我的想法是,这个正则表达式根本就不是给人看的,更别说维护了,动一下就挂了, 不如像做 ORM 那样,封装一个个方法来表示正则表示
叠甲:我只学了基本的正则表达式,没有经历过需要大量使用正则表达式的经验,骂我的时候提个 ISSUE
"[\w.-]+@[\w.-]+.\w{2,}"
有以下示例字符串
- [email protected] (匹配)
- [email protected] (匹配)
- [email protected] (不完全匹配,因为没有考虑顶级域如.co.uk)
构造下这个正则表达式
val regex = buildRegex {
onemore(
enumerate('A'..'Z')
.concat('a'..'z')
.concat(0..9)
.concat('.')
.concat('_')
.concat('+')
.concat('-')
)
literal("@")
onemore(
enumerate('A'..'Z')
.concat('a'..'z')
.concat(0..9)
.concat('.')
.concat('-')
)
escape('.')
repeat(enumerate('A'..'Z').concat('a'..'z'), 2)
}
查看下匹配情况
assertTrue(regex.matches("[email protected]"))
assertTrue(regex.matches("[email protected]"))
assertFalse(regex.matches("invalid-email@com"))
顺带给你们看下生成的正则表达式,🤣,这下知道这玩意有多蛋疼了吧
[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789._+-]([ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789._+-])+@[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789.-]([ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789.-])+\.[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]([ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]{2,})
"+?[\d- ]{7,}"
- +1-800-123-4567 (匹配)
- 0800123456 (匹配)
- 123-456-7890 (匹配)
- 123-abc-defg (不匹配,因为包含非数字字符)
val regex = buildRegex {
maybe(escape('+'))
repeat(
enumerate(0..9).concat(escape('-')).concat(' '),
minCount = 7)
}
assertTrue(regex.matches("+1-800-123-4567"))
assertTrue(regex.matches("0800123456"))
assertTrue(regex.matches("123-456-7890"))
assertFalse(regex.matches("123-abc-defg"))
"\d{4}-\d{2}-\d{2}"
- 2023-01-30 (匹配)
- 1999-12-31 (匹配)
- 2023/01/30 (不匹配,因为分隔符不是-)
val regex = buildRegex {
fixedRepeat(digit(), 4)
escape('-')
fixedRepeat(digit(), 2)
escape('-')
fixedRepeat(digit(), 2)
}
assertTrue(regex.matches("2023-01-30"))
assertTrue(regex.matches("1999-12-31"))
assertFalse(regex.matches("2023/01/30"))
"http(s)?://[a-zA-Z0-9.-]+.[a-zA-Z]{2,}(/[^ \t\n\r\f\v]*)?"
- http://example.com (匹配)
- https://www.subdomain.example.org/path/to/resource (匹配)
- ftp://example.com (不匹配,因为协议不是http或https)
val regex = buildRegex {
literal("http")
maybe(literal("s"))
literal("://")
onemore(enumerate(word()).concat('.').concat('-'))
escape('.')
repeat(
enumerate(letter()).concat(capitalLetter()),
minCount = 2
)
maybe(
group(
literal("/"),
anycount(exclude(space()))
)
)
}
assertTrue(regex.matches("http://example.com"))
assertTrue(regex.matches("https://www.subdomain.example.org/path/to/resource"))
assertFalse(regex.matches("ftp://example.com"))
"<(\w+)[^>]>(.?)</\1>"
这个模式用于匹配带有开始和结束标签的HTML元素内容,注意它不处理自闭合标签的情况。
val regex = buildRegex {
literal("<")
group(onemore(word()))
anycount(exclude('>'))
literal(">")
group(anycount(any()).lazyMatch())
literal("<")
escape('/')
group(onemore(word()))
literal(">")
}
assertTrue(regex.matches("<div>Hello World</div>"))
assertTrue(regex.matches("<p>This is a paragraph.</p>"))
assertFalse(regex.matches("<img src=\"image.jpg\" />"))
- 转义字符枚举
- 更多的扩展正则字符串
- 参考现代正则字符串