正则表达式简介

正则表达式的常规语法，python 和 scala 的基本实现

常用符号

句点 . ：通配符，用于匹配一个字符
[] ：限定匹配
() ：分组符，它会将 () 中匹配的内容保存起来，可以对其进行访问
| ：或匹配
$ ：匹配行结束符，即以什么结尾行
^ ：匹配行开始符，即以什么开始行
* ：匹配 0 至多个字符
+ ：匹配 1 次或多次
? ：匹配 0 次或 1 次
{n} ：匹配 n 次
{n,} ：至少匹配 n 次
{n,m} ：至少匹配 n 次，最多匹配 m 次
\ ：转义符

限定匹配 []

[a-z] ：条件限制在小写 a to z 范围中一个字符
[A-Z] ：条件限制在大写 A to Z 范围中一个字符
[a-zA-Z] ：条件限制在小写 a to z 或大写 A to Z 范围中一个字符
[0-9] ：条件限制在小写 0 to 9 范围中一个字符
[0-9a-z] ：条件限制在小写 0 to 9 或 a to z 范围中一个字符
[0-9[a-z]] ：条件限制在小写 0 to 9 或 a to z 范围中一个字符(交集)
^ 符号在限定匹配 [] 中有另外的含义，即取反操作
- [^a-z] ：限制在非小写 a to z 范围中一个字符
- [^A-Z] ：件限制在非大写 A to Z 范围中一个字符
- [^a-zA-Z] ：件限制在非小写 a to z 或大写 A to Z 范围中一个字符
- [^0-9] ：限制在非小写 0 to 9 范围中一个字符
- [^0-9a-z] ：限制在非小写 0 to 9 或 a to z 范围中一个字符

转义字符

\ ：反斜杠
\t ：间隔 ('/u0009')
\n ：换行 ('/u000A')
\r ：回车 ('/u000D')
\d ：数字 [0-9]
\D ：非数字 [^0-9]
\s ：空白符号 [/t/n/x0B/f/r]
\S ：非空白符号 [^/t/n/x0B/f/r]
\w ：单独字符 [a-zA-Z_0-9]
\W ：非单独字符 [^a-zA-Z_0-9]
\f ：换页符
\e ：Escape
\b ：一个单词的边界
\B ：一个非单词的边界
\G ：前一个匹配的结束

简单示例

scala

object RegexMatch {
  def main(args: Array[String]): Unit = {
    matchMail("old.echo@gamil.com")
    matchURL("http://oldecho.github.io")
    matchPhoneNumber("813740879427dfkh8613740879427")
    matchIP("127.0.0.1")
  }

  def matchMail(mail: String): Unit = {
    val sparkRegex = "^[\\w-]+(\\.[\\w-]+)*@[\\w]+(\\.[\\w-]+)+$".r
    println(sparkRegex.findAllIn(mail).getClass)
    println(sparkRegex.findAllIn(mail).getClass.getSimpleName)
    for(matchString <- sparkRegex.findAllIn(mail))
      println(matchString)
  }

  def matchURL(url: String): Unit = {
    val sparkRegex="^[a-zA-Z]+://(\\w+(-\\w+)*)(\\.(\\w+(-\\w+)*))*(\\?\\s*)?$".r
    for(matchString <- sparkRegex.findAllIn(url))
      println(matchString)
  }


  def matchPhoneNumber(pnum: String): Unit = {
    val sparkRegex="(86)*0*13\\d{9}".r
    for(matchString <- sparkRegex.findAllIn(pnum))
      println(matchString)
  }

  def matchIP(ip: String): Unit = {
    val sparkRegex="(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)".r
    for(matchString <- sparkRegex.findAllIn(ip))
      println(matchString)
  }

}

// 输出：
// class scala.util.matching.Regex$MatchIterator
// MatchIterator
// oldechochuan@gamil.com
// http://oldecho.github.io
// 13740879427
// 8613740879427
// 127.0.0.1

python 3

# -*- coding: utf-8 -*-

import re


def match_mail(mail):
    regex = re.compile('^[\w-]+(\.[\w-]+)*@[\w]+(\.[\w-]+)+$', re.S)
    mail_in = regex.search(mail)
    print(type(mail_in))
    if mail_in is not None:
        print(mail_in.group(0))

        
if __name__ == '__main__':
    match_mail('old.echo@gamil.com')

# 输出：
# <class '_sre.SRE_Match'>
# old.echo@gamil.com