• 首页
  • vue
  • TypeScript
  • JavaScript
  • scss
  • css3
  • html5
  • php
  • MySQL
  • redis
  • jQuery
  • 元字符

    正则表达式的威力源于它可以在模式中拥有选择和重复的能力。 一些字符被赋予 特殊的涵义,使其不再单纯的代表自己,模式中的这种有特殊涵义的编码字符 称为 元字符。

    共有两种不同的元字符:一种是可以在模式中方括号外任何地方使用的,另外一种 是需要在方括号内使用的。

    在方括号外使用的元字符如下:

    • \:一般用于转义字符
    • ^:断言目标的开始位置(或在多行模式下是行首)
    • $:断言目标的结束位置(或在多行模式下是行尾)
    • .:匹配除换行符外的任何字符(默认)
    • [:开始字符类定义
    • ]:结束字符类定义
    • |:开始一个可选分支
    • (:子组的开始标记
    • ):子组的结束标记
    • ?:作为量词,表示 0 次或 1 次匹配。位于量词后面用于改变量词的贪婪特性。
    • *:量词,0 次或多次匹配
    • +:量词,1 次或多次匹配
    • {:自定义量词开始标记
    • }:自定义量词结束标记

    模式中方括号内的部分称为“字符类”。 在一个字符类中仅有以下可用元字符:

  • \:转义字符
  • ^:仅在作为第一个字符(方括号内)时,表明字符类取反
  • -:标记字符范围
  • 下面部分描述每个元字符的用法。

    A hint for those of you who are trying to fight off (or work around at least) the problem of matching a pattern correctly at the end (or at the beginning) of any line even without the multiple lines mode (/m) or meta-character assertions ($ or ^).
    <?php
    // Various OS-es have various end line (a.k.a line break) chars:
    // - Windows uses CR+LF (\r\n);
    // - Linux LF (\n);
    // - OSX CR (\r).
    // And that's why single dollar meta assertion ($) sometimes fails with multiline modifier (/m) mode - possible bug in PHP 5.3.8 or just a "feature"(?) of default configuration option for meta-character assertions (^ and $) at compile time of PCRE.
    $str="ABC ABC\n\n123 123\r\ndef def\rnop nop\r\n890 890\nQRS QRS\r\r~-_ ~-_";
    //          C          3                   p          0                   _
    
    $pat3='/\w\R?$/mi';    // Somehow disappointing according to php.net and pcre.org when used improperly
    $pat3_2='/\w(?=\R)/i';    
    
    // Much better with allowed lookahead assertion (just to detect without capture) without multiline (/m) mode; note that with alternative for end of string ((?=\R|$)) it would grab all 7 elements as expected, but '/(*ANYCRLF)\w$/mi' is more straightforward in use anyway
    
    $p=preg_match_all($pat3, $str, $m3);
    $r=preg_match_all($pat3_2, $str, $m4);
    
    echo $str."\n3 !!! $pat3 ($p): ".print_r($m3[0], true) ."\n3_2 !!! $pat3_2 ($r): ".print_r($m4[0], true);
    
    // Note the difference between the two very helpful escape sequences in $pat3 and $pat3_2 (\R) - for some applications at least.
    
    /* The code above results in the following output:
    ABC ABC
    
    123 123
    def def
    nop nop
    890 890
    QRS QRS
    
    ~-_ ~-_
    3 !!! /\w\R?$/mi (5): Array
    (
        [0] => C
    
        [1] => 3
        [2] => p
        [3] => 0
        [4] => _
    )
    
    3_2 !!! /\w(?=\R)/i (6): Array
    (
        [0] => C
        [1] => 3
        [2] => f
        [3] => p
        [4] => 0
        [5] => S
    )
    */
    ?>
    Unfortunately, I haven't got any access to a server with the latest PHP version - my local PHP is 5.3.8 and my public host's PHP is version 5.2.17