元字符
正则表达式的威力源于它可以在模式中拥有选择和重复的能力。 一些字符被赋予 特殊的涵义,使其不再单纯的代表自己,模式中的这种有特殊涵义的编码字符 称为 元字符。
共有两种不同的元字符:一种是可以在模式中方括号外任何地方使用的,另外一种 是需要在方括号内使用的。
在方括号外使用的元字符如下:
\:一般用于转义字符^:断言目标的开始位置(或在多行模式下是行首)$:断言目标的结束位置(或在多行模式下是行尾).:匹配除换行符外的任何字符(默认)[:开始字符类定义]:结束字符类定义|:开始一个可选分支(:子组的开始标记):子组的结束标记?:作为量词,表示 0 次或 1 次匹配。位于量词后面用于改变量词的贪婪特性。*:量词,0 次或多次匹配+:量词,1 次或多次匹配{:自定义量词开始标记}:自定义量词结束标记
模式中方括号内的部分称为“字符类”。 在一个字符类中仅有以下可用元字符:
\:转义字符^:仅在作为第一个字符(方括号内)时,表明字符类取反-:标记字符范围下面部分描述每个元字符的用法。
A hint for those of you who are trying to fight off (or work around at least) the problem of matching a pattern correctly at the end (or at the beginning) of any line even without the multiple lines mode (/m) or meta-character assertions ($ or ^).
<?php
// Various OS-es have various end line (a.k.a line break) chars:
// - Windows uses CR+LF (\r\n);
// - Linux LF (\n);
// - OSX CR (\r).
// And that's why single dollar meta assertion ($) sometimes fails with multiline modifier (/m) mode - possible bug in PHP 5.3.8 or just a "feature"(?) of default configuration option for meta-character assertions (^ and $) at compile time of PCRE.
$str="ABC ABC\n\n123 123\r\ndef def\rnop nop\r\n890 890\nQRS QRS\r\r~-_ ~-_";
// C 3 p 0 _
$pat3='/\w\R?$/mi'; // Somehow disappointing according to php.net and pcre.org when used improperly
$pat3_2='/\w(?=\R)/i';
// Much better with allowed lookahead assertion (just to detect without capture) without multiline (/m) mode; note that with alternative for end of string ((?=\R|$)) it would grab all 7 elements as expected, but '/(*ANYCRLF)\w$/mi' is more straightforward in use anyway
$p=preg_match_all($pat3, $str, $m3);
$r=preg_match_all($pat3_2, $str, $m4);
echo $str."\n3 !!! $pat3 ($p): ".print_r($m3[0], true) ."\n3_2 !!! $pat3_2 ($r): ".print_r($m4[0], true);
// Note the difference between the two very helpful escape sequences in $pat3 and $pat3_2 (\R) - for some applications at least.
/* The code above results in the following output:
ABC ABC
123 123
def def
nop nop
890 890
QRS QRS
~-_ ~-_
3 !!! /\w\R?$/mi (5): Array
(
[0] => C
[1] => 3
[2] => p
[3] => 0
[4] => _
)
3_2 !!! /\w(?=\R)/i (6): Array
(
[0] => C
[1] => 3
[2] => f
[3] => p
[4] => 0
[5] => S
)
*/
?>
Unfortunately, I haven't got any access to a server with the latest PHP version - my local PHP is 5.3.8 and my public host's PHP is version 5.2.17
