mb_ereg()
(PHP 4 >= 4.2.0, PHP 5, PHP 7)
Regular expression match with multibyte support
说明
mb_ereg(string $pattern,string $string[,array &$regs]): int
Executes the regular expression match with multibyte support.
参数
- $pattern
The search pattern.
- $string
The search string.
- $regs
If matches are found for parenthesized substrings of$patternand the function is called with the third argument$regs, the matches will be stored in the elements of the array$regs. If no matches are found,$regsis set to an empty array.
$regs[1]will contain the substring which starts at the first left parenthesis;$regs[2]will contain the substring starting at the second, and so on.$regs[0]will contain a copy of the complete string matched.
返回值
Returns the byte length of the matched string if a match for$patternwas found in$string, or FALSE
if no matches were found or an error occurred.
If the optional parameter$regswas not passed or the length of the matched string is0, this function returns1.
更新日志
版本 | 说明 |
---|---|
7.1.0 | mb_ereg() will now set$regsto an empty array, if nothing matched. Formerly,$regswas not modified in that case. |
注释
Note:mb_regex_encoding()指定的内部编码或字符编码将会当作此函数用的字符编码。
参见
mb_regex_encoding()
Set/Get character encoding for multibyte regexmb_eregi()
Regular expression match ignoring case with multibyte support
Old link to Oniguruma regex syntax is not working anymore, there is a working one: https://github.com/geoffgarside/oniguruma/blob/master/Syntax.txt
Note that mb_ereg() does not support the \uFFFF unicode syntax but uses \x{FFFF} instead: <?PHP $text = 'Peter is a boy.'; // english $text = 'بيتر هو صبي.'; // arabic //$text = 'פיטר הוא ילד.'; // hebrew mb_regex_encoding('UTF-8'); if(mb_ereg('[\x{0600}-\x{06FF}]', $text)) // arabic range //if(mb_ereg('[\x{0590}-\x{05FF}]', $text)) // hebrew range { echo "Text has some arabic/hebrew characters."; } else { echo "Text doesnt have arabic/hebrew characters."; } ?>
I hope this information is shown somewhere on php.net. According to "https://github.com/php/php-src/tree/PHP-5.6/ext/mbstring/oniguruma", the bundled Oniguruma regex library version seems ... 4.7.1 between PHP 5.3 - 5.4.45, 5.9.2 between PHP 5.5 - 7.1.16, 6.3.0 since PHP 7.2 - .
<?php // in PHP_VERSION 7.1 // WITHOUT $regs (3rd argument) $int = mb_ereg('abcde', '_abcde_'); // [5 bytes match] var_dump($int); // int(1) $int = mb_ereg('ab', '_ab_'); // [2 bytes match] var_dump($int); // int(1) $int = mb_ereg('^', '_ab_'); // [0 bytes match] var_dump($int); // int(1) $int = mb_ereg('ab', '__'); // [not match] var_dump($int); // bool(false) $int = mb_ereg('', '_ab_'); // [error : empty pattern] // Warning: mb_ereg(): empty pattern in ... var_dump($int); // bool(false) $int = mb_ereg('ab'); // [error : fewer arguments] // Warning: mb_ereg() expects at least 2 parameters, 1 given in ... var_dump($int); // bool(false) // Without 3rd argument, mb_ereg() returns either int(1) or bool(false). // WITH $regs (3rd argument) $int = mb_ereg('abcde', '_abcde_', $regs);// [5 bytes match] var_dump($int); // int(5) var_dump($regs); // array(1) { [0]=> string(5) "abcde" } $int = mb_ereg('ab', '_ab_', $regs); // [2 bytes match] var_dump($int); // int(2) var_dump($regs); // array(1) { [0]=> string(2) "ab" } $int = mb_ereg('^', '_ab_', $regs); // [0 bytes match] var_dump($int); // int(1) var_dump($regs); // array(1) { [0]=> bool(false) } $int = mb_ereg('ab', '__', $regs); // [not match] var_dump($int); // bool(false) var_dump($regs); // array(0) { } $int = mb_ereg('', '_ab_', $regs); // [error : empty pattern] // Warning: mb_ereg(): empty pattern in ... var_dump($int); // bool(false) var_dump($regs); // array(0) { } $int = mb_ereg('ab'); // [error : fewer arguments] // Warning: mb_ereg() expects at least 2 parameters, 1 given in ... var_dump($int); // bool(false) var_dump($regs); // array(0) { } // With 3rd argument, mb_ereg() returns either int(how many bytes matched) or bool(false) // and 3rd argument is a bit complicated. ?>
mb_ereg() seems unable to Use "named subpattern". preg_match() seems a substitute only in UTF-8 encoding. <?php $text = 'multi_byte_string'; $pattern = '.*(?<name>string).*'; // "?P" causes "mbregex compile err" in PHP 5.3.5 if(mb_ereg($pattern, $text, $matches)){ echo '<pre>'.print_r($matches, true).'</pre>'; }else{ echo 'no match'; } ?> This code ignores "?<name>" in $pattern and displays below. Array ( [0] => multi_byte_string [1] => string ) $pattern = '/.*(?<name>string).*/u'; if(preg_match($pattern, $text, $matches)){ instead of lines 2 & 3 displays below (in UTF-8 encoding). Array ( [0] => multi_byte_string [name] => string [1] => string )
While hardly mentioned anywhere, it may be useful to note that mb_ereg uses Oniguruma library internally. The syntax for the default mode (ruby) is described here: http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
Hebrew regex tested on PHP 5, Ubuntu 8.04. Seems to work fine without the mb_regex_encoding lines (commented out). Didn't seem to work with \uxxxx (also commented out). <?php echo "Line "; //mb_regex_encoding("ISO-8859-8"); //if(mb_ereg(".*([\u05d0-\u05ea]).*", $this->current_line)) if(mb_ereg(".*([א-ת]).*", $this->current_line)) { echo "has"; } else { echo "doesn't have"; } echo " Hebrew characters.<br>"; //mb_regex_encoding("UTF-8"); ?>