mb_ereg()
(PHP 4 >= 4.2.0, PHP 5, PHP 7)
Regular expression match with multibyte support
说明
mb_ereg(string $pattern,string $string[,array &$regs]): int
Executes the regular expression match with multibyte support.
参数
- $pattern
The search pattern.
- $string
The search string.
- $regs
If matches are found for parenthesized substrings of$patternand the function is called with the third argument$regs, the matches will be stored in the elements of the array$regs. If no matches are found,$regsis set to an empty array.
$regs[1]will contain the substring which starts at the first left parenthesis;$regs[2]will contain the substring starting at the second, and so on.$regs[0]will contain a copy of the complete string matched.
返回值
Returns the byte length of the matched string if a match for$patternwas found in$string, or FALSE if no matches were found or an error occurred.
If the optional parameter$regswas not passed or the length of the matched string is0, this function returns1.
更新日志
| 版本 | 说明 |
|---|---|
| 7.1.0 | mb_ereg() will now set$regsto an empty array, if nothing matched. Formerly,$regswas not modified in that case. |
注释
Note:mb_regex_encoding()指定的内部编码或字符编码将会当作此函数用的字符编码。
参见
mb_regex_encoding()Set/Get character encoding for multibyte regexmb_eregi()Regular expression match ignoring case with multibyte support
Old link to Oniguruma regex syntax is not working anymore, there is a working one: https://github.com/geoffgarside/oniguruma/blob/master/Syntax.txt
Note that mb_ereg() does not support the \uFFFF unicode syntax but uses \x{FFFF} instead:
<?PHP
$text = 'Peter is a boy.'; // english
$text = 'بيتر هو صبي.'; // arabic
//$text = 'פיטר הוא ילד.'; // hebrew
mb_regex_encoding('UTF-8');
if(mb_ereg('[\x{0600}-\x{06FF}]', $text)) // arabic range
//if(mb_ereg('[\x{0590}-\x{05FF}]', $text)) // hebrew range
{
echo "Text has some arabic/hebrew characters.";
}
else
{
echo "Text doesnt have arabic/hebrew characters.";
}
?>
I hope this information is shown somewhere on php.net. According to "https://github.com/php/php-src/tree/PHP-5.6/ext/mbstring/oniguruma", the bundled Oniguruma regex library version seems ... 4.7.1 between PHP 5.3 - 5.4.45, 5.9.2 between PHP 5.5 - 7.1.16, 6.3.0 since PHP 7.2 - .
<?php
// in PHP_VERSION 7.1
// WITHOUT $regs (3rd argument)
$int = mb_ereg('abcde', '_abcde_'); // [5 bytes match]
var_dump($int); // int(1)
$int = mb_ereg('ab', '_ab_'); // [2 bytes match]
var_dump($int); // int(1)
$int = mb_ereg('^', '_ab_'); // [0 bytes match]
var_dump($int); // int(1)
$int = mb_ereg('ab', '__'); // [not match]
var_dump($int); // bool(false)
$int = mb_ereg('', '_ab_'); // [error : empty pattern]
// Warning: mb_ereg(): empty pattern in ...
var_dump($int); // bool(false)
$int = mb_ereg('ab'); // [error : fewer arguments]
// Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
var_dump($int); // bool(false)
// Without 3rd argument, mb_ereg() returns either int(1) or bool(false).
// WITH $regs (3rd argument)
$int = mb_ereg('abcde', '_abcde_', $regs);// [5 bytes match]
var_dump($int); // int(5)
var_dump($regs); // array(1) { [0]=> string(5) "abcde" }
$int = mb_ereg('ab', '_ab_', $regs); // [2 bytes match]
var_dump($int); // int(2)
var_dump($regs); // array(1) { [0]=> string(2) "ab" }
$int = mb_ereg('^', '_ab_', $regs); // [0 bytes match]
var_dump($int); // int(1)
var_dump($regs); // array(1) { [0]=> bool(false) }
$int = mb_ereg('ab', '__', $regs); // [not match]
var_dump($int); // bool(false)
var_dump($regs); // array(0) { }
$int = mb_ereg('', '_ab_', $regs); // [error : empty pattern]
// Warning: mb_ereg(): empty pattern in ...
var_dump($int); // bool(false)
var_dump($regs); // array(0) { }
$int = mb_ereg('ab'); // [error : fewer arguments]
// Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
var_dump($int); // bool(false)
var_dump($regs); // array(0) { }
// With 3rd argument, mb_ereg() returns either int(how many bytes matched) or bool(false)
// and 3rd argument is a bit complicated.
?>
mb_ereg() seems unable to Use "named subpattern".
preg_match() seems a substitute only in UTF-8 encoding.
<?php
$text = 'multi_byte_string';
$pattern = '.*(?<name>string).*'; // "?P" causes "mbregex compile err" in PHP 5.3.5
if(mb_ereg($pattern, $text, $matches)){
echo '<pre>'.print_r($matches, true).'</pre>';
}else{
echo 'no match';
}
?>
This code ignores "?<name>" in $pattern and displays below.
Array
(
[0] => multi_byte_string
[1] => string
)
$pattern = '/.*(?<name>string).*/u';
if(preg_match($pattern, $text, $matches)){
instead of lines 2 & 3
displays below (in UTF-8 encoding).
Array
(
[0] => multi_byte_string
[name] => string
[1] => string
)While hardly mentioned anywhere, it may be useful to note that mb_ereg uses Oniguruma library internally. The syntax for the default mode (ruby) is described here: http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
Hebrew regex tested on PHP 5, Ubuntu 8.04.
Seems to work fine without the mb_regex_encoding lines (commented out).
Didn't seem to work with \uxxxx (also commented out).
<?php
echo "Line ";
//mb_regex_encoding("ISO-8859-8");
//if(mb_ereg(".*([\u05d0-\u05ea]).*", $this->current_line))
if(mb_ereg(".*([א-ת]).*", $this->current_line))
{
echo "has";
}
else
{
echo "doesn't have";
}
echo " Hebrew characters.<br>";
//mb_regex_encoding("UTF-8");
?>
