mb_ereg()

(PHP 4 >= 4.2.0, PHP 5, PHP 7)

Regular expression match with multibyte support

说明

mb_ereg(string $pattern,string $string[,array &$regs]): int

Executes the regular expression match with multibyte support.

参数

$pattern

The search pattern.

$string

The search string.

$regs

If matches are found for parenthesized substrings of$patternand the function is called with the third argument$regs, the matches will be stored in the elements of the array$regs. If no matches are found,$regsis set to an empty array.

$regs[1]will contain the substring which starts at the first left parenthesis;$regs[2]will contain the substring starting at the second, and so on.$regs[0]will contain a copy of the complete string matched.

返回值

Returns the byte length of the matched string if a match for$patternwas found in$string, or FALSE if no matches were found or an error occurred.

If the optional parameter$regswas not passed or the length of the matched string is0, this function returns1.

更新日志

版本	说明
7.1.0	mb_ereg() will now set`$regs`to an empty array, if nothing matched. Formerly,`$regs`was not modified in that case.

注释

Note:
mb_regex_encoding()指定的内部编码或字符编码将会当作此函数用的字符编码。

参见

mb_regex_encoding()Set/Get character encoding for multibyte regex
mb_eregi()Regular expression match ignoring case with multibyte support

Old link to Oniguruma regex syntax is not working anymore, there is a working one:
https://github.com/geoffgarside/oniguruma/blob/master/Syntax.txt

Note that mb_ereg() does not support the \uFFFF unicode syntax but uses \x{FFFF} instead:
<?PHP
$text = 'Peter is a boy.'; // english
$text = 'بيتر هو صبي.'; // arabic
//$text = 'פיטר הוא ילד.'; // hebrew
mb_regex_encoding('UTF-8');
if(mb_ereg('[\x{0600}-\x{06FF}]', $text)) // arabic range
//if(mb_ereg('[\x{0590}-\x{05FF}]', $text)) // hebrew range
{
  echo "Text has some arabic/hebrew characters.";
}
else
{
  echo "Text doesnt have arabic/hebrew characters.";
}
?>

I hope this information is shown somewhere on php.net.
According to "https://github.com/php/php-src/tree/PHP-5.6/ext/mbstring/oniguruma",
the bundled Oniguruma regex library version seems ...
 4.7.1 between PHP 5.3 - 5.4.45,
 5.9.2 between PHP 5.5 - 7.1.16,
 6.3.0 since PHP 7.2 - .

<?php
// in PHP_VERSION 7.1
// WITHOUT $regs (3rd argument)
$int = mb_ereg('abcde', '_abcde_'); // [5 bytes match]
var_dump($int);           // int(1)
$int = mb_ereg('ab', '_ab_');    // [2 bytes match]
var_dump($int);           // int(1)
$int = mb_ereg('^', '_ab_');    // [0 bytes match]
var_dump($int);           // int(1)
$int = mb_ereg('ab', '__');     // [not match]
var_dump($int);           // bool(false)
$int = mb_ereg('', '_ab_');     // [error : empty pattern]
                  // Warning: mb_ereg(): empty pattern in ...
var_dump($int);           // bool(false)
$int = mb_ereg('ab');        // [error : fewer arguments]
                  // Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
var_dump($int);           // bool(false)
          // Without 3rd argument, mb_ereg() returns either int(1) or bool(false).
// WITH $regs (3rd argument)
$int = mb_ereg('abcde', '_abcde_', $regs);// [5 bytes match]
var_dump($int);              // int(5)
var_dump($regs);             // array(1) { [0]=> string(5) "abcde" }
$int = mb_ereg('ab', '_ab_', $regs);   // [2 bytes match]
var_dump($int);              // int(2)
var_dump($regs);             // array(1) { [0]=> string(2) "ab" }
$int = mb_ereg('^', '_ab_', $regs);    // [0 bytes match]
var_dump($int);              // int(1)
var_dump($regs);             // array(1) { [0]=> bool(false) }
$int = mb_ereg('ab', '__', $regs);    // [not match]
var_dump($int);              // bool(false)
var_dump($regs);             // array(0) { }
$int = mb_ereg('', '_ab_', $regs);    // [error : empty pattern]
                     // Warning: mb_ereg(): empty pattern in ...
var_dump($int);              // bool(false)
var_dump($regs);             // array(0) { }
$int = mb_ereg('ab');           // [error : fewer arguments]
                     // Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
var_dump($int);              // bool(false)
var_dump($regs);             // array(0) { }
          // With 3rd argument, mb_ereg() returns either int(how many bytes matched) or bool(false)
          // and 3rd argument is a bit complicated.
?>

mb_ereg() seems unable to Use "named subpattern".
preg_match() seems a substitute only in UTF-8 encoding.
<?php
$text = 'multi_byte_string';
$pattern = '.*(?<name>string).*';    // "?P" causes "mbregex compile err" in PHP 5.3.5
if(mb_ereg($pattern, $text, $matches)){
  echo '<pre>'.print_r($matches, true).'</pre>';
}else{
  echo 'no match';
}
?>
This code ignores "?<name>" in $pattern and displays below.
Array
(
  [0] => multi_byte_string
  [1] => string
)
$pattern = '/.*(?<name>string).*/u';
if(preg_match($pattern, $text, $matches)){
instead of lines 2 & 3
displays below (in UTF-8 encoding).
Array
(
  [0] => multi_byte_string
  [name] => string
  [1] => string
)

While hardly mentioned anywhere, it may be useful to note that mb_ereg uses Oniguruma library internally. The syntax for the default mode (ruby) is described here:
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt

Hebrew regex tested on PHP 5, Ubuntu 8.04.
Seems to work fine without the mb_regex_encoding lines (commented out).
Didn't seem to work with \uxxxx (also commented out).
<?php
echo "Line ";
//mb_regex_encoding("ISO-8859-8");
//if(mb_ereg(".*([\u05d0-\u05ea]).*", $this->current_line))
if(mb_ereg(".*([א-ת]).*", $this->current_line))
{
  echo "has";
}
else
{
  echo "doesn't have";
}
echo " Hebrew characters.<br>";  
//mb_regex_encoding("UTF-8");
?>