• 首页
  • vue
  • TypeScript
  • JavaScript
  • scss
  • css3
  • html5
  • php
  • MySQL
  • redis
  • jQuery
  • mb_ereg()

    (PHP 4 >= 4.2.0, PHP 5, PHP 7)

    Regular expression match with multibyte support

    说明

    mb_ereg(string $pattern,string $string[,array &$regs]): int

    Executes the regular expression match with multibyte support.

    参数

    $pattern

    The search pattern.

    $string

    The search string.

    $regs

    If matches are found for parenthesized substrings of$patternand the function is called with the third argument$regs, the matches will be stored in the elements of the array$regs. If no matches are found,$regsis set to an empty array.

    $regs[1]will contain the substring which starts at the first left parenthesis;$regs[2]will contain the substring starting at the second, and so on.$regs[0]will contain a copy of the complete string matched.

    返回值

    Returns the byte length of the matched string if a match for$patternwas found in$string, or FALSE if no matches were found or an error occurred.

    If the optional parameter$regswas not passed or the length of the matched string is0, this function returns1.

    更新日志

    版本说明
    7.1.0mb_ereg() will now set$regsto an empty array, if nothing matched. Formerly,$regswas not modified in that case.

    注释

    Note:

    mb_regex_encoding()指定的内部编码或字符编码将会当作此函数用的字符编码。

    参见

    Old link to Oniguruma regex syntax is not working anymore, there is a working one:
    https://github.com/geoffgarside/oniguruma/blob/master/Syntax.txt
    Note that mb_ereg() does not support the \uFFFF unicode syntax but uses \x{FFFF} instead:
    <?PHP
    $text = 'Peter is a boy.'; // english
    $text = 'بيتر هو صبي.'; // arabic
    //$text = 'פיטר הוא ילד.'; // hebrew
    mb_regex_encoding('UTF-8');
    if(mb_ereg('[\x{0600}-\x{06FF}]', $text)) // arabic range
    //if(mb_ereg('[\x{0590}-\x{05FF}]', $text)) // hebrew range
    {
      echo "Text has some arabic/hebrew characters.";
    }
    else
    {
      echo "Text doesnt have arabic/hebrew characters.";
    }
    ?>
    
    I hope this information is shown somewhere on php.net.
    According to "https://github.com/php/php-src/tree/PHP-5.6/ext/mbstring/oniguruma",
    the bundled Oniguruma regex library version seems ...
     4.7.1 between PHP 5.3 - 5.4.45,
     5.9.2 between PHP 5.5 - 7.1.16,
     6.3.0 since PHP 7.2 - .
    <?php
    // in PHP_VERSION 7.1
    // WITHOUT $regs (3rd argument)
    $int = mb_ereg('abcde', '_abcde_'); // [5 bytes match]
    var_dump($int);           // int(1)
    $int = mb_ereg('ab', '_ab_');    // [2 bytes match]
    var_dump($int);           // int(1)
    $int = mb_ereg('^', '_ab_');    // [0 bytes match]
    var_dump($int);           // int(1)
    $int = mb_ereg('ab', '__');     // [not match]
    var_dump($int);           // bool(false)
    $int = mb_ereg('', '_ab_');     // [error : empty pattern]
                      // Warning: mb_ereg(): empty pattern in ...
    var_dump($int);           // bool(false)
    $int = mb_ereg('ab');        // [error : fewer arguments]
                      // Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
    var_dump($int);           // bool(false)
              // Without 3rd argument, mb_ereg() returns either int(1) or bool(false).
    // WITH $regs (3rd argument)
    $int = mb_ereg('abcde', '_abcde_', $regs);// [5 bytes match]
    var_dump($int);              // int(5)
    var_dump($regs);             // array(1) { [0]=> string(5) "abcde" }
    $int = mb_ereg('ab', '_ab_', $regs);   // [2 bytes match]
    var_dump($int);              // int(2)
    var_dump($regs);             // array(1) { [0]=> string(2) "ab" }
    $int = mb_ereg('^', '_ab_', $regs);    // [0 bytes match]
    var_dump($int);              // int(1)
    var_dump($regs);             // array(1) { [0]=> bool(false) }
    $int = mb_ereg('ab', '__', $regs);    // [not match]
    var_dump($int);              // bool(false)
    var_dump($regs);             // array(0) { }
    $int = mb_ereg('', '_ab_', $regs);    // [error : empty pattern]
                         // Warning: mb_ereg(): empty pattern in ...
    var_dump($int);              // bool(false)
    var_dump($regs);             // array(0) { }
    $int = mb_ereg('ab');           // [error : fewer arguments]
                         // Warning: mb_ereg() expects at least 2 parameters, 1 given in ...
    var_dump($int);              // bool(false)
    var_dump($regs);             // array(0) { }
              // With 3rd argument, mb_ereg() returns either int(how many bytes matched) or bool(false)
              // and 3rd argument is a bit complicated.
    ?>
    
    mb_ereg() seems unable to Use "named subpattern".
    preg_match() seems a substitute only in UTF-8 encoding.
    <?php
    $text = 'multi_byte_string';
    $pattern = '.*(?<name>string).*';    // "?P" causes "mbregex compile err" in PHP 5.3.5
    if(mb_ereg($pattern, $text, $matches)){
      echo '<pre>'.print_r($matches, true).'</pre>';
    }else{
      echo 'no match';
    }
    ?>
    This code ignores "?<name>" in $pattern and displays below.
    Array
    (
      [0] => multi_byte_string
      [1] => string
    )
    $pattern = '/.*(?<name>string).*/u';
    if(preg_match($pattern, $text, $matches)){
    instead of lines 2 & 3
    displays below (in UTF-8 encoding).
    Array
    (
      [0] => multi_byte_string
      [name] => string
      [1] => string
    )
    While hardly mentioned anywhere, it may be useful to note that mb_ereg uses Oniguruma library internally. The syntax for the default mode (ruby) is described here:
    http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
    Hebrew regex tested on PHP 5, Ubuntu 8.04.
    Seems to work fine without the mb_regex_encoding lines (commented out).
    Didn't seem to work with \uxxxx (also commented out).
    <?php
    echo "Line ";
    //mb_regex_encoding("ISO-8859-8");
    //if(mb_ereg(".*([\u05d0-\u05ea]).*", $this->current_line))
    if(mb_ereg(".*([א-ת]).*", $this->current_line))
    {
      echo "has";
    }
    else
    {
      echo "doesn't have";
    }
    echo " Hebrew characters.<br>";  
    //mb_regex_encoding("UTF-8");
    ?>