mb_regex_encoding()

(PHP 4 >= 4.2.0, PHP 5, PHP 7)

Set/Get character encoding for multibyte regex

说明

mb_regex_encoding([string $encoding= mb_regex_encoding()]): mixed

Set/Get character encoding for a multibyte regex.

参数

$encoding: $encoding参数为字符编码。如果省略，则使用内部字符编码。

返回值

If$encodingis set, then 成功时返回TRUE，或者在失败时返回FALSE。 In this case, the internal character encoding is NOT changed. If$encodingis omitted, then the current character encoding name for a multibyte regex is returned.

更新日志

版本	说明
5.6.0	Default encoding is changed to UTF-8. It was EUC-JP Previously.

参见

mb_internal_encoding()设置/获取内部字符编码
mb_ereg()Regular expression match with multibyte support

Beware, mb_regex_encoding does not support the same set of encodings as listed in mb_list_encodings.php
Example:
<?php
mb_internal_encoding('CP936');
mb_regex_encoding('CP936'); # this line produces an error
 ?>

mb_ereg functionality is provided via Oniguruma RegEx library and not via PCRE. mb_regex_encoding() does only support a subset of encoding names, compared to mb_list_encodings() and mb_encoding_aliases().
Currently the following names are supported (case-insensitive):
UCS-4
UCS-4LE
UTF-32
UTF-32BE
UTF-32LE
UTF-16
UTF-16BE
UTF-16LE
UTF-8
utf8
ASCII
US-ASCII
EUC-JP
eucJP
x-euc-jp
SJIS
eucJP-win
SJIS-win
CP932
MS932
Windows-31J
ISO-8859-1
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
ISO-8859-10
ISO-8859-13
ISO-8859-14
ISO-8859-15
ISO-8859-16
EUC-CN
EUC_CN
eucCN
gb2312
EUC-TW
EUC_TW
eucTW
BIG-5
CN-BIG5
BIG-FIVE
BIGFIVE
EUC-KR
EUC_KR
eucKR
KOI8-R
KOI8R
The list is a mixture of base names and aliases and applies to PHP 5.4.45 (Oniguruma lib v4.7.1), PHP 5.6.31 (v5.9.5), PHP 7.0.22 (v5.9.6) and PHP 7.1.8 (v5.9.6). Be aware of the inconsistency: mb_regex_encoding() accepts for example the base name 'UTF-8' and its only alias 'utf8', but it does not accept aliases 'utf16', 'utf32' or 'latin1'.
Additionally note, that the informal name/alias 'latin9' for ISO/IEC 8859-15:1999 (including the Euro sign on 0xA4) is also not known by mb_list_encodings(). It can only be adressed as 'ISO-8859-15' or 'ISO_8859-15' and for mb_regex_encoding() solely as 'ISO-8859-15'.

mb_regex_encoding does not recognize CP1252 or Windows-1252 as valid encodings, although they are in the list generated by mb_list_encodings.
ISO-8859-1 (AKA "Latin-1") is supported, but it's not the same as the Windows variety of Latin-1.

To change algo the regex_encodign
<?php
echo "current mb_internal_encoding: ".mb_internal_encoding()."<br />";
echo "changing mb_internal_encoding to UTF-8<br />";
mb_internal_encoding("UTF-8"); 
echo "new mb_internal_encoding: ".mb_internal_encoding()."<br />";
echo "current mb_regex_encoding: ".mb_regex_encoding()."<br />";
echo "changing mb_regex_encoding to UTF-8<br />";
mb_regex_encoding('UTF-8');
echo "new mb_regex_encoding: ".mb_regex_encoding()."<br />";
?>

Return values vary in setting and getting:
<?php
 echo mb_regex_encoding();
 // returns encoding name as a string
?>
<?php
 echo mb_regex_encoding("UTF-8");
 // returns true (success) of false as a boolean
?>