mb_regex_encoding()
(PHP 4 >= 4.2.0, PHP 5, PHP 7)
Set/Get character encoding for multibyte regex
说明
mb_regex_encoding([string $encoding= mb_regex_encoding()]): mixed
Set/Get character encoding for a multibyte regex.
参数
- $encoding
$encoding参数为字符编码。如果省略,则使用内部字符编码。
返回值
If$encodingis set, then 成功时返回TRUE
,或者在失败时返回FALSE
。 In this case, the internal character encoding is NOT changed. If$encodingis omitted, then the current character encoding name for a multibyte regex is returned.
更新日志
版本 | 说明 |
---|---|
5.6.0 | Default encoding is changed to UTF-8. It was EUC-JP Previously. |
参见
mb_internal_encoding()
设置/获取内部字符编码mb_ereg()
Regular expression match with multibyte support
Beware, mb_regex_encoding does not support the same set of encodings as listed in mb_list_encodings.php Example: <?php mb_internal_encoding('CP936'); mb_regex_encoding('CP936'); # this line produces an error ?>
mb_ereg functionality is provided via Oniguruma RegEx library and not via PCRE. mb_regex_encoding() does only support a subset of encoding names, compared to mb_list_encodings() and mb_encoding_aliases(). Currently the following names are supported (case-insensitive): UCS-4 UCS-4LE UTF-32 UTF-32BE UTF-32LE UTF-16 UTF-16BE UTF-16LE UTF-8 utf8 ASCII US-ASCII EUC-JP eucJP x-euc-jp SJIS eucJP-win SJIS-win CP932 MS932 Windows-31J ISO-8859-1 ISO-8859-2 ISO-8859-3 ISO-8859-4 ISO-8859-5 ISO-8859-6 ISO-8859-7 ISO-8859-8 ISO-8859-9 ISO-8859-10 ISO-8859-13 ISO-8859-14 ISO-8859-15 ISO-8859-16 EUC-CN EUC_CN eucCN gb2312 EUC-TW EUC_TW eucTW BIG-5 CN-BIG5 BIG-FIVE BIGFIVE EUC-KR EUC_KR eucKR KOI8-R KOI8R The list is a mixture of base names and aliases and applies to PHP 5.4.45 (Oniguruma lib v4.7.1), PHP 5.6.31 (v5.9.5), PHP 7.0.22 (v5.9.6) and PHP 7.1.8 (v5.9.6). Be aware of the inconsistency: mb_regex_encoding() accepts for example the base name 'UTF-8' and its only alias 'utf8', but it does not accept aliases 'utf16', 'utf32' or 'latin1'. Additionally note, that the informal name/alias 'latin9' for ISO/IEC 8859-15:1999 (including the Euro sign on 0xA4) is also not known by mb_list_encodings(). It can only be adressed as 'ISO-8859-15' or 'ISO_8859-15' and for mb_regex_encoding() solely as 'ISO-8859-15'.
mb_regex_encoding does not recognize CP1252 or Windows-1252 as valid encodings, although they are in the list generated by mb_list_encodings. ISO-8859-1 (AKA "Latin-1") is supported, but it's not the same as the Windows variety of Latin-1.
To change algo the regex_encodign <?php echo "current mb_internal_encoding: ".mb_internal_encoding()."<br />"; echo "changing mb_internal_encoding to UTF-8<br />"; mb_internal_encoding("UTF-8"); echo "new mb_internal_encoding: ".mb_internal_encoding()."<br />"; echo "current mb_regex_encoding: ".mb_regex_encoding()."<br />"; echo "changing mb_regex_encoding to UTF-8<br />"; mb_regex_encoding('UTF-8'); echo "new mb_regex_encoding: ".mb_regex_encoding()."<br />"; ?>
Return values vary in setting and getting: <?php echo mb_regex_encoding(); // returns encoding name as a string ?> <?php echo mb_regex_encoding("UTF-8"); // returns true (success) of false as a boolean ?>