mb_decode_numericentity()
(PHP 4 >= 4.0.6, PHP 5, PHP 7)
根据 HTML 数字字符串解码成字符
说明
mb_decode_numericentity (string $str,array $convmap[,string $encoding= mb_internal_encoding() ] ) : string
将数字字符串的引用$str按指定的字符块转换成字符串。
参数
- $str
要解码的 string。
- $convmap
$convmap是一个 array,指定了要转换的代码区域。
- $encoding
$encoding参数为字符编码。如果省略,则使用内部字符编码。
返回值
转换后的字符串。
范例
Example #1$convmap例子
<?php $convmap = array ( int start_code1, int end_code1, int offset1, int mask1, int start_code2, int end_code2, int offset2, int mask2, ........ int start_codeN, int end_codeN, int offsetN, int maskN ); // Specify Unicode value for start_codeN and end_codeN // Add offsetN to value and take bit-wise 'AND' with maskN, // then convert value to numeric string reference. ?>
参见
mb_encode_numericentity()
Encode character to HTML numeric string reference
Example #2$convmap的例子: 编码(escape) JavaScript 字符串
<?php function escape_javascript_string($str) { $map = [ 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,0,0, // 49 0,0,0,0,0,0,0,0,1,1, 1,1,1,1,1,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,1,1,1,1,1,1,0,0,0, // 99 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, // 149 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, // 199 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1, // 249 1,1,1,1,1,1,1, // 255 ]; // Char encoding is UTF-8 $mblen = mb_strlen($str, 'UTF-8'); $utf32 = bin2hex(mb_convert_encoding($str, 'UTF-32', 'UTF-8')); for ($i=0, $encoded=''; $i < $mblen; $i++) { $u = substr($utf32, $i*8, 8); $v = base_convert($u, 16, 10); if ($v < 256 && $map[$v]) { $encoded .= '\\x'.substr($u, 6,2); } else if ($v == 2028) { $encoded .= '\\u2028'; } else if ($v == 2029) { $encoded .= '\\u2029'; } else { $encoded .= mb_convert_encoding(hex2bin($u), 'UTF-8', 'UTF-32'); } } return $encoded; } // Test data $convmap = [ 0x0, 0xffff, 0, 0xffff ]; $msg = ''; for ($i=0; $i < 1000; $i++) { // chr() cannot generate correct UTF-8 data larger value than 128, use mb_decode_numericentity(). $msg .= mb_decode_numericentity('&#'.$i.';', $convmap, 'UTF-8'); } // var_dump($msg); var_dump(escape_javascript_string($msg));
note that at this time it seems that mb_decode_numericentity() only works with decimal entities and not hexadecimal entities. This fact would have saved me a good hour of time in debugging. For those who need to convert hex entities try first converting them all to decimal entities with a combination of the preg_replace() and hexdec() functions.
Just two great functions for daily use: /* Converts any HTML-entities into characters */ function my_numeric2character($t) { $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF); return mb_decode_numericentity($t, $convmap, 'UTF-8'); } /* Converts any characters into HTML-entities */ function my_character2numeric($t) { $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF); return mb_encode_numericentity($t, $convmap, 'UTF-8'); } print my_numeric2character('’ ἀ â'); print my_character2numeric(' ');
Here are functions to convert hankaku to zenkaku characters (and vice-versa) in Japanese text. <?php // Supported characters: // (space) // !#$%&()*+,./0123456789:;<=>?@ // ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_` // abcdefghijklmnopqrstuvwxyz{|} // (Katakana isn't supported.) function f_han2zen ($string,$encoding = null) { if (is_null($encoding)) $encoding = mb_internal_encoding(); $convmap = array( 0x20,0x20,0x3000-0x20,0xffff, // Space 0x21,0x7e,0xff01-0x21,0xffff); $temp = mb_encode_numericentity($string,$convmap,$encoding); $convmap = array(0,0xffff,0,0xffff); return mb_decode_numericentity($temp,$convmap,$encoding); } function f_zen2han ($string,$encoding = null) { if (is_null($encoding)) $encoding = mb_internal_encoding(); $convmap = array( 0x3000,0x3000,-(0x3000-0x20),0xffff, // Space 0xff01,0xff5e,-(0xff01-0x21),0xffff); $temp = mb_encode_numericentity($string,$convmap,$encoding); $convmap = array(0,0xffff,0,0xffff); return mb_decode_numericentity($temp,$convmap,$encoding); } // Sample usage: f_han2zen("test","shift_jis"); f_han2zen("test","utf-8"); ?>
Many web browsers will tend upload high order characters as UTF-8 encoded entities. Here is some simple code to convert UTF-8 HTML entities within a block of text into proper characters: <?php //decode decimal HTML entities added by web browser $body = preg_replace('/&#\d{2,5};/ue', "utf8_entity_decode('$0')", $body ); //decode hex HTML entities added by web browser $body = preg_replace('/&#x([a-fA-F0-7]{2,8});/ue', "utf8_entity_decode('&#'.hexdec('$1').';')", $body ); //callback function for the regex function utf8_entity_decode($entity){ $convmap = array(0x0, 0x10000, 0, 0xfffff); return mb_decode_numericentity($entity, $convmap, 'UTF-8'); } ?>
Manual entity => utf8 conversion: <?php // parse entities $raw = preg_replace_callback ( "/&#(\\d+);/u", "_pcreEntityToUtf", $raw ); function _pcreEntityToUtf($matches) { $char = intval(is_array($matches) ? $matches[1] : $matches); if ($char < 0x80) { // to prevent insertion of control characters if ($char >= 0x20) return htmlspecialchars(chr($char)); else return "&#$char;"; } else if ($char < 0x8000) { return chr(0xc0 | (0x1f & ($char >> 6))) . chr(0x80 | (0x3f & $char)); } else { return chr(0xe0 | (0x0f & ($char >> 12))) . chr(0x80 | (0x3f & ($char >> 6))). chr(0x80 | (0x3f & $char)); } } ?>
By use of function utf8_decode you'll get a problem with all extended chars above ISO-8859-1 charset. You can solve this problem by using the function mb_encode_numericentity before: // convert $text from UTF-8 to ISO-8859-1 $convmap = array(0xFF, 0x2FFFF, 0, 0xFFFF); $text = mb_encode_numericentity($text, $convmap, "UTF-8"); $text = utf8_decode($text); The second line encodes all extended chars below 0xFF, the third line converts the rest: 0x80 - 0xFF