get_html_translation_table()
(PHP 4, PHP 5, PHP 7)
返回使用htmlspecialchars()和htmlentities()后的转换表
说明
get_html_translation_table([int $table= HTML_SPECIALCHARS[,int $flags= ENT_COMPAT | ENT_HTML401[,string $encoding= 'UTF-8']]]) : array
get_html_translation_table()将返回htmlspecialchars()和htmlentities()处理后的转换表。
Note:特殊字符可以使用多种转换方式。例如:"可以被转换成","或者".get_html_translation_table()返回其中最常用的。
参数
- $table
有两个新的常量(
HTML_ENTITIES
,HTML_SPECIALCHARS
)允许你指定你想要的表。- $flags
A bitmask of one or more of the following flags, which specify which quotes the table will contain as well as which document type the table is for. The default isENT_COMPAT | ENT_HTML401.
Available$flagsconstants Constant Name Description ENT_COMPAT
Table will contain entities for double-quotes, but not for single-quotes. ENT_QUOTES
Table will contain entities for both double and single quotes. ENT_NOQUOTES
Table will neither contain entities for single quotes nor for double quotes. ENT_HTML401
Table for HTML 4.01. ENT_XML1
Table for XML 1. ENT_XHTML
Table for XHTML. ENT_HTML5
Table for HTML 5. - $encoding
Encoding to use. If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.
支持以下字符集:
支持的字符集列表 字符集 别名 描述 ISO-8859-1 ISO8859-1 西欧,Latin-1 ISO-8859-5 ISO8859-5 Little used cyrillic charset (Latin/Cyrillic). ISO-8859-15 ISO8859-15 西欧,Latin-9。增加欧元符号,法语和芬兰语字母在 Latin-1(ISO-8859-1)中缺失。 UTF-8 ASCII 兼容的多字节 8 位 Unicode。 cp866 ibm866, 866 DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。 cp1251 Windows-1251, win-1251, 1251 Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。 cp1252 Windows-1252, 1252 Windows 特有的西欧编码。 KOI8-R koi8-ru, koi8r 俄语。本字符集在 4.3.2 版本中得到支持。 BIG5 950 繁体中文,主要用于中国台湾省。 GB2312 936 简体中文,中国国家标准字符集。 BIG5-HKSCS 繁体中文,附带香港扩展的 Big5 字符集。 Shift_JIS SJIS, 932 日语 EUC-JP EUCJP 日语 MacRoman Mac OS 使用的字符串。 '' An empty string activates detection from script encoding (Zend multibyte),default_charsetand current locale (seenl_langinfo()andsetlocale()), in this order. Not recommended. Note:其他字符集没有认可。将会使用默认编码并抛出异常。
返回值
将转换表作为一个数组返回。
更新日志
版本 | 说明 |
---|---|
5.4.0 | The default value for the$encodingparameter was changed to UTF-8. |
5.4.0 | The constantsENT_HTML401 ,ENT_XML1 ,ENT_XHTML andENT_HTML5 were added. |
5.3.4 | The$encodingparameter was added. |
范例
Translation Table Example
<?php var_dump(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES | ENT_HTML5)); ?>
以上例程的输出类似于:
array(1510) { [" "]=> string(9) "
" ["!"]=> string(6) "!" ["""]=> string(6) """ ["#"]=> string(5) "#" ["$"]=> string(8) "$" ["%"]=> string(8) "%" ["&"]=> string(5) "&" ["'"]=> string(6) "'" // ... }
参见
htmlspecialchars()
将特殊字符转换为 HTML 实体htmlentities()
将字符转换为 HTML 转义字符html_entity_decode()
Convert HTML entities to their corresponding characters
Be careful using get_html_translation_table() in a loop, as it's very slow.
The fact that MS-word and some other sources use CP-1252, and that it is so close to Latin1 ('ISO-8859-1') causes a lot of confusion. What confused me the most was finding that mySQL uses CP-1252 by default. You may run into trouble if you find yourself tempted to do something like this: <?php $trans[chr(149)] = '•'; // Bullet $trans[chr(150)] = '–'; // En Dash $trans[chr(151)] = '—'; // Em Dash $trans[chr(152)] = '˜'; // Small Tilde $trans[chr(153)] = '™'; // Trade Mark Sign ?> Don't do it. DON'T DO IT! You can use: <?php $translationTable = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES, 'WINDOWS-1252'); ?> or just convert directly: <?php $output = htmlentities($input, ENT_NOQUOTES, 'WINDOWS-1252'); ?> But your web page is probably encoded UTF-8, and you probably don't really want CP-1252 text flying around, so fix the character encoding first: <?php $output = mb_convert_encoding($input, 'UTF-8', 'WINDOWS-1252'); $ouput = htmlentities($output); ?>
Not sure what's going on here but I've run into a problem that others might face as well... <?php $translations = array_flip(get_html_translation_table(HTML_ENTITIES,ENT_QUOTES)); ?> returns the single quote ' as being equal to ' while <?php $translatedString = htmlentities($string,ENT_QUOTES); ?> returns it as being equal to ' I've had to do a specific string replacement for the time being... Not sure if it's an issue with the function or the array manipulation. -Pat
I wrote a quick little function for converting something like '·' into '·': $to_convert = '·'; $table = get_html_translation_table(HTML_ENTITIES); $equiv = '&#'.ord(array_search($to_convert,$table)).';';
to display the mapping on a webpage no matter what the server encoding is, this can be used echo "<pre>\n"; echo htmlentities(print_r((get_html_translation_table(HTML_SPECIALCHARS)), true)); echo htmlentities(print_r((get_html_translation_table(HTML_ENTITIES)), true)); since get_html_translation_table() actually gives the special chars in iso-8859-1 (Latin-1) encoding, so to see the tables correctly using print_r(get_html_translation_table(HTML_ENTITIES)); your server needs to give a HTTP header as iso-8859-1, unless you use header() or manually set the browser's encoding setting to iso-8859-1. And you need to view the source of the page to see the mapping. (except English version of IE 7 outputs the page source as iso-8859-1 anyway).
get_html_translation_table It works only with the first 256 Codepositions. For Higher Positions, for Example ф (a kyrillic Letter) it shows the same.
without heavy scientific analysis, this seems to work as a quick fix to making text originating from a Microsoft Word document display as HTML: <?php function DoHTMLEntities ($string) { $trans_tbl = get_html_translation_table (HTML_ENTITIES); // MS Word strangeness.. // smart single/ double quotes: $trans_tbl[chr(145)] = '\''; $trans_tbl[chr(146)] = '\''; $trans_tbl[chr(147)] = '"'; $trans_tbl[chr(148)] = '"'; // Acute 'e' $trans_tbl[chr(142)] = 'é'; return strtr ($string, $trans_tbl); } ?>
htmlentities includes htmlspecialchars, so here's how to convert an UTF-8 string : htmlentities($string, ENT_QUOTES, 'UTF-8');
If you have troubles (like me) getting data from ISO-8859-1 encoded forms where user copy and paste from word, this routine could be useful. It adds to the standard get_html_translation_table the codes of the characters usually M$ Word replacs into typed text. Otherwise those characters would never be displayed correctly in html output. function get_html_translation_table_CP1252() { $trans = get_html_translation_table(HTML_ENTITIES); $trans[chr(130)] = '‚'; // Single Low-9 Quotation Mark $trans[chr(131)] = 'ƒ'; // Latin Small Letter F With Hook $trans[chr(132)] = '„'; // Double Low-9 Quotation Mark $trans[chr(133)] = '…'; // Horizontal Ellipsis $trans[chr(134)] = '†'; // Dagger $trans[chr(135)] = '‡'; // Double Dagger $trans[chr(136)] = 'ˆ'; // Modifier Letter Circumflex Accent $trans[chr(137)] = '‰'; // Per Mille Sign $trans[chr(138)] = 'Š'; // Latin Capital Letter S With Caron $trans[chr(139)] = '‹'; // Single Left-Pointing Angle Quotation Mark $trans[chr(140)] = 'Œ '; // Latin Capital Ligature OE $trans[chr(145)] = '‘'; // Left Single Quotation Mark $trans[chr(146)] = '’'; // Right Single Quotation Mark $trans[chr(147)] = '“'; // Left Double Quotation Mark $trans[chr(148)] = '”'; // Right Double Quotation Mark $trans[chr(149)] = '•'; // Bullet $trans[chr(150)] = '–'; // En Dash $trans[chr(151)] = '—'; // Em Dash $trans[chr(152)] = '˜'; // Small Tilde $trans[chr(153)] = '™'; // Trade Mark Sign $trans[chr(154)] = 'š'; // Latin Small Letter S With Caron $trans[chr(155)] = '›'; // Single Right-Pointing Angle Quotation Mark $trans[chr(156)] = 'œ'; // Latin Small Ligature OE $trans[chr(159)] = 'Ÿ'; // Latin Capital Letter Y With Diaeresis ksort($trans); return $trans; }
If you want to display special HTML entities in a web browser, you can use the following code: <? $entities = get_html_translation_table(HTML_ENTITIES); foreach ($entities as $entity) { $new_entities[$entity] = htmlspecialchars($entity); } echo "<pre>"; print_r($new_entities); echo "</pre>"; ?> If you don't, the key name of each element will appear to be the same as the element content itself, making it look mighty stupid. ;)
I found this useful in converting latin characters <?php function convertLatin1ToHtml($str) { $allEntities = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES); $specialEntities = get_html_translation_table(HTML_SPECIALCHARS, ENT_NOQUOTES); $noTags = array_diff($allEntities, $specialEntities); $str = strtr($str, $noTags); return $str; } ?>
If you want to decode all those { symbols as well.... function unhtmlentities ($string) { $trans_tbl = get_html_translation_table (HTML_ENTITIES); $trans_tbl = array_flip ($trans_tbl); $ret = strtr ($string, $trans_tbl); return preg_replace('/\&\#([0-9]+)\;/me', "chr('\\1')",$ret); }
Alans version didn't seem to work right. If you're having the same problem consider using this slightly modified version instead: function unhtmlentities ($string) { $trans_tbl = get_html_translation_table (HTML_ENTITIES); $trans_tbl = array_flip ($trans_tbl); $ret = strtr ($string, $trans_tbl); return preg_replace('/&#(\d+);/me', "chr('\\1')",$ret); }