get_html_translation_table()
(PHP 4, PHP 5, PHP 7)
返回使用htmlspecialchars()和htmlentities()后的转换表
说明
get_html_translation_table([int $table= HTML_SPECIALCHARS[,int $flags= ENT_COMPAT | ENT_HTML401[,string $encoding= 'UTF-8']]]) : array
get_html_translation_table()将返回htmlspecialchars()和htmlentities()处理后的转换表。
Note:特殊字符可以使用多种转换方式。例如:"可以被转换成","或者".get_html_translation_table()返回其中最常用的。
参数
- $table
有两个新的常量(
HTML_ENTITIES,HTML_SPECIALCHARS)允许你指定你想要的表。- $flags
A bitmask of one or more of the following flags, which specify which quotes the table will contain as well as which document type the table is for. The default isENT_COMPAT | ENT_HTML401.
Available$flagsconstants Constant Name Description ENT_COMPATTable will contain entities for double-quotes, but not for single-quotes. ENT_QUOTESTable will contain entities for both double and single quotes. ENT_NOQUOTESTable will neither contain entities for single quotes nor for double quotes. ENT_HTML401Table for HTML 4.01. ENT_XML1Table for XML 1. ENT_XHTMLTable for XHTML. ENT_HTML5Table for HTML 5. - $encoding
Encoding to use. If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.
支持以下字符集:
支持的字符集列表 字符集 别名 描述 ISO-8859-1 ISO8859-1 西欧,Latin-1 ISO-8859-5 ISO8859-5 Little used cyrillic charset (Latin/Cyrillic). ISO-8859-15 ISO8859-15 西欧,Latin-9。增加欧元符号,法语和芬兰语字母在 Latin-1(ISO-8859-1)中缺失。 UTF-8 ASCII 兼容的多字节 8 位 Unicode。 cp866 ibm866, 866 DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。 cp1251 Windows-1251, win-1251, 1251 Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。 cp1252 Windows-1252, 1252 Windows 特有的西欧编码。 KOI8-R koi8-ru, koi8r 俄语。本字符集在 4.3.2 版本中得到支持。 BIG5 950 繁体中文,主要用于中国台湾省。 GB2312 936 简体中文,中国国家标准字符集。 BIG5-HKSCS 繁体中文,附带香港扩展的 Big5 字符集。 Shift_JIS SJIS, 932 日语 EUC-JP EUCJP 日语 MacRoman Mac OS 使用的字符串。 '' An empty string activates detection from script encoding (Zend multibyte),default_charsetand current locale (seenl_langinfo()andsetlocale()), in this order. Not recommended. Note:其他字符集没有认可。将会使用默认编码并抛出异常。
返回值
将转换表作为一个数组返回。
更新日志
| 版本 | 说明 |
|---|---|
| 5.4.0 | The default value for the$encodingparameter was changed to UTF-8. |
| 5.4.0 | The constantsENT_HTML401,ENT_XML1,ENT_XHTMLandENT_HTML5were added. |
| 5.3.4 | The$encodingparameter was added. |
范例
Translation Table Example
<?php var_dump(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES | ENT_HTML5)); ?>
以上例程的输出类似于:
array(1510) {
["
"]=>
string(9) "
"
["!"]=>
string(6) "!"
["""]=>
string(6) """
["#"]=>
string(5) "#"
["$"]=>
string(8) "$"
["%"]=>
string(8) "%"
["&"]=>
string(5) "&"
["'"]=>
string(6) "'"
// ...
}
参见
htmlspecialchars()将特殊字符转换为 HTML 实体htmlentities()将字符转换为 HTML 转义字符html_entity_decode()Convert HTML entities to their corresponding characters
Be careful using get_html_translation_table() in a loop, as it's very slow.
The fact that MS-word and some other sources use CP-1252, and that it is so close to Latin1 ('ISO-8859-1') causes a lot of confusion. What confused me the most was finding that mySQL uses CP-1252 by default.
You may run into trouble if you find yourself tempted to do something like this:
<?php
$trans[chr(149)] = '•'; // Bullet
$trans[chr(150)] = '–'; // En Dash
$trans[chr(151)] = '—'; // Em Dash
$trans[chr(152)] = '˜'; // Small Tilde
$trans[chr(153)] = '™'; // Trade Mark Sign
?>
Don't do it. DON'T DO IT!
You can use:
<?php
$translationTable = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES, 'WINDOWS-1252');
?>
or just convert directly:
<?php
$output = htmlentities($input, ENT_NOQUOTES, 'WINDOWS-1252');
?>
But your web page is probably encoded UTF-8, and you probably don't really want CP-1252 text flying around, so fix the character encoding first:
<?php
$output = mb_convert_encoding($input, 'UTF-8', 'WINDOWS-1252');
$ouput = htmlentities($output);
?>Not sure what's going on here but I've run into a problem that others might face as well... <?php $translations = array_flip(get_html_translation_table(HTML_ENTITIES,ENT_QUOTES)); ?> returns the single quote ' as being equal to ' while <?php $translatedString = htmlentities($string,ENT_QUOTES); ?> returns it as being equal to ' I've had to do a specific string replacement for the time being... Not sure if it's an issue with the function or the array manipulation. -Pat
I wrote a quick little function for converting something like '·' into '·': $to_convert = '·'; $table = get_html_translation_table(HTML_ENTITIES); $equiv = '&#'.ord(array_search($to_convert,$table)).';';
to display the mapping on a webpage no matter what the server encoding is, this can be used echo "<pre>\n"; echo htmlentities(print_r((get_html_translation_table(HTML_SPECIALCHARS)), true)); echo htmlentities(print_r((get_html_translation_table(HTML_ENTITIES)), true)); since get_html_translation_table() actually gives the special chars in iso-8859-1 (Latin-1) encoding, so to see the tables correctly using print_r(get_html_translation_table(HTML_ENTITIES)); your server needs to give a HTTP header as iso-8859-1, unless you use header() or manually set the browser's encoding setting to iso-8859-1. And you need to view the source of the page to see the mapping. (except English version of IE 7 outputs the page source as iso-8859-1 anyway).
get_html_translation_table It works only with the first 256 Codepositions. For Higher Positions, for Example ф (a kyrillic Letter) it shows the same.
without heavy scientific analysis, this seems to work as a quick fix to making text originating from a Microsoft Word document display as HTML:
<?php
function DoHTMLEntities ($string)
{
$trans_tbl = get_html_translation_table (HTML_ENTITIES);
// MS Word strangeness..
// smart single/ double quotes:
$trans_tbl[chr(145)] = '\'';
$trans_tbl[chr(146)] = '\'';
$trans_tbl[chr(147)] = '"';
$trans_tbl[chr(148)] = '"';
// Acute 'e'
$trans_tbl[chr(142)] = 'é';
return strtr ($string, $trans_tbl);
}
?>htmlentities includes htmlspecialchars, so here's how to convert an UTF-8 string : htmlentities($string, ENT_QUOTES, 'UTF-8');
If you have troubles (like me) getting data from ISO-8859-1 encoded forms where user copy and paste from word, this routine could be useful.
It adds to the standard get_html_translation_table the codes of the characters usually M$ Word replacs into typed text.
Otherwise those characters would never be displayed correctly in html output.
function get_html_translation_table_CP1252() {
$trans = get_html_translation_table(HTML_ENTITIES);
$trans[chr(130)] = '‚'; // Single Low-9 Quotation Mark
$trans[chr(131)] = 'ƒ'; // Latin Small Letter F With Hook
$trans[chr(132)] = '„'; // Double Low-9 Quotation Mark
$trans[chr(133)] = '…'; // Horizontal Ellipsis
$trans[chr(134)] = '†'; // Dagger
$trans[chr(135)] = '‡'; // Double Dagger
$trans[chr(136)] = 'ˆ'; // Modifier Letter Circumflex Accent
$trans[chr(137)] = '‰'; // Per Mille Sign
$trans[chr(138)] = 'Š'; // Latin Capital Letter S With Caron
$trans[chr(139)] = '‹'; // Single Left-Pointing Angle Quotation Mark
$trans[chr(140)] = 'Œ '; // Latin Capital Ligature OE
$trans[chr(145)] = '‘'; // Left Single Quotation Mark
$trans[chr(146)] = '’'; // Right Single Quotation Mark
$trans[chr(147)] = '“'; // Left Double Quotation Mark
$trans[chr(148)] = '”'; // Right Double Quotation Mark
$trans[chr(149)] = '•'; // Bullet
$trans[chr(150)] = '–'; // En Dash
$trans[chr(151)] = '—'; // Em Dash
$trans[chr(152)] = '˜'; // Small Tilde
$trans[chr(153)] = '™'; // Trade Mark Sign
$trans[chr(154)] = 'š'; // Latin Small Letter S With Caron
$trans[chr(155)] = '›'; // Single Right-Pointing Angle Quotation Mark
$trans[chr(156)] = 'œ'; // Latin Small Ligature OE
$trans[chr(159)] = 'Ÿ'; // Latin Capital Letter Y With Diaeresis
ksort($trans);
return $trans;
}If you want to display special HTML entities in a web browser, you can use the following code:
<?
$entities = get_html_translation_table(HTML_ENTITIES);
foreach ($entities as $entity) {
$new_entities[$entity] = htmlspecialchars($entity);
}
echo "<pre>";
print_r($new_entities);
echo "</pre>";
?>
If you don't, the key name of each element will appear to be the same as the element content itself, making it look mighty stupid. ;)I found this useful in converting latin characters
<?php
function convertLatin1ToHtml($str) {
$allEntities = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES);
$specialEntities = get_html_translation_table(HTML_SPECIALCHARS, ENT_NOQUOTES);
$noTags = array_diff($allEntities, $specialEntities);
$str = strtr($str, $noTags);
return $str;
}
?>If you want to decode all those { symbols as well....
function unhtmlentities ($string) {
$trans_tbl = get_html_translation_table (HTML_ENTITIES);
$trans_tbl = array_flip ($trans_tbl);
$ret = strtr ($string, $trans_tbl);
return preg_replace('/\&\#([0-9]+)\;/me',
"chr('\\1')",$ret);
}Alans version didn't seem to work right. If you're having the same problem consider using this slightly modified version instead:
function unhtmlentities ($string) {
$trans_tbl = get_html_translation_table (HTML_ENTITIES);
$trans_tbl = array_flip ($trans_tbl);
$ret = strtr ($string, $trans_tbl);
return preg_replace('/&#(\d+);/me',
"chr('\\1')",$ret);
}