get_html_translation_table()

(PHP 4, PHP 5, PHP 7)

返回使用htmlspecialchars()和htmlentities()后的转换表

说明

get_html_translation_table([int $table= HTML_SPECIALCHARS[,int $flags= ENT_COMPAT | ENT_HTML401[,string $encoding= 'UTF-8']]]) : array

get_html_translation_table()将返回htmlspecialchars()和htmlentities()处理后的转换表。

Note:
特殊字符可以使用多种转换方式。例如："可以被转换成","或者&#x22.get_html_translation_table()返回其中最常用的。

参数

$table

有两个新的常量(HTML_ENTITIES,HTML_SPECIALCHARS)允许你指定你想要的表。

$flags

A bitmask of one or more of the following flags, which specify which quotes the table will contain as well as which document type the table is for. The default isENT_COMPAT | ENT_HTML401.

**Available$flagsconstants**
Constant Name	Description
`ENT_COMPAT`	Table will contain entities for double-quotes, but not for single-quotes.
`ENT_QUOTES`	Table will contain entities for both double and single quotes.
`ENT_NOQUOTES`	Table will neither contain entities for single quotes nor for double quotes.
`ENT_HTML401`	Table for HTML 4.01.
`ENT_XML1`	Table for XML 1.
`ENT_XHTML`	Table for XHTML.
`ENT_HTML5`	Table for HTML 5.

$encoding

Encoding to use. If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.

支持以下字符集：

**支持的字符集列表**
字符集	别名	描述
ISO-8859-1	ISO8859-1	西欧，Latin-1
ISO-8859-5	ISO8859-5	Little used cyrillic charset (Latin/Cyrillic).
ISO-8859-15	ISO8859-15	西欧，Latin-9。增加欧元符号，法语和芬兰语字母在 Latin-1(ISO-8859-1)中缺失。
UTF-8		ASCII 兼容的多字节 8 位 Unicode。
cp866	ibm866, 866	DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1251	Windows-1251, win-1251, 1251	Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
cp1252	Windows-1252, 1252	Windows 特有的西欧编码。
KOI8-R	koi8-ru, koi8r	俄语。本字符集在 4.3.2 版本中得到支持。
BIG5	950	繁体中文，主要用于中国台湾省。
GB2312	936	简体中文，中国国家标准字符集。
BIG5-HKSCS		繁体中文，附带香港扩展的 Big5 字符集。
Shift_JIS	SJIS, 932	日语
EUC-JP	EUCJP	日语
MacRoman		Mac OS 使用的字符串。
''		An empty string activates detection from script encoding (Zend multibyte),default_charsetand current locale (seenl_langinfo()andsetlocale()), in this order. Not recommended.

Note:其他字符集没有认可。将会使用默认编码并抛出异常。

返回值

将转换表作为一个数组返回。

更新日志

版本	说明
5.4.0	The default value for the$encodingparameter was changed to UTF-8.
5.4.0	The constants`ENT_HTML401`,`ENT_XML1`,`ENT_XHTML`and`ENT_HTML5`were added.
5.3.4	The$encodingparameter was added.

范例

Translation Table Example

<?php
var_dump(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES  |  ENT_HTML5));
?>

以上例程的输出类似于：

array(1510) {
  ["
"]=>
  string(9) "&NewLine;"
  ["!"]=>
  string(6) "&excl;"
  ["""]=>
  string(6) "&quot;"
  ["#"]=>
  string(5) "&num;"
  ["$"]=>
  string(8) "&dollar;"
  ["%"]=>
  string(8) "&percnt;"
  ["&"]=>
  string(5) "&amp;"
  ["'"]=>
  string(6) "&apos;"
  // ...
}

参见

htmlspecialchars()将特殊字符转换为 HTML 实体
htmlentities()将字符转换为 HTML 转义字符
html_entity_decode()Convert HTML entities to their corresponding characters

Be careful using get_html_translation_table() in a loop, as it's very slow.

The fact that MS-word and some other sources use CP-1252, and that it is so close to Latin1 ('ISO-8859-1') causes a lot of confusion. What confused me the most was finding that mySQL uses CP-1252 by default.
You may run into trouble if you find yourself tempted to do something like this:
<?php
  $trans[chr(149)] = '&bull;';  // Bullet
  $trans[chr(150)] = '&ndash;';  // En Dash
  $trans[chr(151)] = '&mdash;';  // Em Dash
  $trans[chr(152)] = '&tilde;';  // Small Tilde
  $trans[chr(153)] = '&trade;';  // Trade Mark Sign
?>
Don't do it. DON'T DO IT!
You can use:
<?php
  $translationTable = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES, 'WINDOWS-1252');
?>
or just convert directly:
<?php
  $output = htmlentities($input, ENT_NOQUOTES, 'WINDOWS-1252');
?>
But your web page is probably encoded UTF-8, and you probably don't really want CP-1252 text flying around, so fix the character encoding first:
<?php
  $output = mb_convert_encoding($input, 'UTF-8', 'WINDOWS-1252');
  $ouput = htmlentities($output);
?>

Not sure what's going on here but I've run into a problem that others might face as well...
<?php
$translations = array_flip(get_html_translation_table(HTML_ENTITIES,ENT_QUOTES));
?>
returns the single quote ' as being equal to &#39; while
<?php
$translatedString = htmlentities($string,ENT_QUOTES);
?>
returns it as being equal to &#039;
I've had to do a specific string replacement for the time being... Not sure if it's an issue with the function or the array manipulation.
-Pat

I wrote a quick little function for converting something like '&middot;' into '&#183;':
$to_convert = '&middot;'; 
$table = get_html_translation_table(HTML_ENTITIES);
$equiv = '&#'.ord(array_search($to_convert,$table)).';';

to display the mapping on a webpage no matter what the server encoding is, this can be used
 echo "<pre>\n";
 echo htmlentities(print_r((get_html_translation_table(HTML_SPECIALCHARS)), true));
 echo htmlentities(print_r((get_html_translation_table(HTML_ENTITIES)), true));
since get_html_translation_table() actually gives the special chars in iso-8859-1 (Latin-1) encoding, so to see the tables correctly using
 print_r(get_html_translation_table(HTML_ENTITIES));
your server needs to give a HTTP header as iso-8859-1, unless you use header() or manually set the browser's encoding setting to iso-8859-1. And you need to view the source of the page to see the mapping. (except English version of IE 7 outputs the page source as iso-8859-1 anyway).

get_html_translation_table
It works only with the first 256 Codepositions.
For Higher Positions, for Example &#1092;
(a kyrillic Letter) it shows the same.

without heavy scientific analysis, this seems to work as a quick fix to making text originating from a Microsoft Word document display as HTML:
<?php
function DoHTMLEntities ($string)
  {
    $trans_tbl = get_html_translation_table (HTML_ENTITIES);
    
    // MS Word strangeness.. 
    // smart single/ double quotes:
    $trans_tbl[chr(145)] = '\''; 
    $trans_tbl[chr(146)] = '\''; 
    $trans_tbl[chr(147)] = '&quot;'; 
    $trans_tbl[chr(148)] = '&quot;'; 
        // Acute 'e'
    $trans_tbl[chr(142)] = '&eacute;';
    
    return strtr ($string, $trans_tbl);
  }
?>

htmlentities includes htmlspecialchars, so here's how to convert an UTF-8 string :
htmlentities($string, ENT_QUOTES, 'UTF-8');

If you have troubles (like me) getting data from ISO-8859-1 encoded forms where user copy and paste from word, this routine could be useful.
It adds to the standard get_html_translation_table the codes of the characters usually M$ Word replacs into typed text.
Otherwise those characters would never be displayed correctly in html output.
function get_html_translation_table_CP1252() {
  $trans = get_html_translation_table(HTML_ENTITIES);
  $trans[chr(130)] = '&sbquo;';  // Single Low-9 Quotation Mark
  $trans[chr(131)] = '&fnof;';  // Latin Small Letter F With Hook
  $trans[chr(132)] = '&bdquo;';  // Double Low-9 Quotation Mark
  $trans[chr(133)] = '&hellip;';  // Horizontal Ellipsis
  $trans[chr(134)] = '&dagger;';  // Dagger
  $trans[chr(135)] = '&Dagger;';  // Double Dagger
  $trans[chr(136)] = '&circ;';  // Modifier Letter Circumflex Accent
  $trans[chr(137)] = '&permil;';  // Per Mille Sign
  $trans[chr(138)] = '&Scaron;';  // Latin Capital Letter S With Caron
  $trans[chr(139)] = '&lsaquo;';  // Single Left-Pointing Angle Quotation Mark
  $trans[chr(140)] = '&OElig;  ';  // Latin Capital Ligature OE
  $trans[chr(145)] = '&lsquo;';  // Left Single Quotation Mark
  $trans[chr(146)] = '&rsquo;';  // Right Single Quotation Mark
  $trans[chr(147)] = '&ldquo;';  // Left Double Quotation Mark
  $trans[chr(148)] = '&rdquo;';  // Right Double Quotation Mark
  $trans[chr(149)] = '&bull;';  // Bullet
  $trans[chr(150)] = '&ndash;';  // En Dash
  $trans[chr(151)] = '&mdash;';  // Em Dash
  $trans[chr(152)] = '&tilde;';  // Small Tilde
  $trans[chr(153)] = '&trade;';  // Trade Mark Sign
  $trans[chr(154)] = '&scaron;';  // Latin Small Letter S With Caron
  $trans[chr(155)] = '&rsaquo;';  // Single Right-Pointing Angle Quotation Mark
  $trans[chr(156)] = '&oelig;';  // Latin Small Ligature OE
  $trans[chr(159)] = '&Yuml;';  // Latin Capital Letter Y With Diaeresis
  ksort($trans);
  return $trans;
}

If you want to display special HTML entities in a web browser, you can use the following code:
<?
$entities = get_html_translation_table(HTML_ENTITIES);
foreach ($entities as $entity) {
  $new_entities[$entity] = htmlspecialchars($entity);
}
echo "<pre>";
print_r($new_entities);
echo "</pre>";
?>
If you don't, the key name of each element will appear to be the same as the element content itself, making it look mighty stupid. ;)

I found this useful in converting latin characters
<?php
function convertLatin1ToHtml($str) { 
$allEntities = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES); 
$specialEntities = get_html_translation_table(HTML_SPECIALCHARS, ENT_NOQUOTES); 
$noTags = array_diff($allEntities, $specialEntities); 
$str = strtr($str, $noTags); 
return $str; 
}
?>

If you want to decode all those &#123; symbols as well.... 
function unhtmlentities ($string) {
  $trans_tbl = get_html_translation_table (HTML_ENTITIES);
  $trans_tbl = array_flip ($trans_tbl);
  $ret = strtr ($string, $trans_tbl);
  return preg_replace('/\&\#([0-9]+)\;/me', 
    "chr('\\1')",$ret);
}

Alans version didn't seem to work right. If you're having the same problem consider using this slightly modified version instead:
function unhtmlentities ($string) {
  $trans_tbl = get_html_translation_table (HTML_ENTITIES);
  $trans_tbl = array_flip ($trans_tbl);
  $ret = strtr ($string, $trans_tbl);
  return preg_replace('/&#(\d+);/me', 
   "chr('\\1')",$ret);
}