• 首页
  • vue
  • TypeScript
  • JavaScript
  • scss
  • css3
  • html5
  • php
  • MySQL
  • redis
  • jQuery
  • get_html_translation_table()

    (PHP 4, PHP 5, PHP 7)

    返回使用htmlspecialchars()和htmlentities()后的转换表

    说明

    get_html_translation_table([int $table= HTML_SPECIALCHARS[,int $flags= ENT_COMPAT | ENT_HTML401[,string $encoding= 'UTF-8']]]) : array

    get_html_translation_table()将返回htmlspecialchars()和htmlentities()处理后的转换表。

    Note:

    特殊字符可以使用多种转换方式。例如:"可以被转换成","或者&#x22.get_html_translation_table()返回其中最常用的。

    参数

    $table

    有两个新的常量(HTML_ENTITIES,HTML_SPECIALCHARS)允许你指定你想要的表。

    $flags

    A bitmask of one or more of the following flags, which specify which quotes the table will contain as well as which document type the table is for. The default isENT_COMPAT | ENT_HTML401.

    Available$flagsconstants
    Constant NameDescription
    ENT_COMPATTable will contain entities for double-quotes, but not for single-quotes.
    ENT_QUOTESTable will contain entities for both double and single quotes.
    ENT_NOQUOTESTable will neither contain entities for single quotes nor for double quotes.
    ENT_HTML401Table for HTML 4.01.
    ENT_XML1Table for XML 1.
    ENT_XHTMLTable for XHTML.
    ENT_HTML5Table for HTML 5.
    $encoding

    Encoding to use. If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.

    支持以下字符集:

    支持的字符集列表
    字符集别名描述
    ISO-8859-1ISO8859-1西欧,Latin-1
    ISO-8859-5ISO8859-5Little used cyrillic charset (Latin/Cyrillic).
    ISO-8859-15ISO8859-15西欧,Latin-9。增加欧元符号,法语和芬兰语字母在 Latin-1(ISO-8859-1)中缺失。
    UTF-8ASCII 兼容的多字节 8 位 Unicode。
    cp866ibm866, 866DOS 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
    cp1251Windows-1251, win-1251, 1251Windows 特有的西里尔编码。本字符集在 4.3.2 版本中得到支持。
    cp1252Windows-1252, 1252Windows 特有的西欧编码。
    KOI8-Rkoi8-ru, koi8r俄语。本字符集在 4.3.2 版本中得到支持。
    BIG5950繁体中文,主要用于中国台湾省。
    GB2312936简体中文,中国国家标准字符集。
    BIG5-HKSCS繁体中文,附带香港扩展的 Big5 字符集。
    Shift_JISSJIS, 932日语
    EUC-JPEUCJP日语
    MacRomanMac OS 使用的字符串。
    ''An empty string activates detection from script encoding (Zend multibyte),default_charsetand current locale (seenl_langinfo()andsetlocale()), in this order. Not recommended.

    Note:其他字符集没有认可。将会使用默认编码并抛出异常。

    返回值

    将转换表作为一个数组返回。

    更新日志

    版本说明
    5.4.0The default value for the$encodingparameter was changed to UTF-8.
    5.4.0The constantsENT_HTML401,ENT_XML1,ENT_XHTMLandENT_HTML5were added.
    5.3.4The$encodingparameter was added.

    范例

    Translation Table Example

    <?php
    var_dump(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES  |  ENT_HTML5));
    ?>

    以上例程的输出类似于:

    array(1510) {
      ["
    "]=>
      string(9) "&NewLine;"
      ["!"]=>
      string(6) "&excl;"
      ["""]=>
      string(6) "&quot;"
      ["#"]=>
      string(5) "&num;"
      ["$"]=>
      string(8) "&dollar;"
      ["%"]=>
      string(8) "&percnt;"
      ["&"]=>
      string(5) "&amp;"
      ["'"]=>
      string(6) "&apos;"
      // ...
    }
    

    参见

    Be careful using get_html_translation_table() in a loop, as it's very slow.
    The fact that MS-word and some other sources use CP-1252, and that it is so close to Latin1 ('ISO-8859-1') causes a lot of confusion. What confused me the most was finding that mySQL uses CP-1252 by default.
    You may run into trouble if you find yourself tempted to do something like this:
    <?php
      $trans[chr(149)] = '&bull;';  // Bullet
      $trans[chr(150)] = '&ndash;';  // En Dash
      $trans[chr(151)] = '&mdash;';  // Em Dash
      $trans[chr(152)] = '&tilde;';  // Small Tilde
      $trans[chr(153)] = '&trade;';  // Trade Mark Sign
    ?>
    Don't do it. DON'T DO IT!
    You can use:
    <?php
      $translationTable = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES, 'WINDOWS-1252');
    ?>
    or just convert directly:
    <?php
      $output = htmlentities($input, ENT_NOQUOTES, 'WINDOWS-1252');
    ?>
    But your web page is probably encoded UTF-8, and you probably don't really want CP-1252 text flying around, so fix the character encoding first:
    <?php
      $output = mb_convert_encoding($input, 'UTF-8', 'WINDOWS-1252');
      $ouput = htmlentities($output);
    ?>
    Not sure what's going on here but I've run into a problem that others might face as well...
    <?php
    $translations = array_flip(get_html_translation_table(HTML_ENTITIES,ENT_QUOTES));
    ?>
    returns the single quote ' as being equal to &#39; while
    <?php
    $translatedString = htmlentities($string,ENT_QUOTES);
    ?>
    returns it as being equal to &#039;
    I've had to do a specific string replacement for the time being... Not sure if it's an issue with the function or the array manipulation.
    -Pat
    I wrote a quick little function for converting something like '&middot;' into '&#183;':
    $to_convert = '&middot;'; 
    $table = get_html_translation_table(HTML_ENTITIES);
    $equiv = '&#'.ord(array_search($to_convert,$table)).';';
    to display the mapping on a webpage no matter what the server encoding is, this can be used
     echo "<pre>\n";
     echo htmlentities(print_r((get_html_translation_table(HTML_SPECIALCHARS)), true));
     echo htmlentities(print_r((get_html_translation_table(HTML_ENTITIES)), true));
    since get_html_translation_table() actually gives the special chars in iso-8859-1 (Latin-1) encoding, so to see the tables correctly using
     print_r(get_html_translation_table(HTML_ENTITIES));
    your server needs to give a HTTP header as iso-8859-1, unless you use header() or manually set the browser's encoding setting to iso-8859-1. And you need to view the source of the page to see the mapping. (except English version of IE 7 outputs the page source as iso-8859-1 anyway).
    get_html_translation_table
    It works only with the first 256 Codepositions.
    For Higher Positions, for Example &#1092;
    (a kyrillic Letter) it shows the same.
    without heavy scientific analysis, this seems to work as a quick fix to making text originating from a Microsoft Word document display as HTML:
    <?php
    function DoHTMLEntities ($string)
      {
        $trans_tbl = get_html_translation_table (HTML_ENTITIES);
        
        // MS Word strangeness.. 
        // smart single/ double quotes:
        $trans_tbl[chr(145)] = '\''; 
        $trans_tbl[chr(146)] = '\''; 
        $trans_tbl[chr(147)] = '&quot;'; 
        $trans_tbl[chr(148)] = '&quot;'; 
            // Acute 'e'
        $trans_tbl[chr(142)] = '&eacute;';
        
        return strtr ($string, $trans_tbl);
      }
    ?>
    htmlentities includes htmlspecialchars, so here's how to convert an UTF-8 string :
    htmlentities($string, ENT_QUOTES, 'UTF-8');
    If you have troubles (like me) getting data from ISO-8859-1 encoded forms where user copy and paste from word, this routine could be useful.
    It adds to the standard get_html_translation_table the codes of the characters usually M$ Word replacs into typed text.
    Otherwise those characters would never be displayed correctly in html output.
    function get_html_translation_table_CP1252() {
      $trans = get_html_translation_table(HTML_ENTITIES);
      $trans[chr(130)] = '&sbquo;';  // Single Low-9 Quotation Mark
      $trans[chr(131)] = '&fnof;';  // Latin Small Letter F With Hook
      $trans[chr(132)] = '&bdquo;';  // Double Low-9 Quotation Mark
      $trans[chr(133)] = '&hellip;';  // Horizontal Ellipsis
      $trans[chr(134)] = '&dagger;';  // Dagger
      $trans[chr(135)] = '&Dagger;';  // Double Dagger
      $trans[chr(136)] = '&circ;';  // Modifier Letter Circumflex Accent
      $trans[chr(137)] = '&permil;';  // Per Mille Sign
      $trans[chr(138)] = '&Scaron;';  // Latin Capital Letter S With Caron
      $trans[chr(139)] = '&lsaquo;';  // Single Left-Pointing Angle Quotation Mark
      $trans[chr(140)] = '&OElig;  ';  // Latin Capital Ligature OE
      $trans[chr(145)] = '&lsquo;';  // Left Single Quotation Mark
      $trans[chr(146)] = '&rsquo;';  // Right Single Quotation Mark
      $trans[chr(147)] = '&ldquo;';  // Left Double Quotation Mark
      $trans[chr(148)] = '&rdquo;';  // Right Double Quotation Mark
      $trans[chr(149)] = '&bull;';  // Bullet
      $trans[chr(150)] = '&ndash;';  // En Dash
      $trans[chr(151)] = '&mdash;';  // Em Dash
      $trans[chr(152)] = '&tilde;';  // Small Tilde
      $trans[chr(153)] = '&trade;';  // Trade Mark Sign
      $trans[chr(154)] = '&scaron;';  // Latin Small Letter S With Caron
      $trans[chr(155)] = '&rsaquo;';  // Single Right-Pointing Angle Quotation Mark
      $trans[chr(156)] = '&oelig;';  // Latin Small Ligature OE
      $trans[chr(159)] = '&Yuml;';  // Latin Capital Letter Y With Diaeresis
      ksort($trans);
      return $trans;
    }
    If you want to display special HTML entities in a web browser, you can use the following code:
    <?
    $entities = get_html_translation_table(HTML_ENTITIES);
    foreach ($entities as $entity) {
      $new_entities[$entity] = htmlspecialchars($entity);
    }
    echo "<pre>";
    print_r($new_entities);
    echo "</pre>";
    ?>
    If you don't, the key name of each element will appear to be the same as the element content itself, making it look mighty stupid. ;)
    I found this useful in converting latin characters
    <?php
    function convertLatin1ToHtml($str) { 
    $allEntities = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES); 
    $specialEntities = get_html_translation_table(HTML_SPECIALCHARS, ENT_NOQUOTES); 
    $noTags = array_diff($allEntities, $specialEntities); 
    $str = strtr($str, $noTags); 
    return $str; 
    }
    ?>
    If you want to decode all those &#123; symbols as well.... 
    function unhtmlentities ($string) {
      $trans_tbl = get_html_translation_table (HTML_ENTITIES);
      $trans_tbl = array_flip ($trans_tbl);
      $ret = strtr ($string, $trans_tbl);
      return preg_replace('/\&\#([0-9]+)\;/me', 
        "chr('\\1')",$ret);
    }
    Alans version didn't seem to work right. If you're having the same problem consider using this slightly modified version instead:
    function unhtmlentities ($string) {
      $trans_tbl = get_html_translation_table (HTML_ENTITIES);
      $trans_tbl = array_flip ($trans_tbl);
      $ret = strtr ($string, $trans_tbl);
      return preg_replace('/&#(\d+);/me', 
       "chr('\\1')",$ret);
    }

    上篇:fprintf()

    下篇:hebrevc()