Tidy可以用来解析、格式化HTML,是一个出色的HTML解析引擎,它最初设计的目的是用来自动修正HTML中的错误和松散的标签。它把HTML能转成XHTML的工具库,Tidy还可以将 HTML 转换成XML 。
tidy,整理的意思。tidy是一个绑定的HTML清洁和修复的工具,不仅仅允许我们清理,操作HTML文档,而且允许我们遍历文档树。
安装
此扩展与PHP5及更高版本捆绑在一起,并使用--with tidy配置选项安装。
On Redhat-ish linux, you must install both libtidy and libtidy-devel (PHP 5.x): sudo yum install libtidy libtidy-devel
运行时配置
这些函数的行为受 php.ini 中的设置影响。
名字 | 默认 | 可修改范围 | 更新日志 |
---|---|---|---|
tidy.default_config | "" | PHP_INI_SYSTEM | |
tidy.clean_output | "0" | PHP_INI_USER | PHP_INI_PERDIR prior to PHP 5.4.0 |
- tidy.default_config string:配置文件的默认路径
- tidy.clean_output boolean:通过Tidy打开/关闭输出修复,默认是 "0"
警告:
当生成一个非html格式的内容,例如:动态图片,不要开启 tidy.clean_output
tidy类
(PHP 5, PHP 7, PECL tidy >= 0.5.2)
tidy检测到的HTML文件中的HTML节点。
tidy类摘要
tidy { /* 属性 */ public string $errorBuffer; /* 方法 */ public body ( void ) : tidyNode public cleanRepair ( void ) : bool public __construct ([ string $filename [, mixed $config [, string $encoding [, bool $use_include_path ]]]] ) public diagnose ( void ) : bool public getConfig ( void ) : array public getHtmlVer ( void ) : int public getOpt ( string $option ) : mixed public getOptDoc ( string $optname ) : string public getRelease ( void ) : string public getStatus ( void ) : int public head ( void ) : tidyNode public html ( void ) : tidyNode public isXhtml ( void ) : bool public isXml ( void ) : bool public parseFile ( string $filename [, mixed $config [, string $encoding [, bool $use_include_path = FALSE ]]] ) : bool public parseString ( string $input [, mixed $config [, string $encoding ]] ) : bool public repairFile ( string $filename [, mixed $config [, string $encoding [, bool $use_include_path = FALSE ]]] ) : string public repairString ( string $data [, mixed $config [, string $encoding ]] ) : string public root ( void ) : tidyNode }
tidyNode类
(PHP 5, PHP 7)
tidy检测到的HTML文件中的HTML节点。
tidyNode类摘要
final tidyNode { /* 属性 */ public string $value; public string $name; public int $type; public int $line; public int $column; public bool $proprietary; public int $id; public array $attribute; public array $child; /* 方法 */ private __construct ( void ) public getParent ( void ) : tidyNode public hasChildren ( void ) : bool public hasSiblings ( void ) : bool public isAsp ( void ) : bool public isComment ( void ) : bool public isHtml ( void ) : bool public isJste ( void ) : bool public isPhp ( void ) : bool public isText ( void ) : bool }
例子
<?php ob_start(); ?> <html>a html document</html> <?php $html = ob_get_clean(); // Specify configuration $config = array( 'indent' => true, 'output-xhtml' => true, 'wrap' => 200); // Tidy $tidy = new tidy; $tidy->parseString($html, $config, 'utf8'); $tidy->cleanRepair(); // Output echo $tidy; ?>
Anyone trying to specify "indent: auto" as documented at http://tidy.sourceforge.net/docs/quickref.html#indent <?php $tidy_options = array('indent' => 'auto'); // WILL NOT WORK $tidy_options = array('indent' => 2); // equivalent of auto $tidy = new Tidy(); $tidy->parseString($html, $tidy_options); ?>
预定义常量
下列常量由此扩展定义,且仅在此扩展编译入 PHP 或在运行时动态载入时可用。
每个IDY_TAG_XXX代表一个HTML标记。例如,TIDY_TAG_A表示一个“<a href=“XX”>链接</a&ht;”标记。
tidy的预定义常量相对较多,因为是针对HTML,所以包含了HTML的tag,attribute,nodetype。
Constant | Notes |
---|---|
TIDY_TAG_UNKNOWN | |
TIDY_TAG_A | |
TIDY_TAG_ABBR | |
TIDY_TAG_ACRONYM | |
TIDY_TAG_ALIGN | |
TIDY_TAG_APPLET | |
TIDY_TAG_AREA | |
TIDY_TAG_ARTICLE | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_ASIDE | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_AUDIO | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_B | |
TIDY_TAG_BASE | |
TIDY_TAG_BASEFONT | |
TIDY_TAG_BDI | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_BDO | |
TIDY_TAG_BGSOUND | |
TIDY_TAG_BIG | |
TIDY_TAG_BLINK | |
TIDY_TAG_BLOCKQUOTE | |
TIDY_TAG_BODY | |
TIDY_TAG_BR | |
TIDY_TAG_BUTTON | |
TIDY_TAG_CANVAS | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_CAPTION | |
TIDY_TAG_CENTER | |
TIDY_TAG_CITE | |
TIDY_TAG_CODE | |
TIDY_TAG_COL | |
TIDY_TAG_COLGROUP | |
TIDY_TAG_COMMAND | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_COMMENT | |
TIDY_TAG_DATALIST | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_DD | |
TIDY_TAG_DEL | |
TIDY_TAG_DETAILS | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_DFN | |
TIDY_TAG_DIALOG | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_DIR | |
TIDY_TAG_DIV | |
TIDY_TAG_DL | |
TIDY_TAG_DT | |
TIDY_TAG_EM | |
TIDY_TAG_EMBED | |
TIDY_TAG_FIELDSET | |
TIDY_TAG_FIGCAPTION | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_FIGURE | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_FONT | |
TIDY_TAG_FOOTER | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_FORM | |
TIDY_TAG_FRAME | |
TIDY_TAG_FRAMESET | |
TIDY_TAG_H1 | |
TIDY_TAG_H2 | |
TIDY_TAG_H3 | |
TIDY_TAG_H4 | |
TIDY_TAG_H5 | |
TIDY_TAG_H6 | |
TIDY_TAG_HEAD | |
TIDY_TAG_HEADER | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_HGROUP | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_HR | |
TIDY_TAG_HTML | |
TIDY_TAG_I | |
TIDY_TAG_IFRAME | |
TIDY_TAG_ILAYER | |
TIDY_TAG_IMG | |
TIDY_TAG_INPUT | |
TIDY_TAG_INS | |
TIDY_TAG_ISINDEX | |
TIDY_TAG_KBD | |
TIDY_TAG_KEYGEN | |
TIDY_TAG_LABEL | |
TIDY_TAG_LAYER | |
TIDY_TAG_LEGEND | |
TIDY_TAG_LI | |
TIDY_TAG_LINK | |
TIDY_TAG_LISTING | |
TIDY_TAG_MAIN | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_MAP | |
TIDY_TAG_MARK | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_MARQUEE | |
TIDY_TAG_MENU | |
TIDY_TAG_MENUITEM | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_META | |
TIDY_TAG_METER | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_MULTICOL | |
TIDY_TAG_NAV | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_NOBR | |
TIDY_TAG_NOEMBED | |
TIDY_TAG_NOFRAMES | |
TIDY_TAG_NOLAYER | |
TIDY_TAG_NOSAVE | |
TIDY_TAG_NOSCRIPT | |
TIDY_TAG_OBJECT | |
TIDY_TAG_OL | |
TIDY_TAG_OPTGROUP | |
TIDY_TAG_OPTION | |
TIDY_TAG_OUTPUT | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_P | |
TIDY_TAG_PARAM | |
TIDY_TAG_PLAINTEXT | |
TIDY_TAG_PRE | |
TIDY_TAG_PROGRESS | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_Q | |
TIDY_TAG_RB | |
TIDY_TAG_RBC | |
TIDY_TAG_RP | |
TIDY_TAG_RT | |
TIDY_TAG_RTC | |
TIDY_TAG_RUBY | |
TIDY_TAG_S | |
TIDY_TAG_SAMP | |
TIDY_TAG_SCRIPT | |
TIDY_TAG_SECTION | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_SELECT | |
TIDY_TAG_SERVER | |
TIDY_TAG_SERVLET | |
TIDY_TAG_SMALL | |
TIDY_TAG_SOURCE | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_SPACER | |
TIDY_TAG_SPAN | |
TIDY_TAG_STRIKE | |
TIDY_TAG_STRONG | |
TIDY_TAG_STYLE | |
TIDY_TAG_SUB | |
TIDY_TAG_SUMMARY | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_SUP | |
TIDY_TAG_TABLE | |
TIDY_TAG_TBODY | |
TIDY_TAG_TD | |
TIDY_TAG_TEMPLATE | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_TEXTAREA | |
TIDY_TAG_TFOOT | |
TIDY_TAG_TH | |
TIDY_TAG_THEAD | |
TIDY_TAG_TIME | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_TITLE | |
TIDY_TAG_TR | |
TIDY_TAG_TRACK | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_TT | |
TIDY_TAG_U | |
TIDY_TAG_UL | |
TIDY_TAG_VAR | |
TIDY_TAG_VIDEO | Added in libtidy 5.0.0. Available as of PHP 7.4.0. |
TIDY_TAG_WBR | |
TIDY_TAG_XMP |
constant | description |
---|---|
TIDY_NODETYPE_ROOT | root node |
TIDY_NODETYPE_DOCTYPE | doctype |
TIDY_NODETYPE_COMMENT | HTML comment |
TIDY_NODETYPE_PROCINS | Processing Instruction |
TIDY_NODETYPE_TEXT | Text |
TIDY_NODETYPE_START | start tag |
TIDY_NODETYPE_END | end tag |
TIDY_NODETYPE_STARTEND | empty tag |
TIDY_NODETYPE_CDATA | CDATA |
TIDY_NODETYPE_SECTION | XML section |
TIDY_NODETYPE_ASP | ASP code |
TIDY_NODETYPE_JSTE | JSTE code |
TIDY_NODETYPE_PHP | PHP code |
TIDY_NODETYPE_XMLDECL | XML declaration |