• 首页
  • css3教程
  • html5教程
  • jQuery手册
  • php7教程
  • MySQL手册
  • apache手册
  • 位置: php7教程 -> php7外部扩展库

    tidy库(解析HTML)

    Tidy可以用来解析、格式化HTML,是一个出色的HTML解析引擎,它最初设计的目的是用来自动修正HTML中的错误和松散的标签。它把HTML能转成XHTML的工具库,Tidy还可以将 HTML 转换成XML 。

    tidy,整理的意思。tidy是一个绑定的HTML清洁和修复的工具,不仅仅允许我们清理,操作HTML文档,而且允许我们遍历文档树。

    安装

    此扩展与PHP5及更高版本捆绑在一起,并使用--with tidy配置选项安装。

    On Redhat-ish linux, you must install both libtidy and libtidy-devel (PHP 5.x):
    sudo yum install libtidy libtidy-devel
    

    运行时配置

    这些函数的行为受 php.ini 中的设置影响。

    Tidy Configuration Options
    名字默认可修改范围更新日志
    tidy.default_config""PHP_INI_SYSTEM
    tidy.clean_output"0"PHP_INI_USERPHP_INI_PERDIR prior to PHP 5.4.0
    • tidy.default_config string:配置文件的默认路径
    • tidy.clean_output boolean:通过Tidy打开/关闭输出修复,默认是 "0"
    警告:
    当生成一个非html格式的内容,例如:动态图片,不要开启 tidy.clean_output

    tidy类

    (PHP 5, PHP 7, PECL tidy >= 0.5.2)

    tidy检测到的HTML文件中的HTML节点。

    tidy类摘要

    tidy 
    {
    	/* 属性 */
    	public string $errorBuffer;
    	/* 方法 */
    	public body ( void ) : tidyNode
    	public cleanRepair ( void ) : bool
    	public __construct ([ string $filename [, mixed $config [, string $encoding [, bool $use_include_path ]]]] )
    	public diagnose ( void ) : bool
    	public getConfig ( void ) : array
    	public getHtmlVer ( void ) : int
    	public getOpt ( string $option ) : mixed
    	public getOptDoc ( string $optname ) : string
    	public getRelease ( void ) : string
    	public getStatus ( void ) : int
    	public head ( void ) : tidyNode
    	public html ( void ) : tidyNode
    	public isXhtml ( void ) : bool
    	public isXml ( void ) : bool
    	public parseFile ( string $filename [, mixed $config [, string $encoding [, bool $use_include_path = FALSE ]]] ) : bool
    	public parseString ( string $input [, mixed $config [, string $encoding ]] ) : bool
    	public repairFile ( string $filename [, mixed $config [, string $encoding [, bool $use_include_path = FALSE ]]] ) : string
    	public repairString ( string $data [, mixed $config [, string $encoding ]] ) : string
    	public root ( void ) : tidyNode
    }
    

    tidyNode类

    (PHP 5, PHP 7)

    tidy检测到的HTML文件中的HTML节点。

    tidyNode类摘要

    final tidyNode
    {
    	/* 属性 */
    	public string $value;
    	public string $name;
    	public int $type;
    	public int $line;
    	public int $column;
    	public bool $proprietary;
    	public int $id;
    	public array $attribute;
    	public array $child;
    	/* 方法 */
    	private __construct ( void )
    	public getParent ( void ) : tidyNode
    	public hasChildren ( void ) : bool
    	public hasSiblings ( void ) : bool
    	public isAsp ( void ) : bool
    	public isComment ( void ) : bool
    	public isHtml ( void ) : bool
    	public isJste ( void ) : bool
    	public isPhp ( void ) : bool
    	public isText ( void ) : bool
    }
    

    例子

    <?php
    ob_start();
    ?>
    <html>a html document</html>
    <?php
    $html = ob_get_clean();
    // Specify configuration
    $config = array(
               'indent'         => true,
               'output-xhtml'   => true,
               'wrap'           => 200);
    // Tidy
    $tidy = new tidy;
    $tidy->parseString($html, $config, 'utf8');
    $tidy->cleanRepair();
    // Output
    echo $tidy;
    ?>
    
    Anyone trying to specify "indent: auto" as documented at http://tidy.sourceforge.net/docs/quickref.html#indent
    <?php
    $tidy_options = array('indent' => 'auto'); // WILL NOT WORK
    $tidy_options = array('indent' => 2); // equivalent of auto
    $tidy = new Tidy();
    $tidy->parseString($html, $tidy_options);
    ?>
    

    预定义常量

    下列常量由此扩展定义,且仅在此扩展编译入 PHP 或在运行时动态载入时可用。

    每个IDY_TAG_XXX代表一个HTML标记。例如,TIDY_TAG_A表示一个“<a href=“XX”>链接</a&ht;”标记。

    tidy的预定义常量相对较多,因为是针对HTML,所以包含了HTML的tag,attribute,nodetype。

    tidy tag constants
    ConstantNotes
    TIDY_TAG_UNKNOWN
    TIDY_TAG_A
    TIDY_TAG_ABBR
    TIDY_TAG_ACRONYM
    TIDY_TAG_ALIGN
    TIDY_TAG_APPLET
    TIDY_TAG_AREA
    TIDY_TAG_ARTICLEAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_ASIDEAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_AUDIOAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_B
    TIDY_TAG_BASE
    TIDY_TAG_BASEFONT
    TIDY_TAG_BDIAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_BDO
    TIDY_TAG_BGSOUND
    TIDY_TAG_BIG
    TIDY_TAG_BLINK
    TIDY_TAG_BLOCKQUOTE
    TIDY_TAG_BODY
    TIDY_TAG_BR
    TIDY_TAG_BUTTON
    TIDY_TAG_CANVASAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_CAPTION
    TIDY_TAG_CENTER
    TIDY_TAG_CITE
    TIDY_TAG_CODE
    TIDY_TAG_COL
    TIDY_TAG_COLGROUP
    TIDY_TAG_COMMANDAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_COMMENT
    TIDY_TAG_DATALISTAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_DD
    TIDY_TAG_DEL
    TIDY_TAG_DETAILSAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_DFN
    TIDY_TAG_DIALOGAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_DIR
    TIDY_TAG_DIV
    TIDY_TAG_DL
    TIDY_TAG_DT
    TIDY_TAG_EM
    TIDY_TAG_EMBED
    TIDY_TAG_FIELDSET
    TIDY_TAG_FIGCAPTIONAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_FIGUREAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_FONT
    TIDY_TAG_FOOTERAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_FORM
    TIDY_TAG_FRAME
    TIDY_TAG_FRAMESET
    TIDY_TAG_H1
    TIDY_TAG_H2
    TIDY_TAG_H3
    TIDY_TAG_H4
    TIDY_TAG_H5
    TIDY_TAG_H6
    TIDY_TAG_HEAD
    TIDY_TAG_HEADERAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_HGROUPAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_HR
    TIDY_TAG_HTML
    TIDY_TAG_I
    TIDY_TAG_IFRAME
    TIDY_TAG_ILAYER
    TIDY_TAG_IMG
    TIDY_TAG_INPUT
    TIDY_TAG_INS
    TIDY_TAG_ISINDEX
    TIDY_TAG_KBD
    TIDY_TAG_KEYGEN
    TIDY_TAG_LABEL
    TIDY_TAG_LAYER
    TIDY_TAG_LEGEND
    TIDY_TAG_LI
    TIDY_TAG_LINK
    TIDY_TAG_LISTING
    TIDY_TAG_MAINAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_MAP
    TIDY_TAG_MARKAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_MARQUEE
    TIDY_TAG_MENU
    TIDY_TAG_MENUITEMAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_META
    TIDY_TAG_METERAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_MULTICOL
    TIDY_TAG_NAVAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_NOBR
    TIDY_TAG_NOEMBED
    TIDY_TAG_NOFRAMES
    TIDY_TAG_NOLAYER
    TIDY_TAG_NOSAVE
    TIDY_TAG_NOSCRIPT
    TIDY_TAG_OBJECT
    TIDY_TAG_OL
    TIDY_TAG_OPTGROUP
    TIDY_TAG_OPTION
    TIDY_TAG_OUTPUTAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_P
    TIDY_TAG_PARAM
    TIDY_TAG_PLAINTEXT
    TIDY_TAG_PRE
    TIDY_TAG_PROGRESSAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_Q
    TIDY_TAG_RB
    TIDY_TAG_RBC
    TIDY_TAG_RP
    TIDY_TAG_RT
    TIDY_TAG_RTC
    TIDY_TAG_RUBY
    TIDY_TAG_S
    TIDY_TAG_SAMP
    TIDY_TAG_SCRIPT
    TIDY_TAG_SECTIONAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_SELECT
    TIDY_TAG_SERVER
    TIDY_TAG_SERVLET
    TIDY_TAG_SMALL
    TIDY_TAG_SOURCEAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_SPACER
    TIDY_TAG_SPAN
    TIDY_TAG_STRIKE
    TIDY_TAG_STRONG
    TIDY_TAG_STYLE
    TIDY_TAG_SUB
    TIDY_TAG_SUMMARYAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_SUP
    TIDY_TAG_TABLE
    TIDY_TAG_TBODY
    TIDY_TAG_TD
    TIDY_TAG_TEMPLATEAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_TEXTAREA
    TIDY_TAG_TFOOT
    TIDY_TAG_TH
    TIDY_TAG_THEAD
    TIDY_TAG_TIMEAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_TITLE
    TIDY_TAG_TR
    TIDY_TAG_TRACKAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_TT
    TIDY_TAG_U
    TIDY_TAG_UL
    TIDY_TAG_VAR
    TIDY_TAG_VIDEOAdded in libtidy 5.0.0. Available as of PHP 7.4.0.
    TIDY_TAG_WBR
    TIDY_TAG_XMP
    tidy nodetype constants
    constantdescription
    TIDY_NODETYPE_ROOTroot node
    TIDY_NODETYPE_DOCTYPEdoctype
    TIDY_NODETYPE_COMMENTHTML comment
    TIDY_NODETYPE_PROCINSProcessing Instruction
    TIDY_NODETYPE_TEXTText
    TIDY_NODETYPE_STARTstart tag
    TIDY_NODETYPE_ENDend tag
    TIDY_NODETYPE_STARTENDempty tag
    TIDY_NODETYPE_CDATACDATA
    TIDY_NODETYPE_SECTIONXML section
    TIDY_NODETYPE_ASPASP code
    TIDY_NODETYPE_JSTEJSTE code
    TIDY_NODETYPE_PHPPHP code
    TIDY_NODETYPE_XMLDECLXML declaration