• 首页
  • vue
  • TypeScript
  • JavaScript
  • scss
  • css3
  • html5
  • php
  • MySQL
  • redis
  • jQuery
  • 全文停用词

    使用服务器字符集和校验规则(character_set_servercollation_server系统变量的值),加载停用词列表并搜索全文查询。如果用于全文索引或搜索的停用词文件或列的字符集或校验规则不同于character_set_server或,则对于停用词查找可能会出现错误的命中或遗漏collation_server

    停用词查找的区分大小写取决于服务器校验规则。例如,查找是不区分大小写如果核对是utf8mb4_0900_ai_ci,反之,如果核对是查找是大小写敏感utf8mb4_0900_as_csutf8mb4_bin

    • InnoDB搜索索引的停用词
    • MyISAM搜索索引的停用词

    InnoDB搜索索引的停用词

    InnoDB缺省停用词的列表相对较短,因为技术,文学和其他来源的文档经常使用短词作为关键字或重要短语。例如,您可能搜索“是或不是”,并期望获得明智的结果,而不是忽略所有这些词。

    要参见默认InnoDB停用词列表,请查询INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD表。

    mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
    +-------	+
    | value 	|
    +-------	+
    | a     	|
    | about 	|
    | an    	|
    | are   	|
    | as    	|
    | at    	|
    | be    	|
    | by    	|
    | com   	|
    | de    	|
    | en    	|
    | for   	|
    | from  	|
    | how   	|
    | i     	|
    | in    	|
    | is    	|
    | it    	|
    | la    	|
    | of    	|
    | on    	|
    | or    	|
    | that  	|
    | the   	|
    | this  	|
    | to    	|
    | was   	|
    | what  	|
    | when  	|
    | where 	|
    | who   	|
    | will  	|
    | with  	|
    | und   	|
    | the   	|
    | www   	|
    +-------	+
    36 rows in set (0.00 sec)
    

    要为所有InnoDB表定义自己的停用词列表,请定义与该表具有相同结构的INNODB_FT_DEFAULT_STOPWORD表,并使用停用词填充该表,并将innodb_ft_server_stopword_table选项的值设置为表单中的值,然后再创建全文索引。停用词表必须只有一个名为的列。以下示例演示了如何为创建和配置新的全局停用词表。db_name/table_nameVARCHARvalueInnoDB

    -- Create a new stopword table
    
    mysql> CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;
    Query OK, 0 rows affected (0.01 sec)
    
    -- Insert stopwords (for simplicity, a single stopword is used in this example)
    
    mysql> INSERT INTO my_stopwords(value) VALUES ('Ishmael');
    Query OK, 1 row affected (0.00 sec)
    
    -- Create the table
    
    mysql> CREATE TABLE opening_lines (
    id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
    opening_line TEXT(500),
    author VARCHAR(200),
    title VARCHAR(200)
    ) ENGINE=InnoDB;
    Query OK, 0 rows affected (0.01 sec)
    
    -- Insert data into the table
    
    mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
    ('Call me Ishmael.','Herman Melville','Moby-Dick'),
    ('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
    ('I am an invisible man.','Ralph Ellison','Invisible Man'),
    ('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
    ('It was love at first sight.','Joseph Heller','Catch-22'),
    ('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
    ('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
    ('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
    Query OK, 8 rows affected (0.00 sec)
    Records: 8  Duplicates: 0  Warnings: 0
    
    -- Set the innodb_ft_server_stopword_table option to the new stopword table
    
    mysql> SET GLOBAL innodb_ft_server_stopword_table = 'test/my_stopwords';
    Query OK, 0 rows affected (0.00 sec)
    
    -- Create the full-text index (which rebuilds the table if no FTS_DOC_ID column is defined)
    
    mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
    Query OK, 0 rows affected, 1 warning (1.17 sec)
    Records: 0  Duplicates: 0  Warnings: 1
    

    通过查询中的单词,确认没有出现指定的停用词('Ishmael')INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE

    注意

    默认情况下,长度少于3个字符或长度大于84个字符的单词不会出现在InnoDB全文搜索索引中。可以使用innodb_ft_max_token_sizeinnodb_ft_min_token_size变量配置最大和最小字长值。此默认行为不适用于ngram解析器插件。ngram令牌大小由该ngram_token_size选项定义。

    mysql> SET GLOBAL innodb_ft_aux_table='test/opening_lines';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> SELECT word FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE LIMIT 15;
    +-----------	+
    | word      	|
    +-----------	+
    | across    	|
    | all       	|
    | burn      	|
    | buy       	|
    | call      	|
    | comes     	|
    | dalloway  	|
    | first     	|
    | flowers   	|
    | happened  	|
    | herself   	|
    | invisible 	|
    | less      	|
    | love      	|
    | man       	|
    +-----------	+
    15 rows in set (0.00 sec)
    

    要逐个表创建停用词列表,请创建其他停用词表,并使用该innodb_ft_user_stopword_table选项指定要使用的停用词表,然后再创建全文索引。

    MyISAM搜索索引的停用词

    停止字文件被加载并使用搜索latin1,如果character_set_serverucs2utf16utf16le,或utf32

    要覆盖MyISAM表的默认停用词列表,请设置ft_stopword_file系统变量。(请参见“服务器系统变量”。)变量值应为包含停用词列表的文件的路径名,或为禁用停用词过滤的空字符串。除非指定了绝对路径名以指定其他目录,否则服务器将在数据目录中查找文件。更改此变量的值或停用词文件的内容后,重新启动服务器并重建FULLTEXT索引。

    停用词列表是自由格式的,使用任何非字母数字字符(例如换行符,空格或逗号)分隔停用词。下划线字符(_)和单撇号(')被视为单词的一部分,但例外。停用词列表的字符集是服务器的默认字符集。请参见“服务器字符集和校验规则”。

    以下列表显示了MyISAM搜索索引的默认停用词。在MySQL源代码发行版中,您可以在storage/myisam/ft_static.c文件中找到此列表。

    a's           able          about         above         according
    accordingly   across        actually      after         afterwards
    again         against       ain't         all           allow
    allows        almost        alone         along         already
    also          although      always        am            among
    amongst       an            and           another       any
    anybody       anyhow        anyone        anything      anyway
    anyways       anywhere      apart         appear        appreciate
    appropriate   are           aren't        around        as
    aside         ask           asking        associated    at
    available     away          awfully       be            became
    because       become        becomes       becoming      been
    before        beforehand    behind        being         believe
    below         beside        besides       best          better
    between       beyond        both          brief         but
    by            c'mon         c's           came          can
    can't         cannot        cant          cause         causes
    certain       certainly     changes       clearly       co
    com           come          comes         concerning    consequently
    consider      considering   contain       containing    contains
    corresponding could         couldn't      course        currently
    definitely    described     despite       did           didn't
    different     do            does          doesn't       doing
    don't         done          down          downwards     during
    each          edu           eg            eight         either
    else          elsewhere     enough        entirely      especially
    et            etc           even          ever          every
    everybody     everyone      everything    everywhere    ex
    exactly       example       except        far           few
    fifth         first         five          followed      following
    follows       for           former        formerly      forth
    four          from          further       furthermore   get
    gets          getting       given         gives         go
    goes          going         gone          got           gotten
    greetings     had           hadn't        happens       hardly
    has           hasn't        have          haven't       having
    he            he's          hello         help          hence
    her           here          here's        hereafter     hereby
    herein        hereupon      hers          herself       hi
    him           himself       his           hither        hopefully
    how           howbeit       however       i'd           i'll
    i'm           i've          ie            if            ignored
    immediate     in            inasmuch      inc           indeed
    indicate      indicated     indicates     inner         insofar
    instead       into          inward        is            isn't
    it            it'd          it'll         it's          its
    itself        just          keep          keeps         kept
    know          known         knows         last          lately
    later         latter        latterly      least         less
    lest          let           let's         like          liked
    likely        little        look          looking       looks
    ltd           mainly        many          may           maybe
    me            mean          meanwhile     merely        might
    more          moreover      most          mostly        much
    must          my            myself        name          namely
    nd            near          nearly        necessary     need
    needs         neither       never         nevertheless  new
    next          nine          no            nobody        non
    none          noone         nor           normally      not
    nothing       novel         now           nowhere       obviously
    of            off           often         oh            ok
    okay          old           on            once          one
    ones          only          onto          or            other
    others        otherwise     ought         our           ours
    ourselves     out           outside       over          overall
    own           particular    particularly  per           perhaps
    placed        please        plus          possible      presumably
    probably      provides      que           quite         qv
    rather        rd            re            really        reasonably
    regarding     regardless    regards       relatively    respectively
    right         said          same          saw           say
    saying        says          second        secondly      see
    seeing        seem          seemed        seeming       seems
    seen          self          selves        sensible      sent
    serious       seriously     seven         several       shall
    she           should        shouldn't     since         six
    so            some          somebody      somehow       someone
    something     sometime      sometimes     somewhat      somewhere
    soon          sorry         specified     specify       specifying
    still         sub           such          sup           sure
    t's           take          taken         tell          tends      
    th            than          thank         thanks        thanx
    that          that's        thats         the           their
    theirs        them          themselves    then          thence
    there         there's       thereafter    thereby       therefore
    therein       theres        thereupon     these         they
    they'd        they'll       they're       they've       think
    third         this          thorough      thoroughly    those
    though        three         through       throughout    thru
    thus          to            together      too           took
    toward        towards       tried         tries         truly
    try           trying        twice         two           un
    under         unfortunately unless        unlikely      until
    unto          up            upon          us            use
    used          useful        uses          using         usually
    value         various       very          via           viz
    vs            want          wants         was           wasn't
    way           we            we'd          we'll         we're
    we've         welcome       well          went          were
    weren't       what          what's        whatever      when
    whence        whenever      where         where's       whereafter
    whereas       whereby       wherein       whereupon     wherever
    whether       which         while         whither       who
    who's         whoever       whole         whom          whose
    why           will          willing       wish          with
    within        without       won't         wonder        would
    wouldn't      yes           yet           you           you'd
    you'll        you're        you've        your          yours
    yourself      yourselves    zero