Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

停用词相关 #54

Open
tenlee2012 opened this issue Oct 15, 2022 · 0 comments
Open

停用词相关 #54

tenlee2012 opened this issue Oct 15, 2022 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@tenlee2012
Copy link
Owner

tenlee2012 commented Oct 15, 2022

看好很多同学再问停用词功能。
非常抱歉,本插件不支持停用词配置以及远程停用词词库。
原因是elasticsearch本身就有停用词功能,中文的停用词更新也不频繁,就没有重复造轮子。
如有需要,请使用es原生提供的停用词功能。
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-tokenfilter.html

PUT /my-index-000001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "tokenizer": "whitespace",
          "filter": [ "my_custom_stop_words_filter" ]
        }
      },
      "filter": {
        "my_custom_stop_words_filter": {
          "type": "stop",
          "stopwords_path": "停用词路径,每个词一行"
          "ignore_case": true
        }
      }
    }
  }
}

PS: 其实词库的热更新,也只是做到了在词库更新之后的新doc才会被新词识别,旧doc还是要依赖索引重建。

@tenlee2012 tenlee2012 added the documentation Improvements or additions to documentation label Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant