Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add stop words and use them to filter text matches
Stop words are very common words that carry basically no information. Usually, stop word lists are language specific and you can easily see why: "hat" might be a normal word in English, but carries no information in German. "these" might be a stop word in English, but is a useful word in German. Unfortunately we don't have the luxury of only supporting one language and in fact: we don't even know the language of a certain document. So we are kind of forced to have a combined list. I created this semi-manually by combining DE and EN (the only languages we currently support), making sure that words that carry meaning in any of the languages are not marked as stop words. Additional languages can be added in the future, but each new one decreases the usefulness of the list. Once the need arises, we can also easily add the feature to configure your own stop words. These stop words we could just send to Meili, instructing it to ignore them. Unfortunately, there are some disadvantages to that as Meili doesn't nicely deal with stop words IMO: especially in phrase search, the highlighting is broken and might confuse users. Phrase search still kind of works but from reading the docs, I think with stop search "the" and "a", searching for "foo the bar" will also find documents with the text "foo a bar". See https://github.com/orgs/meilisearch/discussions/793 So instead, we just use the stop words to filter out matches in texts. That doesn't improve indexing speed, search speed, or index size in Meili, but it can vastly reduce the size of the GQL response to the frontend and makes the frontend less likely to choke on these useless matches. We might still use our stop words for more in the future (ignoring matches in metadata or even sending them to Meili once Meili fixes its problems).
- Loading branch information