forked from elastic/elasticsearch
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[DOCS] Add full-text search overview
- Loading branch information
1 parent
7f37edf
commit 0ea93d2
Showing
4 changed files
with
139 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
62 changes: 62 additions & 0 deletions
62
docs/reference/images/search/full-text-search-overview.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
68 changes: 68 additions & 0 deletions
68
docs/reference/search/search-your-data/full-text-search.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
[[full-text-search]] | ||
== Full-text search | ||
|
||
.Hands-on introduction to full-text search | ||
[TIP] | ||
==== | ||
Would you prefer to jump straight into a hands-on tutorial? | ||
Refer to our quick start <<full-text-filter-tutorial,full-text search tutorial>>. | ||
==== | ||
|
||
Full-text search, also known as lexical search, is a technique for fast, efficient searching through text fields in documents. | ||
Documents and search queries are transformed to enable returning https://www.elastic.co/what-is/search-relevance[relevant] results instead of simply exact term matches. | ||
Fields of type <<text-field-type,`text`>> are analyzed and indexed for full-text search. | ||
|
||
Built on decades of information retrieval research, full-text search in {es} is a compute-efficient, deterministic approach that scales predictably with data volume. | ||
Full-text search is the cornerstone of production-grade search solutions. | ||
Combine full-text search with <<semantic-search,semantic search using vectors>> to build modern hybrid search applications. | ||
|
||
[discrete] | ||
[[full-text-search-how-it-works]] | ||
=== How full-text search works | ||
|
||
The following diagram illustrates the components of full-text search. Note that the query text also undergoes text analysis, so that it's transformed in the same way as the indexed text. | ||
|
||
image::images/search/full-text-search-overview.svg[Components of full-text search from analysis to relevance scoring, align=center, width=500] | ||
|
||
At a high level, full-text search involves the following: | ||
|
||
* <<analysis-overview,*Text analysis*>>: Analysis consists of a pipeline of sequential transformations. Text is transformed into a format optimized for searching by stemming, lowercasing, stop word elimination, etc. {es} contains a number of built-in <<analysis-analyzers,analyzers>> (including language-specific analyzers) and tokenizers, and you can also create custom analyzers. | ||
+ | ||
[TIP] | ||
==== | ||
Refer to <<test-analyzer,Test an analyzer>> to learn how to test an analyzer and inspect the tokens and metadata it generates. | ||
==== | ||
* *Inverted index*: After analysis is complete, {es} builds an inverted index from the resulting tokens. | ||
An inverted index is a data structure that maps each token to the documents that contain it. | ||
It's made up of two key components: | ||
** *Dictionary*: A sorted list of all unique terms in the collection of documents in your index. | ||
** *Posting list*: For each term, a list of document IDs where the term appears, along with optional metadata like term frequency and position. | ||
* *Relevance scoring*: Results are ranked by how relevant they are to the given query. The relevance score of each document is represented by a positive floating-point number called the `_score`. The higher the `_score`, the more relevant the document. | ||
+ | ||
The default <<index-modules-similarity,similarity algorithm>> {es} uses for calculating relevance scores is https://en.wikipedia.org/wiki/Okapi_BM25[Okapi BM25], a variation of the https://en.wikipedia.org/wiki/Tf–idf[TF-IDF algorithm]. BM25 calculates relevance scores based on term frequency, document frequency, and document length. | ||
Refer to this https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[technical blog post] for a deep dive into BM25. | ||
* *Full-text search query*: Query text is analyzed <<analysis-index-search-time,the same way as the indexed text>>, and the resulting tokens are used to search the inverted index. | ||
+ | ||
Query DSL supports a number of <<full-text-queries,full-text queries>>. | ||
+ | ||
As of 8.17, {esql} also supports <<esql-search-functions,full-text search>> functions. | ||
|
||
[discrete] | ||
[[full-text-search-learn-more]] | ||
=== Learn more | ||
|
||
.Getting Started | ||
* <<full-text-filter-tutorial,Hands-on full-text search tutorial>> | ||
|
||
.Core Concepts | ||
* <<text,Text fields>> | ||
* <<analysis,Text analysis>> | ||
* <<analysis-tokenizers,Tokenizers>> | ||
* <<analysis-analyzers,Analyzers>> | ||
|
||
.Search APIs | ||
* <<full-text-queries,Full-text queries using Query DSL>> | ||
* <<esql-search-functions,Full-text search functions in {esql}>> | ||
|
||
.Advanced Topics | ||
* https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables[Practical BM25: Part 2 - The BM25 Algorithm and its Variables] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters