From 60ccb376d888aed1812b2255792a7149d0d8207c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Wed, 18 Oct 2023 11:30:38 +0200 Subject: [PATCH] [DOCS] Adds links to token section in ESLER conceptual. (#101033) --- .../search-your-data/semantic-search-elser.asciidoc | 13 +++++++++---- .../semantic-search/generate-embeddings.asciidoc | 4 +++- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/docs/reference/search/search-your-data/semantic-search-elser.asciidoc b/docs/reference/search/search-your-data/semantic-search-elser.asciidoc index eab2bd19b0e73..0178ede83054f 100644 --- a/docs/reference/search/search-your-data/semantic-search-elser.asciidoc +++ b/docs/reference/search/search-your-data/semantic-search-elser.asciidoc @@ -208,7 +208,10 @@ GET my-index/_search The result is the top 10 documents that are closest in meaning to your query text from the `my-index` index sorted by their relevancy. The result also contains the extracted tokens for each of the relevant search results with their -weights. +weights. Tokens are learned associations capturing relevance, they are not +synonyms. To learn more about what tokens are, refer to +{ml-docs}/ml-nlp-elser.html#elser-tokens[this page]. It is possible to exclude +tokens from source, refer to <> to learn more. [source,consol-result] ---- @@ -325,12 +328,14 @@ by using the <> mapping to remove the ELSER terms from the document source. WARNING: Reindex uses the document source to populate the destination index. -Once the ELSER terms have been excluded from the source, they cannot be -recovered through reindexing. Excluding the tokens from the source is a +**Once the ELSER terms have been excluded from the source, they cannot be** +**recovered through reindexing.** Excluding the tokens from the source is a space-saving optimsation that should only be applied if you are certain that reindexing will not be required in the future! It's important to carefully consider this trade-off and make sure that excluding the ELSER terms from the -source aligns with your specific requirements and use case. +source aligns with your specific requirements and use case. Review the +<> and <> sections carefully to learn +more about the possible consequences of excluding the tokens from the `_source`. The mapping that excludes `content_embedding` from the `_source` field can be created by the following API call: diff --git a/docs/reference/tab-widgets/semantic-search/generate-embeddings.asciidoc b/docs/reference/tab-widgets/semantic-search/generate-embeddings.asciidoc index caf6523783b02..2294d3c5598c5 100644 --- a/docs/reference/tab-widgets/semantic-search/generate-embeddings.asciidoc +++ b/docs/reference/tab-widgets/semantic-search/generate-embeddings.asciidoc @@ -39,7 +39,9 @@ and the `output_field` that will contain the {infer} results. To ingest data through the pipeline to generate tokens with ELSER, refer to the <> section of the tutorial. After you successfully ingested documents by using the pipeline, your index will contain the tokens -generated by ELSER. +generated by ELSER. Tokens are learned associations capturing relevance, they +are not synonyms. To learn more about what tokens are, refer to +{ml-docs}/ml-nlp-elser.html#elser-tokens[this page]. // end::elser[]