[DOCS] Fixes conflicts.

szabosteve · Oct 11, 2023 · 46e92bb · 46e92bb
1 parent 948f1fa
commit 46e92bb
Show file tree

Hide file tree

Showing 6 changed files with 129 additions and 121 deletions.
diff --git a/docs/reference/ingest/processors/inference.asciidoc b/docs/reference/ingest/processors/inference.asciidoc
@@ -15,20 +15,27 @@ ingested in the pipeline.
 .{infer-cap} Options
 [options="header"]
 |======
-| Name                                  | Required  | Default                        | Description
-| `model_id` .                          | yes       | -                              | (String) The ID or alias for the trained model, or the ID of the deployment.
-| `input_output`                        | no        | -                              | (List) Input fields for inference and output (destination) fields for the inference results. This options is incompatible with the `target_field` and `field_map` options.
-| `target_field`                        | no        | `ml.inference.<processor_tag>` | (String) Field added to incoming documents to contain results objects.
-| `field_map`                           | no        | If defined the model's default field map | (Object) Maps the document field names to the known field names of the model. This mapping takes precedence over any default mappings provided in the model configuration.
+| Name                                  | Required  | Default                                    | Description
+| `model_id` .                          | yes       | -                                          | (String) The ID or alias for the trained model, or the ID of the deployment.
+| `input_output`                        | no        | -                                          | (List) Input fields for {infer} and output (destination) fields for the {infer} results. This option is incompatible with the `target_field` and `field_map` options.
+| `target_field`                        | no        | `ml.inference.<processor_tag>`             | (String) Field added to incoming documents to contain results objects.
+| `field_map`                           | no        | If defined the model's default field map   | (Object) Maps the document field names to the known field names of the model. This mapping takes precedence over any default mappings provided in the model configuration.
 | `inference_config`                    | no        | The default settings defined in the model  | (Object) Contains the inference type and its options.
-| `ignore_missing`                      | no        | `false`  | (Boolean) If `true` and any of the input fields defined in `input_ouput` are missing then those missing fields are quietly ignored, otherwise a missing field causes a failure. Only applies when using `input_output` configurations to explicitly list the input fields.
+| `ignore_missing`                      | no        | `false`                                    | (Boolean) If `true` and any of the input fields defined in `input_ouput` are missing then those missing fields are quietly ignored, otherwise a missing field causes a failure. Only applies when using `input_output` configurations to explicitly list the input fields.
 include::common-options.asciidoc[]
 |======
 
+IMPORTANT: You cannot use the `input_output` field with the `target_field` and 
+`field_map` fields. For NLP models, use the `input_output` option. For 
+{dfanalytics} models, use the `target_field` and `field_map` option.
+
+
 [discrete]
 [[inference-input-output-example]]
 ==== Configuring input and output fields
-Select the `content` field for inference and write the result to `content_embedding`.
+
+Select the `content` field for inference and write the result to 
+`content_embedding`.
 
 [source,js]
 --------------------------------------------------
@@ -47,9 +54,11 @@ Select the `content` field for inference and write the result to `content_embedd
 // NOTCONSOLE
 
 ==== Configuring multiple inputs
-The `content` and `title` fields will be read from the incoming document
-and sent to the model for the inference. The inference output is written
-to `content_embedding` and `title_embedding` respectively.
+
+The `content` and `title` fields will be read from the incoming document and 
+sent to the model for the inference. The inference output is written to 
+`content_embedding` and `title_embedding` respectively.
+
 [source,js]
 --------------------------------------------------
 {
@@ -73,9 +82,9 @@ to `content_embedding` and `title_embedding` respectively.
 Selecting the input fields with `input_output` is incompatible with
 the `target_field` and `field_map` options.
 
-Data frame analytics models must use the `target_field` to specify the
-root location results are written to and optionally a `field_map` to map
-field names in the input document to the model input fields.
+{dfanalytics-cap} models must use the `target_field` to specify the root 
+location results are written to and optionally a `field_map` to map field names 
+in the input document to the model input fields.
 
 [source,js]
 --------------------------------------------------
@@ -92,6 +101,7 @@ field names in the input document to the model input fields.
 --------------------------------------------------
 // NOTCONSOLE
 
+
 [discrete]
 [[inference-processor-classification-opt]]
 ==== {classification-cap} configuration options

diff --git a/docs/reference/search/search-your-data/semantic-search-elser.asciidoc b/docs/reference/search/search-your-data/semantic-search-elser.asciidoc
@@ -42,11 +42,11 @@ you must provide suitably sized nodes yourself.
 [[elser-mappings]]
 ==== Create the index mapping
 
-First, the mapping of the destination index - the index that contains the tokens 
-that the model created based on your text - must be created.  The destination 
-index must have a field with the 
-<<rank-features, `sparse_vector` or `rank_features`>> field type to index the 
-ELSER output.
+First, the mapping of the destination index - the index that contains the tokens
+that the model created based on your text - must be created.  The destination
+index must have a field with the
+<<sparse-vector, `sparse_vector`>> or <<rank-features,`rank_features`>> field 
+type to index the ELSER output.
 
 NOTE: ELSER output must be ingested into a field with the `sparse_vector` or 
 `rank_features` field type. Otherwise, {es} interprets the token-weight pairs as 
@@ -61,21 +61,23 @@ PUT my-index
 {
   "mappings": {
     "properties": {
-      "ml.tokens": { <1>
+      "content_embedding": { <1>
         "type": "sparse_vector" <2>
       },
-      "text": { <3>
+      "content": { <3>
         "type": "text" <4>
       }
     }
   }
 }
 ----
 // TEST[skip:TBD]
-<1> The name of the field to contain the generated tokens.
+<1> The name of the field to contain the generated tokens. It must be refrenced 
+in the {infer} pipeline configuration in the next step.
 <2> The field to contain the tokens is a `sparse_vector` field.
 <3> The name of the field from which to create the sparse vector representation. 
-In this example, the name of the field is `text`.
+In this example, the name of the field is `content`. It must be referenced in the 
+{infer} pipeline configuration in the next step.
 <4> The field type which is text in this example.
 
 To learn how to optimize space, refer to the <<save-space>> section. 
@@ -91,32 +93,33 @@ that is being ingested in the pipeline.
 
 [source,console]
 ----
-PUT _ingest/pipeline/elser-v2-test
-{
-  "processors": [
-    {
-      "inference": {
-        "model_id": ".elser_model_2",
-        "target_field": "ml",
-        "field_map": { <1>
-          "text": "text_field"
-        },
-        "inference_config": {
-          "text_expansion": { <2>
-            "results_field": "tokens"
-          }
-        }
-      }
-    }
-  ]
+PUT _ingest/pipeline/elser-v2-test 
+{ 
+  "processors": [ 
+    { 
+      "inference": { 
+        "model_id": ".elser_model_2", 
+        "input_output": [ <1> 
+          { 
+            "input_field": "content", 
+            "output_field": "content_embedding" 
+          } 
+        ] 
+      } 
+    } 
+  ] 
 }
 ----
-// TEST[skip:TBD]
-<1> The `field_map` object maps the input document field name (which is `text` 
-in this example) to the name of the field that the model expects (which is 
-always `text_field`).
-<2> The `text_expansion` inference type needs to be used in the {infer} ingest 
-processor.
+<1> Configuration object that defines the `input_field` for the {infer} process 
+and the `output_field` that will contain the {infer} results.
+
+////
+[source,console]
+----
+DELETE _ingest/pipeline/elser-v2-test
+----
+// TEST[continued]
+////
 
 
 [discrete]
@@ -132,11 +135,11 @@ a list of relevant text passages. All unique passages, along with their IDs,
 have been extracted from that data set and compiled into a 
 https://github.com/elastic/stack-docs/blob/main/docs/en/stack/ml/nlp/data/msmarco-passagetest2019-unique.tsv[tsv file].
 
-Download the file and upload it to your cluster using the 
-{kibana-ref}/connect-to-elasticsearch.html#upload-data-kibana[Data Visualizer] 
-in the {ml-app} UI. Assign the name `id` to the first column and `text` to the 
-second column. The index name is `test-data`. Once the upload is complete, you 
-can see an index named `test-data` with 182469 documents.
+Download the file and upload it to your cluster using the
+{kibana-ref}/connect-to-elasticsearch.html#upload-data-kibana[Data Visualizer]
+in the {ml-app} UI. Assign the name `id` to the first column and `content` to 
+the second column. The index name is `test-data`. Once the upload is complete, 
+you can see an index named `test-data` with 182469 documents.
 
 
 [discrete]
@@ -183,16 +186,16 @@ follow the progress.
 
 To perform semantic search, use the `text_expansion` query, and provide the 
 query text and the ELSER model ID. The example below uses the query text "How to 
-avoid muscle soreness after running?", the `ml.tokens` field contains the 
-generated ELSER output:
+avoid muscle soreness after running?", the `content_embedding` field contains 
+the generated ELSER output:
 
 [source,console]
 ----
 GET my-index/_search
 {
    "query":{
       "text_expansion":{
-         "ml.tokens":{
+         "content_embedding":{
             "model_id":".elser_model_2",
             "model_text":"How to avoid muscle soreness after running?"
          }
@@ -209,40 +212,41 @@ weights.
 
 [source,consol-result]
 ----
-"hits":[
-   {
-      "_index":"my-index",
-      "_id":"978UAYgBKCQMet06sLEy",
-      "_score":18.612831,
-      "_ignored":[
-         "text.keyword"
-      ],
-      "_source":{
-         "id":7361587,
-         "text":"For example, if you go for a run, you will mostly use the muscles in your lower body. Give yourself 2 days to rest those muscles so they have a chance to heal before you exercise them again. Not giving your muscles enough time to rest can cause muscle damage, rather than muscle development.",
-         "ml":{
-            "tokens":{
-               "muscular":0.075696334,
-               "mostly":0.52380747,
-               "practice":0.23430172,
-               "rehab":0.3673556,
-               "cycling":0.13947526,
-               "your":0.35725075,
-               "years":0.69484913,
-               "soon":0.005317828,
-               "leg":0.41748235,
-               "fatigue":0.3157955,
-               "rehabilitation":0.13636169,
-               "muscles":1.302141,
-               "exercises":0.36694175,
-               (...)
-            },
-            "model_id":".elser_model_2"
-         }
+"hits": {
+  "total": {
+    "value": 10000,
+    "relation": "gte"
+  },
+  "max_score": 26.199875,
+  "hits": [
+    {
+      "_index": "my-index",
+      "_id": "FPr9HYsBag9jXmT8lEpI",
+      "_score": 26.199875,
+      "_source": {
+        "content_embedding": {
+          "muscular": 0.2821541,
+          "bleeding": 0.37929374,
+          "foods": 1.1718726,
+          "delayed": 1.2112266,
+          "cure": 0.6848574,
+          "during": 0.5886185,
+          "fighting": 0.35022718,
+          "rid": 0.2752442,
+          "soon": 0.2967024,
+          "leg": 0.37649947,
+          "preparation": 0.32974035,
+          "advance": 0.09652356,
+          (...)
+        },
+        "id": 1713868,
+        "model_id": ".elser_model_2",
+        "content": "For example, if you go for a run, you will mostly use the muscles in your lower body. Give yourself 2 days to rest those muscles so they have a chance to heal before you exercise them again. Not giving your muscles enough time to rest can cause muscle damage, rather than muscle development."
       }
-   },
-   (...)
-]
+    },
+    (...)
+  ]
+}
 ----
 // NOTCONSOLE
 
@@ -275,7 +279,7 @@ GET my-index/_search
       "should": [
         {
           "text_expansion": { 
-            "ml.tokens": {
+            "content_embedding": {
               "model_text": "How to avoid muscle soreness after running?",
               "model_id": ".elser_model_2",
               "boost": 1 <2>
@@ -328,8 +332,8 @@ reindexing will not be required in the future! It's important to carefully
 consider this trade-off and make sure that excluding the ELSER terms from the 
 source aligns with your specific requirements and use case.
 
-The mapping that excludes `ml.tokens` from the  `_source` field can be created 
-by the following API call: 
+The mapping that excludes `content_embedding` from the  `_source` field can be 
+created by the following API call:
 
 [source,console]
 ----
@@ -338,14 +342,14 @@ PUT my-index
   "mappings": {
     "_source": {
       "excludes": [
-        "ml.tokens"
+        "content_embedding"
       ]
     },
     "properties": {
-      "ml.tokens": {
+      "content_embedding": {
         "type": "sparse_vector" 
       },
-      "text": { 
+      "content": {
         "type": "text" 
       }
     }

diff --git a/docs/reference/tab-widgets/semantic-search/field-mappings.asciidoc b/docs/reference/tab-widgets/semantic-search/field-mappings.asciidoc
@@ -17,7 +17,7 @@ PUT my-index
 {
   "mappings": {
     "properties": {
-      "my_embeddings.tokens": { <1>
+      "my_tokens": { <1>
         "type": "sparse_vector" <2>
       },
       "my_text_field": { <3>

diff --git a/docs/reference/tab-widgets/semantic-search/generate-embeddings.asciidoc b/docs/reference/tab-widgets/semantic-search/generate-embeddings.asciidoc
@@ -15,32 +15,26 @@ This is how an ingest pipeline that uses the ELSER model is created:
 
 [source,console]
 ----
-PUT _ingest/pipeline/my-text-embeddings-pipeline
-{
+PUT _ingest/pipeline/my-text-embeddings-pipeline 
+{ 
   "description": "Text embedding pipeline",
-  "processors": [
-    {
-      "inference": {
-        "model_id": ".elser_model_2",
-        "target_field": "my_embeddings",
-        "field_map": { <1>
-          "my_text_field": "text_field"
-        },
-        "inference_config": {
-          "text_expansion": { <2>
-            "results_field": "tokens"
-          }
-        }
-      }
-    }
-  ]
+  "processors": [ 
+    { 
+      "inference": { 
+        "model_id": ".elser_model_2", 
+        "input_output": [ <1> 
+          { 
+            "input_field": "my_text_field", 
+            "output_field": "my_tokens" 
+          } 
+        ] 
+      } 
+    } 
+  ] 
 }
 ----
-<1> The `field_map` object maps the input document field name (which is
-`my_text_field` in this example) to the name of the field that the model expects
-(which is always `text_field`).
-<2> The `text_expansion` inference type needs to be used in the inference ingest 
-processor.
+<1> Configuration object that defines the `input_field` for the {infer} process 
+and the `output_field` that will contain the {infer} results.
 
 To ingest data through the pipeline to generate tokens with ELSER, refer to the 
 <<reindexing-data-elser>> section of the tutorial. After you successfully 

diff --git a/docs/reference/tab-widgets/semantic-search/hybrid-search.asciidoc b/docs/reference/tab-widgets/semantic-search/hybrid-search.asciidoc
@@ -21,7 +21,7 @@ GET my-index/_search
     {
       "query": {
         "text_expansion": {
-          "my_embeddings.tokens": {
+          "my_tokens": {
             "model_id": ".elser_model_2",
             "model_text": "the query string"
           }