Skip to content

Commit

Permalink
Add reindex examples
Browse files Browse the repository at this point in the history
  • Loading branch information
lcawl committed Jan 16, 2025
1 parent f0a380b commit 64d7de0
Show file tree
Hide file tree
Showing 36 changed files with 778 additions and 153 deletions.
68 changes: 40 additions & 28 deletions output/openapi/elasticsearch-openapi.json

Large diffs are not rendered by default.

62 changes: 37 additions & 25 deletions output/openapi/elasticsearch-serverless-openapi.json

Large diffs are not rendered by default.

152 changes: 98 additions & 54 deletions output/schema/schema.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion specification/_doc_ids/table.csv
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,6 @@ docs-multi-termvectors,https://www.elastic.co/guide/en/elasticsearch/reference/{
docs-reindex,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/docs-reindex.html
docs-termvectors,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/docs-termvectors.html
docs-update-by-query,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/docs-update-by-query.html
docs-update-by-query,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/docs-update-by-query.html
docs-update,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/docs-update.html
document-input-parameters,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/query-dsl-mlt-query.html#_document_input_parameters
dot-expand-processor,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/dot-expand-processor.html
Expand Down Expand Up @@ -680,6 +679,7 @@ set-processor,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/s
shape,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/shape.html
simulate-ingest-api,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/simulate-ingest-api.html
simulate-pipeline-api,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/simulate-pipeline-api.html
slice-scroll,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/paginate-search-results.html#slice-scroll
slm-api-delete-policy,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/slm-api-delete-policy.html
slm-api-execute-lifecycle,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/slm-api-execute-lifecycle.html
slm-api-execute-retention,https://www.elastic.co/guide/en/elasticsearch/reference/{branch}/slm-api-execute-retention.html
Expand Down
213 changes: 206 additions & 7 deletions specification/_global/reindex/ReindexRequest.ts

Large diffs are not rendered by default.

47 changes: 47 additions & 0 deletions specification/_global/reindex/ReindexResponse.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,21 +25,68 @@ import { DurationValue, EpochTime, UnitMillis } from '@_types/Time'

export class Response {
body: {
/**
* The number of scroll responses that were pulled back by the reindex.
*/
batches?: long
/**
* The number of documents that were successfully created.
*/
created?: long
/**
* The number of documents that were successfully deleted.
*/
deleted?: long
/**
* If there were any unrecoverable errors during the process, it is an array of those failures.
* If this array is not empty, the request ended because of those failures.
* Reindex is implemented using batches and any failure causes the entire process to end but all failures in the current batch are collected into the array.
* You can use the `conflicts` option to prevent the reindex from ending on version conflicts.
*/
failures?: BulkIndexByScrollFailure[]
/**
* The number of documents that were ignored because the script used for the reindex returned a `noop` value for `ctx.op`.
*/
noops?: long
/**
* The number of retries attempted by reindex.
*/
retries?: Retries
/**
* The number of requests per second effectively run during the reindex.
*/
requests_per_second?: float
slice_id?: integer
task?: TaskId
/**
* The number of milliseconds the request slept to conform to `requests_per_second`.
*/
throttled_millis?: EpochTime<UnitMillis>
/**
* This field should always be equal to zero in a reindex response.
* It has meaning only when using the task API, where it indicates the next time (in milliseconds since epoch) that a throttled request will be run again in order to conform to `requests_per_second`.
*/
throttled_until_millis?: EpochTime<UnitMillis>
/**
* If any of the requests that ran during the reindex timed out, it is `true`.
*/
timed_out?: boolean
/**
* The total milliseconds the entire operation took.
*/
took?: DurationValue<UnitMillis>
/**
* The number of documents that were successfully processed.
*/
total?: long
/**
* The number of documents that were successfully updated.
* That is to say, a document with the same ID already existed before the reindex updated it.
*/
updated?: long
/**
* The number of version conflicts that occurred.
*/
version_conflicts?: long
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
summary: Reindex multiple sources
# method_request: POST _reindex
description: >
Run `POST _reindex` to reindex from multiple sources.
The `index` attribute in source can be a list, which enables you to copy from lots of sources in one request.
This example copies documents from the `my-index-000001` and `my-index-000002` indices.
# type: request
value: |-
{
"source": {
"index": ["my-index-000001", "my-index-000002"]
},
"dest": {
"index": "my-new-index-000002"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
summary: Reindex with Painless
# method_request: POST _reindex
description: >
You can use Painless to reindex daily indices to apply a new template to the existing documents.
The script extracts the date from the index name and creates a new index with `-1` appended.
For example, all data from `metricbeat-2016.05.31` will be reindexed into `metricbeat-2016.05.31-1`.
# type: request
value:
"{\n \"source\": {\n \"index\": \"metricbeat-*\"\n },\n \"dest\": {\n\
\ \"index\": \"metricbeat\"\n },\n \"script\": {\n \"lang\": \"painless\"\
,\n \"source\": \"ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(),\
\ ctx._index.length())) + '-1'\"\n }\n}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
summary: Reindex a random subset
# method_request: POST _reindex
description: >
Run `POST _reindex` to extract a random subset of the source for testing.
You might need to adjust the `min_score` value depending on the relative amount of data extracted from source.
# type: request
value:
"{\n \"max_docs\": 10,\n \"source\": {\n \"index\": \"my-index-000001\"\
,\n \"query\": {\n \"function_score\" : {\n \"random_score\" : {},\n\
\ \"min_score\" : 0.9\n }\n }\n },\n \"dest\": {\n \"index\"\
: \"my-new-index-000001\"\n }\n}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
summary: Reindex modified documents
# method_request: POST _reindex
description: >
Run `POST _reindex` to modify documents during reindexing.
This example bumps the version of the source document.
# type: request
value:
"{\n \"source\": {\n \"index\": \"my-index-000001\"\n },\n \"dest\":\
\ {\n \"index\": \"my-new-index-000001\",\n \"version_type\": \"external\"\
\n },\n \"script\": {\n \"source\": \"if (ctx._source.foo == 'bar') {ctx._version++;\
\ ctx._source.remove('foo')}\",\n \"lang\": \"painless\"\n }\n}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
summary: Reindex from remote on Elastic Cloud
# method_request: POST _reindex
description: >
When using Elastic Cloud, you can run `POST _reindex` and authenticate against a remote cluster with an API key.
# type: request
value:
"{\n \"source\": {\n \"remote\": {\n \"host\": \"http://otherhost:9200\"\
,\n \"username\": \"user\",\n \"password\": \"pass\"\n },\n \"index\"\
: \"my-index-000001\",\n \"query\": {\n \"match\": {\n \"test\":\
\ \"data\"\n }\n }\n },\n \"dest\": {\n \"index\": \"my-new-index-000001\"\
\n }\n}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
summary: Manual slicing
# method_request: POST _reindex
description: >
Run `POST _reindex` to slice a reindex request manually.
Provide a slice ID and total number of slices to each request.
# type: request
value: |-
{
"source": {
"index": "my-index-000001",
"slice": {
"id": 0,
"max": 2
}
},
"dest": {
"index": "my-new-index-000001"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
summary: Automatic slicing
# method_request: POST _reindex?slices=5&refresh
description: >
Run `POST _reindex?slices=5&refresh` to automatically parallelize using sliced scroll to slice on `_id`.
The `slices` parameter specifies the number of slices to use.
# type: request
value:
"{\n \"source\": {\n \"index\": \"my-index-000001\"\n },\n \"dest\":\
\ {\n \"index\": \"my-new-index-000001\"\n }\n}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
summary: Routing
# method_request: POST _reindex
description: >
By default if reindex sees a document with routing then the routing is preserved unless it's changed by the script.
You can set `routing` on the `dest` request to change this behavior.
In this example, run `POST _reindex` to copy all documents from the `source` with the company name `cat` into the `dest` with routing set to `cat`.
# type: request
value:
"{\n \"source\": {\n \"index\": \"source\",\n \"query\": {\n \"\
match\": {\n \"company\": \"cat\"\n }\n }\n },\n \"dest\": {\n\
\ \"index\": \"dest\",\n \"routing\": \"=cat\"\n }\n}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
summary: Ingest pipelines
# method_request: POST _reindex
description: Run `POST _reindex` and use the ingest pipelines feature.
# type: request
value:
"{\n \"source\": {\n \"index\": \"source\"\n },\n \"dest\": {\n \"\
index\": \"dest\",\n \"pipeline\": \"some_ingest_pipeline\"\n }\n}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
summary: Reindex with a query
# method_request: POST _reindex
description: >
Run `POST _reindex` and add a query to the `source` to limit the documents to reindex.
For example, this request copies documents into `my-new-index-000001` only if they have a `user.id` of `kimchy`.
# type: request
value:
"{\n \"source\": {\n \"index\": \"my-index-000001\",\n \"query\": {\n\
\ \"term\": {\n \"user.id\": \"kimchy\"\n }\n }\n },\n \"\
dest\": {\n \"index\": \"my-new-index-000001\"\n }\n}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
summary: Reindex with max_docs
# method_request: POST _reindex
description: >
You can limit the number of processed documents by setting `max_docs`.
For example, run `POST _reindex` to copy a single document from `my-index-000001` to `my-new-index-000001`.
# type: request
value:
"{\n \"max_docs\": 1,\n \"source\": {\n \"index\": \"my-index-000001\"\
\n },\n \"dest\": {\n \"index\": \"my-new-index-000001\"\n }\n}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
summary: Reindex selected fields
# method_request: POST _reindex
description: >
You can use source filtering to reindex a subset of the fields in the original documents.
For example, run `POST _reindex` the reindex only the `user.id` and `_doc` fields of each document.
# type: request
value:
"{\n \"source\": {\n \"index\": \"my-index-000001\",\n \"_source\":\
\ [\"user.id\", \"_doc\"]\n },\n \"dest\": {\n \"index\": \"my-new-index-000001\"\
\n }\n}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
summary: Reindex new field names
# method_request: POST _reindex
description: >
A reindex operation can build a copy of an index with renamed fields.
If your index has documents with `text` and `flag` fields, you can change the latter field name to `tag` during the reindex.
# type: request
value:
"{\n \"source\": {\n \"index\": \"my-index-000001\"\n },\n \"dest\":\
\ {\n \"index\": \"my-new-index-000001\"\n },\n \"script\": {\n \"source\"\
: \"ctx._source.tag = ctx._source.remove(\\\"flag\\\")\"\n }\n}"
37 changes: 26 additions & 11 deletions specification/_global/reindex/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,9 @@ export class Destination {
*/
index: IndexName
/**
* Set to `create` to only index documents that do not already exist.
* Important: To reindex to a data stream destination, this argument must be `create`.
* If it is `create`, the operation will only index documents that do not already exist (also known as "put if absent").
*
* IMPORTANT: To reindex to a data stream destination, this argument must be `create`.
* @server_default index
*/
op_type?: OpType
Expand All @@ -52,8 +53,10 @@ export class Destination {
*/
pipeline?: string
/**
* By default, a document's routing is preserved unless it’s changed by the script.
* Set to `discard` to set routing to `null`, or `=value` to route using the specified `value`.
* By default, a document's routing is preserved unless it's changed by the script.
* If it is `keep`, the routing on the bulk request sent for each match is set to the routing on the match.
* If it is `discard`, the routing on the bulk request sent for each match is set to `null`.
* If it is `=value`, the routing on the bulk request sent for each match is set to all value specified after the equals sign (`=`).
* @server_default keep
*/
routing?: Routing
Expand All @@ -66,11 +69,11 @@ export class Destination {
export class Source {
/**
* The name of the data stream, index, or alias you are copying from.
* Accepts a comma-separated list to reindex from multiple sources.
* It accepts a comma-separated list to reindex from multiple sources.
*/
index: Indices
/**
* Specifies the documents to reindex using the Query DSL.
* The documents to reindex, which is defined with Query DSL.
*/
query?: QueryContainer
/**
Expand All @@ -79,17 +82,27 @@ export class Source {
remote?: RemoteSource
/**
* The number of documents to index per batch.
* Use when indexing from remote to ensure that the batches fit within the on-heap buffer, which defaults to a maximum size of 100 MB.
* Use it when you are indexing from remote to ensure that the batches fit within the on-heap buffer, which defaults to a maximum size of 100 MB.
* @server_default 1000
*/
size?: integer
/**
* Slice the reindex request manually using the provided slice ID and total number of slices.
*/
slice?: SlicedScroll
/**
* A comma-separated list of `<field>:<direction>` pairs to sort by before indexing.
* Use it in conjunction with `max_docs` to control what documents are reindexed.
*
* WARNING: Sort in reindex is deprecated.
* Sorting in reindex was never guaranteed to index documents in order and prevents further development of reindex such as resilience and performance improvements.
* If used in combination with `max_docs`, consider using a query filter instead.
* @deprecated 7.6.0
*/
sort?: Sort
/**
* If `true` reindexes all source fields.
* Set to a list to reindex select fields.
* If `true`, reindex all source fields.
* Set it to a list to reindex select fields.
* @server_default true
* @codegen_name source_fields */
_source?: Fields
Expand All @@ -99,7 +112,7 @@ export class Source {
export class RemoteSource {
/**
* The remote connection timeout.
* Defaults to 30 seconds.
* @server_default 30s
*/
connect_timeout?: Duration
/**
Expand All @@ -108,6 +121,7 @@ export class RemoteSource {
headers?: Dictionary<string, string>
/**
* The URL for the remote instance of Elasticsearch that you want to index from.
* This information is required when you're indexing from remote.
*/
host: Host
/**
Expand All @@ -119,7 +133,8 @@ export class RemoteSource {
*/
password?: Password
/**
* The remote socket read timeout. Defaults to 30 seconds.
* The remote socket read timeout.
* @server_default 30s
*/
socket_timeout?: Duration
}
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,20 @@ import { float } from '@_types/Numeric'
* Throttle a reindex operation.
*
* Change the number of requests per second for a particular reindex operation.
* For example:
*
* ```
* POST _reindex/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_second=-1
* ```
*
* Rethrottling that speeds up the query takes effect immediately.
* Rethrottling that slows down the query will take effect after completing the current batch.
* This behavior prevents scroll timeouts.
* @rest_spec_name reindex_rethrottle
* @availability stack since=2.4.0 stability=stable
* @availability serverless stability=stable visibility=private
* @doc_tag document
* @doc_id docs-reindex
*/
export interface Request extends RequestBase {
urls: [
Expand All @@ -39,13 +49,14 @@ export interface Request extends RequestBase {
]
path_parts: {
/**
* Identifier for the task.
* The task identifier, which can be found by using the tasks API.
*/
task_id: Id
}
query_parameters: {
/**
* The throttle for this request in sub-requests per second.
* It can be either `-1` to turn off throttling or any decimal number like `1.7` or `12` to throttle to that level.
*/
requests_per_second?: float
}
Expand Down
Loading

0 comments on commit 64d7de0

Please sign in to comment.