Skip to content

Commit

Permalink
Add Multi-Field Support for Semantic Text Fields
Browse files Browse the repository at this point in the history
Semantic text fields now support multi-fields, either as part of a multi-field structure or containing multi-fields internally.
This enhancement aligns with the semantic text field's current behavior as a standard text field.

Note: Multi-field support is only available for the new index format. Attempting to set a multi-field on an index created with the older format will still result in a failure.
  • Loading branch information
jimczi committed Jan 14, 2025
1 parent 30948ac commit 0faa561
Show file tree
Hide file tree
Showing 8 changed files with 286 additions and 131 deletions.
48 changes: 26 additions & 22 deletions docs/reference/mapping/types/semantic-text.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -182,16 +182,11 @@ Even if the script targets non-`semantic_text` fields, the update will fail when

[discrete]
[[copy-to-support]]
==== `copy_to` support
==== `copy_to` and multi-fields support

The `semantic_text` field type can be the target of
<<copy-to,`copy_to` fields>>. This means you can use a single `semantic_text`
field to collect the values of other fields for semantic search. Each value has
its embeddings calculated separately; each field value is a separate set of chunk(s) in
the resulting embeddings.

This imposes a restriction on bulk requests and ingestion pipelines that update documents with `semantic_text` fields.
In these cases, all fields that are copied to a `semantic_text` field, including the `semantic_text` field value, must have a value to ensure every embedding is calculated correctly.
The semantic_text field type can serve as the target of <<copy-to,copy_to fields>>,
be part of a <<multi-fields,multi-field>> structure, or contain <<multi-fields,multi-fields>> internally.
This means you can use a single field to collect the values of other fields for semantic search.

For example, the following mapping:

Expand All @@ -201,39 +196,48 @@ PUT test-index
{
"mappings": {
"properties": {
"infer_field": {
"type": "semantic_text",
"inference_id": ".elser-2-elasticsearch"
},
"source_field": {
"type": "text",
"copy_to": "infer_field"
},
"infer_field": {
"type": "semantic_text",
"inference_id": ".elser-2-elasticsearch"
}
}
}
}
------------------------------------------------------------
// TEST[skip:TBD]

Will need the following bulk update request to ensure that `infer_field` is updated correctly:
can also be declared as multi-fields:

[source,console]
------------------------------------------------------------
PUT test-index/_bulk
{"update": {"_id": "1"}}
{"doc": {"infer_field": "updated inference field", "source_field": "updated source field"}}
PUT test-index
{
"mappings": {
"properties": {
"source_field": {
"type": "text",
"fields": {
"infer_field": {
"type": "semantic_text",
"inference_id": ".elser-2-elasticsearch"
}
}
}
}
}
}
------------------------------------------------------------
// TEST[skip:TBD]

Notice that both the `semantic_text` field and the source field are updated in the bulk request.


[discrete]
[[limitations]]
==== Limitations

`semantic_text` field types have the following limitations:

* `semantic_text` fields are not currently supported as elements of <<nested,nested fields>>.
* `semantic_text` fields can't currently be set as part of <<dynamic-templates>>.
* `semantic_text` fields can't be defined as <<multi-fields,multi-fields>> of another field, nor can they contain other fields as multi-fields.
* `semantic_text` fields can't currently be set as part of <<dynamic-templates>>.
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,10 @@ private void expand() {
path = newPath;
}

public void remove() {
path[--index] = null;
public String remove() {
var ret = path[--index];
path[index] = null;
return ret;
}

public void setWithinLeafObject(boolean withinLeafObject) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1386,6 +1386,11 @@ public Builder init(FieldMapper initializer) {
return this;
}

public Builder addMultiField(FieldMapper.Builder builder) {
this.multiFieldsBuilder.add(builder);
return this;
}

protected BuilderParams builderParams(Mapper.Builder mainFieldBuilder, MapperBuilderContext context) {
return new BuilderParams(multiFieldsBuilder.build(mainFieldBuilder, context), copyTo, sourceKeepMode, hasScript, onScriptError);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -143,13 +143,7 @@ protected void parseCreateField(DocumentParserContext context) throws IOExceptio
// directly. We can safely split on all "." chars because semantic text fields cannot be used when subobjects == false.
String[] fieldNameParts = fieldName.split("\\.");
setPath(context.path(), fieldNameParts);

var parent = context.parent().findParentMapper(fieldName);
if (parent == null) {
throw new IllegalArgumentException("Field [" + fieldName + "] does not have a parent mapper");
}
String suffix = parent != context.parent() ? fieldName.substring(parent.fullPath().length() + 1) : fieldName;
var mapper = parent.getMapper(suffix);
var mapper = context.mappingLookup().getMapper(fieldName);
if (mapper instanceof SemanticTextFieldMapper fieldMapper) {
XContentLocation xContentLocation = context.parser().getTokenLocation();
var input = fieldMapper.parseSemanticTextField(context);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,9 @@
import org.elasticsearch.index.mapper.Mapper;
import org.elasticsearch.index.mapper.MapperBuilderContext;
import org.elasticsearch.index.mapper.MapperMergeContext;
import org.elasticsearch.index.mapper.MapperParsingException;
import org.elasticsearch.index.mapper.MappingLookup;
import org.elasticsearch.index.mapper.MappingParserContext;
import org.elasticsearch.index.mapper.NestedObjectMapper;
import org.elasticsearch.index.mapper.ObjectMapper;
import org.elasticsearch.index.mapper.SimpleMappedFieldType;
Expand Down Expand Up @@ -83,6 +85,7 @@
import java.util.Objects;
import java.util.Optional;
import java.util.Set;
import java.util.function.BiConsumer;
import java.util.function.Function;

import static org.elasticsearch.search.SearchService.DEFAULT_SIZE;
Expand Down Expand Up @@ -119,12 +122,22 @@ public class SemanticTextFieldMapper extends FieldMapper implements InferenceFie

public static final TypeParser PARSER = new TypeParser(
(n, c) -> new Builder(n, c::bitSetProducer, c.getIndexSettings()),
List.of(notInMultiFields(CONTENT_TYPE), notFromDynamicTemplates(CONTENT_TYPE))
List.of(validateParserContext(CONTENT_TYPE))
);

public static BiConsumer<String, MappingParserContext> validateParserContext(String type) {
return (n, c) -> {
if (InferenceMetadataFieldsMapper.isEnabled(c.getIndexSettings().getSettings()) == false && c.isWithinMultiField()) {
throw new MapperParsingException("Field [" + n + "] of type [" + type + "] can't be used in multifields");
}
if (c.isFromDynamicTemplate()) {
throw new MapperParsingException("Field [" + n + "] of type [" + type + "] can't be used in dynamic templates");
}
};
}

public static class Builder extends FieldMapper.Builder {
private final boolean useLegacyFormat;
private final IndexVersion indexVersionCreated;

private final Parameter<String> inferenceId = Parameter.stringParam(
INFERENCE_ID_FIELD,
Expand Down Expand Up @@ -178,7 +191,6 @@ public static Builder from(SemanticTextFieldMapper mapper) {

public Builder(String name, Function<Query, BitSetProducer> bitSetProducer, IndexSettings indexSettings) {
super(name);
this.indexVersionCreated = indexSettings.getIndexVersionCreated();
this.useLegacyFormat = InferenceMetadataFieldsMapper.isEnabled(indexSettings.getSettings()) == false;
this.inferenceFieldBuilder = c -> createInferenceField(
c,
Expand Down Expand Up @@ -225,10 +237,10 @@ protected void merge(FieldMapper mergeWith, Conflicts conflicts, MapperMergeCont

@Override
public SemanticTextFieldMapper build(MapperBuilderContext context) {
if (copyTo.copyToFields().isEmpty() == false) {
if (useLegacyFormat && copyTo.copyToFields().isEmpty() == false) {
throw new IllegalArgumentException(CONTENT_TYPE + " field [" + leafName() + "] does not support [copy_to]");
}
if (multiFieldsBuilder.hasMultiFields()) {
if (useLegacyFormat && multiFieldsBuilder.hasMultiFields()) {
throw new IllegalArgumentException(CONTENT_TYPE + " field [" + leafName() + "] does not support multi-fields");
}
final String fullName = context.buildFullName(leafName());
Expand All @@ -247,7 +259,6 @@ public SemanticTextFieldMapper build(MapperBuilderContext context) {
searchInferenceId.getValue(),
modelSettings.getValue(),
inferenceField,
indexVersionCreated,
useLegacyFormat,
meta.getValue()
),
Expand Down Expand Up @@ -277,13 +288,33 @@ private SemanticTextFieldMapper copySettings(SemanticTextFieldMapper mapper, Map

private SemanticTextFieldMapper(String simpleName, MappedFieldType mappedFieldType, BuilderParams builderParams) {
super(simpleName, mappedFieldType, builderParams);
ensureMultiFields(builderParams.multiFields().iterator());
}

private void ensureMultiFields(Iterator<FieldMapper> mappers) {
while (mappers.hasNext()) {
var mapper = mappers.next();
if (mapper.leafName().equals(CHUNKED_EMBEDDINGS_FIELD)) {
throw new IllegalArgumentException(
"Field ["
+ mapper.fullPath()
+ "] is already used by another field ["
+ fullPath()
+ "] internally. Please choose a different name."
);
}
}
}

@Override
public Iterator<Mapper> iterator() {
List<Mapper> subIterators = new ArrayList<>();
subIterators.add(fieldType().getInferenceField());
return subIterators.iterator();
List<Mapper> mappers = new ArrayList<>();
Iterator<Mapper> m = super.iterator();
while (m.hasNext()) {
mappers.add(m.next());
}
mappers.add(fieldType().getInferenceField());
return mappers.iterator();
}

@Override
Expand Down Expand Up @@ -352,20 +383,7 @@ void parseCreateFieldFromContext(DocumentParserContext context, SemanticTextFiel

final SemanticTextFieldMapper mapper;
if (fieldType().getModelSettings() == null) {
context.path().remove();
Builder builder = (Builder) new Builder(
leafName(),
fieldType().getChunksField().bitsetProducer(),
fieldType().getChunksField().indexSettings()
).init(this);
try {
mapper = builder.setModelSettings(field.inference().modelSettings())
.setInferenceId(field.inference().inferenceId())
.build(context.createDynamicMapperBuilderContext());
context.addDynamicMapper(mapper);
} finally {
context.path().add(leafName());
}
mapper = addDynamicUpdate(context, field);
} else {
Conflicts conflicts = new Conflicts(fullFieldName);
canMergeModelSettings(fieldType().getModelSettings(), field.inference().modelSettings(), conflicts);
Expand Down Expand Up @@ -440,6 +458,32 @@ void parseCreateFieldFromContext(DocumentParserContext context, SemanticTextFiel
}
}

private SemanticTextFieldMapper addDynamicUpdate(DocumentParserContext context, SemanticTextField field) {
context.path().remove();
Builder builder = (Builder) getMergeBuilder();
try {
builder.setModelSettings(field.inference().modelSettings()).setInferenceId(field.inference().inferenceId());
if (context.mappingLookup().isMultiField(fullPath())) {
// The field is part of a multi-field, so the parent field must also be updated accordingly.
var fieldName = context.path().remove();
try {
var parentMapper = ((FieldMapper) context.mappingLookup().getMapper(context.mappingLookup().parentField(fullPath())))
.getMergeBuilder();
context.addDynamicMapper(parentMapper.addMultiField(builder).build(context.createDynamicMapperBuilderContext()));
return builder.build(context.createDynamicMapperBuilderContext());
} finally {
context.path().add(fieldName);
}
} else {
var mapper = builder.build(context.createDynamicMapperBuilderContext());
context.addDynamicMapper(mapper);
return mapper;
}
} finally {
context.path().add(leafName());
}
}

@Override
protected String contentType() {
return CONTENT_TYPE;
Expand All @@ -460,11 +504,14 @@ public InferenceFieldMetadata getMetadata(Set<String> sourcePaths) {

@Override
protected void doValidate(MappingLookup mappers) {
int parentPathIndex = fullPath().lastIndexOf(leafName());
String fullPath = mappers.isMultiField(fullPath()) ? mappers.parentField(fullPath()) : fullPath();
String leafName = mappers.getMapper(fullPath).leafName();
int parentPathIndex = fullPath.lastIndexOf(leafName);
if (parentPathIndex > 0) {
String parentName = fullPath.substring(0, parentPathIndex - 1);
// Check that the parent object field allows subobjects.
// Subtract one from the parent path index to omit the trailing dot delimiter.
ObjectMapper parentMapper = mappers.objectMappers().get(fullPath().substring(0, parentPathIndex - 1));
ObjectMapper parentMapper = mappers.objectMappers().get(parentName);
if (parentMapper == null) {
throw new IllegalStateException(CONTENT_TYPE + " field [" + fullPath() + "] does not have a parent object mapper");
}
Expand All @@ -482,7 +529,6 @@ public static class SemanticTextFieldType extends SimpleMappedFieldType {
private final String searchInferenceId;
private final SemanticTextField.ModelSettings modelSettings;
private final ObjectMapper inferenceField;
private final IndexVersion indexVersionCreated;
private final boolean useLegacyFormat;

public SemanticTextFieldType(
Expand All @@ -491,7 +537,6 @@ public SemanticTextFieldType(
String searchInferenceId,
SemanticTextField.ModelSettings modelSettings,
ObjectMapper inferenceField,
IndexVersion indexVersionCreated,
boolean useLegacyFormat,
Map<String, String> meta
) {
Expand All @@ -500,7 +545,6 @@ public SemanticTextFieldType(
this.searchInferenceId = searchInferenceId;
this.modelSettings = modelSettings;
this.inferenceField = inferenceField;
this.indexVersionCreated = indexVersionCreated;
this.useLegacyFormat = useLegacyFormat;
}

Expand Down
Loading

0 comments on commit 0faa561

Please sign in to comment.