Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor semantic text field to align with text field behaviour #119183

Merged
merged 46 commits into from
Dec 30, 2024

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Dec 20, 2024

This PR updates the semantic text field to function similarly to a standard text field. The original source content is preserved, while embedding chunks are retrieved from:

  • Doc values for dense embeddings
  • Term vectors for sparse embeddings

The new format is not enabled by default. An internal index setting is used to enable the new format, allowing for a controlled transition. The switch to the new format will be implemented in a subsequent PR.
Highlighting support is also missing (for the new format) and will be added in a follow up.

Note: Individual commits were reviewed in the original branch for clarity.
This PR is intended for merging rather than detailed review.

jimczi and others added 30 commits November 21, 2024 20:46
Copy link
Contributor

Documentation preview:

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, but I think the comments cover the main points to discuss.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I flagged some things we should address in follow-ups.

@jimczi jimczi merged commit 12e86b1 into main Dec 30, 2024
17 checks passed
@jimczi jimczi deleted the inference_metadata_fields branch December 30, 2024 08:31
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Dec 30, 2024
elasticsearchmachine pushed a commit that referenced this pull request Dec 30, 2024
#119339)

* Refactor semantic text field to align with text field behaviour   (#119183)

Co-authored-by: Mike Pellegrini <[email protected]>

* fix compil after backport

* fix compil after backport (bis)

---------

Co-authored-by: Mike Pellegrini <[email protected]>
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jan 6, 2025
This change adapts the semantic highlighter to work with the new format introduced in elastic#119183.
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jan 6, 2025
This change adapts the semantic highlighter to work with the new format introduced in elastic#119183.
jimczi added a commit that referenced this pull request Jan 7, 2025
#119604)

This change adapts the semantic highlighter to work with the new format introduced in #119183.

Co-authored-by: Kathleen DeRusso <[email protected]>
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jan 7, 2025
elastic#119604)

This change adapts the semantic highlighter to work with the new format introduced in elastic#119183.

Co-authored-by: Kathleen DeRusso <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Jan 7, 2025
#119604) (#119657)

This change adapts the semantic highlighter to work with the new format introduced in #119183.

Co-authored-by: Kathleen DeRusso <[email protected]>
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jan 30, 2025
The semantic text format was updated in elastic#119183. This commit removes the last remaining reference to the old format from the documentation to ensure consistency.
leemthompo pushed a commit that referenced this pull request Jan 30, 2025
The semantic text format was updated in #119183. This commit removes the last remaining reference to the old format from the documentation to ensure consistency.
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jan 30, 2025
…21276)

The semantic text format was updated in elastic#119183. This commit removes the last remaining reference to the old format from the documentation to ensure consistency.
elasticsearchmachine pushed a commit that referenced this pull request Jan 30, 2025
…121289)

The semantic text format was updated in #119183. This commit removes the last remaining reference to the old format from the documentation to ensure consistency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants