Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3888 Adds support for additional search connectors. #4906

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

albertisfu
Copy link
Contributor

@albertisfu albertisfu commented Jan 10, 2025

This PR adds support for additional search connectors as discussed in #3888

The new search connectors with their functionalities are as follows:

% But not

In this case % (surrounded by spaces) is replaced by NOT (surrounded by spaces).

& connector

Here & (surrounded by spaces) is replaced by AND surrounded by spaces.

! Root expander suffix for variant Endings

Here, ! at the beginning of a word (matching [a-zA-Z]) is replaced with * at the end of the word, which serves as the equivalent wildcard operator to match words with multiple endings.

Universal Character (*)

In this case, * within words is replaced with ?, the equivalent operator in ES to act as a single character wildcard. An exception is made if * appears at the end of the word, as it is preserved to support the built-in functionality for matching multiple endings.

Disallowed expensive wildcards.

In ES documentation related to wildcards. Certain uses of * are discouraged due to their high query cost. I could confirm it with a couple of tests.

Using * at the beginning of words like *ing is particularly heavy, because all terms in the index need to be examined, such queries timed out.

So, it’s better not to allow this query and to throw an error message instead.

Using * in very short words like a*, b*, or c* is quite costly since it requires examining too many terms that could potentially match. These kinds of queries also resulted in timeouts.

A test with two characters, such as ap*, was quite slow (~38,000 ms), but it didn’t time out. However, it’s still quite costly.

A test with three characters, such as app*, took approximately ~19,000 ms.

A test with four characters, such as appl*, took around ~8,000 ms.

Therefore, it seems reasonable to allow multiple-ending wildcards (! or *) only if they are used in words with at least three characters. If the wildcard is used in a word with fewer than three characters, an error is thrown.

The error message for using * at the beginning of words or in short words at the end is as follows:
Screenshot 2025-01-10 at 10 58 44 a m

When working on documentation for these search connectors, we can add a link in the error message to provide more details.

These new search connectors, validations, and error messages also apply to V3 and V4 of the search API.

  • Added multiple tests to ensure proper functionality for both the frontend and API.
  • Extended the test_query_cleanup_function test cases and moved them to ESCommonSearchTest since they don't require a Selenium setup to work.
  • Removed a test related to a query like Howard !Honda, which is equivalent to Howard -Honda. However, the ! operator now represents a multi ending variant wildcard instead of negation. ! operator is not currently documented, whereas - is.

Let me know what do you think.

@albertisfu albertisfu marked this pull request as ready for review January 10, 2025 17:19
@albertisfu albertisfu requested a review from mlissner January 10, 2025 17:19
Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To Do
Development

Successfully merging this pull request may close these issues.

3 participants