3888 Adds support for additional search connectors. #4906
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for additional search connectors as discussed in #3888
The new search connectors with their functionalities are as follows:
% But not
In this case
%
(surrounded by spaces) is replaced byNOT
(surrounded by spaces).& connector
Here
&
(surrounded by spaces) is replaced byAND
surrounded by spaces.! Root expander suffix for variant Endings
Here,
!
at the beginning of a word (matching[a-zA-Z]
) is replaced with*
at the end of the word, which serves as the equivalent wildcard operator to match words with multiple endings.Universal Character (*)
In this case,
*
within words is replaced with?
, the equivalent operator in ES to act as a single character wildcard. An exception is made if*
appears at the end of the word, as it is preserved to support the built-in functionality for matching multiple endings.Disallowed expensive wildcards.
In ES documentation related to wildcards. Certain uses of
*
are discouraged due to their high query cost. I could confirm it with a couple of tests.Using
*
at the beginning of words like*ing
is particularly heavy, because all terms in the index need to be examined, such queries timed out.So, it’s better not to allow this query and to throw an error message instead.
Using
*
in very short words likea*
,b*
, orc*
is quite costly since it requires examining too many terms that could potentially match. These kinds of queries also resulted in timeouts.A test with two characters, such as
ap*
, was quite slow (~38,000 ms), but it didn’t time out. However, it’s still quite costly.A test with three characters, such as
app*
, took approximately ~19,000 ms.A test with four characters, such as
appl*
, took around ~8,000 ms.Therefore, it seems reasonable to allow multiple-ending wildcards (
!
or*
) only if they are used in words with at least three characters. If the wildcard is used in a word with fewer than three characters, an error is thrown.The error message for using
*
at the beginning of words or in short words at the end is as follows:When working on documentation for these search connectors, we can add a link in the error message to provide more details.
These new search connectors, validations, and error messages also apply to V3 and V4 of the search API.
test_query_cleanup_function
test cases and moved them toESCommonSearchTest
since they don't require a Selenium setup to work.Howard !Honda
, which is equivalent toHoward -Honda
. However, the!
operator now represents a multi ending variant wildcard instead of negation.!
operator is not currently documented, whereas-
is.Let me know what do you think.