Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] [Inference API] Add node-local rate limiting for the inference API (#120400) #121251

Merged
merged 3 commits into from
Jan 30, 2025

Conversation

timgrein
Copy link
Contributor

Backports the following commits to 8.x:

…lastic#120400)

* Add node-local rate limiting for the inference API

* Fix integration tests by using new LocalStateInferencePlugin instead of InferencePlugin and adjust formatting.

* Correct feature flag name

* Add more docs, reorganize methods and make some methods package private

* Clarify comment in BaseInferenceActionRequest

* Fix wrong merge

* Fix checkstyle

* Fix checkstyle in tests

* Check that the service we want to the read the rate limit config for actually exists

* [CI] Auto commit changes from spotless

* checkStyle apply

* Update docs/changelog/120400.yaml

* Move rate limit division logic to RequestExecutorService

* Spotless apply

* Remove debug sout

* Adding a few suggestions

* Adam feedback

* Fix compilation error

* [CI] Auto commit changes from spotless

* Add BWC test case to InferenceActionRequestTests

* Add BWC test case to UnifiedCompletionActionRequestTests

* Update x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/InferenceServiceNodeLocalRateLimitCalculator.java

Co-authored-by: Adam Demjen <[email protected]>

* Update x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/InferenceServiceNodeLocalRateLimitCalculator.java

Co-authored-by: Adam Demjen <[email protected]>

* Remove addressed TODO

* Spotless apply

* Only use new rate limit specific feature flag

* Use ThreadLocalRandom

* [CI] Auto commit changes from spotless

* Use Randomness.get()

* [CI] Auto commit changes from spotless

* Fix import

* Use ConcurrentHashMap in InferenceServiceNodeLocalRateLimitCalculator

* Check for null value in getRateLimitAssignment and remove AtomicReference

* Remove newAssignments

* Up the default rate limit for completions

* Put deprecated feature flag back in

* Check feature flag in BaseTransportInferenceAction

* spotlessApply

* Export inference.common

* Do not export inference.common

* Provide noop rate limit calculator, if feature flag is disabled

* Add proper dependency injection

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: Jonathan Buttner <[email protected]>
Co-authored-by: Adam Demjen <[email protected]>
@timgrein timgrein added :ml Machine learning >feature auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport Team:ML Meta label for the ML team labels Jan 29, 2025
@elasticsearchmachine elasticsearchmachine merged commit f0a5e25 into elastic:8.x Jan 30, 2025
15 checks passed
@timgrein timgrein deleted the backport/8.x/pr-120400 branch January 30, 2025 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >feature :ml Machine learning Team:ML Meta label for the ML team v8.18.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants