[8.x] [Inference API] Add node-local rate limiting for the inference API (#120400) #121251

timgrein · 2025-01-29T23:51:25Z

Backports the following commits to 8.x:

[Inference API] Add node-local rate limiting for the inference API ([Inference API] Add node-local rate limiting for the inference API #120400)

…lastic#120400) * Add node-local rate limiting for the inference API * Fix integration tests by using new LocalStateInferencePlugin instead of InferencePlugin and adjust formatting. * Correct feature flag name * Add more docs, reorganize methods and make some methods package private * Clarify comment in BaseInferenceActionRequest * Fix wrong merge * Fix checkstyle * Fix checkstyle in tests * Check that the service we want to the read the rate limit config for actually exists * [CI] Auto commit changes from spotless * checkStyle apply * Update docs/changelog/120400.yaml * Move rate limit division logic to RequestExecutorService * Spotless apply * Remove debug sout * Adding a few suggestions * Adam feedback * Fix compilation error * [CI] Auto commit changes from spotless * Add BWC test case to InferenceActionRequestTests * Add BWC test case to UnifiedCompletionActionRequestTests * Update x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/InferenceServiceNodeLocalRateLimitCalculator.java Co-authored-by: Adam Demjen <[email protected]> * Update x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/common/InferenceServiceNodeLocalRateLimitCalculator.java Co-authored-by: Adam Demjen <[email protected]> * Remove addressed TODO * Spotless apply * Only use new rate limit specific feature flag * Use ThreadLocalRandom * [CI] Auto commit changes from spotless * Use Randomness.get() * [CI] Auto commit changes from spotless * Fix import * Use ConcurrentHashMap in InferenceServiceNodeLocalRateLimitCalculator * Check for null value in getRateLimitAssignment and remove AtomicReference * Remove newAssignments * Up the default rate limit for completions * Put deprecated feature flag back in * Check feature flag in BaseTransportInferenceAction * spotlessApply * Export inference.common * Do not export inference.common * Provide noop rate limit calculator, if feature flag is disabled * Add proper dependency injection --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Jonathan Buttner <[email protected]> Co-authored-by: Adam Demjen <[email protected]>

…nce?)

timgrein added :ml Machine learning >feature auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport Team:ML Meta label for the ML team labels Jan 29, 2025

elasticsearchmachine added the v8.18.0 label Jan 29, 2025

elasticsearchmachine mentioned this pull request Jan 29, 2025

[Inference API] Add node-local rate limiting for the inference API #120400

Merged

timgrein and others added 2 commits January 30, 2025 10:26

Merge branch '8.x' into backport/8.x/pr-120400

8a817fd

Use .get(0) as getFirst() doesn't exist in 8.18 (probably JDK differe…

9712691

…nce?)

elasticsearchmachine merged commit f0a5e25 into elastic:8.x Jan 30, 2025
15 checks passed

timgrein deleted the backport/8.x/pr-120400 branch January 30, 2025 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.x] [Inference API] Add node-local rate limiting for the inference API (#120400) #121251

[8.x] [Inference API] Add node-local rate limiting for the inference API (#120400) #121251

timgrein commented Jan 29, 2025

[8.x] [Inference API] Add node-local rate limiting for the inference API (#120400) #121251

[8.x] [Inference API] Add node-local rate limiting for the inference API (#120400) #121251

Conversation

timgrein commented Jan 29, 2025