-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adaptive rate limiting for OpenSearch bulk requests #1011
Changes from all commits
f8a93c8
54574c1
0d96b55
4e97f97
182521e
c0865ce
43424d7
ee5d8a5
09b96e9
40e2818
d55bb13
6de6429
fa2a4c3
5f20c0f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -535,7 +535,11 @@ In the index mapping, the `_meta` and `properties`field stores meta and schema i | |
- `spark.datasource.flint.write.batch_size`: "The number of documents written to Flint in a single batch request. Default value is Integer.MAX_VALUE. | ||
- `spark.datasource.flint.write.batch_bytes`: The approximately amount of data in bytes written to Flint in a single batch request. The actual data write to OpenSearch may more than it. Default value is 1mb. The writing process checks after each document whether the total number of documents (docCount) has reached batch_size or the buffer size has surpassed batch_bytes. If either condition is met, the current batch is flushed and the document count resets to zero. | ||
- `spark.datasource.flint.write.refresh_policy`: default value is false. valid values [NONE(false), IMMEDIATE(true), WAIT_UNTIL(wait_for)] | ||
- `spark.datasource.flint.write.bulkRequestRateLimitPerNode`: [Experimental] Rate limit(request/sec) for bulk request per worker node. Only accept integer value. To reduce the traffic less than 1 req/sec, batch_bytes or batch_size should be reduced. Default value is 0, which disables rate limit. | ||
- `spark.datasource.flint.write.bulk.rate_limit_per_node.enabled`: [Experimental] Enable rate limit for bulk request per worker node. Default is false. | ||
- `spark.datasource.flint.write.bulk.rate_limit_per_node.min`: [Experimental] Lower limit (documents/sec) for adaptive rate limiting for bulk request per worker node, if rate limit enabled. The adaptive rate will not drop below this value. Must be greater than 0. Default is 5000. | ||
- `spark.datasource.flint.write.bulk.rate_limit_per_node.max`: [Experimental] Upper limit (documents/sec) for adaptive rate limiting for bulk request per worker node, if rate limit enabled. The adaptive rate will not exceed this value. Set to -1 for no upper bound. Default is 50000. | ||
- `spark.datasource.flint.write.bulk.rate_limit_per_node.increase_step`: [Experimental] Adaptive rate limit increase step for bulk request per worker node, if rate limit enabled. Must be greater than 0. Default is 500. | ||
- `spark.datasource.flint.write.bulk.rate_limit_per_node.decrease_ratio`: [Experimental] Adaptive rate limit decrease ratio for bulk request per worker node, if rate limit enabled. Must be between 0 and 1. Default is 0.8. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason why we are using steps to increase and ratio to decrease? can this be consistent? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this uses the idea of AIMD (additive-increase, multiplicative-decrease) algorithm for TCP congestion control.
I'm not well versed in the mathematics behind this but AIAD (additive-increase additive-decrease), and MIMD (multiplicative-increase multiplicative-decrease) are both said not able to reach stability in dynamic systems There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 0.8 ratio looks too aggressive. Do you have some testing to explain why you chose 0.8? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mainly because 0.8^3 ~= 0.5 So if a bulk request attempts 3 times and all fails, we cut the rate limit in half. |
||
- `spark.datasource.flint.read.scroll_size`: default value is 100. | ||
- `spark.datasource.flint.read.scroll_duration`: default value is 5 minutes. scroll context keep alive duration. | ||
- `spark.datasource.flint.retry.max_retries`: max retries on failed HTTP request. default value is 3. Use 0 to disable retry. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
|
||
package org.opensearch.flint.core.storage; | ||
|
||
import com.google.common.util.concurrent.RateLimiter; | ||
import java.util.logging.Logger; | ||
import org.opensearch.flint.core.FlintOptions; | ||
import org.opensearch.flint.core.metrics.MetricConstants; | ||
import org.opensearch.flint.core.metrics.MetricsUtil; | ||
|
||
public class BulkRequestRateLimiterImpl implements BulkRequestRateLimiter { | ||
private static final Logger LOG = Logger.getLogger(BulkRequestRateLimiterImpl.class.getName()); | ||
private RateLimiter rateLimiter; | ||
|
||
private final long minRate; | ||
private final long maxRate; | ||
private final long increaseStep; | ||
private final double decreaseRatio; | ||
|
||
public BulkRequestRateLimiterImpl(FlintOptions flintOptions) { | ||
minRate = flintOptions.getBulkRequestMinRateLimitPerNode(); | ||
maxRate = flintOptions.getBulkRequestMaxRateLimitPerNode(); | ||
increaseStep = flintOptions.getBulkRequestRateLimitPerNodeIncreaseStep(); | ||
decreaseRatio = flintOptions.getBulkRequestRateLimitPerNodeDecreaseRatio(); | ||
|
||
LOG.info("Setting rate limit for bulk request to " + minRate + " documents/sec"); | ||
this.rateLimiter = RateLimiter.create(minRate); | ||
MetricsUtil.addHistoricGauge(MetricConstants.OS_BULK_RATE_LIMIT_METRIC, minRate); | ||
} | ||
|
||
// Wait so it won't exceed rate limit. Does nothing if rate limit is not set. | ||
@Override | ||
public void acquirePermit() { | ||
this.rateLimiter.acquire(); | ||
LOG.info("Acquired 1 permit"); | ||
} | ||
|
||
@Override | ||
public void acquirePermit(int permits) { | ||
this.rateLimiter.acquire(permits); | ||
LOG.info("Acquired " + permits + " permits"); | ||
} | ||
|
||
/** | ||
* Increase rate limit additively. | ||
*/ | ||
@Override | ||
public void increaseRate() { | ||
setRate(getRate() + increaseStep); | ||
} | ||
|
||
/** | ||
* Decrease rate limit multiplicatively. | ||
*/ | ||
@Override | ||
public void decreaseRate() { | ||
setRate((long) (getRate() * decreaseRatio)); | ||
} | ||
|
||
@Override | ||
public long getRate() { | ||
return (long) this.rateLimiter.getRate(); | ||
} | ||
|
||
/** | ||
* Set rate limit to the given value, clamped by minRate and maxRate. Non-positive maxRate means | ||
* there's no maximum rate restriction, and the rate can be set to any value greater than | ||
* minRate. | ||
*/ | ||
@Override | ||
public void setRate(long permitsPerSecond) { | ||
if (maxRate > 0) { | ||
permitsPerSecond = Math.min(permitsPerSecond, maxRate); | ||
} | ||
permitsPerSecond = Math.max(minRate, permitsPerSecond); | ||
LOG.info("Setting rate limit for bulk request to " + permitsPerSecond + " documents/sec"); | ||
this.rateLimiter.setRate(permitsPerSecond); | ||
MetricsUtil.addHistoricGauge(MetricConstants.OS_BULK_RATE_LIMIT_METRIC, permitsPerSecond); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
|
||
package org.opensearch.flint.core.storage; | ||
|
||
import java.util.logging.Logger; | ||
|
||
public class BulkRequestRateLimiterNoop implements BulkRequestRateLimiter { | ||
private static final Logger LOG = Logger.getLogger(BulkRequestRateLimiterNoop.class.getName()); | ||
|
||
public BulkRequestRateLimiterNoop() { | ||
LOG.info("Rate limit for bulk request was not set."); | ||
} | ||
|
||
@Override | ||
public void acquirePermit() {} | ||
|
||
@Override | ||
public void acquirePermit(int permits) {} | ||
|
||
@Override | ||
public void increaseRate() {} | ||
|
||
@Override | ||
public void decreaseRate() {} | ||
|
||
@Override | ||
public long getRate() { | ||
return 0; | ||
} | ||
|
||
@Override | ||
public void setRate(long permitsPerSecond) {} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is the main purpose of having min value? is there any concerns if the rate limiter goes down to zero incase of continuous failures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For one, min value is used for initial rate limit.
Second is that rate could only increase if there's some successful request, so we can't have 0 rate limit, or else it wouldn't bounce back at all.
Having too small of a rate could also be troublesome, because that means the rate limiter will react slower. For example, say the rate limit becomes 10 docs per sec, however a single request consists of 1000 docs. Then once the 1000 docs request go through, there'll be 100 seconds where other requests couldn't go through (so it satisfy the 10 doc per sec rate). That's 100 seconds of not updating the rate.