Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a section that explains the downtime with replica 1 #163

Open
wants to merge 1 commit into
base: latest
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions operators/operating-pinot/rebalance/rebalance-servers.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,15 +155,24 @@ Typically, the flags that need to be changed from defaults are
{% endhint %}

| Query param | Default value | Description |
| -------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| -------------------- | ------------- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| dryRun | false | If set to true, **rebalance is run as a dry-run** so that you can see the expected changes to the ideal state and instance partition assignment. |
| includeConsuming | false | <p>Applicable for REALTIME tables.</p><p><strong>CONSUMING segments are rebalanced only if this is set to true</strong>.<br>Moving a CONSUMING segment involves dropping the data consumed so far on old server, and re-consuming on the new server. If an application is sensitive to <strong>increased memory utilization due to re-consumption or to a momentary data staleness</strong>, they may choose to not include consuming in the rebalance. Whenever the CONSUMING segment completes, the completed segment will be assigned to the right instances, and the new CONSUMING segment will also be started on the correct instances. If you choose to includeConsuming=false and let the segments move later on, any downsized nodes need to remain untagged in the cluster, until the segment completion happens.</p> |
| downtime | false | <p><strong>This controls whether Pinot allows downtime while rebalancing.</strong><br>If downtime = true, all replicas of a segment can be moved around in one go, which could result in a momentary downtime for that segment (time gap between ideal state updated to new servers and new servers downloading the segments).<br>If downtime = false, Pinot will make sure to keep certain number of replicas (config in next row) always up. The rebalance will be done in multiple iterations under the hood, in order to fulfill this constraint.</p><p><strong>Note</strong>: <em>If you have only 1 replica for your table, rebalance with downtime=false is not possible.</em></p> |
| downtime | false | <p><strong>This controls whether Pinot allows downtime while rebalancing.</strong><br>If downtime = true, all replicas of a segment can be moved around in one go, which could result in a momentary downtime for that segment (time gap between ideal state updated to new servers and new servers downloading the segments).<br>If downtime = false, Pinot will make sure to keep certain number of replicas (config in next row) always up. The rebalance will be done in multiple iterations under the hood, in order to fulfill this constraint.</p><p><strong>Note</strong>: <em>If you have only 1 replica for your table, see [the section below](#rebalance-with-only-1-replica).</em></p> |
| minAvailableReplicas | 1 | <p>Applicable for rebalance with downtime=false.</p><p>This is the <strong>minimum number of replicas that are expected to stay alive</strong> through the rebalance.</p> |
| bestEfforts | false | <p>Applicable for rebalance with downtime=false.</p><p>If a no-downtime rebalance cannot be performed successfully, this flag <strong>controls whether to fail the rebalance or do a best-effort rebalance</strong>.</p> |
| reassignInstances | false | Applicable to tables where the instance assignment has been persisted to zookeeper. Setting this to true will make the rebalance **first update the instance assignment, and then rebalance the segments**. |
| bootstrap | false | Rebalances all segments again, **as if adding segments to an empty table**. If this is false, then the rebalance will try to minimize segment movements. |

### Rebalance with only 1 replica
In general, when the table config uses only 1 replica, the downtime may be affected.
Copy link
Contributor

@swaminathanmanish swaminathanmanish Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor reword if it makes sense.

"The downtime option is not relevant when rebalance is initiated on a table with a replication factor of 1 and there's no further change to the replication factor. In this case, there'll be a downtime during rebalance.

However when the rebalance is initiated for increasing replication (from 1 to > 1), the downtime flag can be used (set to false) to avoid downtime."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think with downtime = false and replication = 1 (with no changes to final replication count), the rebalance will itself be blocked. So this would be untrue "In this case, there'll be a downtime during rebalance"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with the rebalance itself will be blocked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've recently found this PR is still open. I'm going to apply the change suggested by Manish. About the suggestion from Neha... do we know the actual behavior? IIRC I didn't actually tested that and the written here is what I inferred seen the code, but I may be wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npawar, is this one good to merge re @gortiz comment above?

This is why it is not recommended to create tables with only 1 replica in cases where availability is mandatory.
When doing a rebalance with a table whose replica is 1 you have to assume that small downtime windows are possible even if `downtime` is false.

It is important to note that rebalance may be executed in order to change the replica count.
If that is the case, the important replica number is the final count.
Specifically, when increasing the replica count from 1 to another value with `downtime` set to false, the rebalance will be done with no downtime.

### Checking status

The following API is used to check the progress of a rebalance Job. The API takes the jobId of the rebalance job. The API to see the jobIds of rebalance Jobs for a table is shown next.
Expand Down