-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1-1 restore service implementation #4199
Comments
SM can establish node mapping on its own during validation stage - we just look for the nodes with the same token ownership. Source cluster ID is also not needed - we don't need it for finding the snapshot in the regular restore, although it forces us to traverse the whole meta dir, which is annoying, but it shouldn't be big. |
Also, the 1-1 restore is for data only and the schema is supposed to be restored in the regular way, right? |
But why do we want to introduce a separate service for that? Shouldn't it be a part of the restore service, perhaps just a new method like |
Why so, it's easier to expect this input. Especially that the prerequisite for 1-1 restore with SM is to clone the cluster.
We can traverse, but why if the user can just provide the source-cluster id. Especially that path in backup location which is a part of the backup specification forces to use cluster-id. |
Because it will create the spaghetti with a logic to separate the flow of l&s restore from the flow of the 1-1 restore. |
Yes, the prereq here is to restore the schema first. |
It could also be the very same cluster (backup->truncate->restore). It's still simple to construct such mapping, but it would require to fetch all host ids from the cluster. I wouldn't say that we need to assume that the cloud would be the only user of this API - if we expose it, then it can be used. In general I would say that we should aim to minimize the amount of API params if we can reliably calculate the information on our own. It's always easier and more convenient to add the API param than to remove/modify it. Not to mention that it simplifies the UX (even if the user is the cloud).
The point is that we are already working like that in the regular restore, so perhaps it would be surprising to require it here but not there, but in general this is a good direction, as it helps with the underlying issue of #3873 (snapshot tag conflict). |
This issue is about creating the full service allowing to perform 1-1 restore of the VNode keyspaces/column familes.
Design doc (https://docs.google.com/document/d/18jhhNo90JWy6fIgPCks3RI1zhHLI7eBLKpHGJRBmdpM/edit?tab=t.0#bookmark=id.e6nd5pbe9bjq)
The service must work on the following input:
The restore process consists of the following stages:
The service must check if agents can reach the backup location.
The service must compare if the cluster topology saved to the snapshot is the same as the destination cluster topology.
The service must compare if the source cluster ring topology (read from manifest files) is the same as the cluster ring of the destination cluster.
This process can be split into two parts (reading source cluster topology, reading destination cluster topology)
If validation fails, the service must stop and return meaningful error information
nodes mapping informs which destination node is going to restore SSTables of which source node. SSTables of the source node must be copied to the corresponding /keyspace/column_family/upload directories on the destination node.
There must be (number_of_nodes) independent go-routines per destination node copying SSTables from the corresponding source node.
If the corresponding destination folder doesn't exists then it means that one of the prerequisities is not met (schema is not restored). It should just return error informing that the keyspace/column family is missing.
After the SSTables are copied to the upload directories, SM must call
scylla-manager/v3/swagger/scylla_v1.json
Line 11841 in b501137
The text was updated successfully, but these errors were encountered: