-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Existing etcd
Clusters with etcd-druid
#957
base: master
Are you sure you want to change the base?
Integrate Existing etcd
Clusters with etcd-druid
#957
Conversation
- Introduced `ExistingCluster` struct in `EtcdConfig` to specify configuration for an existing etcd cluster. - Added `MemberURLs` and `Endpoint` fields to `ExistingCluster` to define the URLs and endpoint of the existing etcd cluster members.
…ruid managed cluster
- Updated `prepareInitialCluster` function to include existing cluster member URLs. - Modified `getBackupRestoreContainerCommandArgs` to append existing cluster endpoint to service endpoints.
/test pull-etcd-druid-unit |
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seshachalam-yv thanks a lot for adding this functionality! I personally love that druid can now "adopt" existing etcd clusters, and possibly also help with migrating cluster data from one etcd cluster to another without the use of backups and without downtime (useful for gardener live CPM scenario). I have few comments and doubts. Can you please address them? Thanks.
|
||
// ExistingCluster defines the configuration for an existing etcd cluster. | ||
type ExistingCluster struct { | ||
// MemberURLs specifies the URLs of the existing etcd cluster members. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring does not give sufficient information about the structure of the MemberURLs
field. What do the keys represent? Member names? Member IDs? Quite unclear.
Also, I feel it would be cleaner to rename MemberURLs
to Members
, which is a slice of ExistingClusterMember
type.
type ExistingClusterMember struct {
Name string
ID string
URLs []string
}
Something on these lines, to make the consumption clear and explicit. Using the map key to denote member name/ID becomes confusing for the exact reason that I mentioned above, that it isn't intrinsically clear whether the key represents member name or ID.
@@ -233,6 +233,22 @@ type EtcdConfig struct { | |||
// ClientService defines the parameters of the client service that a user can specify | |||
// +optional | |||
ClientService *ClientService `json:"clientService,omitempty"` | |||
// ExistingCluster specifies the configuration for an existing etcd cluster that the etcd-druid managed cluster will join. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make this field immutable? Once the Etcd
spec has been created, Spec.Etcd.ExistingCluster
can no longer be added or modified.
Also, once the new cluster has joined the existing cluster, will this field still be retained in the Etcd spec? If yes, then what's the behavior of etcd-druid after the user manually removes the old cluster members (assuming they wanted druid to "adopt" their existing cluster)? If Spec.Etcd.ExistingCluster
still exists after removal of the old members, will druid still attempt to create the cluster with the old members also, which will lead to a quorum loss since only 3 out of total 6 members will now exist?
I think we need some discussion/clarity on this behavior.
// - https://etcd-1:2380 | ||
MemberURLs map[string][]string `json:"memberURLs"` | ||
// Endpoint specifies the endpoint of the existing etcd cluster. | ||
Endpoint string `json:"endpoint"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming to ClientEndpoint
makes sense here, since just Endpoint
seems ambiguous.
commandArgs = append(commandArgs, fmt.Sprintf("--service-endpoints=https://%s:%d", druidv1alpha1.GetClientServiceName(b.etcd.ObjectMeta), b.clientPort)) | ||
commandArgs = append(commandArgs, fmt.Sprintf("--service-endpoints=%s", serviceEndpoint)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--service-endpoints
CLI flag is being passed twice, which seems incorrect and could lead to errors from etcdbr.
} else { | ||
commandArgs = append(commandArgs, "--insecure-transport=true") | ||
commandArgs = append(commandArgs, "--insecure-skip-tls-verify=true") | ||
commandArgs = append(commandArgs, fmt.Sprintf("--endpoints=http://%s-local:%d", b.etcd.Name, b.clientPort)) | ||
commandArgs = append(commandArgs, fmt.Sprintf("--service-endpoints=http://%s:%d", druidv1alpha1.GetClientServiceName(b.etcd.ObjectMeta), b.clientPort)) | ||
commandArgs = append(commandArgs, fmt.Sprintf("--service-endpoints=%s", serviceEndpoint)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
|
||
## Overview | ||
|
||
`etcd-druid` supports the integration of new etcd members into an existing `etcd` cluster through the `etcd.spec.etcd.existingCluster` configuration. This capability ensures that when new managed members are added, they automatically join the pre-existing cluster, facilitating seamless migrations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`etcd-druid` supports the integration of new etcd members into an existing `etcd` cluster through the `etcd.spec.etcd.existingCluster` configuration. This capability ensures that when new managed members are added, they automatically join the pre-existing cluster, facilitating seamless migrations. | |
`etcd-druid` supports the integration of new etcd members into an existing `etcd` cluster through the `etcd.spec.etcd.existingCluster` configuration. This capability ensures that when new managed members are added, they automatically join the pre-existing cluster, allowing etcd-druid to eventually "adopt" the etcd cluster. |
### Single-Node `etcd` Cluster Migration | ||
|
||
- **From Single-Node to Three-Node:** Transitioning from a single-node to a three-node etcd cluster can be achieved with zero downtime. However, once expanded to a multi-member configuration, reducing back to a single-node setup is not supported by `etcd-druid`. | ||
- **From Single-Node to Single-Node:** Migrations of this nature typically involve some downtime. You are required to capture a snapshot using the [etcd-backup-restore tool](https://github.com/gardener/etcd-backup-restore/blob/master/docs/deployment/getting_started.md#taking-scheduled-snapshot), upload it to your designated backup bucket, and configure your new etcd CR to restore from this snapshot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would there be downtime involved? The cluster gets scaled up from 1 to 2 members, and as long as both members are healthy, there's quorum. And then the user removes the old member from the cluster via etcdctl member remove
call, which scales down the cluster size to 1 member, which still ensures quorum. I don't see why this would 'typically involve some downtime". Yes, there is a possibility of quorum loss if one of the members becomes unhealthy during this migration process. But that's a possibility for a 3-node cluster migration too, if 3 members go down.
Can you please re-word this?
etcd-0: | ||
- http://etcd-0.etcd:2380 | ||
etcd-1: | ||
- http://etcd-1.etcd:2380 | ||
etcd-2: | ||
- http://etcd-2.etcd:2380 | ||
endpoint: http://etcd:2379 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous example used https
, while this uses http
. Please correct it.
|
||
Collect crucial data about your current cluster configuration: | ||
|
||
- **Member URLs:** These are the URLs used for peer communications among cluster members. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain here whether the MemberURL map keys represent the member names of IDs or pod names? Makes it easier for consumption.
|
||
if etcd.Spec.Etcd.ExistingCluster != nil { | ||
for memberName, memberURLs := range etcd.Spec.Etcd.ExistingCluster.MemberURLs { | ||
builder.WriteString(fmt.Sprintf("%s=%s,", memberName, strings.Trim(strings.Join(memberURLs, ","), ","))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some guard rails around this, like validation to ensure that the cluster member names and endpoints are actually valid. I would not want the existing cluster to go into an irrecoverable or unusable state due to bad configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also a check to ensure that the existing cluster is actually up and running. If not, then druid will end up trying to add members to a non-existing or non-running cluster, and go into a futile loop.
@seshachalam-yv please add an e2e test for this, since it's an important feature that needs to be tested thoroughly. |
How to categorize this PR?
/area control-plane
/area high-availability
/kind enhancement
What this PR does / why we need it:
This PR introduces the necessary changes to
etcd-druid
to enable seamless migration of existingetcd
clusters intoetcd-druid
managed clusters. It includes updates to the API to support theexistingCluster
configuration, which allows clusters that are not initially managed byetcd-druid
to be integrated without disruption.A step-by-step guide is added to demonstrate how to migrate an unmanaged etcd cluster to an etcd-druid managed cluster. By leveraging the
existingCluster
feature, you can create anEtcd
CR configured with your current cluster’s peer URLs and endpoints, prompting etcd-druid to join your existing members. Once the newly managed members are active, you can safely remove the old, unmanaged members. This approach offers a straightforward migration path with minimal downtime.Which issue(s) this PR fixes:
Fixes #956
Special notes for your reviewer:
Release note: