-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-16728: Require Service Deletion for LB Type Updates #1142
base: master
Are you sure you want to change the base?
OCPBUGS-16728: Require Service Deletion for LB Type Updates #1142
Conversation
@gcs278: This pull request references Jira Issue OCPBUGS-16728, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
65eea90
to
8a299a9
Compare
@gcs278 tested and found one issue:
The command to revert the change is wrong, it should be |
Another concern is the ingresscontroller status, should it always show the actual status? If yes, then the |
8a299a9
to
26a1492
Compare
@lihongan Thanks, good catch! Fixed.
Yea it's a great observation, I just talked about this with @Miciah yesterday. This PR won't address having the LB Type in the status reflect the actual LB Type status. It's complicated, and will require much more refactoring than I'm willing to bite off with this PR. I think we should consider this a technical debt item that we should try to solve later. The LB Type in the status will mostly be equivalent to the LB Type in the spec (for now). |
9d3a208
to
b02db5f
Compare
I've taken the WIP off the PR. I'm pretty content with the approach taken from some ideas of @Miciah (i.e. two disjoint "managed" annotation lists). Some parts of the code actually were simplified and code was removed as a result. I've also dropped the usage of "auto effectuation" or "manual effectuation" as I don't think this aptly conveys the idea that you need to delete/recreate the service (based on conversations with other team members). I went for |
e2e-aws-operator unrelated failure:
/test e2e-aws-operator Install failure unrelated |
@gcs278 Another issue is observed. Because PROXY proxy is enabled on Classic but disabled on NLB, so before deleting the LB service to effectuate the change, I can see the router deployment is rolled out with ENV |
Ah good catch. I forgot that somethings rely on the status LB Type to be accurate. I need to think about how to fix this. |
b02db5f
to
5e2d300
Compare
5e2d300
to
c6fd8b6
Compare
c6fd8b6
to
07b7cbb
Compare
/assign |
/retest |
/retest-required |
/jira refresh The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity. |
@openshift-bot: This pull request references Jira Issue OCPBUGS-16728, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/retest-required |
Some links in `load_balancer_service.go` were invalid due to K8S documentation URL changes.
Refactor to introduce a new map, `managedAtServiceCreationLBServiceAnnotations`, which tracks annotations that are managed by the operator but are only applied to the load balancer at the time of service creation. These annotations are also responsible for triggering the `Progressing: True` status when modified. This addition not only improves documentation of these annotations, but also simplifies the code. The `loadBalancerServiceAnnotationsChanged` function has been adapted to replace existing logic in `shouldRecreateLoadBalancer`, allowing the removal of the `scopeEqual` function as well.
Due to the AWS CCM leaking resources when the load balancer type is changed on a service, the cloud team is now blocking updates to the load balancer type. As a result, the Ingress Operator will require the service to be deleted and recreated when the Ingress Controller's load balancer type changes. This change introduces a `Progressing: True` status message when the load balancer type is modified, instructing the user on how to effectuate the change. This commit does not alter the function of the LB type in the status, which has always (incorrectly) represented the desired state. However, with this change, a new scenario is introduced where the desired LB type may differ from the actual state. Consequently, the LB type status may no longer reliably reflect the actual state as it previously did, potentially surprising users who mistakenly assumed it always indicated the current state.
IsProxyProtocolNeeded must get the LB Type from the service, as the status now may not accurately reflect the actual LB Type during an update. However, we can't rely only on the service because it doesn’t exist during the initial bootstrap (and potentially when it gets deleted). In that case, it's safe to use the status to determine the anticipated LB Type. Without this commit results in a misalignment of the proxy protocol value between the load balancer and the router deployment when an LB type change is pending. This commit will be squashed into the previous one, as both are dependent on each other. I've kept them separate for easier review.
Previously, the fix for OCPBUGS-38217 cleared inactive LB parameters in the status in setDefaultPublishingStrategy. This was valid at the time, as the desired LB type would always become the actual LB type. However, with the introduction of pending LB type changes, setDefaultPublishingStrategy was now prematurely clearing parameters for the current (but soon-to-be inactive) LB type. This caused issues because certain LB parameters, such as classicLoadBalancer.subnets or networkLoadBalancer.eipAllocations, reflect the actual state. As a result, setDefaultPublishingStrategy was clearing them too early, without knowing a type change was pending. The solution is to move the clearing logic to syncIngressControllerStatus in status.go, where it can detect pending LB type changes and avoid clearing parameters for the still-active LB type. This means that during a pending type change, both networkLoadBalancer and classicLoadBalancer parameters are present. This commit will be squashed into the previous one, as both are dependent on each other. I've kept them separate for easier review.
b27545b
to
1a57afd
Compare
New changes are detected. LGTM label has been removed. |
@gcs278: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@@ -23,8 +24,6 @@ import ( | |||
"k8s.io/apimachinery/pkg/api/errors" | |||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | |||
kerrors "k8s.io/apimachinery/pkg/util/errors" | |||
"k8s.io/apimachinery/pkg/util/sets" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import got erroneously moved.
if _, ok := platformsWithMutableScope[platform]; !ok { | ||
for name := range annotation { | ||
if result[platform] == nil { | ||
result[platform] = sets.New[string]() | ||
} | ||
result[platform].Insert(name) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little easier to read:
if _, ok := platformsWithMutableScope[platform]; !ok { | |
for name := range annotation { | |
if result[platform] == nil { | |
result[platform] = sets.New[string]() | |
} | |
result[platform].Insert(name) | |
} | |
} | |
if _, ok := platformsWithMutableScope[platform]; ok { | |
continue | |
} | |
for name := range annotation { | |
if result[platform] == nil { | |
result[platform] = sets.New[string]() | |
} | |
result[platform].Insert(name) | |
} |
for platform, annotation := range InternalLBAnnotations { | ||
if _, ok := platformsWithMutableScope[platform]; !ok { | ||
for name := range annotation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for platform, annotation := range InternalLBAnnotations { | |
if _, ok := platformsWithMutableScope[platform]; !ok { | |
for name := range annotation { | |
for platform, annotations := range InternalLBAnnotations { | |
if _, ok := platformsWithMutableScope[platform]; !ok { | |
for name := range annotations { |
t.Errorf("expected reason %s, got %s", tc.expectChangedAnnotations, changedAnnotations) | ||
} | ||
if !cmp.Equal(changedFields, tc.expectChangedFields, sliceCmpOpts...) { | ||
t.Errorf("expected reason %s, got %s", tc.expectChangedFields, changedFields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.Errorf("expected reason %s, got %s", tc.expectChangedAnnotations, changedAnnotations) | |
} | |
if !cmp.Equal(changedFields, tc.expectChangedFields, sliceCmpOpts...) { | |
t.Errorf("expected reason %s, got %s", tc.expectChangedFields, changedFields) | |
t.Errorf("expected changedAnnotations %s, got %s", tc.expectChangedAnnotations, changedAnnotations) | |
} | |
if !cmp.Equal(changedFields, tc.expectChangedFields, sliceCmpOpts...) { | |
t.Errorf("expected changedFields %s, got %s", tc.expectChangedFields, changedFields) |
@@ -774,7 +792,7 @@ func loadBalancerServiceIsUpgradeable(ic *operatorv1.IngressController, deployme | |||
// Since the status logic runs after the controller sets the annotations, it checks for | |||
// any discrepancy (in case modified) between the desired annotation values of the controller | |||
// and the current annotation values. | |||
changed, updated := loadBalancerServiceAnnotationsChanged(current, desired, managedLoadBalancerServiceAnnotations) | |||
changed, updated, _ := loadBalancerServiceAnnotationsChanged(current, desired, managedLBServiceAnnotations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed, updated, _ := loadBalancerServiceAnnotationsChanged(current, desired, managedLBServiceAnnotations) | |
changed, _, updated := loadBalancerServiceAnnotationsChanged(current, desired, managedLBServiceAnnotations) |
changedMsg := fmt.Sprintf("The IngressController scope was changed from %q to %q.", haveScope, wantScope) | ||
err := fmt.Errorf(changedMsg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want double format expansion, do we?
changedMsg := fmt.Sprintf("The IngressController scope was changed from %q to %q.", haveScope, wantScope) | |
err := fmt.Errorf(changedMsg) | |
err := fmt.Errorf("The IngressController scope was changed from %q to %q.", haveScope, wantScope) | |
changedMsg := err.Error() |
or
changedMsg := fmt.Sprintf("The IngressController scope was changed from %q to %q.", haveScope, wantScope) | |
err := fmt.Errorf(changedMsg) | |
changedMsg := fmt.Sprintf("The IngressController scope was changed from %q to %q.", haveScope, wantScope) | |
err := fmt.Errorf("%s", changedMsg) |
or import "errors"
and then
changedMsg := fmt.Sprintf("The IngressController scope was changed from %q to %q.", haveScope, wantScope) | |
err := fmt.Errorf(changedMsg) | |
changedMsg := fmt.Sprintf("The IngressController scope was changed from %q to %q.", haveScope, wantScope) | |
err := errors.New(changedMsg) |
haveLBType := getAWSLoadBalancerTypeFromServiceAnnotation(service) | ||
if platform.Type == configv1.AWSPlatformType { | ||
// Tricky: LB Type in the status indicates the *desired* scope (user desire + defaulting). | ||
wantLBType := getAWSLoadBalancerTypeInStatus(ic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is confusing. Why is haveLBType
computed outside the if
statement and wantLBType
inside? I think haveLBType
will give operatorv1.AWSClassicLoadBalancer
on non-AWS platforms, which is kind of silly. Moreover, it seems to me that all the AWS-specific code (for LB type, subnets, and EIP) should be inside a single if platform.Type == configv1.AWSPlatformType {
block.
if _, ok := platformsWithMutableScope[platform.Type]; !ok { | ||
err = fmt.Errorf("%[1]s To effectuate this change, you must delete the service: `oc -n %[2]s delete svc/%[3]s`; the service load-balancer will then be deprovisioned and a new one created. This will most likely cause the new load-balancer to have a different host name and IP address from the old one's. Alternatively, you can revert the change to the IngressController: `oc -n openshift-ingress-operator patch ingresscontrollers/%[4]s --type=merge --patch='{\"spec\":{\"endpointPublishingStrategy\":{\"loadBalancer\":{\"scope\":\"%[5]s\"}}}}'`", err.Error(), service.Namespace, service.Name, ic.Name, haveScope) | ||
ocPatchRevertCmd := fmt.Sprintf("oc -n openshift-ingress-operator patch ingresscontrollers/%[1]s --type=merge --patch='{\"spec\":{\"endpointPublishingStrategy\":{\"loadBalancer\":{\"scope\":\"%[2]s\"}}}}'", ic.Name, haveScope) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ocPatchRevertCmd := fmt.Sprintf("oc -n openshift-ingress-operator patch ingresscontrollers/%[1]s --type=merge --patch='{\"spec\":{\"endpointPublishingStrategy\":{\"loadBalancer\":{\"scope\":\"%[2]s\"}}}}'", ic.Name, haveScope) | |
ocPatchRevertCmd := fmt.Sprintf(`oc -n openshift-ingress-operator patch ingresscontrollers/%[1]s --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"loadBalancer":{"scope":"%[2]s"}}}}'`, ic.Name, haveScope) |
if _, ok := platformsWithMutableScope[platform.Type]; !ok { | ||
err = fmt.Errorf("%[1]s To effectuate this change, you must delete the service: `oc -n %[2]s delete svc/%[3]s`; the service load-balancer will then be deprovisioned and a new one created. This will most likely cause the new load-balancer to have a different host name and IP address from the old one's. Alternatively, you can revert the change to the IngressController: `oc -n openshift-ingress-operator patch ingresscontrollers/%[4]s --type=merge --patch='{\"spec\":{\"endpointPublishingStrategy\":{\"loadBalancer\":{\"scope\":\"%[5]s\"}}}}'`", err.Error(), service.Namespace, service.Name, ic.Name, haveScope) | ||
ocPatchRevertCmd := fmt.Sprintf("oc -n openshift-ingress-operator patch ingresscontrollers/%[1]s --type=merge --patch='{\"spec\":{\"endpointPublishingStrategy\":{\"loadBalancer\":{\"scope\":\"%[2]s\"}}}}'", ic.Name, haveScope) | ||
err = fmt.Errorf(effectuateMessage(changedMsg, ocPatchRevertCmd, service)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have extra format expansion again, here and everywhere where you have fmt.Errorf(effectuateMessage(...))
.
expect: false, | ||
}, | ||
{ | ||
description: "if the service.beta.kubernetes.io/aws-load-balancer-type is added", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: "if the service.beta.kubernetes.io/aws-load-balancer-type is added", | |
description: "if the service.beta.kubernetes.io/aws-load-balancer-type annotation is added", |
@@ -1370,19 +1370,23 @@ func TestAWSResourceTagsChanged(t *testing.T) { | |||
assertLoadBalancerServiceAnnotationWithPollImmediate(t, kclient, defaultIC, awsLBAdditionalResourceTags, expectedTags) | |||
} | |||
|
|||
// TestAWSLBTypeChange verifies that process of changing the IngressController's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TestAWSLBTypeChange verifies that process of changing the IngressController's | |
// TestAWSLBTypeChange verifies the process of changing the IngressController's |
}, | ||
{ | ||
description: "loadbalancer strategy with ELB, but a service with NLB shouldn't use PROXY", | ||
strategy: &loadBalancerStrategyWithNLB, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strategy: &loadBalancerStrategyWithNLB, | |
strategy: &loadBalancerStrategy, |
based on the description of the test case.
want, desired, err := desiredLoadBalancerService(ic, deploymentRef, platform, subnetsAWSEnabled, eipAllocationsAWSEnabled) | ||
proxyNeeded, err := IsProxyProtocolNeeded(ic, platform, current) | ||
if err != nil { | ||
return fmt.Errorf("failed to determine if proxy protocol is needed for ingresscontroller %s/%s: %v", ic.Namespace, ic.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("failed to determine if proxy protocol is needed for ingresscontroller %s/%s: %v", ic.Namespace, ic.Name, err) | |
return fmt.Errorf("failed to determine if proxy protocol is needed for ingresscontroller %s/%s: %w", ic.Namespace, ic.Name, err) |
|
||
proxyNeeded, err := IsProxyProtocolNeeded(ci, platformStatus, currentLBService) | ||
if err != nil { | ||
errs = append(errs, fmt.Errorf("failed to determine if proxy protocol is needed for ingresscontroller %s/%s: %v", ci.Namespace, ci.Name, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errs = append(errs, fmt.Errorf("failed to determine if proxy protocol is needed for ingresscontroller %s/%s: %v", ci.Namespace, ci.Name, err)) | |
errs = append(errs, fmt.Errorf("failed to determine if proxy protocol is needed for ingresscontroller %s/%s: %w", ci.Namespace, ci.Name, err)) |
{ | ||
name: "loadbalancer type changed from ELB to NLB, with old ELB parameters removal", | ||
ic: makeIC(spec(nlb()), status(elbWithNullParameters())), | ||
expectedResult: true, | ||
expectedIC: makeIC(spec(nlb()), status(nlbWithNullParameters())), | ||
domainMatchesBaseDomain: true, | ||
}, | ||
{ | ||
name: "loadbalancer type changed from NLB to ELB, with old NLB parameters removal", | ||
ic: makeIC(spec(elb()), status(nlbWithNullParameters())), | ||
expectedResult: true, | ||
expectedIC: makeIC(spec(elb()), status(elbWithNullParameters())), | ||
domainMatchesBaseDomain: true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you removed these test cases rather than changing them to match the new behavior? Granted, TestSetDefaultPublishingStrategyHandlesUpdates
is already a bit unwieldy, and you did add plenty of test coverage in this commit, and that should be adequate.
Thanks for the reviews @Miciah, I'm running low on time and don't think I'll be able to get to this today, so I think someone else will have to follow up, or we should merge as is. I have no preference. Let's be sure to coordinate this with @JoelSpeed and the merging of openshift/cluster-cloud-controller-manager-operator#362 before we merge this PR. This PR only is required when openshift/cluster-cloud-controller-manager-operator#362 is merged. |
Due to the AWS CCM leaking resources when the load balancer type is changed on a service, the cloud team is now blocking updates to the load balancer type. As a result, the Ingress Operator will require the service to be deleted and recreated when the Ingress Controller's load balancer type changes.
This change introduces a Progressing=True status message when the load balancer type is modified, instructing the user on how to effectuate the change. Additionally, the
service.beta.kubernetes.io/aws-load-balancer-type
annotation is now added to themanagedAtServiceCreationLBServiceAnnotations
map along with other annotations that require service deletion.This commit does not alter the function of the LB type in the status, which has always (incorrectly) represented the desired state. However, with this change, a new scenario is introduced where the desired LB type may differ from the actual state. Consequently, the LB type status may no longer reliably reflect the actual state as it previously did, potentially surprising users who mistakenly assumed it always indicated the current state.