Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AUTH-541: OIDC structured auth config #713

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

liouk
Copy link
Member

@liouk liouk commented Oct 8, 2024

This PR adds a controller behind the ExternalOIDC feature gate that tracks the auth CR, and when auth type is configured to be OIDC, it:

  • creates a structured auth config object based on the auth CR and validates it
  • serializes it into JSON and stores it into a configmap
  • syncs that configmap into openshift-config-managed, where it will be picked up by the KAS-o and synced into a static file and passed on to the KAS pods

KAS-o functionality PR: openshift/cluster-kube-apiserver-operator#1760

Enhancement: openshift/enhancements#1632

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 8, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 8, 2024

@liouk: This pull request references AUTH-541 which is a valid jira issue.

In response to this:

This PR adds a controller behind the ExternalOIDC feature gate that tracks the auth CR, and when auth type is configured to be OIDC, it:

  • creates a structured auth config object based on the auth CR
  • serializes it into a configmap
  • syncs that configmap into openshift-config, where it will be picked up by the KAS-o and synced into a static file (not yet implemented)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from deads2k and ibihim October 8, 2024 12:52
@liouk liouk changed the title AUTH-541: OIDC structured auth config WIP: AUTH-541: OIDC structured auth config Oct 8, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 8, 2024
@liouk liouk force-pushed the oidc-config-structured-auth branch from 35b2d3d to 7e8ad90 Compare October 8, 2024 13:31
@liouk liouk changed the title WIP: AUTH-541: OIDC structured auth config AUTH-541: OIDC structured auth config Oct 8, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 8, 2024
@liouk liouk force-pushed the oidc-config-structured-auth branch from 7e8ad90 to 31e7cc5 Compare October 10, 2024 13:13
@liouk liouk changed the title AUTH-541: OIDC structured auth config WIP: AUTH-541: OIDC structured auth config Oct 10, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 10, 2024
@liouk liouk force-pushed the oidc-config-structured-auth branch 5 times, most recently from f066dae to c4f822c Compare October 17, 2024 13:00
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 18, 2024
@liouk liouk force-pushed the oidc-config-structured-auth branch 2 times, most recently from 8ca7a87 to 36db406 Compare October 22, 2024 10:24
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 22, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 22, 2024

@liouk: This pull request references AUTH-541 which is a valid jira issue.

In response to this:

This PR adds a controller behind the ExternalOIDC feature gate that tracks the auth CR, and when auth type is configured to be OIDC, it:

  • creates a structured auth config object based on the auth CR and validates it
  • serializes it into a configmap
  • syncs that configmap into openshift-config-managed, where it will be picked up by the KAS-o and synced into a static file (not yet implemented) and passed on to the KAS pods

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@liouk liouk force-pushed the oidc-config-structured-auth branch 5 times, most recently from 81af1fc to 46663cd Compare October 28, 2024 09:06
@liouk liouk changed the title WIP: AUTH-541: OIDC structured auth config AUTH-541: OIDC structured auth config Oct 28, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 28, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 28, 2024

@liouk: This pull request references AUTH-541 which is a valid jira issue.

In response to this:

This PR adds a controller behind the ExternalOIDC feature gate that tracks the auth CR, and when auth type is configured to be OIDC, it:

  • creates a structured auth config object based on the auth CR and validates it
  • serializes it into JSON and stores it into a configmap
  • syncs that configmap into openshift-config-managed, where it will be picked up by the KAS-o and synced into a static file (not yet implemented) and passed on to the KAS pods

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@liouk liouk force-pushed the oidc-config-structured-auth branch from bca8702 to 5457a34 Compare November 19, 2024 17:09
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 19, 2024
Copy link
Contributor

openshift-ci bot commented Nov 19, 2024

New changes are detected. LGTM label has been removed.

@liouk liouk force-pushed the oidc-config-structured-auth branch from 5457a34 to a11dfa2 Compare November 19, 2024 17:55
}

if !featureGates.Enabled(features.FeatureGateExternalOIDC) {
return nil, nil, nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to set up a child context with cancellation so that the feature gate accessor that is launched above can terminate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to keep the accessor running so that it can os.Exit if the feature gates change?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I saw that nothing else was using it and did not make the connection that the comment at the top was saying that this all depends on the exit-on-feature-change behavior.

if err != nil {
return fmt.Errorf("could not marshal auth config into JSON: %v", err)
}
authConfigJSON := strings.TrimSpace(string(encoded))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the JSON serializer appends a trailing newline, but what problem was that causing?


var (
cfgScheme = runtime.NewScheme()
codecs = serializer.NewCodecFactory(cfgScheme, serializer.EnableStrict)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The EnableStrict option isn't really doing anything for you since you are only building an encoder from this.

var (
cfgScheme = runtime.NewScheme()
codecs = serializer.NewCodecFactory(cfgScheme, serializer.EnableStrict)
serializerInfo, _ = runtime.SerializerInfoForMediaType(codecs.SupportedMediaTypes(), runtime.ContentTypeJSON)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Best to check the second return value and panic on false in case this should somehow break. That would be a clearer failure than a panic later inside of runtime.Encode.

return err
}

encoded, err := runtime.Encode(codecs.EncoderForVersion(serializerInfo.Serializer, apiserverv1beta1.ConfigSchemeGroupVersion), authConfig)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, but I'm also unsure what benefit setting up and using a CodecFactory is providing. We're not converting or defaulting anything and we always want it to use JSON. May as well use https://pkg.go.dev/k8s.io/apimachinery/pkg/util/json#Marshal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this might be my fault. I mentioned that I am not sure if using the api machinery is better / more idiomatic than json.Marshal.

@liouk's original solution was based on json.Marshal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main benefit was to serialize the data in a more structured way, e.g. without having to define the type meta manually. But it gets more complicated than what I thought, so I will revert back to using json.Marshal.

}

cm := corev1ac.ConfigMap(targetAuthConfigCMName, managedNamespace).WithData(map[string]string{authConfigDataKey: authConfigJSON})
if _, err := c.configMaps.ConfigMaps(managedNamespace).Apply(ctx, cm, metav1.ApplyOptions{FieldManager: c.name}); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is another field manager going to be writing to the same configmap? If not, it is probably reasonable to set Force: true to allow stomping conflicts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the CAO must be the only one -- great point.

return fmt.Errorf("auth config validation failed: %v", errList)
}

cm := corev1ac.ConfigMap(targetAuthConfigCMName, managedNamespace).WithData(map[string]string{authConfigDataKey: authConfigJSON})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been trying to reduce no-op Apply requests by extracting the current apply configuration from the local informer's cache with https://pkg.go.dev/k8s.io/client-go/applyconfigurations/core/v1#ExtractConfigMap and doing a apiequality.Semantic.DeepEqual between them first. I haven't heard of any issues arising from that approach yet, does it make sense to do it here to avoid making a write on every resync?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the intention of this check: https://github.com/openshift/cluster-authentication-operator/pull/713/files#diff-3c99f304cc2949488aa2fa2b8aea2d7e8ddb0c8baa42e66f128eacd7cdbda11aR135-R137

	if existingCM != nil && existingCM.Data[authConfigDataKey] == authConfigJSON {
		return nil
	}

Do you think the DeepEqual() is preferable?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the DeepEqual would be preferable because it would make updates in the future easier.

As an example, if in the future we needed to add new data to the ConfigMap, we would need to add a new check to prevent the no-op apply and add a new entry in the applyconfiguration we are creating.

If we had a single place to update our desired apply configuration (maybe a function?) we could simply do the semantic equality check between our desired state and the extracted apply configuration to determine if we need to do the apply operation.

jwt.Issuer.CertificateAuthority = caData
}

switch provider.ClaimMappings.Username.PrefixPolicy {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a new valid option is added in the future? Can there be a period of time during an upgrade from N to N+1 where there is a cluster-authentication-operator at N and a CRD at N+1? I would include a default case to avoid all doubt.

Comment on lines 230 to 232
// TODO currently validations from k8s.io/apiserver/pkg/apis/apiserver/validation cannot be used here
// since they aren't defined for the beta type; once the feature goes out of beta, we should replace
// this func with the upstream validations (but keep CA cert validation)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're defined for the internal type, would it be easier to convert to internal and use those? This seems to be how it is done for unserved APIs like the apiserver configuration files, I don't see any external-versioned validations there.

Also, what happens if there is a bug in the validation used by cluster-authentication-operator that causes the validation to be overly strict? Even if this changes to use the validations from k8s.io/apiserver, this can be out of sync with whatever a particular kube-apiserver was compiled against. This is a form of client-side validation, which we have been trying to move away from. Revisioned kube-apiserver rollouts should mitigate the risk of writing an invalid config here, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I see the point in keeping validations at the server-side, especially with revisioned rollouts being in place.

However, the KAS pods do not really validate the CA cert, if specified. If the CA cert is not the correct one, the KAS pods will log an error but will not crash, so the rollout will be completed correctly. Therefore I'm considering keeping the CA cert validation at the CAO side, but dropping the rest. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a fixup 20a1f72 that demonstrates what I described above. Placing a hold until this gets squashed or dropped.

/hold

@liouk liouk force-pushed the oidc-config-structured-auth branch from a11dfa2 to 20a1f72 Compare November 21, 2024 14:08
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 21, 2024
@liouk liouk force-pushed the oidc-config-structured-auth branch from 20a1f72 to 08155bf Compare November 21, 2024 14:16
@xingxingxia
Copy link
Contributor

From test result perspective, based on good pre-merge test results in https://issues.redhat.com/browse/OCPBUGS-44592?focusedId=26134688&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-26134688 , adding below label:
/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Nov 22, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 22, 2024

@liouk: This pull request references AUTH-541 which is a valid jira issue.

In response to this:

This PR adds a controller behind the ExternalOIDC feature gate that tracks the auth CR, and when auth type is configured to be OIDC, it:

  • creates a structured auth config object based on the auth CR and validates it
  • serializes it into JSON and stores it into a configmap
  • syncs that configmap into openshift-config-managed, where it will be picked up by the KAS-o and synced into a static file and passed on to the KAS pods

KAS-o functionality PR: openshift/cluster-kube-apiserver-operator#1760

Enhancement: openshift/enhancements#1632

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@liouk
Copy link
Member Author

liouk commented Nov 22, 2024

/retest

…or OIDC

Changes as per review from everettraven.
@liouk
Copy link
Member Author

liouk commented Jan 28, 2025

Note to self: commit 57c8bbc needs to be split, and controller functionality to be merged back on f8f5973.

Copy link
Contributor

openshift-ci bot commented Jan 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ibihim, liouk
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

liouk added 2 commits January 29, 2025 10:39
…or OIDC

Changes as per review from everettraven.
…or OIDC

Changes as per review from everettraven.
Copy link

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Have a handful of new comments

}

return nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unnecessary newline

Comment on lines 192 to 193
case configv1.NoPrefix:
jwt.ClaimMappings.Username.Prefix = ptr.To("-")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the prefix actually be set to "-" in this case? Looking at https://github.com/kubernetes/apiserver/blob/c09fadd4305dde0b8861d8a7595690800a4a0db0/pkg/apis/apiserver/v1beta1/types.go#L332-L333 it seems like using the flags, "-" signalled no prefix but for equivalent mapping in the structured authentication config it should be ""?

Copy link
Member Author

@liouk liouk Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a great find. You're right -- we must translate NoPrefix to "". But that's not the only issue; we also need to translate NoOpinion to a special prefix logic.

From here:

--oidc-username-prefix="" and --oidc-username-claim != "email", prefix was "<value of --oidc-issuer-url>#". For the same behavior using authentication config, set username.prefix="<value of issuer.url>#"

Therefore, NoOpinion means:

  • set username.prefix = "", if username.claim == "email"
  • set username.prefix = "<issuer.URL>#" otherwise

This has been accounted for in our authentication config API (and here).

This seems to be an issue in hypershift as well; will look further into that.

Pushed a fix for this in 5fecaf6.

},
}

if len(provider.Issuer.Audiences) > 0 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this if block necessary?

It looks like the jwt.Issuer.Audiences is required , and the provider.Issuer.Audiences is required with a minimum items of 1 so it should be safe to avoid this conditional and just run the logic that is in the conditional block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's a for loop that appends the strings, it's not needed anyways. Will remove!

Comment on lines +108 to +110
if err := validateAuthConfig(*authConfig); err != nil {
return fmt.Errorf("auth config validation failed: %v", err)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be done prior to generating the ApplyConfiguration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, we decided to keep this sequence due to the fact that the validation performs a network call to the provider to grab the server certificates, which we can avoid if the two configs are equivalent.

Comment on lines 234 to 248
existingCM, err := c.configMapLister.ConfigMaps(managedNamespace).Get(targetAuthConfigCMName)
if err != nil && !apierrors.IsNotFound(err) {
return nil, fmt.Errorf("could not retrieve auth configmap %s/%s to check data before sync: %v", managedNamespace, targetAuthConfigCMName, err)
}

if existingCM != nil {
existingCMApplyConfig, err := corev1ac.ExtractConfigMap(existingCM, c.name)
if err != nil {
return nil, fmt.Errorf("could not extract ConfigMap apply configuration: %v", err)
}

if equality.Semantic.DeepEqual(existingCMApplyConfig.Data, expectedCMApplyConfig.Data) {
return nil, nil
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including the fetching of the existing ConfigMap and the equality check in getExpectedApplyConfig feels like it overloads this method a bit.

What do you think of separating the fetching of the existing ApplyConfiguration into a separate method and putting the equality check in the sync method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e8dfce7

liouk added 2 commits January 31, 2025 13:37
…or OIDC

Changes as per review from everettraven.
…or OIDC

Changes as per review from everettraven.

case configv1.NoOpinion:
prefix := ""
if provider.ClaimMappings.Username.Claim == "email" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood #713 (comment) correctly, this should be set to provider.Issuer.URL + # if the username claim is not email:

Suggested change
if provider.ClaimMappings.Username.Claim == "email" {
if provider.ClaimMappings.Username.Claim != "email" {

Copy link
Contributor

openshift-ci bot commented Jan 31, 2025

@liouk: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/test-operator-integration 5fecaf6 link false /test test-operator-integration
ci/prow/unit 5fecaf6 link true /test unit
ci/prow/okd-scos-e2e-aws-ovn 5fecaf6 link false /test okd-scos-e2e-aws-ovn
ci/prow/verify 5fecaf6 link true /test verify

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants