Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: initial commit for deploy fix when having duplicate kinds #896

Closed
wants to merge 14 commits into from

Conversation

renescheepers
Copy link
Contributor

@renescheepers renescheepers commented Jul 13, 2022

crds/crd.yaml Outdated Show resolved Hide resolved
@renescheepers renescheepers force-pushed the renescheepers/fix_deploy_duplicate_kind branch from efd35c5 to 741c133 Compare July 13, 2022 13:41
@renescheepers renescheepers force-pushed the renescheepers/fix_deploy_duplicate_kind branch 20 times, most recently from 052d972 to 2edd0b3 Compare July 21, 2022 07:08
@renescheepers renescheepers self-assigned this Jul 21, 2022
@renescheepers renescheepers added the 🪲 bug Something isn't working label Jul 21, 2022
@renescheepers renescheepers marked this pull request as ready for review July 21, 2022 08:08
@renescheepers renescheepers requested a review from a team as a code owner July 21, 2022 08:08
@renescheepers renescheepers requested review from ethanaubuchon and camteasdale143 and removed request for a team July 21, 2022 08:08
@renescheepers renescheepers requested a review from a team July 26, 2022 18:58
@jpfourny
Copy link
Contributor

FYI: You will need to merge with this PR: #900
Version 2.4.9 will be cut from that, shortly.

@renescheepers renescheepers force-pushed the renescheepers/fix_deploy_duplicate_kind branch 2 times, most recently from cfa48a9 to eb4971c Compare July 27, 2022 18:39
@renescheepers renescheepers force-pushed the renescheepers/fix_deploy_duplicate_kind branch from eb4971c to f0e0da7 Compare July 29, 2022 12:40
Comment on lines 1829 to 1836
def test_duplicate_kind_resource_definition
result = deploy_global_fixtures("crd", subset: ["deployment.yml"])
assert_deploy_success(result)

result = deploy_fixtures("crd", subset: ["web.yml"], global_timeout: 30)
assert_deploy_success(result)
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is passing just fine on master branch, I don't think it's forcing the failure you think it is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now, the testing code applies a prefix to the Kind so in that case there is no duplicate.

@@ -37,12 +37,39 @@ def fetch_resources(namespaced: false)
end.compact.uniq { |r| "#{r['apigroup']}/#{r['kind']}" }
end

def fetch_group_kinds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of parsing the tabular information from api-resources, can you collect this from the calls we make during fetch_resources above? I believe api-resources is re-making those exact same calls, and it can be very expensive in large clusters. Furthermore, upstream makes no stability guarantees on tabular data output (vs json or yaml).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does indeed do separate calls in the background. Let me check if I can re-use the existing calls.

def kind
name.demodulize
# Converts Krane::ApiextensionsK8sIo::CustomResourceDefinition to CustomResourceDefinition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't name.demodulize still do what you want? I also have the same question about why we aren't using the information from the definition, like at L51... though in this case you're following what we (probably I) originally did. Something has gone very wrong during instantiation if the two don't match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the class is called where is no instance is present. Here for example:

@@ -503,6 +547,36 @@ def selected?(selector)
selector.nil? || selector.to_h <= labels
end

def self.group_from_api_version(input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add these separately instead of with the rest of the class methods in class << self above?

field_part = FIELDS.map { |f| "{{#{f}}}" }.join(%({{print "#{FIELD_SEPARATOR}"}}))
%({{range .items}}#{condition_start}#{field_part}{{print "#{EVENT_SEPARATOR}"}}{{end}}{{end}})
%({{range .items}}#{and_conditions_string}#{field_part}{{print "#{EVENT_SEPARATOR}"}}#{ends}{{end}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the change to nested syntax required? does it not lazy evaluate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the older version of Kubectl (which we still support) are compiled with Go 1.17. Only since 1.18 it evaluates the statements correctly.

https://tip.golang.org/doc/go1.18#:~:text=test.fuzzminimizetime.-,text/template,-Within%20a%20range

# frozen_string_literal: true

module Krane
class APIResource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this class? Krane::Resource also represents API resources

Copy link
Contributor Author

@renescheepers renescheepers Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only for containing the response of the api resources call, so I don't have to return a hash. Could think of a better name though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, in that case we shouldn't need it, because per my other comment we should be using the discovery information directly instead of re-making the calls via kubectl and parsing tabular output. But if that really isn't possible for some reason, I'd make this class private to the class that needs it, to avoid confusion.

return ""
end

# Converts Krane::ApiextensionsK8sIo::CustomResourceDefinition to apiextensions.k8s.io
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why extract from the class name instead of the apiVersion? In fact, could we save the extraction at L50 in an instance variable and use an accessor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API version isn't always available, for example here. There is no instance available to fetch this information for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right we're still in the class methods here 🤦‍♀️

r = KubernetesResource.build(context: context, logger: logger, definition: r_def,
crd: crd, global_names: global_kinds, statsd_tags: statsd_tags)
crd: crd, group_kinds: group_kinds, statsd_tags: statsd_tags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this is only used for the calculation of whether the resource is global. Could it make sense to do that here instead and make more use of cluster_resource_discoverer? E.g. have a cluster_resource_discoverer.global_resource?(group, kind).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that does make sense. I tried to make the minimum amount of changes, so that most stays the same.

@@ -167,6 +167,12 @@ def identify_target_deployments(selector: nil)
apps_v1_kubeclient.get_deployments(namespace: @namespace, label_selector: selector_string)
end
deployments.select { |d| d.dig(:metadata, :annotations, ANNOTATION) }
.map do |d|
d["apiVersion"] = "apps/v1"
d["kind"] = "Deployment"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to set/override these now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response doesn't contain these values and we now depend on these values being set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, we got them from Kubeclient instead of kubectl here. Related issue: ManageIQ/kubeclient#368

@@ -238,18 +210,22 @@ def secrets_from_ejson
def discover_resources
@logger.info("Discovering resources:")
resources = []
crds_by_kind = cluster_resource_discoverer.crds.group_by(&:kind)
crds_grouped = cluster_resource_discoverer.crds.group_by(&:cr_group_kind)
group_kinds = @task_config.group_kinds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within task config, you get this from an instance of cluster_resource_discoverer. Why not call it directly here? Since that class controls expensive operations, I'd think we should reuse a single instance as much as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code did somewhat the same, thats why I didn't change it at first. Changing now.

r = KubernetesResource.build(namespace: @namespace, context: @context, logger: @logger, definition: r_def,
statsd_tags: @namespace_tags, crd: crd, global_names: @task_config.global_kinds)
statsd_tags: @namespace_tags, crd: crd, group_kinds: group_kinds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in global_deploy_task about using cluster_resource_discoverer here instead of passing the list of group kinds to build.

@KnVerey
Copy link
Contributor

KnVerey commented Aug 16, 2022

fix: all tests serial

I don't see this actually changing tests to run differently, but the commit message caught my eye. It's important that Krane be able to run multiple deploys currently, even when used as a gem. So just wanted to make super sure we are aware of that. 🙏

@renescheepers
Copy link
Contributor Author

renescheepers commented Aug 17, 2022

fix: all tests serial

I don't see this actually changing tests to run differently, but the commit message caught my eye. It's important that Krane be able to run multiple deploys currently, even when used as a gem. So just wanted to make super sure we are aware of that. 🙏

Yes, because the tests take a long time I went through each test-suite separately. One of those is integration-serial so while I didn't specifically touch any files there, I needed to do these fixes to have these tests pass.

@renescheepers renescheepers force-pushed the renescheepers/fix_deploy_duplicate_kind branch from 5ca1ff6 to 5868680 Compare August 18, 2022 20:31
@renescheepers renescheepers force-pushed the renescheepers/fix_deploy_duplicate_kind branch from 5868680 to a956fac Compare August 23, 2022 14:44
@gmarx-shopify gmarx-shopify removed their request for review November 21, 2022 12:37
@renescheepers
Copy link
Contributor Author

Not planning on merging this pull request. This pull request was needed to resolve some issues that would happen when used in combination with Crossplane. However we stopped using Crossplane. Even though it still would be nice to fix, it is way too risky to deploy since this is used everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪲 bug Something isn't working #gsd:24452
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Krane is unable to fully deploy resources when there are duplicate Kinds
5 participants