Aggregator by Default #2886

michaelmdresser · 2023-12-19T19:45:53Z

What does this PR change?

Does this PR relate to any other PRs?

How does this PR impact users? (This is the kind of thing that goes in release notes!)

Kubecost's new query backend "Aggregator" now serves API requests in all configurations. You will notice a new container running in the cost-analyzer Pod by default. If you are an Enterprise Thanos user, follow this guide TODO for necessary configuration changes. If you are an Enterprise Federated ETL user, follow this guide TODO for necessary configuration changes. If you are a Free user, no configuration change is required. This does not introduce a new container image.
Cloud Cost data gathering now runs in a separate container (within cost-analyzer) by default. If using an Enterprise Aggregator configuration, Cloud Cost will run in a one-replica Deployment. No configuration or value change is required. This does not introduce a new container image.
Removes the "HA" (and associated StatefulSet) mode for cost-analyzer. To deploy a highly-available backend, contact Kubecost Support about using the Enterprise Aggregator configuration with replicas > 1. If the cost-model's low-footprint metric building has an HA requirement in your environment, contact Kubecost Enterprise Support so we can learn more.
The federatedETL.primaryCluster value has been removed. In an Enterprise Federated ETL configuration all cost-model containers should behave as if they were secondaries. Aggregator now serves combined queries. If you are an Enterprise Federated ETL user, follow this guide TODO for necessary configuration changes.
The Federator component (enabled via federatedETL.federator.enabled) has been removed. In an Enterprise Federated ETL configuration all queries and data-combination is handled by Aggregator. If you are an Enterprise Federated ETL user, follow this guide TODO for necessary configuration changes.
The kubecostModel.warmSavingsCache value has been removed. If you need to disable the savings cache in Aggregator, set the env var SAVINGS_ENABLED to false.
Query Service (Replicas) has been removed. Aggregator now takes its place, serving queries faster with a lower resource footprint. If you are an Enterprise Federated ETL user, follow this guide TODO for necessary configuration changes.
Unsupported Helm values for an old metric storage method have been removed. No impact is expected.

Links to Issues or tickets this PR addresses or fixes

Main PR for https://kubecost.atlassian.net/browse/SELFHOST-1043

What risks are associated with merging this PR? What is required to fully test this PR?

How was this PR tested?

Have you made an update to documentation? If so, please provide the corresponding PR.

WIP

gitguardian · 2023-12-19T19:45:58Z

⚠️ GitGuardian has uncovered 4 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request

GitGuardian id	Secret	Commit	Filename
7414	Google API Key	`fb28d67`	cost-analyzer/templates/_helpers.tpl	View secret
7414	Google API Key	`3c2979e`	cost-analyzer/templates/query-service-deployment-template.yaml	View secret
7414	Google API Key	`c9e8622`	cost-analyzer/templates/_helpers.tpl	View secret
7414	Google API Key	`c9e8622`	cost-analyzer/templates/aggregator-statefulset.yaml	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secrets safely. Learn here the best practices.
Revoke and rotate these secrets.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!}

chipzoller · 2023-12-19T20:26:14Z

Please also remember to check/update the files required for CI processes:

michaelmdresser · 2023-12-19T21:07:09Z

I'm still working out some things on the backend code which is causing chart tests to fail. While I'm working through testing, @chipzoller are you willing to give this PR a review while it is in a draft state? I'd like some feedback from your Helm expertise on things in here, particularly my use of the _helpers.tpl file to establish some guards and to make a common ContainerSpec which is being shared by multiple deployment methods. Are those bad practices? Is there a better way?

chipzoller

These helper template changes seem extremely over-the-top. What problem are you looking to solve here that you think isn't appropriate for the templates?

michaelmdresser · 2023-12-19T22:03:38Z

What problem are you looking to solve here that you think isn't appropriate for the templates?

There are two containers, Aggregator and Cloud Cost, which must be deployable either within the cost-analyzer Pod or under their own controllers (a StatefulSet and Deployment, respectively). The design is diagrammed out in this doc if you'd like to have a look.

I'm using the templates aggregator.containerTemplate and aggregator.cloudCost.containerTemplate defined in _helpers.tpl to centralize the configuration of these containers to simplify development and hopefully stave off drift. Is there a better way to do this?

nik-kc · 2023-12-20T18:59:15Z

Looking good so far! Fills me with joy to see such a high deletion to addition ratio in a PR.

While you're in here performing this overhaul, what are your thoughts on more thorough automated testing of the helm chart? I know we perform linting, but I feel it would be useful to have the ability to run automated validations on the chart with a wide array of values, in order to verify template correctness for a variety of install configurations.

chipzoller · 2023-12-20T19:05:01Z

@nik-kc, I have already set this up and it is being done on all PRs, including deployment and basic e2e smoke tests on real EKS and OpenShift clusters. See here.

@michaelmdresser, ok I understand the goals. I will give this a good look soon. I think the idea of creating a standard container template and stamping it out in the necessary Pod controller form factor is a good practice to reduce/eliminate drift. The challenge is ensuring that stays universal enough so controller-specific configs don't creep in.

ameijer

so much tech debt paid, bravo. that dead postgres config search and destroy was 💯 . I don't think the reliance on helpers.tpl is too onerous, especially considering it buys us something approaching 1k lines of avoided code duplication?

cost-analyzer/values.yaml

cost-analyzer/templates/cost-analyzer-deployment-template.yaml

cost-analyzer/templates/cost-analyzer-frontend-config-map-template.yaml

cost-analyzer/templates/cost-analyzer-deployment-template.yaml

cost-analyzer/templates/aggregator-cloud-cost-deployment.yaml

cost-analyzer/templates/_helpers.tpl

ameijer

so this LGTM at this point. Will leave it to @chipzoller to approve after his review since he said he wanted to put eyes on

chipzoller

Michael, I've given this a good look through (yet have not verified the end results myself...will wait for final review), and you've done some really nice work here. I know how much time this took. Thank you.

I have added some relatively minor comments, requests, and questions throughout, but here are some of the more general thoughts/requests:

Also, assuming you have not done so, we'd appreciate it if you go through all the values files present in this repo and ensure they're in good shape based on these changes. I know you did that for the CI files (thanks!) but please check all others if not done.
Looks like you're doing it, but because I haven't done a thorough survey, please just double-check and make sure you've removed all defined templates corresponding to excised sections of the chart (ex., query service and postgres).
We need to include annotations and labels defined under global.podAnnotations and global.additionalLabels. We're terribly inconsistent about that, but "global" means "everywhere" and not selectively.
Any new fields added into the values file we (and, specifically, users) would appreciate comments for every one explaining what they do. We will eventually use this to build docs from this values file (a later initiative).
etlBucketConfigSecret is not defined in values.yaml yet is still referenced by templates in this PR implying it's still necessary. Kindly document.

cost-analyzer/values.yaml

cost-analyzer/templates/_helpers.tpl

cost-analyzer/templates/cost-analyzer-deployment-template.yaml

QSR was an optimization attempt made before Aggregator for high-scale environments. Aggregator's existence obsoletes QSR, so for this next release of Kubecost which changes the default install config we should remove QSR completely. Signed-off-by: Michael Dresser <[email protected]>

With the advent of Aggregator as the only query handler, there is no more need for the Federator component to combine federated bingen files. Signed-off-by: Michael Dresser <[email protected]>

Signed-off-by: Michael Dresser <[email protected]>

Postgres as a code path has been obsolete for years. It is not supported. Signed-off-by: Michael Dresser <[email protected]>

Signed-off-by: Michael Dresser <[email protected]>

Kubecost's Aggregator component is now the officially-supported method for highly-available query serving. It has the ability to be split into its own StatefulSet for managing multiple replica situations. The cost-analyzer Pod is now dedicated almost wholly to building metric data and thus no longer needs the highly-complex leader/follower configuration. Discussed with Bolt, Ajay, Alex Signed-off-by: Michael Dresser <[email protected]>

Because of file-based data sharing in the default free config, running Kubecost fully in-memory cannot be supported. Signed-off-by: Michael Dresser <[email protected]>

Signed-off-by: Michael Dresser <[email protected]>

Moved Aggregator container config to helpers so it can be shared by the cost-analyzer Pod and the dedicated StatefulSet. Signed-off-by: Michael Dresser <[email protected]>

Signed-off-by: Michael Dresser <[email protected]>

michaelmdresser · 2024-01-02T18:12:09Z

@chipzoller Thank you for the thorough review, it's exactly what I was looking for. I've addressed as much as I could and left questions on the rest of the feedback. Please let me know if I've misinterpreted any of your ideas.

chipzoller · 2024-01-02T19:53:52Z

Chart build is failing across all instances.

michaelmdresser · 2024-01-02T20:05:27Z

@chipzoller The failing tests are failing because https://github.com/kubecost/kubecost-cost-model/pull/2002 is required for this Helm change to work. Without specifically setting values, the Helm chart uses a v1.108 backend image which does not function in this configuration of the Helm chart, causing all tests to fail.

I believe it is still reviewable.

chipzoller · 2024-01-03T14:28:35Z

Michael, I think the best place to accept this PR is in the 2.0-helm-changes branch we have been working on since this is a 2.0 change. There have already been significant changes to the chart reflected in that branch. Can you rebase on top of it and merge there instead?

michaelmdresser · 2024-01-03T19:11:55Z

Closing in favor of #2898 which targets 2.0-helm-changes. Cleaning that up before review.

michaelmdresser force-pushed the mmd/default-aggregator branch from a152057 to 22e1cd2 Compare December 19, 2023 20:48

michaelmdresser changed the title ~~Aggregator by Default~~ HOLD UNTIL AFTER HOLIDAY (for nightly stability) | Aggregator by Default Dec 19, 2023

michaelmdresser requested a review from chipzoller December 19, 2023 21:07

chipzoller reviewed Dec 19, 2023

View reviewed changes

ameijer reviewed Dec 21, 2023

View reviewed changes

michaelmdresser force-pushed the mmd/default-aggregator branch 2 times, most recently from c5b447b to 5cc809e Compare December 21, 2023 18:37

ameijer reviewed Dec 21, 2023

View reviewed changes

chipzoller reviewed Dec 22, 2023

View reviewed changes

michaelmdresser changed the title ~~HOLD UNTIL AFTER HOLIDAY (for nightly stability) | Aggregator by Default~~ Aggregator by Default Jan 2, 2024

michaelmdresser marked this pull request as ready for review January 2, 2024 17:29

michaelmdresser added 13 commits January 2, 2024 09:45

Remove Federator

7caac30

With the advent of Aggregator as the only query handler, there is no more need for the Federator component to combine federated bingen files. Signed-off-by: Michael Dresser <[email protected]>

Remove kubecostModel.openSourceOnly option

20fbc9d

Signed-off-by: Michael Dresser <[email protected]>

Remove all Postgres references

45365b8

Postgres as a code path has been obsolete for years. It is not supported. Signed-off-by: Michael Dresser <[email protected]>

Nit: clean template spaces

4869651

Signed-off-by: Michael Dresser <[email protected]>

Remove etlToDisk option

8ebcb74

Because of file-based data sharing in the default free config, running Kubecost fully in-memory cannot be supported. Signed-off-by: Michael Dresser <[email protected]>

Move Jaeger container template to helpers

2e70420

Signed-off-by: Michael Dresser <[email protected]>

Make Aggregator usable in cost-analyzer or STS

c9e8622

Moved Aggregator container config to helpers so it can be shared by the cost-analyzer Pod and the dedicated StatefulSet. Signed-off-by: Michael Dresser <[email protected]>

Always disable savings cache in cost-model

c9bfb4a

Signed-off-by: Michael Dresser <[email protected]>

Make Aggregator service flexible, Pod vs STS

7cecb98

Signed-off-by: Michael Dresser <[email protected]>

Make Cloud Cost deployable in cost-analyzer

78530f1

Signed-off-by: Michael Dresser <[email protected]>

Remove Federated Primary option

0fc3401

Signed-off-by: Michael Dresser <[email protected]>

michaelmdresser added 13 commits January 2, 2024 09:45

Remove duplicate PVC definition in Aggregator STS

87b0561

Signed-off-by: Michael Dresser <[email protected]>

Remove unnecessary quotes

d404ce5

Signed-off-by: Michael Dresser <[email protected]>

Clean up resources in values.yaml

6b22498

Signed-off-by: Michael Dresser <[email protected]>

Fix templating to remove whitespace

f5634a1

Signed-off-by: Michael Dresser <[email protected]>

Remove unnecessary newlines

2fe29b4

Signed-off-by: Michael Dresser <[email protected]>

Fix indentation

fb28d67

Signed-off-by: Michael Dresser <[email protected]>

Remove quotes

992fc04

Signed-off-by: Michael Dresser <[email protected]>

Add empty resources block to cloudCost

b318c43

Signed-off-by: Michael Dresser <[email protected]>

Remove quotes

98144dc

Signed-off-by: Michael Dresser <[email protected]>

Format

e289c50

Signed-off-by: Michael Dresser <[email protected]>

Record cloudCost vars in values.yaml

28ba109

Signed-off-by: Michael Dresser <[email protected]>

Document persistentVolume.dbPVEnabled

416d037

Signed-off-by: Michael Dresser <[email protected]>

Document fullImageName

6876caa

Signed-off-by: Michael Dresser <[email protected]>

michaelmdresser force-pushed the mmd/default-aggregator branch from 5c180ae to 6876caa Compare January 2, 2024 17:45

michaelmdresser added 7 commits January 2, 2024 09:48

Document imageVersion

8a148a2

Signed-off-by: Michael Dresser <[email protected]>

Move subconfigs for clarity

5d1f91d

Signed-off-by: Michael Dresser <[email protected]>

Remove redocumented fullImageName

1d2a30a

Signed-off-by: Michael Dresser <[email protected]>

Template readinessProbe for aggregator and CC

6ea7a16

Signed-off-by: Michael Dresser <[email protected]>

Remove unnecessary whitespace

8482e40

Signed-off-by: Michael Dresser <[email protected]>

Use global.additionalLabels, aggregator templates

aed9428

Signed-off-by: Michael Dresser <[email protected]>

Add etlBucket, federatedStorage config in values

d6787a0

Signed-off-by: Michael Dresser <[email protected]>

michaelmdresser requested a review from chipzoller January 2, 2024 19:52

michaelmdresser mentioned this pull request Jan 3, 2024

Aggregator by Default - 2.0 #2898

Merged

michaelmdresser closed this Jan 3, 2024

michaelmdresser deleted the mmd/default-aggregator branch January 3, 2024 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregator by Default #2886

Aggregator by Default #2886

michaelmdresser commented Dec 19, 2023 •

edited

Loading

gitguardian bot commented Dec 19, 2023 •

edited

Loading

chipzoller commented Dec 19, 2023

michaelmdresser commented Dec 19, 2023

chipzoller left a comment

michaelmdresser commented Dec 19, 2023

nik-kc commented Dec 20, 2023

chipzoller commented Dec 20, 2023

ameijer left a comment

ameijer left a comment

chipzoller left a comment •

edited

Loading

michaelmdresser commented Jan 2, 2024

chipzoller commented Jan 2, 2024

michaelmdresser commented Jan 2, 2024

chipzoller commented Jan 3, 2024

michaelmdresser commented Jan 3, 2024

Aggregator by Default #2886

Aggregator by Default #2886

Conversation

michaelmdresser commented Dec 19, 2023 • edited Loading

What does this PR change?

Does this PR relate to any other PRs?

How does this PR impact users? (This is the kind of thing that goes in release notes!)

Links to Issues or tickets this PR addresses or fixes

What risks are associated with merging this PR? What is required to fully test this PR?

How was this PR tested?

Have you made an update to documentation? If so, please provide the corresponding PR.

gitguardian bot commented Dec 19, 2023 • edited Loading

⚠️ GitGuardian has uncovered 4 secrets following the scan of your pull request.

chipzoller commented Dec 19, 2023

michaelmdresser commented Dec 19, 2023

chipzoller left a comment

Choose a reason for hiding this comment

michaelmdresser commented Dec 19, 2023

nik-kc commented Dec 20, 2023

chipzoller commented Dec 20, 2023

ameijer left a comment

Choose a reason for hiding this comment

ameijer left a comment

Choose a reason for hiding this comment

chipzoller left a comment • edited Loading

Choose a reason for hiding this comment

michaelmdresser commented Jan 2, 2024

chipzoller commented Jan 2, 2024

michaelmdresser commented Jan 2, 2024

chipzoller commented Jan 3, 2024

michaelmdresser commented Jan 3, 2024

michaelmdresser commented Dec 19, 2023 •

edited

Loading

gitguardian bot commented Dec 19, 2023 •

edited

Loading

chipzoller left a comment •

edited

Loading