Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add task group to metrics #399

Closed
wants to merge 3 commits into from

Conversation

giovanni-guidini
Copy link
Contributor

We have task health metrics (success, failure, ...)
But we also have a lot of tasks. And the graphs in grafana don't look amazing with all those tasks there.
In particular I'd like to have general success / failure rates per task group.
"task_group" is the task_config_group (part of the task name).

This can be used so we know at any given time how's the general health for the group as a whole.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

We have task health metrics (success, failure, ...)
But we also have a lot of tasks. And the graphs in grafana don't look amazing with all those tasks there.
In particular I'd like to have general success / failure rates per task group.
"task_group" is the task_config_group (part of the task name).

This can be used so we know at any given time how's the general health for the group as a whole.
@giovanni-guidini giovanni-guidini requested review from matt-codecov and a team April 19, 2024 15:34
@codecov-notifications
Copy link

codecov-notifications bot commented Apr 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found ☺️

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #399      +/-   ##
==========================================
+ Coverage   97.48%   97.50%   +0.01%     
==========================================
  Files         395      395              
  Lines       33271    33265       -6     
==========================================
- Hits        32435    32434       -1     
+ Misses        836      831       -5     
Flag Coverage Δ
integration 97.50% <100.00%> (+0.01%) ⬆️
latest-uploader-overall 97.50% <100.00%> (+0.01%) ⬆️
unit 97.50% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.89% <100.00%> (+0.03%) ⬆️
OutsideTasks 97.60% <ø> (+0.06%) ⬆️
Files Coverage Δ
tasks/base.py 96.59% <100.00%> (+0.05%) ⬆️
tasks/tests/unit/test_base.py 98.36% <100.00%> (+<0.01%) ⬆️

... and 6 files with indirect coverage changes

Copy link

codecov bot commented Apr 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.50%. Comparing base (735b82a) to head (a8ffcc0).

Changes have been made to critical files, which contain lines commonly executed in production. Learn more

✅ All tests successful. No failed tests found ☺️

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #399   +/-   ##
=======================================
  Coverage   97.50%   97.50%           
=======================================
  Files         426      426           
  Lines       33971    33983   +12     
=======================================
+ Hits        33123    33135   +12     
  Misses        848      848           
Flag Coverage Δ
integration 97.48% <100.00%> (+<0.01%) ⬆️
latest-uploader-overall 97.48% <100.00%> (+<0.01%) ⬆️
unit 97.48% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.86% <100.00%> (+<0.01%) ⬆️
OutsideTasks 97.53% <ø> (ø)
Files Coverage Δ
tasks/base.py Critical 96.68% <100.00%> (+0.15%) ⬆️
tasks/tests/unit/test_base.py 98.39% <100.00%> (+0.01%) ⬆️

This change has been scanned for critical changes. Learn more

Copy link

codecov-public-qa bot commented Apr 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.48%. Comparing base (735b82a) to head (a8ffcc0).

✅ All tests successful. No failed tests found ☺️

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #399   +/-   ##
=======================================
  Coverage   97.48%   97.48%           
=======================================
  Files         395      395           
  Lines       33271    33283   +12     
=======================================
+ Hits        32435    32447   +12     
  Misses        836      836           
Flag Coverage Δ
integration 97.48% <100.00%> (+<0.01%) ⬆️
latest-uploader-overall 97.48% <100.00%> (+<0.01%) ⬆️
unit 97.48% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.86% <100.00%> (+<0.01%) ⬆️
OutsideTasks 97.53% <ø> (ø)
Files Coverage Δ
tasks/base.py 96.68% <100.00%> (+0.15%) ⬆️
tasks/tests/unit/test_base.py 98.38% <100.00%> (+0.01%) ⬆️

@codecov-qa
Copy link

codecov-qa bot commented Apr 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.48%. Comparing base (735b82a) to head (a8ffcc0).

✅ All tests successful. No failed tests found ☺️

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #399   +/-   ##
=======================================
  Coverage   97.48%   97.48%           
=======================================
  Files         395      395           
  Lines       33271    33283   +12     
=======================================
+ Hits        32435    32447   +12     
  Misses        836      836           
Flag Coverage Δ
integration 97.48% <100.00%> (+<0.01%) ⬆️
latest-uploader-overall 97.48% <100.00%> (+<0.01%) ⬆️
unit 97.48% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.86% <100.00%> (+<0.01%) ⬆️
OutsideTasks 97.53% <ø> (ø)
Files Coverage Δ
tasks/base.py 96.68% <100.00%> (+0.15%) ⬆️
tasks/tests/unit/test_base.py 98.38% <100.00%> (+0.01%) ⬆️

tasks/base.py Outdated
cls.name = name
# All task names follow the format `app.[cron|task].<task_group>.<task>`
task_group = name.split(".")[-2] if name != "unknown_task" else "unknown_group"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this give an error if the length of name.split('.') is 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but all tasks are supposed to have a task_group in their name (separated by '.'. Check the celery config

tasks/base.py Outdated
@@ -42,49 +42,58 @@ def metrics_prefix(self):

def on_timeout(self, soft: bool, timeout: int):
res = super().on_timeout(soft, timeout)
task_group = (
self.name.split(".")[-2] if self.name is not None else "unknown_group"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] this could be in a function since you are doing it somewhere else as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also are we positive all task names definitely fit this pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are supposed to fit this pattern. It's the pattern we use for the task routes in celery config

I can add a try/except just in case

Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can sort of do this with label filters in grafana. filter where task matches some regex, or exactly matches something in a set

@giovanni-guidini
Copy link
Contributor Author

can sort of do this with label filters in grafana. filter where task matches some regex, or exactly matches something in a set

Do you mean by that that I can extract a new label from the task name and use this new label in filters OR that I'd have to use 1 expression per task_group and edit all dashboards that do that if a new group is introduced?

If it's option 1 I'm OK with that. If it's option 2 I'd rather make my life simpler with this extra label

thanks @joseph-sentry for the suggestion.

Also avoids index errors by checking that the task name includes at least one '.'
Copy link
Contributor

@joseph-sentry joseph-sentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on last nit comment

@@ -35,16 +36,22 @@
)


def _get_task_group_from_name(task_name: Optional[str]) -> Optional[str]:
if task_name is None or ("." not in task_name):
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] do we want this to return None or "unknown_group"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought "unknown_group" but after checking the existing dashboards for the tasks I noticed that it's possible to have the task_name there are None... so just being consistent with returning None as well

@giovanni-guidini
Copy link
Contributor Author

As mentioned by @matt-codecov it's possible to get what I wanted using the label_replace function in grafana.

So no need to send this extra tag in the metric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants