-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add task group to metrics #399
Conversation
We have task health metrics (success, failure, ...) But we also have a lot of tasks. And the graphs in grafana don't look amazing with all those tasks there. In particular I'd like to have general success / failure rates per task group. "task_group" is the task_config_group (part of the task name). This can be used so we know at any given time how's the general health for the group as a whole.
Codecov ReportAll modified and coverable lines are covered by tests ✅ ✅ All tests successful. No failed tests found @@ Coverage Diff @@
## main #399 +/- ##
==========================================
+ Coverage 97.48% 97.50% +0.01%
==========================================
Files 395 395
Lines 33271 33265 -6
==========================================
- Hits 32435 32434 -1
+ Misses 836 831 -5
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Changes have been made to critical files, which contain lines commonly executed in production. Learn more ✅ All tests successful. No failed tests found Additional details and impacted files@@ Coverage Diff @@
## main #399 +/- ##
=======================================
Coverage 97.50% 97.50%
=======================================
Files 426 426
Lines 33971 33983 +12
=======================================
+ Hits 33123 33135 +12
Misses 848 848
Flags with carried forward coverage won't be shown. Click here to find out more.
This change has been scanned for critical changes. Learn more |
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found @@ Coverage Diff @@
## main #399 +/- ##
=======================================
Coverage 97.48% 97.48%
=======================================
Files 395 395
Lines 33271 33283 +12
=======================================
+ Hits 32435 32447 +12
Misses 836 836
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found Additional details and impacted files@@ Coverage Diff @@
## main #399 +/- ##
=======================================
Coverage 97.48% 97.48%
=======================================
Files 395 395
Lines 33271 33283 +12
=======================================
+ Hits 32435 32447 +12
Misses 836 836
Flags with carried forward coverage won't be shown. Click here to find out more.
|
tasks/base.py
Outdated
cls.name = name | ||
# All task names follow the format `app.[cron|task].<task_group>.<task>` | ||
task_group = name.split(".")[-2] if name != "unknown_task" else "unknown_group" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this give an error if the length of name.split('.')
is 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but all tasks are supposed to have a task_group
in their name (separated by '.'
. Check the celery config
tasks/base.py
Outdated
@@ -42,49 +42,58 @@ def metrics_prefix(self): | |||
|
|||
def on_timeout(self, soft: bool, timeout: int): | |||
res = super().on_timeout(soft, timeout) | |||
task_group = ( | |||
self.name.split(".")[-2] if self.name is not None else "unknown_group" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] this could be in a function since you are doing it somewhere else as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also are we positive all task names definitely fit this pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are supposed to fit this pattern. It's the pattern we use for the task routes in celery config
I can add a try/except just in case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can sort of do this with label filters in grafana. filter where task matches some regex, or exactly matches something in a set
Do you mean by that that I can extract a new label from the task name and use this new label in filters OR that I'd have to use 1 expression per task_group and edit all dashboards that do that if a new group is introduced? If it's option 1 I'm OK with that. If it's option 2 I'd rather make my life simpler with this extra label |
thanks @joseph-sentry for the suggestion. Also avoids index errors by checking that the task name includes at least one '.'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on last nit comment
@@ -35,16 +36,22 @@ | |||
) | |||
|
|||
|
|||
def _get_task_group_from_name(task_name: Optional[str]) -> Optional[str]: | |||
if task_name is None or ("." not in task_name): | |||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] do we want this to return None
or "unknown_group"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I thought "unknown_group" but after checking the existing dashboards for the tasks I noticed that it's possible to have the task_name there are None
... so just being consistent with returning None
as well
As mentioned by @matt-codecov it's possible to get what I wanted using the label_replace function in grafana. So no need to send this extra tag in the metric |
We have task health metrics (success, failure, ...)
But we also have a lot of tasks. And the graphs in grafana don't look amazing with all those tasks there.
In particular I'd like to have general success / failure rates per task group.
"task_group" is the task_config_group (part of the task name).
This can be used so we know at any given time how's the general health for the group as a whole.
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.