-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pagerduty alert text is not updated by handler #14
Comments
The |
Are you setting the incident_key field? https://developer.pagerduty.com/documentation/integration/events/trigger -> if incident_key is set, this event will be appended to that incident's log, but the text might remain the same |
@analytically No I'm not setting the incident_key field. Example configuration: The PageryDuty plugin was installed using This example uses the /etc/sensu/plugins/check-example: #!/bin/bash
echo $(date)
exit 1 /etc/sensu/conf.d/check-example.json: {
"checks": {
"disk": {
"command": "/etc/sensu/plugins/check-example",
"handlers": [
"pagerduty"
],
"interval": 10,
"subscribers": [
"all"
]
}
}
} /etc/sensu/conf.d/handler-pagerduty.json: {
"handlers": {
"pagerduty": {
"command": "/opt/sensu/embedded/bin/handler-pagerduty.rb",
"type": "pipe"
}
},
"pagerduty": {
"api_key": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}
}
|
I think what I'm asking for is that the description be updated every time the alert is triggered: |
What I just wrote was wrong. But for this it doesn't make sense because then Sensu will have to keep track of outputs instead of exit code. |
Also, what you link makes it look like the description of the Alert is immutable |
I wonder if now that there is the concept of an alert and an incident we create an alert and attach it to an incident and then close and open new incidents with the same alert to keep track of history? BTW I still don't like this I just can't think of a way to solve it nicely given the constraints of sensu and pagerduty. |
I would be still interessted in a long term solution. We have the exact same problem. The problem is that only the last issue get resolved. Examples: |
I would argue this isn't an issue with the handler but with how sensu handles state transitions. Additionally, its a hard problem to solve since Sensu only keeps a finite history you can always think of a case where one will be unsolved. (ok -> w -> C (10000 times) -> ok) |
I will ping some internal sensu folks and get their take on it but I agree there is really not much that I see we can do given how sensu handles state transition and also how the pagerduty api works. |
Also I had a conversation with pagerduty about this a while ago and so we can't change the actual severity of the event once it has been sent to pagerduty. |
The only thing I can think of is the hack outlined here: #30 (comment) |
@eliten00b to properly manage a PagerDuty incident for each Sensu check severity transition, this handler plugin could use an incident key that includes severity. This handler plugin could then check for the existence of a PagerDity incident for the previous severity (Sensu event check history -1), and resolve the incident if it exists. |
@zbintliff I don't see "how sensu handles state transitions" has something to do with this issue, can you elaborate and provide an example? |
@majormoses I see this as a PagerDuty API limitation, being able to update the incident description would be rad, and make things considerably simpler. |
@portertech Sensu itself doesn't issue a RESOLVE event on state transition (2 -> 1), only when it hits 0. Having a handler take that logic in itself is a bit out of scope for its responsibility. If the handler takes responsibility of it then we need to know the check occurrences required to alert and check that many back in history. Then we also need to check if it happened enough to acctually alert. For example, if I have occurrences set to 5 and my history is 0,1,2,2,2,2,2 We will trigger a critical. We go back 5 and see a 1 but it didn't happen enough to actually Alert with a warning. |
btw I agree updating description on an alert would be nice. |
@zbintliff I would leverage PagerDuty's API and its use of composed incident keys and the existence of a matching open incident, instead of having to consider Sensu event filtering methods (which could also be customized and added to by a user). |
Thanks for all the response. I fully agree that everything would be a workaround because of pagerduty api limitation. I will try to solve the problem with a filter. |
What would you say the right move is here? I think documenting this as a known limitation to PagerDuty's API with a workaround of using filtering would be handy. Could be in a callout in the main README.md. Thoughts? |
Definitely we should document this behavior as the text will not update and there is no workaround. There is overlap with this in #30 which there are workarounds for even in they are hacky. Going forward on #30 I would guess that we filter events in the handler (I know we are supposed to be trending away from that but I see no way around this) and use |
Hello from the PagerDuty Team! Following up to confirm this is a limitation in the PagerDuty API rather than in the Sensu integration itself. At the moment, incidents are immutable. I've submitted a feature request from the maintainer of this integration so our product team knows mutable incidents are important to our customers. If you have any questions, please feel free to reach out to [email protected]. |
Thanks for confirming this. |
The alert text in Pagerduty does not change even when the output from the check has changed.
This means alerts can get suddenly worse, without me being aware of it.
Example:
Suppose I have a disk usage check which runs every ten minutes.
The disk is filling up so the check begins alerting (as seen in Uchiwa):
CheckDisk WARNING: / 71%
I get a Pagerduty alert with text:
CheckDisk WARNING: / 71%
Ten minutes later, the disk is filling up really fast so the alert changes (again as seen in Uchiwa):
CheckDisk CRITICAL: / 95%
But the alert in Pagerduty still says:
CheckDisk WARNING: / 71%
The text was updated successfully, but these errors were encountered: