-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An issue that was a warning that goes to critical doesn't send the resolution state #30
Comments
Hmm I am seeing the same and need to spend some time looking into this. I wonder if maybe we can work around using event suppression rules? |
Taking a look these are the only valid actions: https://sensuapp.org/docs/0.25/reference/events.html#how-are-sensu-events-created
I honestly have not played much with flapping in sensu but this might help with some of the situations but not all. By default a handler has handle_flapping to true. https://sensuapp.org/docs/0.24/reference/handlers.html#handler-configuration |
I checked on the pagerduty side and event suppression can only be done on initial ingest. |
@eheydrick what are your thoughts? This is not really pagerduty specific nor can I think of a pagerduty specific work around. |
I checked with pageryduty and event spurpession will not serve as a work around. Unless we want to resolve on state change (I am torn on this). I think we might want to pose a generic question to the sensu community and see what people think. |
Hi, The workaround we had to go for was to include the resolved status in warning alerts. Far from ideal, but it does the job! Dave. |
@DaveWitchalls did you apply this via a filter or mutator? I have not had any time to really look into this much and would like to try out your workaround and see if it works for us. |
Hi @majormoses I asked the guy who did the work for me and got the below, it's reasonably long winded as it doesn't make much sense out of context, hope it helps! NOTES ABOUT OUR USE OF SENSU
Therefore if an check state changes from CRITICAL to WARNING we use a mutator to generate a fake RESOLVE message to be sent to PagerDuty to clear the alert. For checks which only require email alerts use the following handlers list: For checks which require both email and PagerDuty alerts (critical only) use the following handlers list: FILTER CONFIGURATION/etc/sensu/conf.d/filters/alert_filters.jsonNOTE the use of a custom key/value pair - remind_every - which is setup on the check. If not set it will default to emailing reminders every 20 occurrences.{
"filters": {
"mail_alert_filter": {
"negate": false,
"attributes": {
"action": "create",
"occurrences": "eval: value == :::check.occurrences|5::: || value % :::check.remind_every|20::: == 0"
}
},
"pagerduty_alert_filter": {
"negate": false,
"attributes": {
"check": {
"status": 2
},
"action": "create",
"occurrences": "eval: value == :::check.occurrences|5:::"
}
}
}
} /etc/sensu/conf.d/filters/recovery_filters.json{
"filters": {
"recovery_filter": {
"negate": false,
"attributes": {
"action": "resolve",
"occurrences": "eval: value >= :::check.occurrences|5:::"
}
},
"resolve_on_warning_filter": {
"negate": false,
"attributes": {
"check": {
"status": 1
},
"action": "create",
"occurrences": "eval: value == 1"
}
}
}
} HANDLER CONFIGURATION/etc/sensu/conf.d/handlers/mail_handlers.json{
"handlers": {
"mail_alert_handler": {
"type": "pipe",
"command": "handler-mailer.rb -s 'SENSU TEST'",
"filter": "mail_alert_filter"
},
"mail_recovery_handler": {
"type": "pipe",
"command": "handler-mailer.rb -s 'SENSU TEST'",
"filter": "recovery_filter"
},
"mail_resolve_on_warning_handler": {
"type": "pipe",
"command": "handler-mailer.rb -s 'SENSU TEST'",
"filter": "resolve_on_warning_filter",
"mutator": "mail_resolve_on_warning_mutator"
},
"mail_handler": {
"type": "pipe",
"command": "handler-mailer.rb -s 'SENSU TEST'"
}
},
"mailer": {
"admin_gui": "https://xxx.xxx.xxx.xxx/",
"mail_from": "[email protected]",
"mail_to": ["[email protected]"],
"smtp_address": "127.0.0.1",
"smtp_port": "25",
"smtp_domain": "xxxxxxxxx.xxxxxxxxx"
}
} /etc/sensu/conf.d/handlers/pagerduty_handlers.json{
"handlers": {
"pagerduty_alert_handler": {
"type": "pipe",
"command": "handler-pagerduty.rb",
"filter": "pagerduty_alert_filter"
},
"pagerduty_recovery_handler": {
"type": "pipe",
"command": "handler-pagerduty.rb",
"filter": "recovery_filter"
},
"pagerduty_resolve_on_warning_handler": {
"type": "pipe",
"command": "handler-pagerduty.rb",
"filter": "resolve_on_warning_filter",
"mutator": "pagerduty_resolve_on_warning_mutator"
},
"pagerduty_handler": {
"type": "pipe",
"command": "handler-pagerduty.rb"
}
},
"pagerduty": {
"api_key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
} MUTATOR CONFIGURATION/etc/sensu/conf.d/mutators/mail_mutators.json{
"mutators": {
"mail_resolve_on_warning_mutator": {
"command": "/etc/sensu/mutators/mutator-mail-resolve-on-warning.rb"
}
}
} /etc/sensu/conf.d/mutators/pagerduty_mutators.json{
"mutators": {
"pagerduty_priority_override_mutator": {
"command": "mutator-pagerduty-priority-override.rb"
},
"pagerduty_resolve_on_warning_mutator": {
"command": "/etc/sensu/mutators/mutator-pagerduty-resolve-on-warning.rb"
}
}
} MUTATORS/etc/sensu/mutators/mutator-mail-resolve-on-warning.rb#!/usr/bin/env ruby
require 'json'
module Sensu
module Mutator
class Mail
class ResolveOnWarning
def execute(input = STDIN)
event = JSON.parse(input.read, symbolize_names: true)
occurrences = event[:check][:occurrences]
history = event[:check][:history]
### EDGE CASE CHECKS ###
# Check if number of occurrences is > 20 (max history length) - if so change it to 20.
occurrences = 20 if occurrences > 20
# Exit if length of history < occurrences + 1 as it can't have been in an alert state beforehand.
exit 1 if history.length < (occurrences+1)
########################
test_array = Array.new(occurrences, "2")
if history[0-(occurrences+1), occurrences] == test_array
event[:action] = 'resolve'
event[:check][:occurrences] = 1
JSON.dump(event)
else
exit 1
end
end
end
end
end
end
## Is called from Gem script. Program name is full path to this script
### __FILE__ is the initial script ran, which is
### /etc/sensu/mutators/mutator-mail-resolve-on-warning.rb
if $PROGRAM_NAME.include?(__FILE__.split('/').last)
mutator = Sensu::Mutator::Mail::ResolveOnWarning.new
puts mutator.execute
end /etc/sensu/mutators/mutator-pagerduty-resolve-on-warning.rb#!/usr/bin/env ruby
require 'json'
module Sensu
module Mutator
class PagerDuty
class ResolveOnWarning
def execute(input = STDIN)
event = JSON.parse(input.read, symbolize_names: true)
occurrences = event[:check][:occurrences]
history = event[:check][:history]
### EDGE CASE CHECKS ###
# Check if number of occurrences is > 20 (max history length) - if so change it to 20.
occurrences = 20 if occurrences > 20
# Exit if length of history < occurrences + 1 as it can't have been in an alert state beforehand.
exit 1 if history.length < (occurrences+1)
########################
test_array = Array.new(occurrences, "2")
if history[0-(occurrences+1), occurrences] == test_array
event[:action] = 'resolve'
event[:check][:occurrences] = 1
JSON.dump(event)
else
exit 1
end
end
end
end
end
end
## Is called from Gem script. Program name is full path to this script
### __FILE__ is the initial script ran, which is
### /etc/sensu/mutators/mutator-pagerduty-resolve-on-warning.rb
if $PROGRAM_NAME.include?(__FILE__.split('/').last)
mutator = Sensu::Mutator::PagerDuty::ResolveOnWarning.new
puts mutator.execute
end |
@DaveWitchalls thanks for the info I will take a look and see if I can find some bastard amalgamation based on yours that works for us. |
interesting, though considering this I am not sure if its worth the effort for me right now to use a filter now that occurances is an extenstion: https://github.com/sensu-extensions/sensu-extensions-occurrences |
Sensu events now have a "occurrences_watermark", the Sensu built-in "occurrence" filter now uses it instead of "occurrences" for the purpose of the resolve action. These changes are in the Sensu Core 0.29 release. |
ok, cool we can have a reasonable path forward when 0.29 is supported by this plugin... |
@majormoses Any thoughts on whether/when that might happen? (for the record, Sensu skipped from 0.29 to 1.0.0 and subsequently 1.0.2 in July) |
When someone is motivated enough and has the time to work on it. There are ~200 plugins and realistically 2 active maintainers (neither of us working for sensu) and we rely mostly on other community members to contribute. Now beyond my boilerplate 🤷♂️ answer of when...I don't think it would be too hard taking a quick look at the plugin. |
Does this plugin not work with Sensu >= 0.29? Or is there just work required to support changes related specifically to this issue? |
This plugin does work with all recent versions of sensu, I am currently running sensu 1.1.1 and do not have issues. The comment referenced is regarding a work around for moving from warning -> critical -> warning. The idea was to auto resolve the incident and create a new one based on |
Hello from the PagerDuty Team! Following up to confirm this is a limitation in the PagerDuty API rather than in the Sensu integration itself. At the moment, incidents are immutable so the parent incident can't be updated when the severity changes. You can click into the newest alert itself to get the latest data. I've submitted a feature request from the maintainer of this integration so our product team knows mutable incidents are important to our customers. If you have any questions, please feel free to reach out to [email protected]. |
Thanks for confirming this. |
I recently did some testing with this issue and it looks like PagerDuty has updated their API to allow for escalating an incident from a warning/low urgency to critical/high if the For testing we triggered an alert start as a warning starting a low priority incident in PagerDuty. I then changed the alert to the critical threshold which escalated the incident to high urgency in PagerDuty. One thing to note is that it will not de-escalate back to a low urgency incident. |
I think this is semi by design, but any views appreciated.
We've come across this during testing. Using CPU use as an example;
If we set a warning threshold of 50%, an email gets sent. If it then goes to Critical at 70% the alert in PagerDuty is triggered. The problem for us is, if the CPU goes back to below the critical threshold, but still within the warning threshold, the resolution isn't sent PagerDuty as the incident is still live, even though not in a critical state.
Any thoughts or workarounds for this?
Thanks,
Dave.
The text was updated successfully, but these errors were encountered: