Skip to content

Commit

Permalink
Decouple severity from Nagios plugin return codes
Browse files Browse the repository at this point in the history
This change is backwards-compatible.

Nagios plugins for non-trivial services will often poke multiple data
points looking for anomalous or undesirable behaviour.  Each data point
is often reflected with its own check Result in nagiosplugin.  If all
data points are within tolerance, we can rely on nagiosplugin to return
an OK result to its parent process.  Similarly, if one or more data
points are outside tolerance, we can rely on nagiosplugin to return a
non-OK (WARNING or CRITICAL) result to its parent process.  This pattern
greatly simplifies plugin development:  plugins may focus on the
monotonous task of data-gathering, leaving the ultimate 'result join' to
the nagiosplugin library.

Life get complicated when one or more data points are unable to be
queried (for whatever reason).  Prior to this change, any single UNKNOWN
Result would have superseded all other batched check Results:  including
other WARNING or CRITICAL Results.  Amongst organisations that do not
treat an individual UNKNOWN result as a pageable event, this behaviour
suppressed actual failures (see #8).

This commit introduces the notion of a 'status policy':  a mapping
between a conventional check status and its severity, relative to other
statuses.  Results are now ordered by severity instead of the fixed
numeric constants defined by Nagios.  Organisations may now prioritise
plugin return codes to match their established monitoring policy.  Unit
tests in check_test.go demonstrate use.

This decoupling is invisible by default.  The default status policy
mimics old behaviour.  To enable alternative severity prioritisation,
the caller must invoke the new NewCheckWithOptions() initialiser.
  • Loading branch information
Saj Goonatilleke committed Jan 4, 2016
1 parent 72f8b19 commit 07fe3c5
Show file tree
Hide file tree
Showing 4 changed files with 135 additions and 9 deletions.
41 changes: 37 additions & 4 deletions check.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,46 @@ func Exit(status Status, message string) {

// Represents the state of a Nagios check.
type Check struct {
results []Result
perfdata []PerfDatum
status Status
results []Result
perfdata []PerfDatum
status Status
statusPolicy *statusPolicy
}

// CheckOptions contains knobs that modify default Check behaviour. See
// NewCheckWithOptions().
type CheckOptions struct {

// StatusPolicy defines the relative severity of different check
// results by status value.
//
// A Nagios plugin must ultimately report a single status to its
// parent process (OK, CRITICAL, etc.). nagiosplugin allows plugin
// developers to batch multiple check results in a single plugin
// invocation. The most severe result will be reflected in the
// plugin's final exit status. By default, results are prioritised
// by the numeric 'plugin return codes' defined by the Nagios Plugin
// Development Guidelines. Results with CRITICAL status will take
// precedence over WARNING, WARNING over OK, and UNKNOWN over all
// other results. This ordering may be tailored with a custom
// policy. See NewStatusPolicy().
StatusPolicy *statusPolicy
}

// NewCheck returns an empty Check object.
func NewCheck() *Check {
c := new(Check)
c.statusPolicy = NewDefaultStatusPolicy()
return c
}

// NewCheckWithOptions returns an empty Check object with
// caller-specified behavioural modifications. See CheckOptions.
func NewCheckWithOptions(options CheckOptions) *Check {
c := NewCheck()
if options.StatusPolicy != nil {
c.statusPolicy = options.StatusPolicy
}
return c
}

Expand All @@ -38,7 +70,8 @@ func (c *Check) AddResult(status Status, message string) {
result.status = status
result.message = message
c.results = append(c.results, result)
if result.status > c.status {

if (*c.statusPolicy)[result.status] > (*c.statusPolicy)[c.status] {
c.status = result.status
}
}
Expand Down
28 changes: 28 additions & 0 deletions check_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package nagiosplugin
import (
"fmt"
"math/rand"
"strings"
"testing"
"time"
)
Expand All @@ -22,3 +23,30 @@ func TestCheck(t *testing.T) {
t.Errorf("Expected check output %v, got check output %v", expected, result)
}
}

func TestDefaultStatusPolicy(t *testing.T) {
c := NewCheck()
c.AddResult(WARNING, "Isolated-frame flux emission outside threshold")
c.AddResult(UNKNOWN, "No response from betaform amplifier")

expected := "UNKNOWN"
actual := strings.SplitN(c.String(), ":", 2)[0]
if actual != expected {
t.Errorf("Expected %v status, got %v", expected, actual)
}
}

func TestCustomStatusPolicy(t *testing.T) {
p, _ := NewStatusPolicy([]Status{OK, UNKNOWN, WARNING, CRITICAL})
c := NewCheckWithOptions(CheckOptions{
StatusPolicy: p,
})
c.AddResult(WARNING, "Isolated-frame flux emission outside threshold")
c.AddResult(UNKNOWN, "No response from betaform amplifier")

expected := "WARNING"
actual := strings.SplitN(c.String(), ":", 2)[0]
if actual != expected {
t.Errorf("Expected %v status, got %v", expected, actual)
}
}
55 changes: 50 additions & 5 deletions result.go
Original file line number Diff line number Diff line change
@@ -1,16 +1,61 @@
package nagiosplugin

import "fmt"

// Nagios plugin exit status.
type Status uint

// The usual mapping from 0-3.
// https://nagios-plugins.org/doc/guidelines.html#AEN78
const (
OK Status = iota
WARNING
CRITICAL
UNKNOWN
)

type (
// Check results are ordered by severity: only the most severe check
// result will be captured in the plugin's exit status. A status
// policy is used to define severity as a function of check status.
// Higher relative statusSeverity values assign higher severity to a
// status. (Absolute values are insignificant.)
statusSeverity uint
statusPolicy map[Status]statusSeverity
)

// NewDefaultStatusPolicy returns a status policy that assigns relative
// severity in accordance with conventional Nagios plugin return codes.
// Statuses associated with higher return codes are more severe.
func NewDefaultStatusPolicy() *statusPolicy {
return &statusPolicy{
OK: statusSeverity(OK),
WARNING: statusSeverity(WARNING),
CRITICAL: statusSeverity(CRITICAL),
UNKNOWN: statusSeverity(UNKNOWN),
}
}

// NewStatusPolicy returns a status policy that assigns relative
// severity in accordance with a user-configurable prioritised slice.
// Check statuses must be listed in ascending severity order.
func NewStatusPolicy(statuses []Status) (*statusPolicy, error) {
newPol := make(statusPolicy)
for i, status := range statuses {
newPol[status] = statusSeverity(i)
}

// Ensure all statuses are covered by the new policy.
defaultPol := NewDefaultStatusPolicy()
for status, _ := range *defaultPol {
_, ok := newPol[status]
if !ok {
return nil, fmt.Errorf("missing status: %v", status)
}
}

return &newPol, nil
}

// Returns string representation of a Status. Panics if given an invalid
// status (this will be recovered in check.Finish if it has been deferred).
func (s Status) String() string {
Expand All @@ -27,10 +72,10 @@ func (s Status) String() string {
panic("Invalid nagiosplugin.Status.")
}

// Result is a combination of a Status and infotext. A check can have
// multiple of these, and only the most important (greatest badness)
// will be reported on the first line of output or represented in the
// plugin's exit status.
// Result encapsulates a machine-readable result code and a
// human-readable description of a problem. A check may have multiple
// Results. Only the most severe Result will be reported on the first
// line of plugin output and in the plugin's exit status.
type Result struct {
status Status
message string
Expand Down
20 changes: 20 additions & 0 deletions result_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
package nagiosplugin

import (
"testing"
)

func TestNewStatusPolicyAcceptsCompleteStatuses(t *testing.T) {
_, err := NewStatusPolicy([]Status{OK, UNKNOWN, WARNING, CRITICAL})
if err != nil {
t.Errorf("NewStatusPolicy(): %v", err)
}
}

func TestNewStatusPolicyRejectsIncompleteStatuses(t *testing.T) {
// Missing UNKNOWN.
_, err := NewStatusPolicy([]Status{OK, WARNING, CRITICAL})
if err == nil {
t.Errorf("expected NewStatusPolicy() to return an error")
}
}

0 comments on commit 07fe3c5

Please sign in to comment.