Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster check #84

Open
dselans opened this issue Feb 5, 2017 · 2 comments
Open

Cluster check #84

dselans opened this issue Feb 5, 2017 · 2 comments

Comments

@dselans
Copy link
Member

dselans commented Feb 5, 2017

Create a 'cluster'/rollup check.

This would allow you to group multiple checks together and expose the 'cluster' check as a single entity. Thresholds should be percentage based.

Would be nice:

Cluster checks that support usage of 'tags'. Ie. When creating the cluster check, you do not have to specify specific checks, but instead just specify one or more tags that other checks use.

example:

monitor:
  exec-cluster-check:
    type: cluster-tags
    description: cluster check for important execs
    interval: 10s
    monitor-tags:
      - very-important
    warning-threshold: 20% # 20 percent of the checks are failing
    critical-threshold: 50% # 50 percent of the checks are failings
    warning-alerter:
      - primary-slack
    critical-alerter:
      - primary-email
    tags:
      - our-cluster-checks

  exec-check1:
    type: exec
    description: exec check test
    timeout: 5s
    command: echo
    args:
      - hello
      - world
    interval: 10s
    return-code: 0
    expect: hello
    warning-threshold: 1
    critical-threshold: 3
    tags:
      - super-exec-checks
      - very-important

  exec-check2:
    type: exec
    description: exec check test
    timeout: 5s
    command: echo
    args:
      - hello
      - world
    interval: 10s
    return-code: 0
    expect: world
    warning-threshold: 1
    critical-threshold: 3
    warning-alerter:
      - primary-slack
    critical-alerter:
      - primary-email
    tags:
      - super-exec-checks
      - very-important

In the above example:

We create a 'exec-cluster-check' that will monitor the state of 2 checks that were specified through the usage of the very-important tag. If 20% of the underlying checks fail, it will produce a warning alert, if 50% of the underlying checks fail, it will produce a critical alert.

@relistan
Copy link
Collaborator

relistan commented Mar 4, 2017

Do you anticipate this check running those checks a second time, or re-using the existing check state from the last run?

@dselans
Copy link
Member Author

dselans commented Apr 30, 2017

I think this should reuse check state data, not sure how tricky that could be though (having partial state only etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants