Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document monitoring, metrics, tracing, observability and alerting #4797

Open
1 task done
AgaDufrat opened this issue Sep 12, 2024 · 3 comments · Fixed by #4863
Open
1 task done

Document monitoring, metrics, tracing, observability and alerting #4797

AgaDufrat opened this issue Sep 12, 2024 · 3 comments · Fixed by #4863

Comments

@AgaDufrat
Copy link
Contributor

AgaDufrat commented Sep 12, 2024

As an engineer on GOV.UK,
I want to know how to configure monitoring, metrics, tracing, observability and alerting for my applications,
so that we can enable proactive detection and resolution of issues, ensure optimal performance and enhance reliability by providing real time insight into applications health and behaviour, as well as to inform product decisions.

Current documentation

Logging

How logging works on GOV.UK
Request tracing

Monitoring

Debug underperforming search - I've asked Search team to review and probably remove this
How we handle errors
Pingdom
Sentry

Alerting

Pingdom Bouncer canary check
Router error ratio too high
Travel Advice or Drug and Medical Device email alerts not sent
Signon API user token expires soon
PagerDuty
Things that may contact on-call - I suggest the specifics get taken out of here and instead link to the relevant pages

[WIP] Missing documentation

[WIP] Documentation that could do with a refresh

@nicholsj
Copy link
Contributor

For info, Tech 2nd Line tech leads have a card to document what alerts to Pagerduty

@nicholsj
Copy link
Contributor

I made a stab at a diagram of how I think things fit together and where things are documented or not

@nicholsj
Copy link
Contributor

Closed by accident I think - still more to do here

@nicholsj nicholsj reopened this Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants