This checklist provides incident response procedures. For more detail, see the Security Incident Response Plan.
- Contents
- 1. Breathe
- 2. Start documenting
- 3. Initiate the response
- 4. Assess the incident
- 5. Remediate
- Conclude the incident
No one's life is in danger.
Begin documenting all steps and findings. Documentation makes hand-offs and responder onboarding easier. The Slack channel #None is recommended because it is most widely accessible, but other communication channels may be used.
At this stage, the First Responder is usually working alone, and is also the Incident Commander (IC).
-
Issue a broadcast notification via one or more of the following:
- Slack channel #None. Use
@channel
to notify the Project team. This may have been automatic via OpsGenie pager alarms. - Email to "on call" system admin: [email protected]
- Email/telephone to CivicActions/Project IR Team
- Slack channel #None. Use
-
For an incident requiring more than 30 minutes to resolve:
- Recruit additional IR Team responders via the Slack channel #None.
- Designate an Incident Commander and hand off the IC duties. More information on incident response roles and responsibilities:
- Responder
- Incident Commander (IC)
- Communications Officer (CO)
Use the Explicit Handoff Ceremony when transferring/changing roles.
Conclude the incident. Proceed to 6. Conclude the incident.
- Gather information, and document your findings.
- Was the event triggered by an external dependency?
- Is a system failure causing the disruption?
- Proceed to the next step for a confirmed incident. (For a false alarm, conclude the incident. Proceed to 6. Conclude the incident.)
Use the rubric in the IR guide. (Project incidents are generally "Low severity".)
Consider whether Disaster Recovery is required. If so, activate the contingency plan.
Reminder: Use the Explicit Handoff Ceremony when transferring/changing roles.
- Post an initial situation report, called a sitrep (example sitrep), to the Slack channel #None. Include a descriptive name, and identify the current Incident Commander and Responders.
- Ensure that a JIRA ticket has been created. This should be done, even if the First Responder/IC manages the incident fully, for example, by simply re-starting a service.
-
Determine the cause, implement a resolution, and return the system to normal operations. Make every attempt to identify the cause; this can prevent incident recurrence.
-
If suspicious activity is suspected or other unanswered questions exist, do the following before making any changes:
- Make backup snapshots of relevant volumes and data.
- Preserve logs.
- Take screen captures of anomalous activity that can be used in post-remediation forensic analysis.
- Consider implementing a containment strategy. For example, reconfigure firewall rules for the affected instance to drop all ingress and egress traffic, except from specific IPs like yours, until forensics can be performed.
- Maintain current information in Slack, shared Google Docs files, the JIRA Incident ticket, or other
communication channels. Be sure to include:
- Project team leads and members
- Remediation items and their assignees
- Establish and document work shifts for an incident longer than 3 hours.
- Maintain communications with stakeholders, or designate a Communications Officer via explicit handoff.
- Share sitreps on a regular basis:
- High severity: hourly
- Medium severity: 2x daily
- Low severity: daily
- Focus on coordination, not remediation.
Update the JIRA ticket and set the status to one of the following:
- Confirmed incident: Ready for QA
- False alarm: Done
Notify the Slack channel #None that the incident has been resolved.
Schedule an IR Team retrospective. Optional for false alarms.
Share the final sitrep with stakeholders.
Thank everyone for their service.