What is an incident?
Before jumping into categorization, you’ll need to understand what constitutes an incident. For an engineering organization, an incident is most often noted as an unexpected interruption or degradation of service, but it’s important to note that there are “positive’ incidents as well. Events or routine maintenances with no impact or a successful deploy can also be an incident. The types of events that may be defined as an incident vary from business to business. It could be an unexpected system change, failure, or outage.
How to categorize the impact of your incidents
To better manage your incidents, you’ll first have to identify what constitutes an incident and how you may want these incidents to be handled at your organization. The most common way of categorizing the impact of your incidents is first by major or minor incident and then on a scale by severity level, or by how severe an incident could be based on some guidelines.
Most organizations have severities ranging from 1 to 4, and even 5 or more. The number of levels you have will depend on the complexity of your technology, organization, and team size.
Severity Level 1
A Severity 1 or SEV-1 incident is the most extreme of circumstances where everything is on fire, and all hands must be on deck to fix the issue. This is undoubtedly a major incident, and one of highest priority.
A SEV-1 will require a dedicated process that will involve multiple team members and a dedicated internal and external communication strategy. Your stakeholders will be assigned roles and have a centralized place to communicate, such as a dedicated Slack Channel created for the incident or a Zoom bridge. Additionally, it will be necessary to communicate with other internal stakeholders, such as senior leadership or customer-facing team members, and with customers through a status page.
An example of a SEV-1 may be that your system is down leaving your customers without access. One such very public instance of what could be considered a SEV-1 is Slack’s outage that occurred at the beginning of 2021 (read their retrospective to learn about what happened!).
Severity Level 2
A Severity 2 or SEV-2 is an incident that affects your customers, but may not be critical in terms of the number of impacted customers or the type of service impacted. This is still considered a major incident. During a SEV-2, you may need to address customer questions and let them know you are working to fix the issue (usually through a status page). If you have a dedicated team member for customer support they will handle the communication here.
Severity Level 3, 4, 5 and beyond
A Severity 3 or SEV-3 is a minor incident. Typically these incidents don’t affect a majority of customers, but have the potential to become a major incident if it isn’t addressed. Levels 4, 5, and beyond can be categorized as you need, but are generally smaller bugs that barely impact essential services.
How to effectively use your severity levels
Now that you know what constitutes an incident and can categorize them, turn that work into action with a severity matrix that can help you easily identify what severity level to set your incident at based on the characteristics and impact of your incidents.