Incident Communication Plan: 6 Best Practices

Here's a quick guide to effective incident communication:

Plan Ahead
Set Up Clear Communication Channels
Use a Tiered Alert System
Provide Timely and Transparent Updates
Designate Spokespersons and Roles
Conduct Post-Incident Reviews

Key points:

Create a detailed plan before incidents occur
Use multiple communication channels (status page, email, social media)
Categorize issues by severity and respond accordingly
Update stakeholders frequently and honestly
Assign specific roles for communication during incidents
Learn from each incident to improve future responses

Practice	Purpose	Example
Plan Ahead	Prepare for quick response	Pre-written templates
Clear Channels	Reach all stakeholders	Status page, email, social media
Tiered Alerts	Prioritize response	P1 (critical) to P4 (minor)
Timely Updates	Keep everyone informed	Updates every 30 minutes for major issues
Designated Speakers	Ensure consistent messaging	Tech lead for IT teams, PR for media
Post-Incident Review	Improve future responses	Team debrief, customer feedback

This guide helps maintain trust, minimize disruption, and improve incident management.

Basics of Incident Communication

Main Parts of Incident Communication

Incident communication has three key phases:

First contact: Initial alert about the incident
Regular updates: Ongoing communication during the incident
Resolution and post-mortem: Final update and analysis after the incident

To communicate well during incidents:

Pick a spokesperson or team to handle messages
Use many channels to reach everyone affected
Adjust messages for different groups (teams, customers, public)
Keep information clear and the same across all platforms

Common Problems When Communicating During Issues

Here are some challenges in incident communication and how to fix them:

Problem	Impact	Fix
Slow response	Users get upset	Set up a quick alert system
Mixed messages	People get confused	Use pre-made templates and one main info source
Not being open	Hurts company image	Tell people what happened, why, and how you're fixing it
Not enough updates	People worry and guess	Set a schedule for regular updates
Too much tech talk	Non-tech people don't understand	Write messages for each group

To avoid these problems:

Make a full incident plan before issues happen
Give clear jobs to communication teams
Write message templates for different problems
Use tools like Statuspage for easy updates
Look at what happened after each incident to do better next time

Real-World Example: Facebook's 2010 Outage

In 2010, Facebook had a big outage that lasted about 2.5 hours. Here's how they handled it:

What happened: A problem with their database caused the site to go down
How they communicated: After fixing the issue, a Facebook engineer wrote a detailed post
What the post included:
- An apology to users
- An explanation of what went wrong
- Steps taken to prevent future incidents

This approach helped Facebook:

Show they cared about users
Explain the problem in simple terms
Build trust by being open about the issue

Tips for Better Incident Communication

Define incidents clearly: Use a system like the 4-tier severity scale many web companies use. This helps teams know how serious a problem is.
Prepare ahead of time: Have your communication tools, channels, and message templates ready before incidents happen.
Tell the right people: Start with your core team, then spread the word to other staff and customers as needed.
Be extra careful with security issues: If there's a security problem or data loss, tell everyone right away.
Make your first update count: When you first tell people about an issue:
- Say you know there's a problem
- Tell them what's not working
- Promise to give more updates soon
- Let them know if their data is safe

1. Plan Ahead

Create a Clear Response Plan

To handle problems well, make a plan before they happen. Here's what to do:

1. Define what counts as an incident for your company

2. Set up a system to rate how bad incidents are

3. Write down how you'll respond, including:

Which tools you'll use to talk to people
What you'll say in different situations
Who does what when there's a problem

List Key People and Their Jobs

Make a list of who does what during an incident:

Role	Job
Incident Manager	Runs the whole response
Communications Lead	Writes and sends out messages
Technical Lead	Gives updates on fixing the problem
Customer Support	Answers user questions
Executive Sponsor	Approves big announcements

Make sure everyone knows their job and how to do it when there's a problem.

Make Message Templates

Write messages ahead of time for different types of problems. This saves time and keeps your messages the same. Make templates for:

Telling people about a new problem
Giving updates
Saying the problem is fixed
Following up after the problem

Remember to change your messages for different people. Start with your team, then tell others as needed. For security problems or lost data, tell everyone right away.

Use Different Ways to Talk to People

Don't just use one way to tell people about problems. Use many, like:

A special website just for updates
Email
Chat tools
Social media
Text messages

This helps make sure everyone gets the news.

Real Example: Slack's Big Outage

In 2021, Slack, a popular work chat app, had a big problem. Here's how they handled it:

They used their status page to give updates every 30 minutes
They posted on Twitter to reach more people
They explained what went wrong in simple terms
They said sorry and thanked people for being patient

Slack's CEO, Stewart Butterfield, said: "We know how much you depend on Slack, and we take our reliability very seriously. We're deeply sorry for this disruption to your work day."

This shows how important it is to:

Use different ways to talk to people
Give regular updates
Explain things simply
Say sorry when things go wrong

2. Set Up Clear Communication Channels

Choose the Right Communication Tools

Pick tools that help you talk to people quickly when problems happen. A status page is the best way to do this. It's a special website that tells everyone what's going on.

Atlassian, a big software company, uses a status page as their main way to tell people about problems. This helps them:

Let users sign up for updates
Answer fewer questions from users
Keep everyone up to date

Use Many Ways to Talk to People

Don't just use one way to tell people about problems. Use lots of ways:

Way to Talk	What It's For	Example
Status Page	Main place for updates	Statuspage
Email	Long messages	Company email list
Work Chat	Team updates	Slack or Microsoft Teams
Social Media	Quick public messages	Twitter or LinkedIn
Text Messages	Urgent alerts	Text message service

Make sure you have backup ways to talk if the internet goes down. This could be extra internet connections or even satellite phones.

Make One Place for All Messages

Have one place where your team can talk about the problem. This could be:

A special chat room in your work chat app
A tool like Jira Service Management

For people outside your company, put status info right on your website. This way, they can see what's happening without going to another page.

Tips for Better Communication

Pick your tools before problems happen
Use a status page as your main way to tell people what's going on
Put status info on your website
Use many ways to talk to people (email, chat, social media)
Have a place for your team to talk about the problem
Keep contact lists up to date
Test your plan to make sure it works

3. Use a Tiered Alert System

Group Issues by How Serious They Are

A tiered alert system helps teams respond to problems quickly and correctly. Here's how to group issues:

Level	What It Means	Example
P1	Very bad: Whole service down	Website crashes
P2	Bad: Big part not working	Can't log in
P3	Not great: Small problem	Search is slow
P4	Small issue: Doesn't hurt much	Button looks wrong

This system helps teams know how fast to act and who to call.

Steps for Each Problem Level

For each level, have clear steps:

1. P1 (Very Bad)

Tell everyone right away
Call the boss
Start fixing in 15 minutes
Update every 30 minutes

2. P2 (Bad)

Tell team leaders
Get main team together
Update every hour

3. P3 (Not Great)

Tell the team in charge
Plan to fix soon
Update daily

4. P4 (Small Issue)

Write it down
Fix when there's time
Check weekly

Set Up Automatic Alerts

Use tools to send alerts fast:

Connect watching tools to talking tools (like PagerDuty to Slack)
Set alerts for certain problems
Make sure alerts go to new people if no one answers

For big problems, the system could:

Text the on-call team
Make a new chat room
Update the status page

Real-World Example: GitHub's 24-Hour Outage

In October 2018, GitHub had a big problem:

What Happened: A data storage system broke
How They Used Tiers: They called it a P1 (worst) problem
What They Did:
- Told users in 5 minutes on status page
- Updated every 30 minutes
- Fixed in 24 hours
Result: Users trusted them more for being open

GitHub's VP of Engineering, Sam Lambert, said: "We believe in being as transparent as possible about service disruptions."

Tips for Better Alerts

Make your tiers fit your business
Train teams on what each tier means
Test your system often
Learn from each problem to make the system better

4. Give Quick and Open Updates

Update Frequency Based on Issue Severity

Match your update frequency to how bad the problem is:

Issue Level	How Often to Update
P1 (Very Bad)	Every 30 minutes
P2 (Bad)	Every hour
P3 (Not Great)	Once a day
P4 (Small)	Once a week

Stick to this schedule. It helps people trust you and stays informed without too many messages.

Be Open While Keeping Some Things Private

Tell people:

What the problem is
How it affects users
What you're doing to fix it
When you think it will be fixed (if you know)

Don't share:

Secret security info
Personal details
Things you're not sure about

Keep your messages clear and factual. This stops people from getting confused or worried for no reason.

Write Clear Messages That Tell People What to Do

Make your messages easy to understand and act on:

Sum up the problem
Say what's happening now
List what you're doing to fix it
Tell users what to do (if needed)
Say when the next update will come

Use this format:

[Problem ID]: Short description
Status: Still happening / Fixed
Impact: What's not working
What we're doing: Steps we're taking
What users should do: Actions for users (if any)
Next update: When we'll say more

This helps everyone know what's going on and what to expect.

Real-World Example: Slack's 2021 Outage

On January 4, 2021, Slack had a big outage that lasted about 4 hours. Here's how they handled it:

They posted updates on their status page every 30 minutes
They used Twitter to reach more people, with 9 tweets during the outage
They explained the problem simply: "Customers may have trouble connecting to Slack to send messages and files"
After fixing it, they said sorry and thanked users for being patient

Slack's CEO, Stewart Butterfield, tweeted: "We're still in a holding pattern. There's no resolution yet, but we'll be sharing more news as soon as we have it. Thanks for your patience."

This shows how to:

Use different ways to talk to people
Give updates often
Keep things simple
Say sorry when things go wrong

Tips for Better Updates

Make a list of who needs to know about problems
Write message templates for different types of issues
Train your team on how to write clear updates
Have a backup plan if your main way of talking to people doesn't work
After each problem, look at how you did and find ways to do better next time

5. Choose Who Speaks and What They Do

Pick People to Talk to Different Groups

Select the right people to talk to each group during a problem:

Role	Talks To	Example
Tech Lead	IT teams, developers	John Smith, CTO of Acme Corp
Support Manager	Users, clients	Sarah Lee, Head of Customer Care at Zendesk
PR Specialist	Media, public	Mike Johnson, Communications Director at Slack
Executive	High-level stakeholders	Satya Nadella, CEO of Microsoft

Make sure each person knows who they should talk to and how.

Train Speakers for Emergency Talks

Get your speakers ready for tough situations:

Do practice runs often
Learn to give short, clear messages
Practice answering hard questions calmly

Make a list of key points for each type of problem. This helps speakers stay on track and give the same info to everyone.

Set up a clear way to share info:

Problem team → Company staff
Tech lead → Support manager
Support manager → Users
PR person → News and public
Executive → Big partners

Keep all info in one place that's always up to date. This stops people from saying different things.

Use this plan to share info based on how bad the problem is:

How Bad	Who Tells Who
Small	Team lead → Department head
Medium	Department head → Division boss
Big	Division boss → Top leaders
Very big	Top leaders → Board members

Real-World Example: Cloudflare's 2019 Outage

In July 2019, Cloudflare, a big internet security company, had a major outage:

What Happened: A config change caused 50% of their network to go down
How They Talked About It:
- CTO John Graham-Cumming wrote a detailed blog post within 24 hours
- CEO Matthew Prince tweeted updates and answered questions
What They Did Well:
- Quick first update (6 minutes after the problem started)
- Clear, honest explanations of what went wrong
- Regular updates on their status page

John Graham-Cumming said: "We believe in transparent communication during incidents. It's crucial for maintaining trust with our customers and the broader internet community."

This approach helped Cloudflare:

Keep users informed
Show they were working hard to fix the problem
Build trust by being open about what happened

Tips for Better Speaker Management

Make a list of who talks to who before problems happen
Train your speakers regularly
Use simple words to explain tech issues
Have backup speakers ready
After each problem, talk about what went well and what to do better next time

6. Review After the Problem is Fixed

Check How Well Communication Worked

After fixing an issue, it's important to look at how well you talked about it. Use this checklist:

Aspect	Questions to Ask
Speed	How fast did we tell people?
Updates	Did we give enough updates?
Clarity	Could everyone understand our messages?
Channels	Did we reach everyone we needed to?
Feedback	Did people feel well-informed?

Use these questions to find what worked and what didn't. This helps you do better next time.

Ask People What They Thought

Get feedback from different groups to learn more:

Send out surveys to teams, customers, and partners
Talk one-on-one with key people
Have a meeting with everyone involved

Ask questions like:

"Did you get updates fast enough?"
"Were our messages clear?"
"Did you feel sure we were fixing the problem?"
"How can we talk better next time?"

Look at what people say to find ways to improve.

Update the Plan Based on What You Learned

Use what you learned to make your plan better:

Fix gaps: If some people didn't get messages, change how you reach them
Improve messages: Update your ready-made messages based on feedback
Make decisions faster: If things were slow, fix your process
Train more: Help your speakers or team members if needed
Get better tools: If your current tools didn't work well, find new ones

Real-World Example: Atlassian's 2022 Outage

In April 2022, Atlassian, a big software company, had a major outage that lasted two weeks. Here's how they handled it:

Action	Result
Quick first update	Told customers within hours
Regular updates	Posted on status page daily
Clear explanations	Explained the problem in simple terms
CEO involvement	Zoe Nicholson, Atlassian's CTO, gave updates

After the outage, Atlassian did a thorough review:

They talked to affected customers
They looked at their communication process
They made changes to prevent similar issues

Scott Farquhar, Atlassian's co-CEO, said: "We've learned a lot from this incident and are taking steps to improve our systems and processes."

This shows how important it is to:

Act fast when problems happen
Keep talking to people throughout the issue
Learn from what went wrong
Make real changes to do better next time

How to Use These Tips

Steps to Implement These Practices

1. Review Current Process

Look at past incidents
Check existing tools
Ask team members for input

2. Choose What to Fix First

Start with easy, quick changes
Plan for bigger updates later

3. Update Your Playbook

Add these 6 tips to your guide
Clearly state who does what

4. Train Your Team

Hold workshops on new practices
Practice with fake incidents

5. Start Small

Use 1-2 new practices at a time
Check if they work and get feedback

Common Problems and Fixes

Problem	How to Fix It
Different messages	Make ready-to-use message templates
Slow alerts	Set up automatic alerts
Unclear updates	Use a set format for all messages
Too many messages	Use levels to send only important info
Poor after-incident reviews	Always have a meeting after big problems

Real-World Example: Atlassian's 2022 Outage

In April 2022, Atlassian had a major outage lasting two weeks. Here's what they did:

Told customers within hours
Posted daily updates on their status page
Explained the problem simply
Had their CTO, Zoe Nicholson, give updates

After fixing the issue, Atlassian:

Talked to affected customers
Looked at how they communicated
Made changes to prevent similar problems

Scott Farquhar, Atlassian's co-CEO, said: "We've learned a lot from this incident and are taking steps to improve our systems and processes."

Key Points to Remember

Check and update your plan often
Ask everyone for honest feedback
Be ready to change your approach
Keep training your team on good communication

Wrap-up

Quick List of the 6 Main Tips

Here's a recap of the six best practices for an effective incident communication plan:

Plan Ahead
Set Up Clear Communication Channels
Use a Tiered Alert System
Provide Timely and Transparent Updates
Designate Spokespersons and Roles
Conduct Post-Incident Reviews

Keep Working on Better Communication

To maintain an effective incident communication plan:

Regular Reviews: Check your plan every 3-6 months.
Practice Runs: Do mock incidents to test your processes.
Stay Current: Keep up with new communication tools and best practices.
Get Feedback: Ask team members and stakeholders for input often.
Be Ready to Change: Update your plan as your company grows or faces new challenges.

Real-World Example: Slack's 2021 Outage Response

On January 4, 2021, Slack faced a major outage lasting about 4 hours. Here's how they handled it:

Action	Details
Quick Updates	Posted on status page every 30 minutes
Multiple Channels	Used Twitter, with 9 tweets during the outage
Clear Communication	Explained the problem simply: "Customers may have trouble connecting to Slack to send messages and files"
Leadership Involvement	CEO Stewart Butterfield tweeted updates

After fixing the issue, Slack:

Apologized to users
Thanked them for their patience
Conducted a thorough review to prevent future incidents

This approach helped Slack:

Keep users informed
Show they were actively working on the problem
Build trust through open communication

Key Takeaways

Act Fast: Tell users about problems quickly.
Use Many Channels: Reach out through different platforms.
Keep It Simple: Explain issues in easy-to-understand terms.
Learn and Improve: Look at what happened and make your plan better.

FAQs

What is an incident communication plan?

An incident communication plan is a key part of IT incident management. It's a detailed guide that covers:

Technical steps for response teams
Who does what during an incident
How to handle the incident quickly

This plan helps teams share information fast and fix problems with less impact on users.

What's a good way to talk to people during an outage?

A good process for talking to people during an outage includes:

Picking the right ways to reach people
Saying who will do the talking
Telling people about the problem right away
Giving updates often
Being honest about what's happening
Saying sorry to affected users
Reaching out before users ask

These steps help keep trust and keep everyone informed while fixing the problem.

What's in an incident communication strategy?

An incident communication strategy is a plan that says:

Who's in charge during a problem
Who talks to users and news people
How to talk to affected people

The strategy tries to keep things calm by making sure only certain people talk to users. It usually has rules about:

What to say in messages
How often to give updates
Which ways to use for talking (like email or social media)

Real-world example: GitHub's 24-hour outage

In October 2018, GitHub had a major outage:

What Happened	How They Handled It	Result
Data storage system broke	Called it their worst-level problem	Users trusted them more
Lasted 24 hours	Told users in 5 minutes on status page
	Updated every 30 minutes

GitHub's VP of Engineering, Sam Lambert, said: "We believe in being as transparent as possible about service disruptions."

Tips for better incident communication

Make your plan fit your business
Train teams on what to do
Test your system often
Learn from each problem
Use simple words to explain tech issues
Have backup speakers ready
After each problem, talk about what to do better next time

Key things to remember

Check and update your plan often
Ask everyone for honest feedback
Be ready to change how you do things
Keep training your team on good communication

Incident Communication Plan: 6 Best Practices

Related video from YouTube

Basics of Incident Communication

Main Parts of Incident Communication

Common Problems When Communicating During Issues

Real-World Example: Facebook's 2010 Outage

Tips for Better Incident Communication

1. Plan Ahead

Create a Clear Response Plan

List Key People and Their Jobs

Make Message Templates

Use Different Ways to Talk to People

Real Example: Slack's Big Outage

2. Set Up Clear Communication Channels

Choose the Right Communication Tools

Use Many Ways to Talk to People

Make One Place for All Messages

Tips for Better Communication

3. Use a Tiered Alert System

Group Issues by How Serious They Are

Steps for Each Problem Level

Set Up Automatic Alerts

Real-World Example: GitHub's 24-Hour Outage

Tips for Better Alerts

sbb-itb-9890dba

4. Give Quick and Open Updates

Update Frequency Based on Issue Severity

Be Open While Keeping Some Things Private

Write Clear Messages That Tell People What to Do

Real-World Example: Slack's 2021 Outage

Tips for Better Updates

5. Choose Who Speaks and What They Do

Pick People to Talk to Different Groups

Train Speakers for Emergency Talks

Make a Clear Order for Sharing Information

Real-World Example: Cloudflare's 2019 Outage

Tips for Better Speaker Management

6. Review After the Problem is Fixed

Check How Well Communication Worked

Ask People What They Thought

Update the Plan Based on What You Learned

Real-World Example: Atlassian's 2022 Outage

How to Use These Tips

Steps to Implement These Practices

Common Problems and Fixes

Real-World Example: Atlassian's 2022 Outage

Key Points to Remember

Wrap-up

Quick List of the 6 Main Tips

Keep Working on Better Communication

Real-World Example: Slack's 2021 Outage Response

Key Takeaways

FAQs

What is an incident communication plan?

What's a good way to talk to people during an outage?

What's in an incident communication strategy?

Real-world example: GitHub's 24-hour outage

Tips for better incident communication

Key things to remember

Related posts

Read more

How to create an ETL process with Boomi, using AWS S3 and Salesforce

AIOps for BizTalk: An Introduction

How AIOps solves the challenge of increasing monitoring need