Can we break it? 7 business continuity plan testing scenarios
Nobody wants to get those dreaded 3 a.m. phone calls. “The servers are down.” “The backups failed.” “Ed from logistics opened a phishing email again!” These calls are an IT professional’s worst nightmare. But the good news is: with the right business continuity plan testing scenarios, you may never have to get such a call.
Creating a business continuity plan (BCP) is only the first step toward implementing a rock-solid continuity strategy. The systems and protocols outlined in your plan might sound good in theory, but how do they hold up in a real-world disaster?
- Can your backup systems survive a real data meltdown?
- Will you be able to meet your RTO for restoring data?
- How well will employees follow emergency procedures?
- Will your emergency communication strategy work out as planned, or will it implode?
- What will really happen when things go bad?
There’s no way to know for sure without testing. This is a critical component of continuity planning. Without putting your BCP to the test, you’ll never know if your company is truly prepared for a disaster—until it’s too late.
Today, we look at 7 business continuity plan testing scenarios that can ensure your technologies and teams are ready for anything.
Get your hammer ready (metaphorically speaking)
Once your plan is finalized, it’s time to try to break it.
Don’t worry—you’re not actually shredding the document that you spent so many weeks writing, editing and getting approved from higher-ups. And you’re not actually breaking anything at all.
However, you do need to prove the soundness of everything you put in the plan. By that, we mean using strategic tests that will help you to:
- Identify weaknesses in your BC systems
- Confirm that infrastructure investments meet your continuity objectives
- Evaluate the company’s response to different types of disruptive events
- Make improvements to systems and procedures based on test findings
- Update your BCP accordingly
Don’t make the mistake of creating a comprehensive plan but never putting it to the test. That’s more than just laziness. It’s dangerous.
Without testing your plan, you’re putting both the business and its people at risk.
Keep this in mind: only 6 percent of companies without a disaster recovery survive a disaster, according to Datto. Having an inadequate plan is just as risky as having no plan at all.
Before you get started
The question you’re probably already asking is: what do you test and how often?
If you’ve done your job, then your BCP is already filled with hundreds of procedures for various events, including even the smallest of emergency-response steps, such as calling 9-1-1 in a fire. Do you test everything? How much is too much?
This depends on your company’s unique risks (as you’ve hopefully identified in a thorough risk assessment and business impact analysis).
- A company that has more to lose from a disruption (revenue losses, operational downtime, credibility / reputation, etc.) will usually require a higher number of business continuity plan testing scenarios, as well as a greater frequency of those tests.
- Keep in mind: the tests that you deem as “most important” may not be as important to another business in the same building as you. They may not even apply at all.
Every business is different, and thus its BCP is different as well: in scope and priority.
Below, we’ve included tests that we recommend for most businesses who are concerned about continuity. Some of the recommendations may be a bit general, depending on your operations. Customize and implement as needed for your business’s unique needs.
Business continuity plan testing scenarios
As you prepare for your tests, you’ll also need to determine just how “real” you want the test to be.
Testing is often a challenge for companies. The tests require time and resources for planning and executing them. For that reason, you may find it easier to conduct certain tests sitting around a conference table, rather than involving the entire organization in a full-scale drill. In business continuity, these varying types of tests are typically defined as follows:
- Plan review: the most basic test, in which the recovery teams go over the BCP, line by line, to make sure everything is accurate and shipshape.
- Tabletop test: a more involved version of the plan review, in which employees participate in actual exercises (usually in a conference-room setting) to confirm that everyone knows their responsibilities in various types of emergencies. These tests may also be used for testing technology components, so that multiple people can evaluate how the systems behave and how it affects their roles.
- Simulation test: this is the most realistic test, requiring team members to perform their BC/DR duties within their actual work environments. For certain types of disasters, this may even mean going off-site (for example, to resolve issues at a local data center or mock-prepare a backup office location).
Full-scale simulation tests are ideal because they allow you to evaluate your teams’ and technologies’ response to disasters in a way that’s as close to the “real thing” as possible. But if time and resources don’t allow for repeated simulations, then fall back on the tabletop tests (rather than not testing at all).
Okay, let’s dive into the tests …
1) Data loss
Let’s start with one of the most common workplace disasters today: a loss of data. This loss could be caused by a number of culprits:
- Ransomware and other cyberattacks
- Accidentally deleted files or folders
- Server / drive failure
- Datacenter outage
Assume that the lost data is mission-critical. Perhaps it’s your CRM information or the data that runs your sales and logistics applications.
The obvious goal is to get that data back as quickly as possible, ideally by restoring a backup. But whose job is it to do that? How should they communicate the problem with other personnel (and at what point in the crisis)? What are the priorities? Do outside vendors, such as managed service providers (MSPs) need to be contacted?
If your primary IT person isn’t available to start the recovery, do other team members know how to do it?
These are all questions that should be answered by your test.
2) Data recovery
You need to make sure your BC/DR systems work like they’re supposed to. Conduct a test that involves losing a massive amount of data, and then try to recover it.
Here’s what you’ll need to evaluate:
- How long does the recovery take?
- Were any files corrupted during the recovery?
- Did you meet your RTO?
- If you virtualized a backup in the cloud, were there any issues? Did internal applications run without connectivity issues or lag?
Make sure that the teams who rely on this business-critical data participate in the test. For example, if they’ll be expected to work with a virtualized environment, watch them do this – see what questions they have or what issues they run into.
3) Power outage
Scenario: Last night, power was knocked out by a storm. The utility company says it won’t be back up for days.
So, what now? What does your BCP say should happen in an event like this?
As part of the test, you’ll want to make sure that your DR team knows their responsibilities and how to communicate with the rest of the organization.
- How will personnel be notified? Are they expected to come into work?
- If a prolonged work stoppage occurs, does HR and Accounting know how it impacts payroll?
- Are there backup generators that need to be manually started?
- Is there a backup office location?
These answers should already be in your BCP. But with the test, you’ll be able to confirm that everyone follows the protocols as outlined.
4) Network outage
Very similar concerns here. Chances are if there’s no electricity then there’s no network either. Although there are numerous situations in which you could have electricity but the network is down.
For situations like this (if the outage is prolonged), it’s increasingly common for organizations to provide personnel with the means to work remotely from home. So as part of this test, you’ll want to make sure that this plan works as designed:
- Do employees know how to use/access the remote desktop systems?
- Does the technology work as designed? Are speeds/connectivity strong enough to maintain productivity levels?
- How is the network being restored? Do recovery teams know what to do?
5) On-site danger
This is a very important office-wide drill that you must conduct at least once a year. Chances are that your local fire codes may already require you to have a periodic fire drill. If not, it’s critical that you conduct one anyway.
In addition to fire, these drills can be used for testing response to other dangerous situations, such as:
- Bomb threats
- Terrorist attacks
- Gas leaks
- Structural instability
As part of your test, make sure people know their emergency procedures, whether it’s evacuation, duck and cover, retreating to a safe area, or even staying at their desks.
Additionally, you should be testing your procedures for maintaining operations in case such an event is prolonged.
6) Communication protocols
Communication is critical in a disaster. And in the most disruptive events (such as a severe natural disaster), you’ll probably lose most of your traditional communication means.
Your BCP should already outline how communication should occur in these situations: who should call whom and how. Some companies use calling trees. Some have an emergency email alert system, a call-in number for updates, or special company websites used exclusively for communicating during these events.
Your tests should check that these systems and steps actually work: that personnel know they exist, that they know how to use them, and that they work as designed.
7) Crisis of any kind
Let’s face it—there are so many different disasters that threaten your operations. Hopefully they’re already thoroughly defined in your business continuity plan.
Your job is to make sure you’re creating realistic tests that prepare the business for each of these crises. We’ve included some of the most destructive (and common) disasters in the recommended tests above, but there are numerous others to consider as part of your testing, including:
- Loss of personnel (Transportation blockage, strike, illness, etc.)
- Additional utility outages (gas, telecommunications)
- Application outages
- On-site flooding
- City/area-wide evacuation
- IT infrastructure failure or damage
As with each of the tests outlined above, your drills for these scenarios should be designed to ensure that personnel know how to respond, that they’ll be safe, and that the business can continue running.
Get more information
To learn more about how your company can mitigate downtime after data loss and other disasters, contact our business continuity experts at Invenio IT. Request a free demo or contact us today by calling (646) 395-1170 or by emailing [email protected].