Can we break it? 9 business continuity plan testing scenarios
Nobody wants to get those dreaded 3 a.m. phone calls. “The servers are down.” “The backups failed.” “Ed from logistics opened a phishing email again!” These calls are an IT professional’s worst nightmare. But the good news is: by exploring the right business continuity plan testing scenarios, you may never have to get such a call.
Creating a business continuity plan (BCP) is only the first step toward implementing a rock-solid continuity strategy. The systems and protocols outlined in your plan might sound good in theory, but how do they hold up in a real-world disaster?
- Can your backup systems survive a real data meltdown?
- Will you be able to meet your RTO for restoring data?
- How well will employees follow emergency procedures?
- Will your emergency communication strategy work out as planned, or will it implode?
- What will really happen when things go bad?
There’s no way to know for sure without testing. This is a critical component of continuity planning. Without putting your BCP to the test, you’ll never know if your company is truly prepared for a disaster—until it’s too late.
Today, we look at 9 business continuity plan testing scenarios that can ensure your technologies and teams are ready for anything.
Get your hammer ready (metaphorically speaking)
Once your plan is finalized, it’s time to try to break it.
Don’t worry—you’re not actually shredding the document that you spent so many weeks writing, editing and getting approved from higher-ups. And you’re not actually breaking anything at all.
However, you do need to prove the soundness of everything you put in the plan. By that, we mean using strategic tests that will help you to:
- Identify weaknesses in your BC systems
- Confirm that infrastructure investments meet your continuity objectives
- Evaluate the company’s response to different types of disruptive events
- Make improvements to systems and procedures based on test findings
- Update your BCP accordingly
Don’t make the mistake of creating a comprehensive plan but never putting it to the test. That’s more than just laziness. It’s dangerous.
Without testing your plan, you’re putting both the business and its people at risk.
Keep this in mind: only 6 percent of companies without a disaster recovery plan survive a disaster, according to Datto. Having an inadequate plan is just as risky as having no plan at all.
Before you get started
The question you’re probably already asking is: what do you test and how often?
If you’ve done your job, then your BCP is already filled with hundreds of procedures for various events, including even the smallest of emergency-response steps, such as calling 9-1-1 in a fire. Do you test everything? How much is too much?
This depends on your company’s unique risks (as you’ve hopefully identified in a thorough risk assessment and business impact analysis).
- A company that has more to lose from a disruption (revenue losses, operational downtime, credibility / reputation, etc.) will usually require a higher number of business continuity plan testing scenarios, as well as a greater frequency of those tests.
- Keep in mind: the tests that you deem as “most important” may not be as important to another business in the same building as you. They may not even apply at all.
Every business is different, and thus its BCP is different as well: in scope and priority.
Below, we’ve included tests that we recommend for most businesses who are concerned about continuity. Some of the recommendations may be a bit general, depending on your operations. Customize and implement as needed for your business’s unique needs.
Business continuity plan testing scenarios
As you prepare for your tests, you’ll also need to determine just how “real” you want the test to be.
Testing is often a challenge for companies. The tests require time and resources for planning and executing them. For that reason, you may find it easier to conduct certain tests sitting around a conference table, rather than involving the entire organization in a full-scale drill. In business continuity, these varying types of tests are typically defined as follows:
- Plan review: the most basic test, in which the recovery teams go over the BCP, line by line, to make sure everything is accurate and shipshape.
- Tabletop test: a more involved version of the plan review, in which employees participate in actual exercises (usually in a conference-room setting) to confirm that everyone knows their responsibilities in various types of emergencies. These tests may also be used for testing technology components so that multiple people can evaluate how the systems behave and how it affects their roles.
- Simulation test: this is the most realistic test, requiring team members to perform their BC/DR duties within their actual work environments. For certain types of disasters, this may even mean going off-site (for example, to resolve issues at a local data center or mock-prepare a backup office location).
Full-scale simulation tests are ideal because they allow you to evaluate your teams’ and technologies’ response to disasters in a way that’s as close to the “real thing” as possible. But if time and resources don’t allow for repeated simulations, then fall back on the tabletop tests (rather than not testing at all).
Okay, let’s dive into the tests …
1) Data loss
Let’s start with one of the most common workplace disasters today: a loss of data. This loss could be caused by a number of culprits:
- Ransomware and other cyberattacks
- Accidentally deleted files or folders
- Server / drive failure
- Datacenter outage
Assume that the lost data is mission-critical. Perhaps it’s your CRM information or the data that runs your sales and logistics applications.
The obvious goal is to get that data back as quickly as possible, ideally by restoring a backup. But whose job is it to do that? How should they communicate the problem with other personnel (and at what point in the crisis)? What are the priorities? Do outside vendors, such as managed service providers (MSPs) need to be contacted?
If your primary IT person isn’t available to start the recovery, do other team members know how to do it?
These are all questions that should be answered by your test.
2) Data recovery
You need to make sure your BC/DR systems work like they’re supposed to. Conduct a test that involves losing a massive amount of data, and then try to recover it.
Here’s what you’ll need to evaluate:
- How long does the recovery take?
- Were any files corrupted during the recovery?
- Did you meet your RTO?
- If you virtualized a backup in the cloud, were there any issues? Did internal applications run without connectivity issues or lag?
Make sure that the teams who rely on this business-critical data participate in the test. For example, if they’ll be expected to work with a virtualized environment, watch them do this – see what questions they have or what issues they run into.
3) Power outage
Scenario: Last night, power was knocked out by a storm. The utility company says it won’t be back up for days.
So, what now? What does your BCP say should happen in an event like this?
As part of the test, you’ll want to make sure that your DR team knows their responsibilities and how to communicate with the rest of the organization.
- How will personnel be notified? Are they expected to come to work?
- If a prolonged work stoppage occurs, does HR and Accounting know how it impacts payroll?
- Are there backup generators that need to be manually started?
- Is there a backup office location?
These answers should already be in your BCP. But with the test, you’ll be able to confirm that everyone follows the protocols as outlined.
4) Network and/or Internet outages
Very similar concerns here. Chances are if there’s no electricity then there’s no network either. Although there are numerous situations in which you could have electricity but the network is down.
For situations like this (if the outage is prolonged), it’s increasingly common for organizations to provide personnel with the means to work remotely from home (more on that in the next continuity plan testing scenario, below). So as part of this test, you’ll want to make sure that this plan works as designed:
- Do employees know how to use/access the remote desktop systems?
- Does the technology work as designed? Are speeds/connectivity strong enough to maintain productivity levels?
- How is the network being restored? Do recovery teams know what to do?
What about network tests?
In addition to testing your preparedness for a network outage, you’ll want to test the network itself. This will enable you to verify the resilience of the network in various scenarios, such as cyberattacks, heavy bandwidth usage, changes in network configurations and so on.
There are numerous types of network stress tests that allow you to simulate congested network conditions. Sometimes referred to as “torture testing,” these tests give you insight into how your network performs when stressed to the max. Most network testing tools will allow you to measure bandwidth utilization and latency, and see how spikes in packet levels affect the performance of your network devices.
In addition to routine testing, these tests should also be conducted prior to the rollout of new applications or other significant changes to the network.
Remember: a critical aspect of these testing scenarios is to test your response to the simulated incident. So in a simulated network outage, for example, you’ll want to run through the steps needed to resolve the problem. Then, conduct a post-incident analysis to measure the speed and effectiveness of that response.
5) Application failure
What happens when an application that is most critical to your operations suddenly stops working? Aside from bringing your operations to a halt, your employees will likely be idled with nothing to do. This is an extremely costly scenario for most businesses, because it means that revenue is halted while expenses continue (and are wasted).
Routinely testing your applications can help to prevent these costly outages from happening and ensure that teams know how to rapidly respond when failure does occur.
Here’s what to consider as part of this testing scenario:
- What events or conditions are most likely to cause the application to fail? (i.e. heavy network usage, large-scale changes, etc.)
- When failure occurs, what steps are needed for recovery?
- What can be done to mitigate or eliminate these outages in the future?
Stress tests and performance tests are especially valuable, as they can help to identify how the application performs under different workloads. If the applications are externally developed and there are bugs or other issues inherent in the software (as opposed to adverse internal conditions, such as network issues), then organizations should work with their software vendor to identify a fix.
6) Public health crisis
This is a larger-scope continuity testing scenario that businesses became well-acquainted with during the Covid-19 pandemic.
As the coronavirus spread, organizations raced to adhere to critical health guidelines that ushered in a new era of remote work, virtually overnight. Not all businesses were able to quickly adapt to this sudden shift. However, some organizations had been testing such a scenario as part of their continuity planning long before the pandemic started.
Businesses of all sizes need to be sure they can continue to operate during a public health crisis that threatens the health, wellness and availability of workers. This means testing the ability to shift operations, as it relates to both logistical feasibility and IT infrastructure:
- Can employees perform their jobs remotely?
- Do they already have devices that make remote work possible? Or would new devices need to be acquired?
- Are IT systems already in place that would enable workers to securely connect to the network?
- In the event of prolonged staffing issues, can critical operations be carried out by limited personnel?
As we discovered, a global health crisis can occur at any time. Businesses need to continually test their ability to adapt to such an event to ensure their operations can continue without interruption.
7) On-site danger
This is a very important office-wide drill that you must conduct at least once a year. Chances are that your local fire codes may already require you to have a periodic fire drill. If not, it’s critical that you conduct one anyway.
In addition to fire, these drills can be used for testing response to other dangerous situations, such as:
- Bomb threats
- Terrorist attacks
- Gas leaks
- Structural instability
As part of your test, make sure people know their emergency procedures, whether it’s evacuation, duck and cover, retreating to a safe area or even staying at their desks.
Additionally, you should be testing your procedures for maintaining operations in case such an event is prolonged.
8) Communication protocols
Communication is critical in a disaster. And in the most disruptive events (such as a severe natural disaster), you’ll probably lose most of your traditional communication means.
Your BCP should already outline how communication should occur in these situations: who should call whom and how. Some companies use calling trees. Some have an emergency email alert system, a call-in number for updates or special company websites used exclusively for communicating during these events.
Your tests should check that these systems and steps actually work: that personnel know they exist, that they know how to use them and that they work as designed.
9) Crisis of any kind
Let’s face it—there are so many different disasters that threaten your operations. Hopefully they’re already thoroughly defined in your business continuity plan.
Your job is to make sure you’re creating realistic tests that prepare the business for each of these crises. We’ve included some of the most destructive (and common) disasters in the recommended tests above, but there are numerous others to consider as part of your testing, including:
- Loss of personnel (transportation blockage, strike, illness, etc.)
- Additional utility outages (gas, telecommunications)
- Application outages
- On-site flooding
- City/area-wide evacuation
- IT infrastructure failure or damage
As with each of the tests outlined above, your drills for these scenarios should be designed to ensure that personnel know how to respond, that they’ll be safe and that the business can continue running.
Documenting your testing scenarios
All tests should be thoroughly documented. This enables organizations to identify how the test was conducted, what went right and what needs to be improved. Each test provides a baseline for conducting future tests and also for making changes to continuity planning.
Each testing scenario should be individually documented, but can also be summarized to provide a high-level overview. Here is a very basic example of what that might look like, just for templating purposes:
|Testing Scenario||Outcome||Action Steps|
|Local data backup recovery||Failure to restore; corrupted data||Further evaluation of the cause of failure/corrupted data; consideration for new BC/DR investment|
|Network stress test||Application failure at peak bandwidth utilization||Reconfigure network settings to balance network load|
This summary should be followed by a more detailed description of each test, when it was conducted, what occurred and recommendations for further testing and/or improvements.
Frequently asked questions
1) What is an example of business continuity?
Business continuity is an operational objective that means a business can continue to function without disruption or interruption. One example of maintaining business continuity would be a hospital that is able to continue providing healthcare services during a hurricane.
2) How do you write a business continuity plan?
Writing a business continuity plan involves outlining your business’s unique risks, the impact of those adverse events, protocols for mitigation, response and recovery, and the systems that support those continuity efforts. For more tips, see our related post on how to develop a business continuity plan.
3) Why is a business continuity plan important?
Planning for potential operational disruptions is the best thing businesses can do to prevent, mitigate and recover from such events. A business continuity plan serves as important documentation for understanding risks and guiding an organization through all stages of a disruption, from prevention to recovery.
4) How often do we test our business continuity plan objectives?
The timing and frequency of your plan testing will depend on the unique objectives of each business, and you will likely need to test different parts of your BCP at different times. For example, you might decide to conduct emergency preparedness drills for employees once a year, while your data backups may need to be tested every few months.
Get more information
To learn more about how your company can mitigate downtime after data loss and other disasters, contact our business continuity experts at Invenio IT. Request a free demo or contact us today by calling (646) 395-1170 or by emailing success@invenioIT.com.