The Ultimate Guide to Disaster Recovery
Typically focused on information technology (but applicable to all business operations), the planning can encompass a broad range of tools and processes:
- Data backup and recovery technologies
- Failover systems
- Redundant hardware and equipment
- Secondary business locations
- Recovery protocols and procedures
Together, these components guide a business through all stages of the disaster management cycle: prevention, preparation, mitigation and recovery.
Why it’s important
Operational disruptions—and the downtime they cause—are arguably the single greatest threat to a business.
- 40% to 60% of small businesses never reopen their doors following a disaster, according to FEMA.
- 90% of smaller companies fail within a year if they can’t resume operations within 5 days after experiencing a disaster.
- Each hour of downtime can cost businesses anywhere from $10,000 to millions of dollars, depending on the size of the company.
Disaster recovery planning is critical to ensuring that companies are prepared for any threat and that personnel know how to respond when those incidents happen.
The truth about disasters
Disaster recovery is not focused solely on destructive natural disasters, such as tornadoes and hurricanes. While those are indeed serious threats that require proper planning, other types of disasters are far more common.
- Ransomware & malware
- Data loss
- Network outages
- Hardware failure
- Utility outages
Each of these events—even the loss of a single critical file—can pose enormous challenges for a business. Operational disruptions of any kind can translate into tremendous costs that are difficult for smaller companies to overcome.
Costs of prolonged recoveries
Consider, for example, the impact of a single ransomware infection. With files encrypted, entire computers become unusable and operations are effectively frozen across the organization. This results in lost wages, lost productivity, interrupted revenue streams, costly recovery efforts and a host of other expenses.
In 2017, the NotPetya ransomware attack cost FedEx a staggering $300 million, underscoring the importance of comprehensive disaster recovery planning.
Creating a disaster recovery plan
A disaster recovery plan outlines a business’s strategies for dealing with operational disruptions. Much like a business continuity plan, it’s a comprehensive document that spells out how the business should respond to various disaster scenarios (and how to avoid them).
A typical disaster recovery plan includes the following sections:
- Key Contacts: Contact information for recovery personnel or key stakeholders
- Objectives: Purpose and scope of the plan
- Review & update schedule: How often the plan should be updated and by whom
- Activation protocol: Under what circumstances the plan is activated and how
- Recovery procedures: Detailed processes for recovering from specific incidents
- Systems: Data backup and other IT systems that support the recovery process
- Secondary locations & assets: Backup space, equipment and resources for temporary relocation
- Recommended action steps: Identification of areas that require additional planning
Creating a disaster recovery plan is the first critical step of the planning process. It requires a business to consider the specific incidents that threaten operations and create detailed recovery protocols for each scenario.
Business continuity and disaster recovery (BC/DR)
The term “business continuity and disaster recovery” is often used to describe a business’s data backup system. Shortened as BC/DR, it is an essential IT deployment that ensures a business can restore data from a backup after a disaster.
- While many forms of data backup software exist, BC/DR systems typically offer greater protection against a range of data-loss events.
- Many of today’s disaster recovery solutions deploy a dedicated backup device, combined with intelligent software and cloud storage.
- High backup frequencies and fast recovery methods are what define the best BC/DR solutions, ensuring that businesses can maintain continuity through any disaster. (Below, we identify some specific features to look for in a disaster recovery system.)
Beyond data, businesses also need to have a backup plan for replacing a wide array of systems that are critical for company operations to function. Failover systems create redundancy, enabling businesses to quickly fall back on secondary resources when primary systems become unavailable.
Examples of failover systems include:
- Backup generators that continue to supply power to the business during electrical outages
- Network failover systems that enable communication to continue during outages (i.e. via redundant telecommunications lines, wireless failover, network failover systems, etc.)
- Failover servers that are activated during planned/unplanned maintenance on primary systems
Failover ensures that critical systems are constantly available, even when primary resources go down.
Creating effective disaster recovery protocols is impossible without having a deep understanding of the potential risks. A risk assessment is needed to identify the most likely threats and their impact on the business.
A risk assessment is typically included within the disaster recovery plan or business continuity plan. An IT-specific risk assessment is often created separately from the larger, business-wide assessment. This helps to measure the unique impact of disaster on essential technology deployments.
- Identified risks should be prioritized by their likelihood, as well as their impact.
- Risk assessments are typically accompanied by an impact analysis, which provides greater insight into the specific consequences of each disruption.
- Impact is typically measured by the end cost of the disruption as it affects IT and all other business functions (i.e. downtime, revenue disruptions, etc.)
Steps to prevent a disaster are just as critical as those to recover from one, if not more so. As such, prevention is an important piece of disaster recovery planning: it enables continuity without the need to activate a recovery plan.
Examples of preventative steps:
- Security solutions to prevent disruptions from malware, cyberattack, data theft, etc.
- Network/firewall configurations to block dangerous incoming/outgoing traffic.
- Access control/permissions to restrict users from accessing sensitive file directories.
- Load balancing to prevent network slowdown and server crashes
- Scheduled server maintenance and hardware replacement to prevent unexpected failure
Another critical form of prevention that’s often overlooked is user education. Today’s most destructive events, like ransomware attacks, are often caused by user error. For example, users may inadvertently open a malicious email attachment or fall victim to a phishing scam that steals their login credentials.
Ongoing employee training programs can greatly reduce the risk of these events by educating users on safe web/email practices.
Every disaster requires its own unique process for recovery. As part of the disaster recovery plan, businesses must carefully outline the steps that personnel should follow to carry out the recovery. This could include steps for restoring a backup, reinstalling a critical application or even moving mission-critical operations to a secondary location.
Tips for effective recovery protocols:
- Create specific procedures for each scenario identified in the risk assessment and/or impact analysis.
- Leave nothing to guesswork. Clearly spell out each step with the assumption that it could be carried out by personnel who aren’t deeply familiar with the process.
- When applicable, incorporate diagrams, flow charts or other visuals to make the process easier to follow.
Recovery procedures should also state who is responsible for carrying them out, including any secondary/substitute personnel for scenarios in which the primary recovery team is unavailable.
The speed and timing of the recovery process should be guided by the objectives set in the disaster recovery plan. Within IT, two of the core objectives pertain to how quickly systems should be recovered to prevent the negative consequences of a prolonged outage. Those two objectives are referred to as:
- Recovery time objective (RTO): The desired maximum amount of time that the recovery process should take. This can be applied toward specific systems or events, such as data loss, network outages, website outages, and so on.
- Recovery point objective (RPO): The desired maximum age of the most recent backup. This objective sets a limit for the age of backups (as well as goals for backup frequency), helping to minimize the amount of data loss when a backup needs to be restored.
Secondary locations and assets
In the event of catastrophic disasters in which physical business locations become inaccessible, organizations must have a plan for restoring critical operations at a secondary location. This means having access not only to the backup location itself but also equipment and resources for that location.
- If a business does not already have access to a secondary location, it should have a plan for quickly securing one.
- Backup equipment must be made available to the mission-critical personnel that will use the secondary location. Beyond server and network infrastructure, this can include individual computers, desks, chairs and so on.
- The disaster recovery plan should prioritize the personnel that should relocate, and the business should further communicate this with all applicable personnel via the emergency communication methods identified in the plan.
Recommended Data Backup & Recovery
As mentioned above, data backup is an important piece of the disaster-recovery puzzle. When data is lost—for whatever the reason and no matter how small or large the loss—businesses must be able to restore it quickly in order to prevent an operational disruption.
While there are numerous BC/DR solutions on the market, there are some key capabilities that today’s businesses should look for when comparing options:
- Dedicated backup devices to process and store the backups
- Hybrid cloud backups (stored locally and in the cloud)
- Ability to perform backups frequently (every few minutes, in some cases)
- Ability to boot backups as virtual machines
- Numerous recovery options: file-level, rapid rollback, bare metal restore, direct restore, etc.
- Automatic backup integrity checks
In the age of ransomware, cybersecurity experts advise businesses to deploy robust disaster recovery solutions that can quickly recover the entire infrastructure, in addition to individual files and folders.
Solutions like the Datto SIRIS provide a complete infrastructure backup (physical, virtual, cloud) as often as every five minutes, while also enabling near-instant backup virtualization, locally or in the cloud.
Request a Free Demo
For more information on protecting your business with Datto’s hybrid cloud BC/DR solutions, request a free demo or contact our disaster recovery experts at Invenio IT. Call us at (646) 395-1170 or email success@invenioIT.com.