Invenio IT

Prepare for the Worst with RTO Disaster Recovery Planning

Tracy Rock

Tracy Rock

Director of Marketing @ Invenio IT

Published

rto-business-continuity

One in four businesses never reopens its doors after a disaster and those that do face an uphill battle. Disasters lead to downtime, which has long-lasting effects on your brand, customer base, and bottom line. The most frustrating part is that so many of these failures could be prevented or minimized with effective RTO disaster recovery planning.

Businesses that haven’t properly prepared for the possibility of a lengthy period of downtime may be in for a rude awakening. Recent studies show that the prevalence and severity of downtime are worsening. According to a study by the Uptime Institute, only 8% of major public outages lasted more than 24 hours in 2017. By 2021, this number had increased to almost 30%. This trend is particularly worrying when you consider just how costly downtime can be, and it’s all the more reason to invest the necessary time and resources in developing an accurate RTO as part of your business continuity and disaster recovery plans.

What Is RTO Disaster Recovery Planning?

An RTO, or recovery time objective, is the maximum amount of downtime that your business can experience following a disaster before things start looking really bleak. Before we get into the details of calculating an RTO for your business, let’s take a closer look at what they are and how they fit in with the rest of your risk management and continuity planning.

The Role of RTOs in Business Continuity Planning

Regardless of size or industry, every business needs a comprehensive business continuity plan (BCP). This is a working document that serves to identify the organization’s unique disaster risks, preventative measures, and recovery solutions.

The primary goal of a BCP is to answer the following questions:

  1. Which disaster scenarios put your business at risk?
  2. What is the impact of those disasters on your operations and revenue?
  3. How quickly could operations be restored after a critical event?
  4. What tools and protocols are needed to prevent an interruption in operations?
  5. What steps need to be taken to restore operations?

Question three is where the RTO comes into play. This critical metric helps you determine not only how fast you can bring your operations back online but also how much time can elapse before your financial losses become unsustainable.

RTOs Explained

In the IT world, the term RTO generally refers to the recovery time of specific computer networks, data, applications, servers, or other systems. It is the amount of downtime that a business can reasonably tolerate before the disaster becomes more devastating in terms of revenue loss, projected costs of survival, and other factors.

For example, if your business could survive an email system outage for a period of six hours before experiencing irreversible damage, then your RTO for that particular system would be six hours, at most. Having this number in hand helps your IT department set a timeframe to get back up and running and make implement prevention and recovery measures.

Identifying your RTO in relation to specific business systems is thus a crucial part of your business continuity planning. It’s a starting point for determining what kind of interruption the business can withstand and what actions must be taken to meet those recovery time objectives.

Why Is RTO Disaster Recovery Planning Important?

Once you have a clear understanding of what an RTO is, the next logical question is why you need one. The short answer? Downtime. RTO disaster recovery planning involves developing strategies that allow you to survive and minimize downtime, which is increasingly likely to occur and can have serious ramifications for your business’s long-term success.

Downtime is Common

Whether you operate a small business or an enormous enterprise, chances are good that you’ll experience downtime at some point. For example, a 2020 study by LogicMonitor found that 96% of surveyed IT leaders experienced one or more outages in the past three years. Similarly, a 2022 report from the Uptime Institute shows that 80% of data center managers and operators experienced an outage of some kind in the prior three years. Businesses that are willing to acknowledge the reality that downtime is often unavoidable can better protect themselves by calculating an RTO and incorporating it into their continuity and recovery plans.

Downtime is Damaging

When your business experiences downtime, operations may slow or come to a complete stop. The longer the outage lasts, the more severe the consequences become.  Possible outcomes include:

  • Reduced or lost productivity
  • Decreased revenue
  • Reputational damage
  • Data loss

The financial impacts of downtime can be particularly staggering. Experts estimate that Facebook lost nearly $100 million in revenue due to a seven-hour period of downtime in September 2021. Likewise, in the single hour that Amazon was down in June 2021, it lost approximately $34 million in sales.

For big businesses, losses totaling tens of millions of dollars are a mere blip. Unfortunately, smaller organizations can be devastated or even destroyed by costly periods of downtime. Developing comprehensive disaster recovery and business continuity plans that include your RTO can help reduce the likelihood that downtime will occur and shorten its duration if it does.

How Are RTOs Measured?

Depending on the type of outage, your RTO may be measured in weeks, days, hours, minutes, or even seconds. Essential systems and applications naturally have shorter RTOs because they have a more significant influence on the business’s ability to function.

Consider a major online retailer being knocked offline by a cyberattack. While companies like Amazon have proven that they could likely survive a prolonged attack despite millions of dollars in losses, you can bet that these companies consider almost any amount of recovery time to be unacceptable. Thus, they put an astonishing number of safeguards in place to minimize the risks of downtime, and they may conclude that the RTO for major systems, particularly those that directly affect customers, is only a few seconds.

In contrast, less important systems may have an RTO of several weeks or even months. A single computer failure at a small business, for example, may not be immediately devastating. However, if the issue isn’t resolved over time, the losses incurred will eventually hit an unacceptable point, especially if they’re tied to idle workers and other dependencies.

How Do You Determine Your RTO?

Since an RTO usually relates to operational costs and revenues, you will likely need to consult with different department managers and business units before establishing it. Ideally, this group of personnel will already be identified as the recovery team in your business continuity plan. You’ll need to collect key data points from each department to develop an accurate and useful RTO.

Cost of Potential Losses

One of the most important calculations you’ll complete is the costs that you could incur as a result of system failures. This should take into account expenses like:

  • Wages paid to idle workers
  • Revenue losses
  • Technology repairs
  • Restoration of lost data
  • Government fines

Keep in mind that not all financial losses will be immediate. Downtime can cause considerable damage to your company’s reputation, which can have a lasting effect on customer loyalty and future sales.

The number you land on will depend on the structure, size, and mission of your business, but be careful not to underestimate the potential costs. A 2022 study by Information Technology Intelligence Consulting (ITIC) revealed that for 91% of mid-sized enterprises and large enterprises, a single hour of server downtime costs at least $300,000. Even more frightening, 44% of those enterprises had hourly outage costs ranging from $1 million to over $5 million.

Critical Dependencies

The next question to answer is how various operations depend on a single business system, application, or technology. In other words, if one system fails, is the effect contained, or will it ripple out to other aspects of your business? Consider the impact of system failure across the organization, and identify the functions, services, and processes that would grind to a halt (or even just slow down) if that single system were to fail. Systems with a high number of dependencies should, if at all possible, have shorter RTOs, whereas a lengthier recovery time may be acceptable for those with few or no dependencies.

Possible Workarounds

While your ultimate goal is to fully restore operations, you may be able to find temporary workarounds that help mitigate the effects of a system failure. Your BCP should identify a Plan B that may help to partially restore operations until a full recovery is completed. This is an ideal process for RTO disaster recovery planning because it enables you to extend the recovery time objective. Your recovery team needs as much time as possible to get things back online, and anything you can do that reasonably stretches the maximum RTO is beneficial.

Losses Incurred Over Time

Determine how the length of downtime will influence the cost of losses, and don’t assume that this is a straight proportional line. For instance, it seems reasonable to say that if one hour of downtime causes your business to lose $5,000, two hours would cost $10,000. In reality, you may discover that the rate of losses increases exponentially with each additional hour of downtime as the situation escalates. Thus, while one hour may cost your business $5,000, two hours may cost $15,000. Be sure to factor those potential increases into your RTO formula.

Acceptable Recovery Time

With these numbers in hand, you have the knowledge you need to determine the acceptable length of time for an outage to continue before it’s too late. That length of time is your RTO. Keep in mind that RTOs may vary based on factors like the time of year. Many companies face much greater losses during high sales periods leading up to the holidays, so they might institute shorter RTOs for those times.

While it may be useful to research and compare with RTOs for other businesses, keep in mind that the number you arrive at should be specific to your organization. You won’t win an award for having the world’s shortest RTO, but you will win the favor of your customers and staff if you establish reasonable recovery objectives and plans that help you resume operations quickly and efficiently.

What’s the Difference Between an RTO and an RPO?

RTOs and RPOs are both important metrics for disaster recovery planning. They’re based on a similar concept, but it’s important to recognize the fundamental distinction between the two terms.

RPO stands for recovery point objective. Whereas an RTO defines an acceptable amount of time for recovery, RPO refers to an acceptable amount of data you can lose, measured in time. In short, an RPO is used to determine the appropriate recovery point for data backups.

For example, if your organization determines that losing more than four hours of data would cause unacceptable losses or other adverse impacts on business operations, then your RPO is four hours and your backup recovery points should be spaced out accordingly.

Determining your RPO and RTO are equally important tasks that help protect your business from the negative impacts of downtime. Both should be adequately addressed within your continuity planning.

Why Do RTOs Fail?

As part of your business continuity plan, you should be testing your business’s resiliency on a regular basis. This can include mock recoveries and other drills to ensure your teams can meet the recovery time objectives you’ve identified. Unfortunately, even businesses with ample preparation sometimes fail to achieve their RTOs in a real-world event. Take note of these common mistakes to avoid falling short of your RTOs in the event of a disaster.

Unrealistic Expectations

Keep in mind that a recovery time objective is exactly that: an objective. It does not necessarily have to identify the absolute point of no return after a disaster. Instead, it can be used as a realistic goal for your recovery teams.

As history has shown, large businesses like Amazon and Facebook have the resources to handle an extended outage, but they also have the resources to set aggressive RTOs. With more personnel, more preventive technologies, and more comprehensive continuity planning, these businesses can safely aim for the lowest recovery time objectives possible.

The same can’t be said for every business. If you are setting an aggressive RTO, be sure it’s realistic and based on actual risk and loss projections. You will need to calculate exactly how much downtime the business can handle before the losses are too much to overcome.

Selecting an impossible RTO makes failure inevitable, as does pulling your RTO out of thin air. Your recovery time objective will always be limited to the capabilities of the technologies that are restoring your systems and the people managing that process. Set realistic expectations based on a thorough analysis of your business’s unique disaster-recovery preparedness.

Misguided Backup Management

Many businesses fail to consider the bigger picture of disaster recovery. Incomplete backups can become a hindrance and can extend the amount of time that it takes for your business to recover from an outage. Ideally, your backups should include:

  • Files
  • Network configurations
  • System state information
  • Applications
  • System settings

This information is vital to restore both the servers and the network. If your RTO only accounts for the time to restore basic data, you may find that full recovery will actually take much longer, and you will fail to meet your objective.

Inefficient Backup Recovery Methods

Choosing the right kind of backup system can have a significant influence on how long it takes to recover from downtime. A high-quality hybrid backup solution, for instance, can instantly restore your data from both local and cloud storage locations.

If you’re backing up to tapes, be aware of the limitations of recovery due to tape contention. In many cases, multiple recovery tasks have to compete for resources on the same tape, which significantly extends your recovery time. Rather than restoring several systems simultaneously, you will have to wait for one system to complete before you can begin the next.

When calculating your RTO, know the limitations of your backup technologies and the dependencies that will influence your recovery time. If necessary, upgrade your backup method to one that can enable you to better meet your objectives.

How Can Businesses Speed Up Recovery Times?

Calculating an RTO requires time and resources that you may feel could be better spent elsewhere. However, with the ever-present risk of downtime putting the future of your company in the balance, taking the time to calculate an RTO is a small price to pay.

In addition to determining a realistic RTO, you can also work toward shortening the time it takes to recover from a disaster. Develop strong plans and invest in backup solutions that can restore data more quickly. If you need guidance or insights into the best possible business continuity products and practices, reach out to Invenio IT’s business continuity experts. From natural disasters to SMB ransomware, the team at Invenio IT can help you prepare for the worst.

Get The Ultimate Business Continuity Resource for IT Leaders
invenio logo

Join 23,000+ readers in the Data Protection Forum