Click to learn more about author W. Curtis Preston.
Disaster recovery (DR) is the perfect workload for the cloud. The main reason this is true is that disasters happen suddenly and without warning, which requires you to declare them in the same way. The cloud is ready to go at a moment’s notice, making it perfect for that scenario. Disaster recovery also happens to be a need that every company has, so they can use it as a test case for the value of cloud.
A company considering migrating workloads to the cloud needs a successful initial project to create momentum. Some start with a lift-and-shift project because it’s simple and will probably succeed from a technical basis, but lift-and-shift projects also come with a near-guarantee of an increase in costs. A fully automated DR system, on the other hand, can be very successful and decrease costs. This is why so many experts talk about DR as a perfect first workload to migrate to the cloud. More on this later.
How DR is Traditionally Done
To understand why DR is perfect for the cloud, it’s important to first discuss how DR is typically done without the cloud. You basically have two options: spend tons of money or don’t bother doing DR. There are a few ways to do DR, but they always amount to significant costs.
The traditional DR approach is to lease or buy a standby data center that you manage. You then connect your production data center to the standby data center and replicate data between the two locations. In order for this to work, the hardware in the standby site must be very close to the same hardware found in the production site. Virtualization made this a little easier, but the computing, storage, and networking power still needs to be very close.
Since this hardware goes completely unused until you declare a disaster, it is a huge investment for something that is barely used. Besides the huge waste from purchasing resources that only get used in a rare situation, those servers and storage arrays usually stay powered on, consuming power every single day. To use a modern term – that’s not very green.
Some companies decided to use DR service providers that will rent them compute, network, and storage in case of a disaster. This is usually less expensive than purchasing and maintaining your own standby data center, because you’re only paying for the resources when you use them; however, this method comes with additional risks. In a true disaster, there can be a “run on the bank,” since such resources tend to be regionally located close to the company in question. If a large disaster happens, more companies may declare disasters than the DR service provider has resources for, which can create massive delays and challenges for the affected companies. That means this method has less waste, but also comes with higher risks.
Can the Cloud Meet Your Needs?
Companies declaring a disaster or testing
their ability to recover from a disaster have very specific needs. They need
(in no particular order):
- The ability to restore data very quickly
- Necessary amount of storage upon which to restore said data
- An unlimited amount of VMs to replace the servers or VMs they have onsite
- A significant amount of network capacity
Enterprises need all of this at a moment’s notice, don’t want to pay for it until they need it, and need to feel confident there won’t be a “run on the bank” if a disaster happens.
A large public cloud provider is the only way to meet these requirements. Public cloud data centers are dispersed enough that demand from a regional disaster is not an issue and even if there was an issue in one region, regional support easily solves this problem. Unlike the traditional DR service providers, there is no need to collocate the DR resources with the company declaring a disaster. Everything is completely automatable and does not need any physical hands to make it happen.
DIY DR
Some companies decide to design their own DR plan. They use traditional replication software running in some VMs in the cloud. This requires storing all necessary data on a block device in the cloud. For an AWS customer, for example, this means the copy will be stored on Enterprise Block Storage (EBS). VMs and a VPN will be created, none of which will be powered on until a disaster is declared.
DRaaS via Replication
There are Disaster Recovery as-a-Service (DRaaS) service providers that will automate everything listed above for you, and use the cloud as their resource for doing so. Many of them are old guard vendors that have been in the DRaaS space for a long time. Where they previously leased equipment during a disaster, they now automate the use of cloud for that function. Their value is in simplifying DR for those who don’t need the complexity of the DIY approach.
Service providers use a variety of tools to get your data ready for a disaster. Sometimes they have their own software, and other times they are leveraging someone else’s product, and simply obfuscating the complexities away from the customer.
DRaaS via Backup
The advantage of DIY and DRaaS via replication are that they usually use replication (instead of backup) to maintain the DR copy. This provides for a very tight recovery point objectives (RPO) and recovery time objectives (RTO). The downside is customers using them have to maintain two separate infrastructures: one for backup and one for DR; and the destination copy must be on expensive block storage.
There is another approach to DR, where cloud-based data protection vendors leverage the fact that they already have a backup of the latest version of each server in the cloud. Instead of managing a separate system that is only used in DR, these vendors perform a restore of each server or VM into the appropriate place.
This restore can be done in advance, as well, to allow for quicker RTOs, but the restored data doesn’t have to be stored on block devices. For example, in the case of AWS, the data can be stored in EBS snapshots, which are half the price of EBS, and can be quickly restored to EBS in case of disaster. Since these systems are using backup data instead of replication, they will not be able to supply the same RPOs that a system using replication can; however, they should be able to supply DR for a much lower cost than a separate backup system.
Don’t Count Out a Commercial Solution
The DIY approach may look inexpensive at first, which is why a lot of people consider it. However, it doesn’t cut costs quite like you’d think. You’re still paying for 24×7 resources, replication software, VMs, and storage, on top of resources for the actual disaster. A cloud-based DR system can pay for itself with a reduced the use of these expensive resources.
In addition, any orchestration in a DIY system will be provided by user-managed scripts – with all the risks that come with them. Commercial DRaaS solutions include multiple levels of orchestration, allowing customers to specify multiple recovery groups and recovery orders within those groups. This ensures that core services needed by other systems are restored first, such as Active Directory or Domain Controllers. It also makes sure any recovery starts with systems that have a higher priority to the business. A commercial system can also provide more support for different types of workloads, including cloud workloads, on-premises workloads, or a hybrid of both.
It’s All About the Cloud
The bottom line is the ability to scale immediately without limits is why the cloud makes so much sense for DR. It offers the scalability and performance necessary without the huge costs we’ve all traditionally thought of for DR. Companies testing or declaring a disaster need unlimited resources and only want to pay when used, and that’s exactly what the cloud was made for.
It’s a marriage made in heaven – or at least in the clouds.