If you’re lucky, you’ve only had to worry about managing a data disaster recovery effort once or twice in your career, if at all. However, as the rate and number of natural disasters have increased, the chances of needing to navigate through a worst-case scenario have risen.
As of April 11, 2023, the U.S. had already recorded its highest number of tornadoes for the first three months in a year. Meanwhile, according to the National Oceanic and Atmospheric Administration, the 2022 hurricane season produced some of the strongest and most damaging storms in recent history.
Both tornadoes and hurricanes can be deadly and destructive, and the impacts they have on technical infrastructure can vary dramatically, so the disaster preparedness and recovery efforts required for each situation are quite different. Organizations can prepare for a hurricane, often up to more than a week in advance; valuable time that allows for mission-critical business continuity preparations. Tornadoes often strike without warning and can leave you scrambling to pick up the data pieces unless you have a regional disaster recovery strategy that allows you to resume business activities quickly.
Having a regional disaster recovery plan in place has never been more important given the increase in and severity of weather- and climate-related events like these. Most organizations would benefit greatly from some form of asynchronous data replication that allows data to be stored safely at a remote unaffected location and access to that data to be made available. They should also be able to restore services rapidly without any detrimental impact on their applications or business needs.
Kubernetes is designed with a fault-tolerant architecture in mind, which ensures that applications deployed are highly available. Partner-developed tools can usually integrate seamlessly into Kubernetes deployments and enable additional functions such as persistent data management, application state awareness, and remote cluster connectivity for backup and recovery actions. Each of these feature sets is seen as necessary when attempting to develop an appropriate disaster recovery solution.
Reducing RPO and RTO
Disaster recovery is often measured in terms of Recovery Point Objective (RPO) and Recovery Time Objective (RTO). With RPO, the goal is to have backed-up data be as current as possible so that the potential for data loss during an event is kept to a minimum. RTO is the maximum time that services can be unavailable before critical business systems become affected.
In the case of a forecasted weather event, such as a hurricane or blizzard, these factors don’t matter as much, as data and services can be preemptively failed over to the remote site to ensure that services will not be affected. In the case of an unexpected weather event that disables services at your primary data center, you want to be able to restore those services as soon as possible, with the loss of as little data as possible.
This is why it’s an excellent idea to have a regional disaster recovery site that’s close enough to your primary data center for rapid asynchronous data transfer, but far enough away so that the disaster itself does not impact it. In a well-designed disaster recovery solution, you should be able to resume normalized operations with your most recent data files in a matter of minutes. Ideally, it would seem as if there were no interruptions to services at all.
While proximity to ancillary data centers certainly helps, your efforts must focus on more than just transferring files, restarting applications, and reloading data. You must be able to replicate configuration files, objects, custom configurations, and application namespaces across geographically dispersed sites, inherently everything your applications need to function correctly.
Portability, Resiliency, and Automation
As an open-source container orchestration platform, Kubernetes is by nature built for portability and mobility. Deployments are not tied to a specific location, and applications (and all of their corresponding data) can be made easily portable and then replicated between sites.
As previously stated, Kubernetes is also remarkably resilient. If an application fails to respond, the platform itself will continue to try to run the application by spawning additional pods in the application deployment on other nodes in the cluster every few minutes.
After you’ve assessed the damage and remediated the initial impact of the disaster, the accessibility provided by the Kubernetes API makes it easier to automate a return to functionality. Using the API functions that are available in Kubernetes natively, system administrators can easily redeploy applications or transfer data between clusters as needed.
Back in Business
As the severity of weather events grows, so does the chance that your business will experience some form of outage. As the old saying goes, it’s not a matter of if, but when.
Be prepared when it does. Build a disaster recovery plan for the applications that your organization has deployed on Kubernetes and get your organization back in business quickly.