Click to learn more about author Stanley Zaffos.
As our abilities to create digital twins or models of our business and manufacturing processes, the things we build, weather patterns, games, traffic flows, voting patterns, DNA, and chemical and nuclear reactions, etc. have improved. So have our ambitions. We are now on the path toward creating a digital twin or model of the world we live in. Since the world we live in is alive and continuously evolving, it is not stateless. A digital twin cares about history (a.k.a., past states) of its real-world counterpart, which means that the amount of data that we store will continue to grow at exponential rates forever.
For those that question why we are building digital twins of our world, the short answer is that it enables us to experiment with things without ever building them. It also allows us to perform “what if” analyses that lead to better decisions. The most obvious example of leveraging historical data is the cybersecurity industry, which leverages historical access patterns and attack data to predict future intrusions. Other examples include the efficiency of designing automobiles or airplanes and running simulations; obvious positive benefits are relative to cost. Also, laying out new factory floors to optimize production flow and flexibility; optimizing traffic flows in new ship designs before they are built; managing the effects of droughts and floods; and building “smart cities” are on a short list of high-visibility, high-impact digital twins that have proven their ability to improve agility while saving time and money.
While these storage-hungry applications are great business for storage vendors, they do create user conversion and migration problems that only get bigger over time because users don’t want to recreate their world every time they do a storage infrastructure refresh. Hence the need for digital twin data to reside in a living, non-disruptive, self-renewing storage infrastructure that makes data migrations a property and data residing in it effectively immortal.
So, what are the new requirements that immortal data places upon storage infrastructure refreshes? None really, aside from scale. But that is a big deal because as the cost of storing, managing, and coaxing value out of data continues to drop organizations are choosing to store more data rather than cut their storage spend rates in the belief that at some future time this “unneeded” data will give them competitive advantage. Since the amount of time and resources needed to migrate data from older production arrays onto newer arrays is roughly proportional to the amount of data being migrated, it is only a matter of time before data migrations will become an ongoing process rather than a series of periodic projects. Add the need for 24X7 availability and cost effectiveness, and the need to rethink infrastructure design and operating visions becomes self-evident.
Unsurprisingly, building, managing and maintaining petabyte-scale infrastructures using arrays designed for a TB world has many complications. Established storage vendors with a near absolute need to maintain backwards compatibility have chosen to use data analytics and AI/ML to manage these complications; that is, they hide the architectural ugliness of their solutions rather than start with a clean sheet design that would force them to recompete for their install bases. Media-based vendors constrained by high media costs are relying on storage virtualization, archiving, and integration with the cloud to provide affordable petabyte scale, which adds its own complexity and complications. Darwinism and the inherent advantages of simplicity make inevitable the market’s turn to storage vendors focused on PB-scale, self-managing, affordable storage arrays that can be woven into an infrastructure that provides the illusion of data immortality.
Immortal Data Storage Requirements
Providing the illusion of data immortality requires a storage infrastructure that is always on, elastic, and infinitely scalable, self-managing, self-optimizing and priced like a utility. It is storage that is integrated with cloud-based analytics and AI to improve agility, staff productivity, and usable availability by reducing human- and software-caused outages, the root cause of 80 percent of all outages. It is storage that makes data migrations a property of the infrastructure instead of a time-consuming resource-intensive error-prone project, and it is storage that is “non-disruptively” self-renewing, meaning that hosted applications are unaware of infrastructure refreshes before, during, and after they occur. The operational benefits of an immortal storage infrastructure include:
- 100% availability
- Infrastructure refreshes that are non-disruptive and transparent to hosted applications
- Protection from architecture obsolescence
- The ability to always meet service level objectives
- Workload independent and repeatable $/TB/mo costs
The technologies needed to create immortal storage are not futurists’ fantasies. They are either already in hand or within our grasp and are listed below:
- Storage arrays that are self-managing; have no single points of failure (SPOFs); have non-disruptive everything: repairs, updates, and expansions, and do not need tuning to meet service level objectives until upgrades are needed
- Storage arrays that can non-disruptively replicate between generations of storage arrays
- Server software that maintains the links between applications and their data even as the data moves between storage solutions
- Opex pricing models that enable users and vendors to optimize cost structures even as they deliver workload independent and repeatable $/TB/month
Immortal Storage vs the Public Cloud
The benefits of immortal storage coupled with effective post-sales support, expansive ecosystem support, and consumption-based pricing models that encourage consumption together provide users with a cloud-like experience. Table 1 compares the benefits of public cloud IaaS (Infrastructure as a Service) to on-premises immortal storage.
Table 1
Notes
- *Cloud block storage is much more expensive than object storage, and managing cloud IaaS consumption is expensive
- **Cloud storage lacks the auto-tiering capabilities of modern hybrid (a.k.a. integrated mixed media) storage arrays, the foundational building blocks of immortal storage
- ***Immortal storage should be installed in a colocation (colo) facility to provide low-cost offsite archives
- ****When capacity growth plans exceed the reserve capacity of the storage infrastructure those plans need to be shared with operations.
Workload independent and repeatable $/TB/mo costs can only be achieved in two ways: Bill or charge to the worst-case situation, which inherently is not competitive, or build intelligence into the infrastructure that automatically migrates data to the lowest-cost media that can meet service level objectives. Since most users agree that at least two-thirds of their data is stale and the $/TB price delta between flash and HDD storage will stay in the 5:1 to 10:1 range through 2030, hybrid storage is not optional. It is the prerequisite to building affordable immortal storage infrastructures.
Ironically, as the operational and financial differences between on-premises and cloud infrastructures shrink and cloud providers expand into the data center, the decision to move applications to the cloud or to repatriate workloads from the cloud becomes more difficult and will rest on other considerations. These include having early access to AI and IoT platforms unavailable on-premises; the cost and disruptions of data center refreshes; managing the security challenges created by putting sensitive data on shared public infrastructures; and circumventing the problems caused by cloud’s siloed approach to STaaS, etc.
Budgeting for Software Refreshes
Without software that can translate the data into information that is understandable to humans and analytics and/or AI/ML/DL applications, immortal storage is nothing but an expensive room heater. Having software that translates the data into information is much more problematic because middleware and application software data models are not set by standards committees; new software releases may or may not create incompatibility; and the timing of major new releases changes is unpredictable. So, while budgeting for software refreshes and conversions is simple enough in concept, it is difficult in practice and further complicated by the following:
- Storage infrastructure and software refreshes each occur on their own timelines.
- The time and resources needed to extract, transform, and load data into a format that is compatible with an organization’s new software is difficult to forecast and often dependent upon outside consultants.
- Many organizations choose to run old and new systems in parallel for years to avoid the risk of breaking mission-critical applications and conversion expenses. However, this does increase software licensing fees and storage needs.
Conclusions
Large organizations with multi-PB storage requirements will, by necessity, move beyond technology and economics-based infrastructure refresh decisions to embrace the idea of a living storage infrastructure that provides the illusion of immortal data. The infrastructure mirrors the operational capabilities of cloud infrastructures and supports the transparent migration of applications and their data between on-premises and cloud infrastructures.
Building living storage infrastructures will cost no more than ad hoc infrastructures on a $/PB basis, especially if a user’s incumbent storage vendors are pursuing product developments focused on eliminating the short interruptions normally associated with migrating data from arrays scheduled for decommission onto new arrays.