Click to learn more about author Mark Bower.
There are very obvious indications that the shift to contemporary cloud application strategies, and, in particular, modern container workload models, is accelerating even more during the current pandemic. This is driven by the cost benefits of re-platforming core IT to enable accelerated change to market conditions and deal with competitive threats imposed by new market entrants. At the same time, data security risks continue to be uncovered that are significant blockers to maximizing the benefits anticipated from the digital transformation journey. A recent survey put a spotlight on the risks this journey imposes — noting that 79 percent of organizations suffered a significant cloud breach of some kind in the last 18 months.
The Move to Digital Transformation
While digital transformation is a catch-all marketing phrase that’s often thrown around, and cloud projects are not new, it’s only been in the last 2-3 years that enabling transformation technology and open, de-facto cloud platform standards have emerged that make it much simpler to make the move.
We saw the rise of container orchestration dominated by Kubernetes and the increasing utilization of serverless functions for everyday data processing utility. AI and machine learning services for automation and insight have flourished with technology advances and the available compute and storage suited to these data-hungry applications. These new foundations live out the CTO’s vision of building intelligent, agile applications that can truly run in the private and public cloud easily, uniformly, and automatically without change for a few bucks of compute spend. What used to be the domain of sophisticated mission-critical hardware for resilience is now shadowed by today’s self-healing, auto-scaling, continuously scalable IT stack.
However, compliance and data leakage risks can temper their enthusiastic adoption when the workloads have sensitive data, as they inevitably do. CISO concerns vary but span exploitable vulnerabilities — human error from the sheer complexity and high rate of change and short software lifecycles, insider threats, and continued external attack through exploits, social engineering, and malware injection. Lastly, it’s not always a central purchase that creates risk, with empowered business units under pressure developing what they need, when they need it, on what best suits competitive market objectives.
Self-Healing and Failsafe Don’t Mean Secure
All security practitioners expect vulnerabilities in all software, and modern container ecosystems take away the pain of traditional patching with more streamlined service and container image updates. However, when the issue is in the foundation stack itself, it introduces a more worrying risk and a concerning gap. Short-term fixes and reconfiguration may not be so easy to deploy or may only reduce severity vs. full mitigation, which isn’t an ideal solution in the eyes of a regulatory audit or against acceptable internal risk posture.
For instance, a recently reported bug in the Kubernetes container networking in some distributions highlighted exploit risk, exposing a “man-In-the-middle” attack to affected networking components. The recommended workarounds don’t prevent all attack vectors, leaving a window of risk to then accept in the most fundamental level of the infrastructure that might well be driving the top revenue-generating app for the business, which is unlikely to be stopped. Interested readers can review the details here in the Kubernetes project — issue 91507. There’s no doubt this will be addressed in time, but it illustrates that self-healing and failsafe don’t necessarily mean secure, and the new world still has old risk problems.
Cloud Complexity
Talking to Chief Data Officers and CISOs, another big issue with cloud is the complexity, which is well recognized now, especially in the container world, with the rise of entirely new tools to attempt to risk-manage this. We’ve seen recent breaches caused by configuration issues. I can’t count the number of times I’ve read about millions of records breached from a misconfiguration of a relatively simple cloud system like Amazon S3. This type of breach was highlighted in the 2020 Verizon Data Breach Report. Consider now the far more complex service mesh architectures like Envoy and ISTIO, which are hugely powerful but hugely complex. The same goes for distributed storage where sensitive data may be processed in these ecosystems, and the risk goes up exponentially, particularly when hybrid approaches are used, and data is burst from private to public clouds with different security control frameworks and subtle configuration variances.
Then there’s the question of the native controls. Technology stocks like Kubernetes have amazing workload orchestration capability, but the security controls revolve around more traditional strategies — granular network controls/firewalls, transport encryption (which routinely uses risky self-signed certificates, another point of risk and a topic unto itself!), and some basic data at rest protection, all depending, again, on configuration.
The Role of Data Tokenization
What’s missing from all of this is a unifying and risk-neutralizing technology and architecture approach that’s been proven in complex data processing ecosystems — data tokenization. This is effectively a way of replacing the sensitive data elements with replacement data that behaves in a similar fashion in the application or network but has no intrinsic value if lost or compromised. Architecturally, the approach is to tokenize as soon as data is collected, which could be on ingest into a cluster, and only detokenize with tight control by a limited allowable set of processes. This inverts the security model of “Protect everything” to “Protect what is most valuable by neutralizing it.”
If we consider the risks above, even with configuration issues or exploits and attacks, when data is tokenized sufficiently then systems can operate in a lower state of trust without the pain felt from making risk tradeoffs over keeping the apps running or opening a window of attack and putting data — and potentially privacy — at risk.
Tokenization has been an evolution unto itself. When it first arrived a decade ago, tokens were created via databases and indexes. That’s fine for a small, low throughput scenario like straightforward payment card data, but today’s AI and machine learning pipelines, ingest feeds, and IoT scale capture put an instant strain on such approaches and render a juicy honeypot with all the sensitive data in it — not a realistic scenario.
Fortunately, modern stateless tokenization based on standards like ANSI X9119-2 can yield powerful container and service compatible tokenization that can be built right in. The upshot is that higher risk technologies that have high benefits, like container and Kubernetes cloud architectures, can be adopted and embraced with full agility without fear of attacks to data or accidental loss. Tokenization is, in effect, the C-suite’s best friend. By building it into the developer’s toolset, powerful data security can be built-in, and a secure digital culture can be fostered from the beginning. Of course, good best practices dictate that in-depth defenses should always be used, but tokenization — when it can scale and snap-in to modern ecosystems — is an essential component of modern digital transformation.