Click to learn more about author Kendall Clark.
What if data really is the fuel that powers engines of insight? Imagine a world exactly like ours in every way except for one key difference. Instead of the modern fossil fuel supply chain (i.e., Big Oil) being created before (or roughly coincident with) the mass production of automobiles (i.e., Ford, Volkswagen, etc.), the technology underlying actual race cars was perfected first. So, in this world, you get the Ford vs. Ferrari battles before there’s a gas station on every corner. In this world, you still have the thrilling 24-hour race at Le Mans and so on, but the fuel that powered those beautiful machines would be an artisanal, bespoke affair, roughly like the production of craft beer in Brooklyn or burritos in the Mission District.
That world might be unusual, but it’s perfectly conceivable. Here’s the point of my thought experiment: It’s a perfectly apt analogy for the actual world– our world – with respect to IT, data, and analytics. In this thought experiment by analogy, the “beautiful machines” (i.e., the race cars) are our actual ML- and AI-powered analytics systems and the artisanal supply chain for gasoline is data integration. Data integration in our world is, pardon the pun, relatively crude. We have cutting-edge engines of insight, but they are still powered by the same kludgy data integration systems we first started using in the 1970s. To adopt a shorthand, they are still powered by storage-level physical consolidation of data; they’re still powered by data location.
Surely there’s a better way? Of course there is. Since data really is the fuel that powers our beautiful insight machines, we better start acting like it; that is, we better move data integration from the storage layer to the computation layer, and we better leverage data meaning rather than just data location. My thought experiment describes our status quo and it’s not great. It’s workable but far from ideal. But the very near-term future looks much more troubling.
Four Reasons to Be Concerned
First, we appear to be nearing the end of a long cycle of data centralization and consolidation. The stunning success of cloud-based rejuvenations of old strategies like traditional data warehouses and ELT suggest the capstone rather than the beginning of a new era. If you look very closely you can see the turn starting to occur. I’m aware of at least six important startups in the data analytics and integration space that are pitching a near-term vision where data is integrated in a new way; all of them share the general view that, as I put it above, data integration needs to be relocated from, exclusively, the storage layer to the computation layer too.
Second, there is growing discontent at the business implications of this era of consolidation. Those implications include asymmetric, unrealistic cloud costs. For example, ingress fees are free and egress fees are growing consistently. They also include worry about vendor lock-in and growth in cloud costs generally. In some specific markets, they include worries about competitive fairness. Enterprise customers are beginning to push back broadly.
Third, data repatriation is on the rise and that’s a sign that cloud-based Data Management will need to take account of this counter-cyclical trend. It’s tempting to dismiss repatriation by just looking at its volume and comparing it to the volume of data assets going the other way – that is, going into the cloud. I think that’s a mistake, since what matters less than where any particular data set lives is the much more crucial question: How many distinct data environments does the enterprise have to take account of in order to fuel analytics with data to generate insight?
Fourth, and perhaps most compelling of all, that question we just asked is actually quite generalizable. That is, the number of cloud or data environments is increasing, not decreasing. I’ve lost track of the number of companies that are claiming to be “cloud vendor #4” and soon the action will be for #5 and so on it goes. An enterprise cannot simultaneously claim to take data seriously as a key strategic asset while also settling on a data integration strategy that is unable to consider data that happens to live in the “wrong place.” The hybrid, multicloud era is well and truly upon us. The main implication of this era is that “the wrong place” doesn’t make much sense anymore. The center cannot hold; or, when it comes to data, everything is the edge now.
What Should a Modern Data Supply Chain Look Like?
I mentioned earlier that there is an alternative; in fact, there are viable alternatives in the marketplace now, contending for what the near-term future will look like. I’m alluding here to contemporary developments like data fabric, data mesh, and so on. Rather than pitching a particular solution, I want to conclude by discussing the three crucial components and constitutive parts that create a viable solution:
- A future-proof data integration capacity must take account of, in a really deep way, the hybrid multicloud era. That is, data is everywhere and all that data matters more or less equally. A data integration solution that’s optimal for one rather than for many data environments is not compelling.
- Data volume is growing and will never stop. According to IDC, the planet will create 59 zettabytes this year; 90% of that will be replicas and copies. But network performance, which has never been subject to Moore’s Law, is flat and, in the context of data volume growth, declining. Moving all the data around over slower and slower (relative to data volume) networks is a mug’s game. All of this means that moving data to computation, which is what we overwhelmingly do today, cannot scale. Rather, we need to start moving computation to data. That’s not necessarily simple, but it will scale.
- What is ultimately most strategic is less data location and much more data meaning. While there are a few exceptions, what matters most is what some piece of data means in the context of some business and the rest of the relevant data. What matters very rarely is what storage system contains that bit of data. Data meaning is relatively stable, at least compared to data location. Data integration systems that leverage what data means will free us up to decouple meaning from location, and that would be revolutionary.
Data fabric approaches are the future of enterprise data integration because they are aligned with the salient features of the data landscape today and into the future. Data fabrics that connect data without consolidating it in some storage layer, and do so based on what the data means in context, are able to act like the world in which data as fuel is cheap, ubiquitous, and plentiful. The only way to win the future is to generate better, faster insight than the competition, and the only way to do that is to start integrating data like it’s the most strategic enterprise asset … because it is!