Advertisement

Big Data at 10: Did Bigger Mean Better?

By on

Click to learn more about author Mathias Golombek.

If this time 10 years ago you were working in data and analytics, something was about to happen that would go on to dominate a large part of your professional life. I’m talking about the emergence of “big data.”

I recently read that the origins of the concept can be traced to 2011. Apart from reminding me exactly how long I’d been in the industry, this gave me reason to reflect on what, ultimately, all the fuss was actually about.

Big Data or Big Diversion?

I was (and remain) skeptical about how big data is usually described. I felt the best approach was to think of big data as a by-product arising from the digitization of entire businesses, not a defining feature or driver of this transformation. But many businesses were instead encouraged by analysts and consultants to believe they could achieve transformation by taming the technical challenges associated with their data’s size, either directly or indirectly.

I wrote about this in 2015, if you want to dive into this gripe.

The resulting obsession with the technical challenge meant that for years big data was directly associated with Hadoop, Spark, and data lakes brimming with unstructured data. These technologies were going to conquer the world, but in the end, most projects failed or didn’t deliver the expected value. Today, it’s generally accepted that structured, relational databases provide a more suitable core for data analytics and data applications.

This thinking around big data led to two problems. First, making it more affordable to store huge piles of data led companies to create huge data lakes and reservoirs that they struggled to extract much value from. As a consequence, these data lakes became passive storage resources rather than unified data layers and the source of new insights.

Second, the focus on technology distracted organizations from the real challenges of establishing a data-driven culture. Large teams were hired to handle complex big data technologies. But that turned these groups into development teams who focused on technology-level implementations rather than transforming the company. It took longer than it might have done to establish that the key for a data-driven culture lies in the simplicity of technology instead of complexity; in establishing the right data strategy, securing leadership’s commitment, user education, and setting standards, processes, and rules of engagement.

OK, Fine – It Wasn’t All Bad

We are still very far from a state where companies are routinely data-driven or data-transformed. However, while it has failed (for now) to fulfill its promise, big data can take credit for a lot that is great today in the world of enterprise analytics.

Organizations have created entire strategies and teams specifically to manage data as a corporate asset – often under a dedicated executive leader. None of that existed before the trend for big data. There have also been fundamental improvements to the ways large data sets are turned into insights, used to automate decisions, or used to drive optimizations.

Thanks to the experience gained and the foundations laid during the big data journey, it’s also never been easier to integrate diverse data sources. Improving Data Literacy, Data Quality measures, Data Governance, data catalogs, and security – we have the big data movement to thank for all these developments.

Where Next?

Still, my wish for the next decade would be for data and analytics, and the professionals working in these disciplines, to break free of any associations with big data. Small, medium, or simply statistically sound samples of bigger data sets are just as capable of creating amazing use cases, informing important decisions or optimizing processes.

After putting the big data era’s focus on complexity behind us, I’m looking forward to seeing how today’s focus on simplicity plays out. For me, the convergence of Data Science, relational databases, and business intelligence should finally be able to deliver on the promise of data-driven teams and the cultures in which they thrive. (This is important because Data Science must be operationalized and made available to standard data users if it is to avoid becoming a repeat of the Hadoop world and its esoteric sandboxes.)

In a similar vein, it would be truly transformational to transfer data ownership to the business so that central data teams are no longer a bottleneck for projects and applications. Interesting possible paths to this outcome include the breaking down of centralized data architectures in favor of distributed models, such as data mesh designs.

It’s also still too complex today to create data applications, so it’s necessary for the industry to accelerate the development of tools and techniques that empower organizations to do more with data without specialist skill sets. Hyperautomation trends related to RPA, AI, and machine learning all promise a great deal here, around both how we work with data and how we use it to digitize a wider range of company processes than today. For example, how long before we can create digital twins of entire organizations through which we can constantly optimize processes and operations? 

And, finally, if it’s not too much to ask: I hope that this is the last time for at least 10 years that I have to write a blog post about big data.

Leave a Reply