
The idea of managing data at tremendous scale is hardly new. Most businesses embraced the concept of “big data” and associated technologies (like data lakes) at least a decade ago.
However, the adoption of modern AI technology has introduced major new challenges to the big data world – so many, in fact, that it feels like we’ve entered a “second wave” of large-scale data management and modernization. The technologies and practices that have been used to manage vast volumes of data over the past 10 to 15 years can no longer keep up with the demands of AI.
As a result, businesses seeking to build the data infrastructure and practices necessary to take full advantage of AI must fundamentally rethink their data management strategies. In effect, they need to modernize their approach to data all over again.
The Challenges of Managing Data at Scale
Thanks to the “first wave” of data modernization and big data technology, the typical business became adept at managing vast quantities of data. For example, many organizations built data lakes in the cloud, where the ultra-low cost of storage meant that they could essentially store all of their data forever.
That is a valuable practice in an era where data has become the “new oil” – the more data that organizations have to work with, the more insights and value they can create.
The problem, though, is that simply building a large-scale data infrastructure isn’t always enough to unlock full value from the data. Often, businesses don’t properly secure, integrate, or clean all of the data that they dump into their data lakes. As a result, the lakes become, at least in part, data swamps – meaning that the information they house is poorly organized and managed.
How AI Exacerbates Data Management Challenges
During the “first wave” of big data – which is to say, between the late 2000s and the late 2010s – these sorts of issues were manageable enough. It certainly wasn’t ideal to have some data that was low in quality or lacked proper access controls, for example, but it wasn’t the end of the world. In general, it didn’t prevent the typical company from deriving value from the data that it did manage effectively through traditional analytics processes.
Modern AI technology, however, has changed this. When businesses want to use big data to power AI solutions – as opposed to the more traditional types of analytics workloads that predominated during the first wave of big data modernization – the problems stemming from poor data management snowball. They transform from mere annoyances or hindrances into show stoppers.
As an example, consider what happens when a non-technical employee wants to pose a question and receive an answer based on the data owned by the organization. Ten years ago, this process would likely have involved writing and running an SQL query to analyze information and pull out a result. Because that process was technically complex, it would have required assistance from technical teams, who would have helped work around any challenges created by data quality or security deficiencies.
But in the age of AI, this process would likely instead entail giving the employee access to a generative AI tool that can interpret a question formulated using natural language and generate a response based on the organizational data that the AI was trained on.
In this case, data quality or security issues could become very problematic. The AI tool might generate a response that is inaccurate because it was trained on irrelevant data, for example. Or, it might expose information that the employee should not be able to view because access controls did not factor into the training process. And because the employee is accessing the data directly with the help of AI, there are no engineers in the mix to create guardrails or smooth over any problems with the data.
This is just one example involving an AI use case complicated by data quality and security issues. But other challenges can arise, too, when managing data in the age of AI, including the possibility that multiple versions of the same document could exist, without a way for AI to understand those differences or know which version is the most valid.
Managing Data Effectively in the AI Era
Now that we’ve explored the data management problems organizations face in the age of modern AI technology, let’s talk about solutions.
Unfortunately, there is no magic bullet that can cure all the types of issues I’ve laid out above. A large part of the solution involves continuing to do the hard work of improving data quality, erecting effective access controls, and making data infrastructure even more scalable.
As they do these things, however, businesses must pay careful attention to the unique requirements of AI use cases. For example, when they create security controls, they must do so in ways that are recognizable to AI tools, such that the tools will know which types of data should be accessible to which users.
To help with these processes, organizations may consider adopting certain types of tools that haven’t always factored into data management in the past, such as:
- Data lineage tools, which track where data originated and how it has evolved over time
- Tools that expose data products as APIs, making it easier to access the data in a flexible, scalable way
- Data discovery tools, which can help locate data assets (especially unstructured data assets) an organization may not know about or may not be properly managing
- Version control software that excels at keeping track of multiple versions of the same data. Although these tools have historically been used mostly to manage code, they are also valuable for managing unstructured data (like documents) that evolves over time
When paired with more traditional data management tools, like data lake platforms, these types of solutions empower businesses to thrive in the face of the new wave of data management challenges.
Conclusion: Embracing the Second Wave of Data Modernization
The changes currently taking place in the realm of data modernization are just as momentous as those that transformed data infrastructure and management practices when the big data concept first appeared on the scene more than 15 years ago.
Yet the stakes, arguably, are even higher today than they were then. Today, modernizing your data is not only important as a way of enabling basic analytics or helping correlate different types of information. It’s also critical for unlocking all the powerful new innovations promised by AI, which promises to become the key factor separating “winners” from “losers” in the realm of business.