DataOps is something that has been building up at the edges of enterprise data strategies for a couple of years now, steadily gaining followers and creeping up the agenda of data professionals. And many believe it could now finally be about to enter the mainstream.
The number of data requests from the business keeps growing and shows no sign of slowing – and these requests are increasingly complex and changeable. So DataOps’ promise of reducing the analytics cycle time between idea and insight, from business need to the creation of models and charts, could be exactly what pressured data teams need.
But if that’s to be the case, the first question on the lips of many will be what is DataOps and where did it come from? And how, exactly, can it drive forward enterprise data strategies?
What Is DataOps?
DataOps’ intellectual heritage has been expertly covered here by the DataKitchen team, but to summarize, it is more than simply DevOps for data.
It certainly draws heavily on DevOps to optimize code builds, improve Data Quality and reduce time to market on analytics applications. Like DevOps, it also promotes the adoption of automated tools, and it shuns siloed teams in favor of cross-functional groups.
But it also introduces agile methodologies to enable data teams to respond more quickly to business demands, iterating on releases to enable rapid intervals of innovation. And it even has roots in lean manufacturing, leveraging statistical process control (SPC) to automate data pipeline management and monitoring (the “ops” side of analytics). In doing so, it introduces continuous Data Governance and quality assurance – crucial to accelerating production intervals.
By bringing all this together, the goal of DataOps is to allow users – typically data scientists or analysts in this context – to spend more of their time building, deploying, and improving models and visualizations, and less on data engineering and tooling concerns.
Machine Learning Drives DataOps’ Recent Popularity – but It Has Wider Appeal
So, that’s the theory. But what does DataOps enable in practice?
From my observations with customers, DataOps’ rise is increasingly linked to that of machine learning and the need to operationalize models to accelerate time to value. By bringing data scientists, software engineers, and related disciplines together on cross-functional DataOps teams, machine learning models benefit from expert operational support post-deployment. This makes it easier and faster for data scientists (who are not natural software engineers) to ensure models are deployed as intended and to provide direction on maintenance, updates, and performance optimizations.
But more generally, DataOps offers advantages that will benefit any analytics function and use case.
As data volumes and types continue to grow, and the number of data users in organizations increases too, Data Management and delivery tend to be the main bottlenecks in the time-to-value cycle. So by riffing off DevOps to tighten the interplay of people, process, and technology – exactly as DevOps has done for software development – it can dramatically improve Data Science productivity.
Following that story through, if it improves the delivery of high-quality data and trusted insights, then DataOps has real potential to drive data democratization and self-service analytics.
DataOps Is Inevitable, at Least for Larger Organizations
DataOps is not a small undertaking, considering the way it impacts teams as much as tooling. (People are always slower to change than technology.)
But data volumes continue to grow, and demands from the business are also increasing. Indeed, our own research recently revealed that more than two-thirds of business and technical decision-makers report receiving a higher number of data analytics requests from multiple business departments. These respondents also expect that demand from all areas of the business is likely to increase in the future.
In this context, most data leaders know that a more formalized yet flexible approach to wrangling data and deploying models is badly needed. It is not sustainable for data organizations and IT to provide daily and direct support to any and every business group.
The shift to DataOps is perhaps inevitable for most organizations, and almost certainly for larger ones.