… and your data warehouse / data lake / data lakehouse.
A few months ago, I talked about how nearly all of our analytics architectures are stuck in the 1990s. Maybe an executive at your company read that article, and now you have a mandate to “modernize analytics.” Let’s say that they even understand that just putting everything into the cloud or using cloud tools doesn’t count as modernization. They want to take advantage of that new data cloth or data fabric or mesh or something. That’s OK. They’re on the right track. You can fix the vocabulary later. Take it as a win that your executive wants to move analytics forward. Now it’s your job to do it.
You start thinking about modern architectures like data mesh and data fabric, and the data products that are fundamental to them. You consider the gap between your current state and what you imagine as your future state. You begin to list the things that your team will have to do in order to make that leap. And then the ton of bricks hits you. So many fundamental disconnects. You shake your head to try to sort them out. You narrow them down to two big ones.
The first is the realization that everybody on the analytics team, at the company, and, frankly, most of the field generally has been quantifying analytics resources in the wrong way.
Data quantity and availability have been prioritized over quality, understanding, and usability.
The whole self-perception of analytics teams tends to sound something like, “Our data lake has 29 gazillion million petabytes of data from more than eleventy thousand source systems!!” Demand from business users tends to sound something like, “Just get the data into the lakehouse today and I’ll figure it out from there.”
Information management folks are left chasing the train as it speeds away and having conversations that tend to sound something like, “How much of that data do you know what it means or know what it’s supposed to contain?” “Some.” “Some? Like what percentage?” “Well, like maybe one percent maybe if you round up.”
You stare into the deadlights of a seemingly overwhelming reality that very, very few of your feeds, streams, or data sets satisfy the requirements for a data product.
It amazes me that we’ve managed to be reasonably successful. Analytics users at nearly every company with whom I’ve discussed this topic have referred derisively to their data lake as a data swamp, cesspool, or quagmire. The enterprise analytics team is often viewed with similar esteem, usually as a project critical path bottleneck. After all, the implementation of hundreds, maybe thousands of data feeds is dependent upon this single team. And the responsibility for all those feeds falls on the back of that same team. Would that be tolerated in any other domain? Of course not. This approach is not scalable, it’s not sustainable, and it’s not the best use of resources.
Which leads directly to the second realization:
Enterprise analytics architecture modernization cannot happen with an enterprise analytics team that is responsible for the care and feeding of the data warehouse / lake / lakehouse.
The typical enterprise analytics team evaluates requests for new data and works with the source systems to establish data file feeds and/or transaction streams. Sometimes the source system team pushes the data into the analytical ecosystem, and sometimes the enterprise team reaches into the source system repository and retrieves the data. The enterprise team then loads the data, manages the feed, and supports the users.
As far as the source system team is concerned, once the data is over the wall that’s it. Hopefully the data gets to where it needs to go. Hopefully it’s usable. And if there’s a question about content or quality, the response is always the same: “The data is good enough to run the business. It’s good enough for you. But if you feel you have to, submit a support ticket and we’ll get to it never.” In short, analytics is not considered to be within the scope of a source system team’s responsibility.
Modernizing analytics requires the material participation of the business process teams, and that will require an enterprise mindset shift.
Nuts. You knew this was coming. Your mandate to “modernize analytics” probably came with the “don’t adversely impact any other teams” constraint attached.
Many companies don’t have dedicated business process teams, but rather partnerships between a business team that defines the business process and a system team that implements that business process. And analytics is rarely on their radar.
The first job of executive management in analytics modernization is to make it clear that analytics is part of the business process teams’ responsibility – whether the consolidated business process team or the business/system team partnership.
The focus must be on people and process. Not technology. These business process teams, regardless of how they’re constituted, must be responsible for creating foundational data products, which include:
- Content
- Curation with definitions and expected content at minimum
- Transportation into the analytical environment
- Content monitoring
- Maintenance and user support
- Lifecycle management
And it can’t be “submit a ticket into a black hole.” It must be a priority. Think of analytics as another business process for which each business process team is responsible.
Meanwhile, the enterprise analytics team prepares by developing the tools and processes that will enable the business process teams to do their job with as little friction as possible.
The key here is “with as little friction as possible.” Analytics and information management teams are notorious for over-engineering processes. We love process. We love completeness and precision and accuracy. We want everything to be perfect. We need to let go a little.
Provide the tools and resources that allow the business process teams to analyze the data and to move it into the analytical ecosystem as easily as possible. Implement monitoring processes and applications that identify and expose errors. Encourage your corporate analytics community to freely make comments and corrections without six layers of review. Define processes that ensure compliance, but analyze each step to automate and accelerate as much as possible.
Centralize standards. Distribute implementation.
Let’s be super-clear. The business process teams are responsible for implementation. The enterprise analytics team provides the tools and processes and ensures compliance.
This transformation can proceed incrementally. Some business process teams will be easier to engage than others. But executive support will undoubtedly be required. Sometimes it will take something a little more assertive than support.
Eventually, your data warehouse or data lake will be populated with foundational data products. From the outside, it may look very similar or maybe even the same as it always has. But functionally it will be very different. Roles, responsibilities, and expertise are aligned. Processes are streamlined, scalable, and sustainable. The data is understood. Analytical insights are accelerated. And you have taken a huge first step toward modernizing your analytics architecture.