Being able to trace data from its origin to its destination is no longer a nice-to-have. It’s a must-have if you are to govern data — and of course you’ve got to govern data.
Without metadata, data lineage can’t exist, and if data lineage can’t exist, neither can Data Governance. Metadata brings together assets and processes and the people who use them and is able to show where a particular data element came from. It’s also the backbone to make sure data is appropriately and intelligently used, both to meet regulatory requirements and to efficiently and effectively perform the work of the business.
And, according to data lineage automation vendor MANTA, one of the biggest holes in the Data Governance market around data lineage is the failure to scan lineage information from programming code.
“There are lots of Data Governance and Data Management solutions that market data lineage capabilities, but it turns out there are some gaps,” said Jan Ulrych, Vice President of Presales at the company, which automates data lineage for impact and root cause analysis, risk and compliance, data migration, enhanced governance, and agile data warehousing.
A miss on scanning the lineage information from programming code data integration and ETL and reporting tools — where a lot of code exists — blocks having a full understanding, which blocks companies from answering the important question of lineage, he said.
Also, keep in mind that many systems implemented years ago are still around. “We see a lot of companies that still use a lot of Cobol — a lot of custom-coded applications,” according to Ulrych. They still have important information that works for the people who are still using these systems, and that has to be accounted for. Sometimes these systems are decommissioned but not that often, he said.
Know the Flow
“You have to be able to look at the whole flow to see where everything is going, where it’s changing,” he said, and he thinks more companies want the ability to automate harvesting data lineage from anywhere. They want that to become integral to the design of systems.
“On the technical side, it’s the variety and diversity of the technology landscape, and the volume of the code that’s out there that matters to the data lineage challenge.”
There’s no way to harvest this data lineage manually. You’d have to hire a whole team and they’d work their entire lives — and they still might not finish the job.
The alternate would be just to do things selectively — to not go that deep. Either way, Ulrych said it’s not repeatable.
Virtues of Automation
Organizations often request help with business management. They want to understand the data flow in more high-level terms, in business terms and its assets.
Self-service, of course, has become embedded in the enterprise, and being able to automatically track data lineage clearly has at least two big benefits. One of them is providing immediate access into where the data comes from to create trust for making data-based decisions. And it is also important to add speed to the process. Business users and analysts don’t have to go around to IT and other LoBs to get an answer to a question about where a number comes from, because now that information is immediately accessible to them
Data lineage also can assist data migrations; it’s possible to understand the dependencies between various data sets and to manage a migration in phases. “Instead of doing a big bang approach you can easily and very nicely work in a controlled way,” Ulrych says. MANTA’s customers take a measured approach for these and other use cases where data lineage is important.
MANTA also partners with all Data Management and Data Governance solutions to actually provide them with this information. It has out-of-the-box API connectors, so it’s just a matter of configuring them for the other solution, he says. Sometimes additional metadata is being pushed over to those tools.
Here Today, Here Tomorrow
In June, MANTA issued its 3.25 update that supports Microsoft Excel, IBM Cognos, and SAP (Sybase) PowerDesigner. MANTA explains that many customers have “disturbing amounts of data in Excel databases, which was more challenging for our customers to include in their data lineage.”
With the new version, organizations can track data lineage from database tables and CSV files through tables and graphs among multiple Excel workbooks and then can push everything into a native visualization, as well as into third-party solutions.
For Cognos, Ulrych says they created complete end-to-end data lineage from the database data sources, analytical models, and reports in Cognos Analytics by scanning reports, interactive reports, Framework Manager models, and database connections and then pushing that into its visualization tool.
With the third connector to the modeling tool SAP (Sybase) PowerDesigner, MANTA can scan PowerDesigner and automatically pull physical, logical, and conceptual models that can then be added to a company’s data lineage to create end-to-end logical data lineage, the company says.
The focus for the next couple of years, as Ulrych sees it, is to really bring together the business and technology worlds for data lineage, so that everyone is looking at and working with the same data.
“So, if a business analyst asked how this report is built and what it’s based on, IT is looking at the same information,” he said. “What we believe is the right way to do this is to build the business lineage as an abstraction on the physical one.”
Image used under license from Shutterstock.com