What are your outcome expectations of data lineage? No one’s just doing it for fun, after all. Generally speaking, data lineage is a major asset for: Regulatory reporting/governance; trust in decision-making; and, on-premise to cloud migrations.
Data lineage tools track business data flow from originating source through all the steps in its lifecycle to destination. Data lineage tools can also track technical data transformation logic. A visual representation provides an intuitive way to view the overall flow.
Familiarity with the foundations of data capabilities is the mandate of today’s CDO – and the challenge, too. According to an interview with Ramesh Nair, North America Financial Services leader at Accenture Applied Intelligence, the foundational elements that are keeping leaders up at night are extracting value from their current big data investments; preparing for the future; and Metadata Management, Data Quality, Data Governance, and data lineage. Yet, one survey finds that 66 percent of CDOs have not deployed Data Lineage.
Christopher Butler, the CDO for Asia-Pacific International Markets at HSBC has commented on the importance of data lineage in the heavily-regulated financial services sector. He related that parts of HSBC are building out the company’s data lineage program to have a granular view of data across the entire organization, enabling HSBC to extract important elements and identify aspects such as the owner of the data.
In its Magic Quadrant for Metadata Management Solutions, Gartner says that there is improved risk management and better assessment by decision-makers with regard to the impact of change within an enterprise – thanks to communication of a clear lineage for data and its use.
The Value of System Metadata Analysis
The ability to analyze system metadata – such as parsing complex SQL scripts, ETL configurations, or report definitions – to extract lineage and data flow information for tracing data paths covers the entire spectrum of which data can be input for calculating other data. This is because it processes every piece of logic. It is known as decoded lineage and can be used in concert with data similarity lineage (examining the data values of schemas to look for similarities without accessing code).
Among the companies that provide decoded lineage capabilities are MANTA, Octopai, and Spline. MANTA SVP of products Ernie Ostic offers more insight into the use cases for decoded lineage for the enterprise and its CDO, such as privacy compliance mandates. “We look at code, not data,” Ostic says. If an Oracle database was full of PII data that shouldn’t be there, regulators will want to know how it got there. “You need to be able to show them you have a full understanding of how it got there” – data lake, data vault, or data warehouse feeds and reports.
A decoded lineage tool can illustrate that.
“The problems and dilemmas facing those who need lineage are far greater than the cost of any software, especially for large banks with regulators looking over their shoulders,” he says. “The need to answer lineage and demonstrate full control and awareness of their data is huge for them to avoid fines.”
You need pure, raw trust in data for effective decision-making, but data is often suspect. Ostic paints the all-too-typical scenario of someone dealing with a key decision-maker who needs to know how something was calculated in a report. The leader suspects that the number is not right, and wants it proved via lineage to know exactly where it came from. The answer is usually ‘Let me check,’ he notes. It’s a long chain then, from finding the colleague who was thought to have written the report to learning that another person actually did it and then emailing right down to the mainframe person who runs the report at the end of every week… and so it goes.
“In a large enterprise you are crossing multiple groups,” Ostic says. “How long that takes has an implication for the executive trying to make decisions and conduct business.” Being able to automate the picture of lineage through program analysis can speed things up significantly.
MANTA can keep a time slice of lineage, too. For instance, if a company needs to see how a report was calculated on a certain date last year, a click of the mouse can bring the user to see that lineage compared to the current lineage to find what differences there may be.
Decoded lineage also relates to cloud migrations in terms of reducing the time and resources dedicated to coding decisions. One financial organization MANTA worked with needed to tackle moving data from ground to cloud. It had about 2,000 tables in Microsoft SQL Server and DB2, plus a couple of hundred reports and 73 other ETL programs. When the critical reports were reduced in number to seventeen, the firm needed to do lineage on only those reports. By scanning the metadata and reading all the SQL programming code and logic stored in it, MANTA could create a detailed visualization of the data lineage to scope the migration and condense the timeframe.
In addition to MANTA’s standalone tool, it has the capability to support users who have invested in a catalog tool like IBM’s Information Governance Platform, Collibra, Informatica, TopQuadrant, or others to optionally package lineage and push it through their native API.
“Lineage is hugely important to customers,” Ostic says. “It’s about what can be done with it, not just lineage itself.”
Image used under license from Shutterstock.com