An apt metaphor for data lineage is that of complex book keeping. But, instead capturing financial transactions in a ledger, data asset values populate a given platform or document. Data lineage describes data origins, movements, characteristics, and quality across the data lifecycle. Typically, data lineage has been thought of as map of tables and joins, to guide what SQL to use for selecting, summarizing or grouping the data in a data warehouse. With the increased velocity, volume, and variety of data sources, data lineage has become more complex. In addition, the growth of self-service business intelligence now has specific lineage requirements to be effective. So, meaningful data lineage needs to contain multiple dimensions including the use of metadata. Data lineage can be broken down into:
- Business Lineage: The who, what, where, why, and how of the business data. Business lineage reports show a simplified view of lineage that highlights the transformation and aggregation of data that is needed by a business user
- Technical Lineage: Shows the flow of physical data through underlying applications, services, data stores toward developing, updating and maintaining a Data Architecture.
Other Definitions of Data Lineage Include:
- A document on “how data under analysis is acquired or created by in an organization,” in addition to how the data flows. (DAMA DMBoK2)
- “An essential tool connecting data across an organization.” (Amber Lee Dennis)
- “Shows movement of data through a job or multiple jobs.” (IBM)
- “Understanding of data’s source, parentage, and journey in the data warehouse.” (OReilly)
- “Data coherence, connection, and organization. Metadata describes data adequately.” (TechRepublic)
- “Understanding of data from its inception to its current state.” (MITRE)
Data Lineage Use Case Examples Include:
- Creating “a world class process and strategy to automate the data forensics and resolve regulatory requirements across the organization”
- Complying with the European General Data Protection Regulation (GDPR)
- To understand biases in reporting elections, e.g. the Chicago Daily Tribune “Dewey Defeats Truman” example
Businesses Need Data Lineage To:
- Agree with the law.
- Track data assets.
- Validation of data usage and risks that need to be mitigated.
- Know now how the Data Quality of an enterprise is affected.
- Analyze business impacts
Image used under license from Shutterstock.com