“Every day, companies need to track and trace data movement processes through metadata discovery and data lineage,” emphasizes Amnon Drori, CEO and co-founder of Octopai. This task has not been easy and has often frustrated Drori throughout his twenty-year career. In the past, Drori’s teams have had to spend time checking and rechecking data, a cumbersome process of retrieving, understanding, and trusting data.
Case in point: about four years ago Drori and a former colleague of his looked at the same report and presented two very different reasonable numbers to the CEO. Puzzled, they investigated further. After two and a half weeks, the team found that the reports came from two different systems, using different rules and business processes to calculate the data in the report. The difference amounted to seven million dollars, which was very significant to the business.
Over time, this problem has intensified. He noted that organizations can have 30 or even 40 systems using different requirements and combining numbers differently. This number of data technologies “used to transform, manage, and shift the data only continues to grow,” stated Drori in a recent DATAVERSITY® interview. In addition, “the complexity between data applications has only multiplied, as they come from different vendors.” Drori asked:
“Why does tracking and tracing data always lead to a lose-lose situation? Four years ago, it took us a day. Now it takes two weeks. We end up doing the same work again and again, while not having enough manpower, meaning we end up delayed and behind.”
He concluded that there must be a smarter way to track data and manage data lineage through metadata automation.
Simplifying Data Management is Smarter
Drori mused, “while development has progressed in a data source and data visualization in reports, little has been done to manage data.” This puts Business Intelligence (BI) and the folks that ship, transform, and consolidate data, at a disadvantage. BI deals with the complex space in between data sources and reporting. There, the business needs to understand how data moves by “analyzing the metadata,” said Drori. “All this needs to happen, simply with the click of the button.” To get to this smarter type of Data Management, it must be broken down into three components:
- Horizontal or Cross-platform Axis: The Horizontal Axis spans processes from Extract, Transform, and Load) or ETL to database tables and views, to reporting and analysis tools comprising customer products. “Metadata can be centralized from different junctions, or steps where the data is moving,” stated Drori.
- Vertical Axis: According to Drori, the vertical axis answers the question, “in making a data world, where is the vendor incidental?” Companies transfer data using tools from multiple vendors (e.g. from SAS to Microsoft to IBM). The image below shows how the data transforms from the report to the data source, the vertical access.
The number of data sources does not matter in vertical lineage. Whether it can be trusted does, thinks Drori.
- Metadata Management: This third component considers multiple data events. Drori commented: “By automating Metadata Management, it shines in making data lineage available. This kind of Metadata Management is disruptive in an agile way, done much better and more innovatively.”
Handling Common Metadata Problems Through Metadata Automation
Metadata Automation for advanced data lineage requires understanding and talking about the problem an organization is trying to solve. Through asking a series of pointed questions, it is possible to discover which metadata needs to be found. The reasons typically come down to “four or five critical use cases,” noted Drori.
In one common scenario, a user does not see the expected results from a report. At this point, Metadata Automation can trace back how the data landed in the report. This means showing all the different pipes, junctions, transformations, and join points of how the data moved and where there are gaps.
Another problem, Drori said is that, “data elements change meaning, unintended.” He stated that this happens because:
“Most metadata is not recorded, documented, managed, stored anywhere. A typical company has tens of millions or millions of data elements without descriptions. To deal with this, organizations add the descriptions manually or try to fix it with data catalog applications.”
In contrast, Octopai captures metadata automatically, allowing companies to view metadata strategically. Drori related that these organizations can then address day-to-day operations through metadata discovery and metadata stitching. Octopai creates a map of metadata and is able to show complete data lineage, which can also be used for data cataloging.
Connecting Metadata to Lineage Through Automation
To help with common Metadata Management problems, automation must connect metadata to BI lineage within the larger business context, according to Drori. Key to this is understanding how to create reports for business users from the BI systems and the data catalog. Drori stated that this automation, employed to connect metadata to lineage, can be used “for other purposes, like Data Governance.” This has been “highly discussed in the past year and will be more and more,” due to the GDPR, said Drori. He believes that the GDPR has “forced companies to understand more about data lineage in addition to have a grip, very quickly, of where the data is located. This is where data automation takes place.”
Given the GDPR’s enforcement, timing is critical. Metadata Automation involves only one or two hours of a customer’s time to set up and run. Working within chaotic environments (say, 30 or 40 different systems) takes Octopai only a few days. Alternative manual or special custom developments would take more time and may not manage the metadata holistically.
“Data lineage taps into cross-organizational needs, not just a very specific one discovered within the last year,” noted Drori. The lineage designed for BI has been requested from other business units that span over the entire organization. “It does not matter where the user is located, whether in a business unit or in IT,” said Drori. The users will get verifiable and reliable information about how the data is moving across the data environments, using Metadata Automation.
Taking Metadata Management to the Next Level with Machine Learning
Reliable and verifiable Metadata Management has made significant process from Machine Learning applications. Machine Learning algorithms index metadata items and provide understanding about relationships between the metadata. Drori stated, “we can actually stitch the connections and then draw a map.” In the example below, a machine discovers the term “product” and its different permutations in the combined data systems.
Drori added, “Machine Learning happens in real time where it discovers changes to metadata elements, allowing users to enhance their understanding of the relationship between them.” Much of this happens behind the scenes, and the user just pushes a button to get data lineage analysis. Progression in metadata analysis advances by automatically tagging each data element as a whole, and also its parts.
Drori remarked:
“Metadata objects could represent a constant, formula, or calculation. Octopai’s applications get the meaning of it so fast that it can show a data element appearing 40,000 times and respond within four to five seconds. These programs extract, centralize, analyze, and tag metadata while creating the relationship between the metadata. This ability pushes Metadata Management over the top.”
The resulting increased in speed and accuracy, makes Machine Learning for Metadata Management an impressive new innovation.
Drori sees Metadata Automation as enabling customers to work more effectively. These clients adopt customized automated data lineage and feel their needs are met, from product creation to finish. In the end, Drori said there is a smarter way to do data lineage so that companies do not have to spend weeks tracing data and explaining differing reporting results.
Image used under license from Shutterstock.com