Click to learn more about author Mandy Chessell.
If you have bought data tools from different vendors, you have probably noticed 3 things:
- Although each tool adds value, no single vendor can support all of your organization’s needs.
- Each tool starts “empty” – with no knowledge of your data landscape. Only through use does it build up knowledge about your organization’s data.
- The knowledge of your data can not easily be exchanged between tools unless they have been designed as a suite.
The knowledge about your data that drives your data tools is called metadata (literally, “data about data”). Depending on your tools, metadata can cover details of a data store’s location, the technology hosting the data, how the data is structured, where it came from, how it is classified and governed along with business definitions and use.
Although there are many good standards for metadata, each covers only a narrow aspect of the metadata you need, such as describing the format of a particular type of data, or the structure of a business rule. There is no single standard that covers all of the metadata needed by an organization to full understand their data, manage it and get value from it.
Data tool vendors have needed to create their own data structures and formats for metadata, and as a result they are all different – even though they essentially contain the same information. Each vendor then has to build their own bridges and connectors to your data platforms to locate and extract as much information about your data as they can and convert it into their tools metadata format. This process is expensive, both for the vendor and for your organization.
The Egeria open source project provides open interfaces and metadata exchange capabilities to allow metadata from different vendor’s tools to be automatically captured and synchronized. The aim is to create a consistent view of your data through the all tools that you use, whilst reducing the cost of metadata capture and management to data tool vendors.
The open metadata and governance capability of Egeria has been incubating as part of the Apache Atlas project, an open source metadata repository designed for the Apache Hadoop ecosystem. It has now reached a level of maturity that means it qualifies as its own open source project. The Egeria project is hosted by the ODPi Data Governance Initiative at the Linux Foundation.