The pre-digital card catalogs of libraries offer a good example of metadata (title, author, copyright, location on shelf). Metadata essentially means “data offering information about the data” or simply “data about data.” It is also often defined as “data in context,” since it supplies the “what, where, when, why, and how” of data. Metadata is part of a cataloging system that summarizes the most basic information about the data, making finding and tracking it easier. Some examples of the basic information used in the cataloging process include:
- The time and date of creation
- Author(s) of the data
- Where the data was created
- File size
- The standards used
- Data Quality information
- The source of the data
- The purpose of the data
- How the data was created
The Importance of Metadata
Metadata can be included in a digital image—describing the picture’s color depth, image resolution, shutter speed, and size, plus other data. Text documents may contain descriptions about who the author is, the document’s length, and when it was written. Metadata is necessary for web pages, and contain descriptions of the page content, as well as keywords linked to its content. Data sources of all types, whether structured, unstructured, or semi-structured contain metadata information and it is a primary facet of effective Data Management in organizations of all sizes. Solid Metadata Management and its allied practice of Data Lineage is necessary for Data Governance and Data Stewardship, Data Warehousing, Master Data Management, BI and Analytics, among just a few examples.
During a recent DATAVERSITY® interview, CEO and co-founder of Octopai Amnon Drori discussed this. He said that while Metadata Management clearly is a central piece of the Data Management landscape, many organizations are not leveraging it correctly.
“Unfortunately, in a typical environment, you can see multiple systems of different vendors that are participating in the data movement process. And moreover, they are digging to each and every one of them separately to understand the relationship between all those systems. It is mostly being done manually.”
He remarked how all that digging costs organizations incalculable losses in productivity and money, while so many of the tasks could ultimately be completed with automation. Being forced to manually trace the data history every time specific metadata is needed was extremely time consuming, frustrating, and inaccurate said Drori:
“Four years ago, we asked, ‘What is it that we’re doing repetitively that can be replaced with automation?’ We looked at what we were doing, and then talked to other colleagues, and suddenly we realized that what we call ‘metadata’ is lacking in support technology. We wanted to understand the data journey with a single click of a button, and we wanted it in five seconds—not five days, not five weeks—but in five seconds.”
According to Drori, there were a lot of investments in managing the data in the business application, but they couldn’t find any solution that addressed the lineage and movement aspects. They wanted one tool that did cross-platform, ETL, and Data Warehouse analysis, along with reporting.
Metadata Automation
The introduction of Metadata Automation to replace the increasingly painful process of manual data mapping associated with Metadata Management is a new trend in the industry. It is innovating this entire process by fully automating Metadata Management and analysis. The technology is able to provide instant discovery and comprehensive lineage, thereby enabling enterprise BI groups to quickly, easily, and accurately find and understand their data for improved reporting accuracy, regulation compliance, Data Modeling, Data Quality, and Data Governance. It allows organizations to retrieve metadata from multi-vendor Business Intelligence systems and then place it in a central Cloud platform for analysis. It is simple to use and can be installed, and up and running, within no more than a day, using a plug-and-play solution.
The ability to facilitate visibility and control of metadata is a key feature to Metadata Automation. Data professionals across the spectrum spend a great deal of time and energy manually exploring, discovering, and understanding metadata. Automation, on the other hand, turns several weeks of manual labor into minutes, and allows the business to move more quickly and efficiently. Some specific strengths of Metadata Automation include:
- Metadata is automatically scanned and gathered from a wide variety of sources, including ETL, database, and reporting tools
- A central repository to store and manage metadata
- Smart algorithms to model and index all metadata types, which allows a data professional to quickly locate and understand cross connections (referred to as “metadata understanding”)
- A smart engine that uses hundreds of crawlers to search all of the metadata and presents the results within seconds
- Creation of a visual map, with the full Data Lineage of the data history as it moves through multi-vendor systems
GDPR
The General Data Protection Regulation (GDPR) requires businesses to behave responsibly with the data collected on European Union residents. Internet users must give permission for the storage of their personal data, must be made fully aware of the ways their data is used, and must have the freedom to export or delete data about themselves. Businesses must also assure personal data is protected. Octopai helps resolve these issues.
Businesses not complying with the GDPR may be fined for up to $25 million or 4 percent of their annual income if a breach occurs—depending on which is higher. Complying with the GDPR is an expensive change for many businesses. According to a Netsparker survey, approximately 60 percent of businesses will spend between $50,000 and $1,000,000 to become compliant, and over 10 percent will spend even more. Speaking about GDPR, Drori stated:
“If organizations cannot manage their vast amounts of data, they cannot take advantage of its value. Organizations are spending more and more time on tracking, finding, and understanding data, and the metadata behind it.”
With the General Data Protection Regulation having become a reality, European businesses and those doing business with European companies must consider the kind of data they have stored, and whether it contains the personally identifiable information (PII) that should be erased. GDPR involves scrubbing data, anonymizing it, and ensuring company servers are storing only valuable, useful information.
Metadata Management Moves Forward
An organization’s use of Metadata Management suggests their commitment to the philosophy of improvement and shared knowledge about data. This philosophy includes a focus on Data Quality and consistency for specific data-driven projects.
According to Drori,
“Metadata Management can be used to improve quality and minimize duplication at source systems. These systems and processes can help an organization streamline the discovery and documentation of data related to the data in other systems, such as definitions of customers, products, or other topics of interest.”
Octopai’s automated Metadata Management platform operates with a Cloud-based, centralized, cross-platform search engine. It helps organizations find needed data and provides full Data Lineage. Many organizations are reliably using dedicated legacy Metadata Management systems today, but many of those systems are not purpose-built for the new world of Data Management with real-time platforms, Advanced Analytics, Machine Learning and others.
BI Analysts now can save time from struggling with inaccurate, mismatched reports. Researchers no longer have to track data manually to find an error’s source, but instead can let Metadata Automation do the work. Metadata Automation can also be used to discover related ETL tables and processes used to create reports, and compare multiple reports using specific metadata items.
Drori described the reaction of an insurance company’s CIO after Octopai had been installed. Drori had asked why they had invested in Octopai, and he responded, “Well, I don’t really remember the reasons. But I can tell you this: Octopai was able to double my team’s efficiency, without hiring anybody else.”
Such is only one success story of Metadata Automation’s growing recognition and importance in the industry.
Photo Credit: Wright Studio/Shutterstock.com