Automated metadata tools can be used to develop and build business glossaries, graphs, and data catalogs. By eliminating the human factor, metadata errors can be minimized and tasks accomplished much more quickly. Business glossaries act as the foundation for a shared and common language.
Automated metadata tools can cut down on the amount of time spent on tasks such as data tagging and developing business glossaries.
Businesses are currently exploring a small range of automated metadata tools to support their data infrastructure. This ensures it can operate reliably at scale, and that researchers can find and understand the data, while trusting its quality and accuracy.
Automated metadata tools support Data Governance and good communications.
Successful communications require the use of a shared common language—or a mutual understanding of what certain terms and concepts mean. Unfortunately, the staff in different departments do not always share the same understanding of certain words and phrases, and communications within an organization can break down. These misunderstandings often have a negative impact on the entire organization.
A business glossary (which is coordinated and used by way of metadata) assigns a definition or meaning to business and data terms.
A recent study made by Gartner has shown more and more businesses are using metadata tools, particularly “automated solutions” that help with Data Governance. According to the study, 50% of the participants using metadata solutions stated their primary use is support for their Data Governance program.
What Is Metadata?
Metadata is quite similar to the card catalogs used by libraries a couple of decades ago. It provides small bits of information that help identify and locate the data. Metadata maximizes the value of a business’s information assets through the use of context. Context refers to answering the “who, what, when, where, and how” questions. More specifically, metadata can answer:
- Who created this data?
- What is it, and what is its privacy level?
- When was it created?
- Where is it from?
- How was it created, and how may it be used?
Metadata can often be accessed directly from the source that created and maintained the data. It can be automatically extracted or manually created. For example, a digital image typically comes with the date and time. The image can be tagged with more keywords that communicate who created it, the event or place, or the title of the image. Tagging provides unique information that helps to identify the image so it can be shared.
Metadata is normally categorized by the function it serves within the business environment. There are a variety of metadata types, but three common and popular ones are:
- Descriptive Metadata: Defines how the data is described. Can be used for identification and discovery. The metadata for a book would include the title, author, genre, and ISBN. Keywords can also be used with descriptive metadata.
- Structural Metadata: The structure of the data, indicating how compound objects are organized; for example, showing its format and how it has been assembled. The various scene selections in a DVD menu provide an example of structural metadata.
- Administrative Metadata: Communicates important instructions about the data. It may list the applicable restrictions regarding the file, including who has access to it. It plays a critical role in the management, archiving, and preservation of resources.
Metadata helps discover, classify, describe, archive, control and manage data.
According to Octopai, a Metadata Management vendor:
“Most metadata is not recorded, documented, managed, or stored anywhere. A typical company has tens of millions or millions of data elements without descriptions. To deal with this, organizations add the descriptions manually or try to fix it with data catalog applications.”
A Lack of Automated Metadata Tools
Historically, working with metadata has been a manual process. Fortunately, this is changing. Manual labor is much slower and clumsier than the automated metadata tools that are starting to appear on the market. Automated metadata tools also help to eliminate human error.
In the world of digital data research, metadata is used for every report, data warehouse, visualization, and dashboard. Without access to metadata, businesses can’t locate and gather information.
It is not unusual for researchers lacking automated metadata tools to perform time-consuming detective work to resolve conflicting or confusing information. They have to deal with the same types of data being stored and labeled differently, because they are in different systems. Or the data may be stored in an unreadable format, and unusable in its current state. Automated metadata tools can be used to resolve these situations, without time-consuming manual labor.
Because researchers often spend large amounts of time manually tracking metadata and errors, a lack of automated metadata tools can result in a significant waste of time and money.
Replacing manual labor with automated processes can be remarkably cost effective, while simultaneously eliminating human error. Automated metadata tools can be used to track and catalog metadata. Data lineage tools can be used to present visual reports and representations. These tools free up researchers’ time, allowing them to help the organization gain insights from their data.
The Business Glossary
Business glossaries are a key component of Data Governance programs and promote understanding across business cultures. Business glossaries are designed to provide an organization’s staff and partners with common definitions. They define the meaning of commonly used business and data terms and help to assure the terms are used correctly and in the proper context. A robust business glossary will define key business concepts and terms and establish relationships between those terms.
Some software solutions use machine learning to automatically generate a business glossary, providing suggested terms and phrases to help in promoting a common understanding.
Without a business glossary acting as a reference source, there is no “authority” promoting a shared and common language. For example, the billing department may think the term “customer” represents a company, while the sales department thinks the term refers to an individual, leaving management in a mild state of confusion. A business glossary will prevent this by providing a single definition to be used on a company-wide basis. (The more discrepancies in definitions, the more confusion.)
One of the most significant benefits of having a business glossary is its ability to display the relationships that exist between business terms, making research much easier. A well-designed business glossary defines terms, shows how the terms relate, and provides examples. It will also establish links for all artifacts associated with the term (processes, databases and systems, KPIs, data owners, and even stewards).
Modern business glossaries are an important aspect of Data Governance.
These glossaries also link to all artifacts that are associated with the term being researched, including processes, databases and systems, KPIs, and data owners. Artifacts have a broad range of definitions, depending on the activity. Generally speaking, a data artifact is created, indirectly or directly, as the result of something else that was created. Data Governance artifacts, for example, are byproducts, internally delivered during the program’s implementation.
Business glossaries are often developed by the data steward as a first step in initiating a Data Governance program.
Sources of Automated Metadata Tools
People who work with data often spend large amounts of time and energy “manually” exploring and discovering data. Automation, however, can accomplish several weeks’ worth of manual labor within minutes. Additionally, this allows the business, as a whole, to move more efficiently. The ability to provide visibility and control is a very useful feature of metadata automation. Some other useful features include:
- Smart engines, which use hundreds of crawlers designed to search all metadata, and present the results in seconds
- Metadata being stored and managed in a central repository
- Smart algorithms used to model and index metadata types
- Metadata being scanned automatically, and collected from a variety of sources
- The development of a visual map, displaying the full data lineage as it moves through different systems
Alation, BigID, Collibra, and Octopai are a few of the companies offering automated metadata tools. Details about each are listed below.
Alation offers a data catalog that automatically indexes all data by source. Their data catalog also gathers information about data, automatically. This allows everyone who needs access to the data to easily collaborate.
BigID offers a business glossary and data discovery using an AI-based, machine learning-driven approach that locates data, everywhere—and follows up by classifying and tagging the metadata according to business needs. BigID’s automated system can:
- Align business terms with the metadata
- Minimize the number of manual steps
- Classify both data and metadata
- Facilitate collaboration between techs and management
Collibra offers a package of automated metadata tools. It can automatically map the data’s lineage to show how data is transformed, and how it flows from system to system. Additional tools and features include:
- A Metadata Graph: Builds a graph connecting technical, business, and privacy metadata with lineage and Data Quality information.
- A Data Catalog: Provides visibility into the organization’s data assets and offers business “context.”
- An Embedded Data Governance and Privacy Feature: Ensures users can access only data that is trusted and compliant
Octopai offers a “metadata platform.” It can extract and collect metadata and analyze it almost instantly. Using one centralized platform, Octopai automatically supports access to a data catalog, data lineage, and data discovery. They also offer an automated business glossary.
Image used under license from Shutterstock.com