In recent years, active metadata (as opposed to passive metadata) has promoted the development of new types of metadata and ways to manage them. Metadata is a labeling system that can be read by humans and computers and allows search engines to locate data using defined metadata fields.
Passive metadata provides a basic identification system using technical information but does not offer significant context, and the metadata is considered static (a semi-permanent label). Fortunately, modern data stacks have prompted the development of active metadata, which supports new metadata description systems and much more context, and is considered dynamic (meaning the metadata is updated whenever the data is altered).
Systems using active metadata rely on machine learning and automation. An active metadata management system uses software to promote the continuous updating of metadata being used for ongoing projects and real-time customer service.
It can also track the data as it moves through the data pipeline and report any changes it undergoes. An active metadata management system requires automation that supports the continuous processing and updating of metadata labels.
Active metadata is taken from sources in real time, allowing management to identify, track, understand, and manage data assets. Active metadata builds trust and democratizes data.
The Active Metadata Platform
Modern businesses need to develop a metadata strategy that ensures their data isn’t being siloed and that their metadata (both passive and active) is accurate, consistent, and reliable.
An active metadata platform allows metadata to flow smoothly and quickly throughout the entire data stack. It is a more complex process than passive metadata and covers a range of metadata descriptive systems.
An active metadata platform is an action-oriented system that is always on, and is constantly gathering metadata, without the need for manual entries. It constantly processes the metadata to interconnect data sets and files and develop business intelligence from it. It uses machine learning to process the metadata and develop actionable intelligence.
An active metadata system becomes smarter as people use it and as it gains more experience with metadata.
The Different Types of Metadata
There are several types of metadata, all of which may be useful in increasing the value of a business’s data assets. The various metadata types that have been developed, per active metadata, support much more flexibility when describing the data’s content.
At the most basic level, metadata should communicate information about content, context, and structure. Some of the more common metadata types are listed below.
Technical Metadata: This type of metadata is commonly used with passive metadata and includes the name of the database table and the column name, as well as the data type, ETL jobs involving the data, indexes referencing the data, etc. Technical metadata includes:
- File formats
- File names
- Schemas
- Data sources
- Geographic locations
Business Metadata: It provides definitions, business rules, restrictions on the data’s use, and context for data. Business metadata is easily understood by non-technicians and provides a common language. Business metadata includes:
- Timelines
- Business requirements and models
- Business process flows
- Metrics
- Business terminology
Operational Metadata: This form of metadata includes information about when and how the data was transformed or created. It provides additional details about how the data was used. Types of this metadata includes information on:
- Dates of updates
- Loading date
- Lineage
- Data’s status
Process Metadata: This is a subdivision of operational metadata that is stored within a data warehouse or a data lake. Process metadata provides details of the process of loading data into storage. This type of information is useful in case of a problem. Process metadata may include:
- Errors logs
- Job execution logs
- Audit results
Provenance Metadata: This metadata type tracks the data’s origin and any changes over time. It provides data traceability, so inaccurate data can be found and removed, improving Data Quality. Provenance metadata may include information on:
- Authority
- Change logs
- Ownership records
- Versioning records
Structural Metadata: This provides information about the physical organization of a data – the relationships, types, versions, and other characteristics. Structural data can be used to create and maintain data dictionaries. Some forms of structural metadata are:
- Data element types
- Table names
- Record size
Administrative Metadata: This type of metadata provides information that is used in Data Governance. It helps to manage and establish the data’s credibility. Administrative metadata can include information about preservation, rights, and use. It provides controls on who can use them and how the files may be used. Administrative metadata can include:
- Copyright information and license agreements
- Technical data on rights management
- User restrictions
- Access control information
Social Metadata: This provides useful information about how people use data. Using the context social metadata provides, businesses can decide to decrease, maintain, or increase advertising or productivity. Social metadata includes information about:
- Author information
- Most queried tables
- Frequency of use
Managing Active Metadata
Managing active metadata makes data searches quick and efficient, providing the insights needed to make data-driven decisions. Businesses should have a strategy for managing their metadata. Without an intelligent strategy, data can become extremely disorganized, making it difficult for researchers to determine the accuracy of data, and causing them to question its reliability.
Be sure to include the following in an active metadata program:
The automatic classification of sensitive data: Sensitive data (such as personal data) is protected using automation. The laws and regulations regarding privacy (and other issues) are appropriately classified, automatically, without the risk of human error.
Data can truly be democratized when users have visibility into all existing data. But this active metadata management allows a business to classify sensitive data automatically, hiding some of it while making the remaining data visible only to authorized users. (Policies regarding sensitive data can be customized.)
The purging of outdated data: A useful process supported by active metadata management is the systematic removal of old, outdated data. It can be set up to determine the date a document or batch of data was last used and/or the number of staff who have accessed it.
A data resource can be archived automatically, if not accessed in the 60 days. If it has not been accessed in the last 90 days, it can be purged automatically.
Downstream end-user alerts: Active metadata management can be set up to directly notify the appropriate people when a database has been modified or when a potential anomaly has been detected.
If a discrepancy is found, it can be traced back quickly to the creator, and then notify that person of the error, and/or correct it, immediately.
Identifying the most frequently used data assets: Active metadata management can be used to develop a customized popularity score for each data resource. This popularity score can be developed from the usage information of query logs, data provenance, and business intelligence dashboards. The data resources that are most used and most relevant should then appear in search results more frequently.
The Future of Metadata Management
Dimitri Sirota, CEO and co-founder of BigID, predicts the use of active metadata hubs will be the next development in active metadata.
The goal of the active metadata hub is to interconnect an organization’s data and serve as its search engine. It will have access to the organization’s entire data ecosystem and will accelerate data solutions through automation and machine learning.
An active metadata hub uses a data catalog supported by machine learning. It is designed to promote the orchestration and enrichment of metadata. Additionally, metadata taken from a variety of sources is interconnected with currently stored data and integrated with other Data Management tools. An active metadata hub allows metadata to be exchanged, updated, and shared.
Image used under license from Shutterstock.com