Metadata is like the secret sauce of the internet. You type a word or phrase in a search engine, press enter, and the information you’re looking for appears (usually). An action that once seemed “indistinguishable from magic,” to quote the third of Arthur C. Clarke’s three laws, is now as commonplace as heating up leftovers in a microwave.
Metadata can be defined simply as “data about data” – and more completely as information that describes everything about the data without disclosing its contents. The dynamic nature of data requires that metadata be updated frequently to ensure important information remains discoverable in new contexts. Manually defining, curating, and documenting metadata (called “passive” metadata) is time-consuming and prone to human error.
Active metadata addresses the shortcomings of passive approaches by automatically updating the metadata whenever an important aspect of the information changes. Defining active metadata and understanding why it matters begins by looking at the shift in organizations’ data strategies from a focus on data acquisition to data consumption. The goal of active metadata is to promote the discoverability of information resources as they are acquired, adapted, and applied over time.
What Is Metadata and How Does It Work?
Metadata is any information that gives data context. Imagine an image of a flower growing in a field. To the untrained eye, it appears to be a typical flower on a generic hillside, but the image’s metadata tells us that this is not just any flower growing anywhere, it’s a type thought to be extinct in the region the picture was taken. Metadata allows us to discover many important aspects of its content without having to access it directly.
Metadata is found in various structures that surround the data itself:
- A document’s metadata may include its author, title, version, and length. The metadata is stored in information tags attached to the file but not part of the document’s contents.
- Social metadata uses meta tags such as those in Facebook’s Open Graph to describe a post, its source, and any associated images.
- HTML metadata places meta tags inside the header of HTML pages. The tags include keywords that relate to the page’s content as well as the website’s title, author, and other information about its source.
- Relational database metadata is stored in a data dictionary and describes the database’s tables, columns, data types, table relationships, constraints, views, and indexes. Data attributes are stored in the table’s columns, while its rows represent unique records with a corresponding attribute value that’s used to establish relationships between data points.
- Email metadata resides in the message’s header and includes the date and time the email was sent and received, the names of the sender and recipient, and its title and subject. The metadata may also describe the message itself and any attached files, including URLs for retrieving the files.
Types of Metadata
No single approach to metadata can be applied to catalog the variety of data types that are collected and analyzed by information systems. While metadata can be placed in any number of categories, the three primary types of metadata are structural, descriptive, and administrative:
- Structural metadata establishes relationships between data objects and the hierarchies of various data resources. The goal is to organize the data coherently to ensure the format and interactions of data elements are clearly communicated to data consumers.
- Descriptive metadata is intended to represent the who, what, where, and when of the data content to help users find the precise data they’re looking for. Descriptive metadata standards include Dublin Core and Machine Readable Cataloguing (MARC).
- Administrative metadata provides data managers with information that helps them determine whether the data meets governance requirements and is adequately protected while permitting access by authorized users. It describes any copyright, licensing, or rights restrictions, as well as contractual or fiduciary requirements.
Other categories of metadata are applied to data based on its function or source. Among the functions of metadata are supporting search and retrieval, data preservation and integrity checking, multi-versioning and reuse, and interoperability.
- Provenance metadata indicates where the data originated and aspects of its creation and use, such as ownership, transformations, and archival requirements. This metadata regenerates each time a new version of the data is created and maintains a version history. It provides a complete history of the data, including the software that was used to generate it, legal rights retained by the data’s creator, how the data was used, security measures, and the steps taken to ensure its integrity, accuracy, and completeness.
- Definitional metadata serves as a shared vocabulary to provide consistent interpretations of the data’s meaning, including the rules governing its context and the logic that was applied to derive it. The two types of definitional metadata are schematic for structured data sets typically found in databases, and semantic for unstructured data such as text files and multimedia.
- Preservation metadata protects the integrity of the data over time by recording a history of its collection and use, including any applicable copyrights. It may involve rights management, including the permissions granted by rights holders.
- Business metadata represents the context, meaning, and relevance of data as it is used within an organization. In addition to describing the data itself, it includes classifications, ownership information, and the business rules and policies that apply to the data’s collection, storage, use, and retention.
- Collaboration metadata records all the comments, discussions, chats, tags, and other insights of users related to the data. This information assists the people in the company charged with maintaining the data and tracking issues raised by users that relate to the data.
How Active Metadata Enhances Data Management and Governance
Active metadata’s ability to update automatically whenever the data it describes changes now extends beyond the data profile itself to enhance the management of data access, classification, and quality. Passive metadata’s static nature limits its use to data discovery, but the dynamic nature of active metadata delivers real-time insights into the data’s lineage to help automate data governance:
- Get a 360-degree view of data. Active metadata’s ability to auto-update ensures that metadata delivers complete and up-to-date descriptions of the data’s lineage, context, and quality. Companies can tell at a glance whether the data is being used effectively, appropriately, and in compliance with applicable regulations.
- Monitor data quality in real time. Automatic metadata updates improve data quality management by providing up-to-the-minute metrics on data completeness, accuracy, and consistency. This allows organizations to identify and respond to potential data problems before they affect the business.
- Patch potential governance holes. Active metadata allows data governance rules to be enforced automatically to safeguard access to the data, ensure it’s appropriately classified, and confirm it meets all data retention requirements. The added layer of protection reduces the chances of the company’s sensitive data being compromised.
- Make better business decisions. Analytics are a vital component of management decision-making. Active metadata delivers context and insights that aren’t possible with passive metadata. The technology highlights trends and patterns in the data and establishes correlations between disparate data elements that wouldn’t surface otherwise.
From a data consumer’s perspective, active metadata adds depth and breadth to their perception of the data that fuels their decision-making. By highlighting connections between data elements that would otherwise be hidden, active metadata promotes logical reasoning about data assets. This is especially so when working on complex problems that involve a large number of disconnected business and technical entities.
- The active metadata analytics workflow orchestrates metadata management across platforms to enhance application integration, resource management, and quality monitoring. It provides a single, comprehensive snapshot of the current status of all data assets involved in business decision-making.
- The technology augments metadata with information gleaned from business processes and information systems. This helps teams collaborate more efficiently while enhancing the overall accuracy of the company’s decision processes.
- By allowing decision makers to visualize metadata and identify and trace links between data assets, active metadata delivers a more complete picture with greater detail and the most up-to-date context. It promotes standardization of metadata definitions across applications, and throughout and beyond the enterprise.
- Active metadata’s built-in change management and auditing makes what-if scenarios and forecasting more reliable, which allows managers to anticipate the impact of proposed changes and updates with more precision. It also improves the company’s ability to respond quickly and more effectively to changing conditions.
Slow and steady may apply to races between tortoises and hares, but when it comes to making smart, winning business decisions, those who snooze, lose. Successful organizations aren’t satisfied with keeping up with the competition – they look to set the pace and leave their rivals far behind. Active metadata gives business decision-makers an edge by providing the most accurate, relevant, and timely information available.