Using metadata management best practices helps to maximize the value of the data stored by an organization. Finding the right data after it has been placed in storage can be difficult if the storage system has no organization. Metadata is used to organize the data so it can be found easily.
Metadata is essentially a labeling system, similar to card catalogs used by many library patrons when seeking specific books from library shelves. Some form of metadata has been used to locate information for thousands of years.
The best practices of metadata management involve establishing a system for handling a specific organization’s metadata in a useful and organized way.
Metadata uses descriptions and keywords associated with a file’s content, allowing a search engine to find it easily. A file’s metadata is typically based on information describing or relating to the file, using features such as its name, date, author, type, and location. These features are used to filter and organize the files.
By developing an organized system for managing metadata, and using the appropriate software, a business can ensure its information can be located.
Unfortunately, there aren’t many tools or platforms available that provide actual management of the metadata structure. Software that is described as being for metadata management is often a platform that uses a metadata search engine and provides a variety of services – such as data lineage, semantic definitions, and impact analysis – but very little in the actual management of the metadata’s structure.
When a business uses a clear, well-organized system for managing its metadata, the data needed for making business decisions (business intelligence) can be located quickly and easily. Metadata can be generated automatically whenever data is created, altered, or updated, but it can also be created or altered manually. However, the selection of metadata that is used should be specific to the needs of the organization. Selecting the right metadata for an organization’s specific needs is a best practice.
Without a strategy for organizing and managing the metadata, and providing accurate information, the chances of making bad decisions increase. Each organization’s strategy for managing metadata will be unique. The metadata/labeling system used depends on the type of business, as well as its goals and priorities.
Metadata management best practices suggest a business should use a metadata structure designed to maximize the value of its data.
Metadata as a Support System
A well-designed metadata management system also supports Data Governance, data catalogs, and security. Metadata can be used in supporting a Data Governance program’s efforts of ensuring high-quality data by providing the source, the date, its history, and the number of copies.
It can also be used to support the creation, updating, and maintenance of a data catalog (organized inventories of an organization’s data assets). Metadata can be used in data security as well, to protect sensitive data and prevent unauthorized users from accessing and modifying the data.
Additionally, metadata helps to provide context for humans using it during research. For example, metadata may contain the title, a description, the date it was filed, etc.
Examples of Metadata
My Mac OS laptop metadata (a kind of pop-up accessed through “get info”) provides a title at the top (for example, “Images JPG” or “robot novel”) followed by the “kind” of document (JPEG, rich text document, PDF, etc.). This information is followed by other reference terms, such as size, where (the file location), created (the date), and modified (the date).
Near the center of the metadata pop-up are the reference terms more Info, name & extension, comments, and open with (the file’s format: RTF, PDF, JPEG). The bottom of the metadata pop-up contains a preview of the file and sharing & permissions (security, administrative access).
As a result, I can find files by searching for the title, the creation date, the last modification date, or by location. The other information can be useful for identification and research.
The Dublin Core Metadata Initiative has developed a list of basic, useful standardized reference terms (often referred to as “elements”) that can be used for creating a metadata format. They are presented below. However, when developing a metadata format, there is no reason to be restricted to their terms. When developing your own metadata framework, feel free to develop your own uniquely tailored reference terms.
- Contributor
- Coverage
- Creator
- Date
- Description
- Format
- Identifier
- Language
- Publisher
- Relation
- Rights
- Source
- Subject
- Title
- Type
Best Practices for Developing a Metadata Management Program
Theoretically, the Data Governance steward or Data Governance team will be responsible for developing and implementing a metadata management program. If the organization is large enough and/or busy enough, a metadata manager position may need to be created.
Metadata management is useful in maximizing the value of an organization’s data. Implementing a metadata management program can bring numerous benefits to an organization. It promotes the discovery and understanding of the organization’s data assets, in turn supporting more productive and efficient work.
The best practices in developing (or restructuring) a metadata management program are listed below:
1. Reviewing the organization’s goals for purposes of clarity is a good first step. Profits are supported, in part, by business intelligence and efficiency. Business intelligence and efficiency are supported, in part, by a well-designed metadata management program.
2. Establish the specific goals of the metadata program. What type of business do you have? The informational needs of a hospital are different from the needs of a manufacturing business. In manufacturing, being able to access data about defects in an assembly line, along with their times, dates, and employee name, could be useful. What kind of work does your organization perform?
3. Find ways to modify or edit the metadata “elements.” Selecting useful and easily understood metadata reference terms/elements can improve overall efficiency. Automated metadata software is normally built into the operating systems of computers (Mac, Linux, Windows) for storing files on hard drives in an organized fashion. This makes editing the metadata elements difficult, but there are a few tactics available.
CollectiveAccess (developed by Providence) is open-source software that can provide an alternative and editable metadata management system, and it works well with Linux, Microsoft, and Mac OS X 10.9+ systems, but also requires the support of the MySQL platform.
Microsoft Word does not allow its standard “document properties” (elements) to be removed or altered, but it does allow for the addition of new elements/document properties, which can be used to serve the same purpose.
4. Select the best reference terms/elements. Determining what reference terms are used is important for developing an efficient metadata program. For example, retail businesses would benefit from analyzing the metadata related to their sales and customer feedback, as a way to identify trends and make decisions about what products are being purchased. By using the relevant metadata terms, such as type of “product” (shirt, dress), the “sales date,” the “size,” the “color,” and the “brand name,” metadata can be used to research and improve the business’s online sales.
Additionally, the use of “keywords” within the metadata allows shoppers to locate what they are seeking more efficiently.
On the other hand, a hospital would select metadata terms focused on patients and their treatment. This metadata would contain the patient’s “name,” perhaps their “doctor,” the date of their “last visit,” any “medication” they are taking, and “health issues.”
5. Update the staff on changes. The staff should be given notification, and perhaps training, on any changes to the metadata structure, or how the staff uses it. Any best practices list is not complete if the staff are left in the dark about the changes.
Best Practices for Metadata Management Maintenance
Metadata management is, unfortunately, not a one-time activity. It must be maintained, adjusted, and improved as the organization grows and evolves. Metadata is used in supporting Data Governance, and many of the skills used in maintaining the Data Governance program are similar to the skills needed to maintain and support metadata management.
Someone (preferably the Data Governance steward or team) should be assigned responsibility for updating the staff on any changes to the metadata system, maintaining the system, and editing and altering the metadata system, as needed. This person (and perhaps two or three backups, just in case something happens to that person) should be the only one(s) with access to the metadata’s “system” to prevent malicious acts.
Conducting regular audits to determine the accuracy and functionality of your metadata can help to assess and identify areas needing improvement. The most obvious metadata maintenance concerns are:
- Accrual: Ensuring accurate metadata is attached to all new records
- Deletion: The removal of unnecessary records and their metadata
- Modification: Altering metadata as needed to assure its accuracy
- Sharing: Copying and sharing selected data needed for other uses
- Migration: Transferring data from one system of architecture to another
- Exposure: Making data available for research
- Security: Restricting access of metadata controls to selected individuals
The Future of Metadata Management
During a DATAVERSITY® interview about the future of metadata management, Emily Washington, Precisely’s senior VP of product management, said,
“Metadata information needs to be refreshed as new fields get added to systems or new inputs and outputs flow to and from them. Lots of automation helps manage metadata, keeping it up-to-date, so changes, additions, and deletions can be checked. Machine learning and AI can monitor historical metadata trends and usage. It can figure out, from metadata, what data has been touched most frequently, where sensitive information lives, and where redundant data exists.”
Image used under license from Shutterstock