Developing a metadata strategy is necessary for a growing business to maintain and improve efficiency. Metadata is a small amount of data that is used to identify a larger collection of data (images, text, files, digital objects). It is generated each time data is collected from its source, moved through a data system, accessed by users, integrated with other data, cleansed, or analyzed.
Any form or amount of data can be tagged with metadata, automatically (or manually). Metadata tags are typically designed to make it easy to find the desired data.
The information (descriptors or keywords) conveyed by the metadata tags is typically associated with relevant elements, such as the title, dates, the creators, or technical information. The tags are not presented to the user, but instead are hidden within the source code. They convey the content of the metadata to browsers, search engines, and other tools. Metadata may also communicate how data has been used. There are six basic types of metadata:
- Descriptive metadata: This type of metadata is used for discovery and identification. It includes descriptors such as the title, author, and keywords.
- Structural metadata: Contains descriptors about containers of data. It describes the version, relationships, and other features of digital materials.
- Administrative metadata: Presents information for managing a resource, such as the resource type, permissions, and how and when the data was created.
- Reference metadata: This form of metadata is about the contents and quality of statistical data.
- Statistical metadata: Can be used to describe the processes involved in collecting, processing, or producing statistical data.
- Legal metadata: It provides information about the creator, the copyright holder, and public licenses.
The purpose of metadata is to provide a way of indexing, preserving, accessing, and discovering digital resources.
Some organizations have never really organized or developed their data architecture, and as they’ve grown, their data has become scattered and disorganized. This can make it challenging to find the desired data. For businesses to be successful in this modern world, they must be able to locate and use their data quickly and efficiently.
Data Governance and Metadata
Metadata is designed to work with Data Governance software, and it is a critical feature of Data Governance, allowing data sets to be indexed and accessed. A metadata strategy must include integrating the metadata with the Data Governance program. This will protect sensitive or confidential data before breaking any existing privacy regulations or laws (such as the GDPR, CCPA, or LGPD). Data Governance provides accountability for data assets and makes certain the metadata is always accurate and consistent. Traditionally, metadata management has been used for organizing and classifying data for compliance reasons.
Currently, machine learning instructions that are embedded into Data Governance programs automate the process of capturing and curating metadata.
A Data Governance framework often includes the use of several apps and software programs, such as data warehousing, data quality, master data management, and metadata management. Data Governance programs can be used to support complete transparency about the business’s data flow, allowing data assets to be defined, tracked, measured, and managed.
Development and Implementation
A thorough understanding of the organization’s metadata is critical to effectively implementing a metadata strategy. There are a number of steps involved in developing a metadata system. It is especially important to schedule the time needed to organize, implement, and test the system (repeatedly) until all the requirements are met. The implementation plan should include the schedule and all details of the project.
The implementation plan should break the process down into discrete, manageable tasks. For instance, developing a map of all active data assets will involve any data lakes, data warehouses, databases, cloud storage, emails, and other storage used by the business. Each storage site should be listed and scheduled for research individually. (Tracking the metadata in a data lake, with its unindexed data, may require breaking “it” down into manageable tasks.)
Implementing a metadata strategy typically includes the following steps and sub-steps:
Develop a metadata template: At this point, the goal is to determine what types of metadata should be used to maximize its ability to be discovered. This requires gathering information from people using the data on how to best design the template. During this information-gathering phase, staff can be interviewed, customers can be surveyed, and workshops can be set up to gain input from IT and stakeholders. Be sure to assess how clients and business users tag their own metadata and identify common elements.
- Identify the types of metadata to be used: Here, the goal is to determine the types of metadata that best communicate the business’s content and needs (descriptive, structural, administrative, reference, statistical, legal). Decide which types of metadata best describe the organization’s data assets, including integers, free text, strings, the date, or date/time fields. Then determine if rules are needed (for example, title fields may need to be limited to 50 characters, or the date/time fields may need to use international display standards).
- Establish a metadata vocabulary: A formal definition of descriptors should be developed for consistent communications of the metadata. Typically, metadata vocabularies are based on domain-specific data. Metadata elements are often grouped into categories – for instance, customer data, product data, and images. Developing a metadata glossary to support the vocabulary and can help with communications and should also be a part of the Data Governance strategy, which emphasizes Data Quality.
- Be aware of the subject metadata: Curiously, metadata contains … sub-metadata. The metadata structures of metadata often have their own metadata. It might be a descriptive name or the length of characters. Subject metadata is the correct name for this kind of metadata. The descriptors of subject metadata can be used to link contributing partners’ and institutions’ records with other records, making them easier to find.
Map the metadata: Create some sort of a trackable chart. It could be a spreadsheet or table on a computer. White boards are an option, although steps should be taken to avoid it being accidentally erased. Using the information gathered from the previous steps, map out the metadata indicating where and how it is used.
- After listing the metadata and its locations, look for common descriptors. (Sometimes descriptors have different names but serve the same purpose. For research purposes, they would qualify as common descriptors.) Remember that it is important to be able trace data back to its original source (such as an ERP or CRM system).
- Create a data catalog. A data catalog is an organized inventory of data assets for a business. This catalog should be maintained and updated on a scheduled basis.
Assessment: At this stage, the goal is to determine if there are any import/export, synchronization, or master data management “tools” that are needed to keep the metadata consistent and clean throughout the business. The following information will be useful in determining how to design the metadata, and the kinds of metadata management tools to research for supporting the metadata strategy.
Understand the people and the processes: This is an important part of the assessment phase, which involves understanding how the processes work, the problems people are having, and their solutions. Listed below are some ways to gain a better understanding of the people and the processes:
- Track how the data moves through the business. Look for common descriptors as the data moves within the system.
- Understand how the metadata is used. Is it used for completing forms, or to connect with other systems? Will it initiate workflow processes?
- Determine how the descriptors will be organized. Will the metadata capture process allow the use of a freestyle method of tagging the content (called “folksonomy”) or will it be completely automated?
- What training or education will staff need to adjust smoothly to the changes? How will the training be accomplished?
Design the metadata model for continuous improvements: Feedback is important for the continuous improvement and evolution of the metadata model. It is crucial to collect feedback from your staff and customers to ensure the metadata plan continues to support the business’s objectives.
Here are some recommendations to incorporate continuous improvement into your design:
- Check in with managers at regular intervals to access the functionality of the metadata model.
- As business objectives change, the metadata model may need to change as well.
- Provide a feedback mechanism for anyone with a suggestion or complaint about the metadata.
Automate wherever possible: There are three basic reasons for automation. It is much, much faster; it eliminates human error; and it “automatically” makes sure the task gets accomplished. Automating metadata can significantly decrease the time spent on tasks like data tagging and cataloging.
The Benefits of Implementing a Metadata Strategy
Metadata is an important factor in gaining the maximum value from your data. It assures data consistency, supports Data Governance, and helps with regulatory compliance. It also supports the research used when making intelligent business decisions.
The use of real-time metadata automation can be both extremely useful and cost-effective. Staff can access the most up-to-date data, improving efficiency and Data Quality (and make better decisions). Automation can be used to standardize, classify, and corroborate data. As a consequence, all data inconsistencies – and other issues – are corrected in real time.
Warning: Thorough research (and/or hiring a consultant) should be done prior to implementing a metadata strategy. Wasting time and money on tools that don’t work is counterproductive.
Image used under license from Shutterstock.com