Understanding Big Data and Data Governance goes hand in hand with the concept of a Data Dictionary. Data Dictionaries have been integral to business functions. This article will demystify and help to clarify the Data Dictionary model.
What is a Data Dictionary?
A Data Dictionary provides the ingredients and steps needed to create relevant business reports from a database. The UCMerced Library simply states in “What is A Data Dictionary” that a Data Dictionary is a “collection of names, definitions and attributes about elements that are being used or captured in a database.” This array, describing a database, needs to provides guidelines, as users enter, edit and delete data in real time. A Database Administrator may likely deal with fluid data. In this case, an Active Data Dictionary, as defined by Gartner’s IT Glossary, provides a “facility for storing dynamically accessible and modifiable information.”
The International Standards Organization (ISO) proposes, in Understanding the Data Dictionary, three categories: Business Concepts, Data Types and Message Concepts. Business Concepts define a business Metadata layer, as described by Zaino, as the “definitions for the physical data that people will access in business terms.” Data Types describe formats for data elements to be considered valid. Message Concepts a shared understanding between institutions and companies to ensure business communications are within the same context. These three Data Dictionary items: Business Concepts, Data Types, and Message Concepts interrelate to one another.
Advantages of a Data Dictionary
A Data Dictionary helps change to be possible. It saves the extra time figuring out what the data means and how it interrelates. Advantages of a Data Dictionary include:
- Consistent Use of Vocabulary: Meaningful information requires instructions on how vocabulary is used and understanding of the context. For example, take the “contact” data element. In a college’s Corporate Relations office, a contact may mean a person, in a private corporation, who would be willing to fund college research and scholarships. To an Admissions Department, a contact data element consists of student’s parents or an alumnus. To the person just hired as an admin assistant, a contact data element may mean a person whom he or she has telephoned or emailed. Without clear definition, in a Data Dictionary, the data entered could take any one of the meanings.
- Useful reports: As the University of Michigan’s Information and Technology Services states “if you don’t understand how the data is structured, the links between tables, and which BusinessObjects folders to use, your report results may be incorrect.” Add the need to generate reports in a dynamic environment, and <Data Dictionaries> become essential.
- Easier Data Document Management: Making a Data Dictionary responsive to change requires simply, access to a computer program with word processing or pen with paper. Blaha states in Documenting Data Models that a Data Dictionary can be easily printed. Such a resource” is simple to receive and requires no modeling tool skill. There is no tool cost” or special software needed to access such information.
- Smoother Database Upgrades: Like the Windows OS, database software, such as that from Oracle, needs to be periodically upgraded. To do this a Data Dictionary is crucial and is a built in aspect of the program. For example Oracle Financial Services Analytical Applications (OFSAA) as well as the Oracle Financial Services Data Foundation (OFSDF) detail how to generate Data Dictionary documentation “to account for site-specific changes as well as release-specific changes from Oracle.”
- More Meaningful Metadata: To have accessible data it needs to be “properly collected and stored.” Metadata provides information about the “context, content, quality, provenance, and/or accessibility of a set of data.” Data Dictionaries provides a centralized location to describe Metadata about the database. As mentioned by AHIMA, having an established Data (AHIMA, 2016). This includes the Metadata pertaining to a database. Just as in the health industry, a Data Dictionary maps any businesses data use by keeping everyone on the same page about the data’s function.
Alternatives to a Data Dictionary
Data Dictionaries do have some draw backs. First, it can be time consuming and cumbersome for a business to maintain and use a comprehensive Data Dictionary. For example, it would be inconvenient for a customer to learn Metadata in order to places an order. Likewise, a Business Analyst, under a tight deadline, may not have time to update or consult Data Dictionary documentation. A start-up environment may not have the information necessary to start a Data Dictionary. Consider these alternatives to a Data Dictionary:
- Captions and Prompts in Forms and Reports: Define Data Elements as they are needed. For example, go to the Address section of a typical e commerce site. A “Select” caption, by State or Provence, instructs a user to choose from a pull down list. Options only include specific menu selections, depending on a particular country chosen. This prevents customers from entering bad data and keeps data consistent. Should a business analyst need to report on the revenue from a particular state, a similar prompts and a pull down box can be used. This verbal prompting may be used along with other data elements to keep business elements consistent.
- User Stories: In Agile development, user stories form the basis to creating a new or updating a product, including a database. “A user story is an artifact describing that an agent (the who) wants to do a specific action (the what) for a specific purpose (the why). It also specifies what steps are required to show or measure (the how).”
As project managers and participants hash out how a program functions and what a customer needs, they define data elements in terms of business context, format, and message. Add specifics about what needs to be captured or used in the database to the story and make the collection of user stories searchable by business context for future sprints. Voila, the objectives in creating a common understanding and vocabulary of data elements happen concurrently with the objectives in the Agile development process.
While captions, prompting or user stories may provide an immediate fix to defining databases, it probably is not a good long-term strategy. Over time businesses grow and the databases evolve. Also the data elements needed to report on how business contacts benefit a business or the number of doctor’s visits needs, becomes murky and complex. Spending the extra time constructing a Data Dictionary would allow for clarification sooner than later.
Data Dictionaries: A Case Study
To look at the value of the Data Dictionary consider the Human Genome Project (HGP). International researchers have worked for years on Human Genome Project to construct a genetic map of humans, account for different genetic variations, and to make this genetic information available for use and analysis. Support for a Data Dictionary type of resource became a crucial requirement for the HGP and was thus created. An established Data Dictionary led to the success of the HGP. According to the National Institutes of Health (NIH) this includes:
- Completion of the Human Genome Project, under budget and more than two years ahead of schedule, in April 2003.
- Discovery of more than 1,800 disease genes, as of today
- Identification of a genetic cause for a disease assessed in a matter of days, from many years.
- More than 2000 genetic tests, enabling patients to learn about their risks.
As the HGP demonstrates, quicker results at a low cost come in part, from an excellent Data Dictionary with a shared vocabulary. This continuing legacy shows how Metadata in a database works hand in hand towards business’s success with Big Data.