Advertisement

Master Data vs. Reference Data

By on
yelosmiley / Shutterstock

The terms “master data” and “reference data” can be confused fairly easily. Both provide data that changes only occasionally over time and provide data that is designed to be accurate and up to date. 

Master data provides the accurate information needed for business transactions that are critical to the running of a business – the permanent/semi-permanent information about customers, employees, and suppliers, and the organization’s products and assets. 

Reference data, on the other hand, is typically long-term (but not always) and used in defining and classifying “other” data.

The confusion is exaggerated by software platform reviews with titles that focus on reference data, but then shift to describing master data management (MDM) platforms, with only one or two of the platforms supporting reference data.

There is not a standardized definition for reference data. (For a better understanding of reference data, think of the reference section in your public library.) “Other” long-term data covers a broad range of needs and industries. For example, in the finance industry, reference data is a catch-all term for detailed information used during transactions – using dynamic reference data. The children’s growth reference data, developed by the World Health Organization, provides another example – using static reference data. A small sampling of the variety of reference data types includes:

Master data provides the basic information needed for business transactions and may require limited access for security reasons. Reference data provides additional information that helps the business operate more efficiently, and is often easily accessible to all staff.

Reference data and master data are required to be both accurate and up to date.  

Organizations can store reference data in a number of locations. If the software supports it, reference data can be saved in data catalogs, data governance software, and master data management platforms. Additionally, there are a few software programs specific to reference data. In some circumstances, such as when using a data warehouse, reference data can be set up as a subdivision of master data. 

What Is Reference Data?

Reference data comes from a variety of sources and must be managed to support the synchronization of the business’s systems. An efficient method of doing this is to use a data governance platform that includes data reference software. Without this type of management, reference data may be siloed within a department organization. It may also be defined and managed differently if different departments use their own tactics for gathering and storing reference data. Examples of commonly used reference data include:

  • Transaction codes
  • Tasks and business processes
  • Financial hierarchies
  • Customer segmentation
  • Currency information
  • State or country codes
  • Organizational unit types
  • Language codes
  • Cost centers

Reference data can be taken from both public and private sources, and supplies information to different domains. Because of the complex connections between the domains and applications that support reference data, managing it can present some challenges. Managing reference data should not be done manually. Reference data is typically used by every department in the organization to help provide context to their data. It supports data quality and data usability. 

Reference data provides a foundation for the data interpretation process that is used across various applications, systems, and processes.

The primary purpose of reference data is to establish common definitions, classifications, and relationships for data elements. It also uses predefined codes and values. By doing this, reference data enhances the data’s quality and streamlines the data integration process. This, in turn, simplifies data sharing.

For example, the financial industry uses security identifiers, such as International Securities Identification Numbers (ISIN) or ticker symbols that communicate reference data that identifies financial instruments — bonds, stocks, and derivatives. During e-commerce, the use of product codes and categorization can make standardized inventory management and pricing much easier. In healthcare, medical coding systems help to accurately classify and bill for medical services.

Master Data and Master Data Management Explained

Two types of master data management have developed: analytical and operational. Operational master data management describes the core data an organization uses to do business. This data must be accurate and trustworthy to prevent transaction and delivery snags and support the smooth flow of business.

Analytical master data management systems use master data to avoid problems arising from conflicting and redundant information. Without the use of master data, different departments will develop their own versions of master data, resulting in multiple listings, with errors creeping in.

To better understand master data, consider what master data is not. 

  • It is not transactional data: Transactional data is generated by the various applications supporting the business’s day-to-day processes of selling and buying. While this information is recorded and stored, transactional data is not used on a regular basis.
  • It is not unstructured data: Freeform or unstructured data is neither organized nor formatted. Freeform data consists of unstructured text, numbers, dates, and basically any data that isn’t formatted/transformed to work with the organization’s system. Unstructured data can include the written content of web pages or documents, emails, surveys, journal articles, marketing research, etc.

With the appropriate software, master data management can provide a broad range of services, such as data cleansing, data transformation, and data integration processes. As new data sources are added, the master data management software can identify, collect, transform, and integrate new data into the master data system. 

Examples of normally used master data are listed below:

  • Customer data: Generally considered the most commonly used form of master data, basic customer data includes billing addresses, email addresses, and phone numbers, but has grown to include individual shopping preferences based on previous purchases.
  • Product data: This type of data lists all the information needed to support the designs, production, deliveries, and maintenance of a business’s products. Product data includes the technical specifications, drawings, parts, and assemblies. It may also include the bills for materials, work instructions, and approved suppliers.
  • Employee data: This data should not be available to all staff, but only a select few. It typically includes an employee’s social security number and direct deposit account, which should be kept private. Information such as their home address, phone number, next of kin may also be listed.
  • Purchases: Data regarding large purchases and specific stock trades may be listed as master data.
  • Branch location data: The locations of branches, stores, facilities, and franchises are permanent/semi-permanent information and used on a regular basis.

Master data, combined with master data management, can be used to support data analytics. Master data is often used with analytics, in part because it is reliable, consistent, and trustworthy. For instance, a business that uses multiple systems to store their customer’s data runs the risk of each system working with different versions of that data, and ruining an analysis that combines the different system’s data.

Several organizations prefer to limit access of the master data to a small number of appropriate staff – for security reasons – while making reference data available to everyone in the organization.

Data Warehouses, Master Data, and Reference Data

A data warehouse is a form of data management and storage that is designed to support analytics and the development of business intelligence. Additionally, it can be used to store both master data and reference data. Data warehouses are scalable and can be expanded easily. As a business grows, its data storage needs increase, including its needs for a growing amount of master data and reference data storage. Data warehouses allow for this growth.

Data warehouses can accomplish a variety of tasks, including the development of business intelligence, and they are especially useful for analyzing significant amounts of data over extended periods of time.

Analytical master data management can be coordinated with a data warehouse where it centralizes and consolidates the data. Data flows to the data warehouse from a variety of sources, including data collected from external sites, inhouse transactional data, operational master data, and reference data. The process allows organizations to gain valuable insights from their data.

Reference data can be stored in a data warehouse, typically as a subdivision of the master data. Data warehouses often organize the data using a star or snowflake schema, with a central “fact” table that contains the primary data, and additional “dimension” tables, which contain reference data related to the primary data. For instance, in a banking data warehouse, its fact table might contain banking data, such as the amount of a loan, the date the loan was made, and the customer who received the loan, while the dimension tables (reference data) could contain information about product information, customer demographics, and location data.