Data Quality metrics are a measuring system that allows the “quality of data” to be evaluated. Data Quality metrics can be used to determine how useful and relevant data is, and it helps to separate high-quality data from low-quality data. It is much easier—and safer—to make business decisions based on reliable information.
Poor information (based on low-quality data) can have a negative impact on business decisions. Poor quality data can result in damage to business performance, a reduction in innovation, and less competitiveness.
People developing business intelligence and performing data analytics should learn how Data Quality is measured, and which dimensions to use. While the Data Quality will not be perfect, it is important to get it as clean as possible, particularly if it is going to be used for business intelligence. When the right metrics are applied, enterprises know where they stand, and the goals to aim for.
What Can Negatively Affect Data Quality?
There are many factors that can affect Data Quality, and determining them is important for eliminating the damaging ones. (There might also be some positive factors worth paying attention to.) Here are the most common damaging factors:
- Manual Entry: Mistakes made during manual entry of data
- Data Decay: Data can start off being accurate, but as time passes, things change and the data never gets updated
- Data Movement: This error appears when moving data from one system to another, especially when it is transformed from one format to another
Metrics vs KPIs
Metrics and KPIs (key performance indicators) are often confused. Key performance indicators are a way of measuring performance over a period of time, while working toward a specific goal. KPIs supply target goals for teams, and milestones to measure progress. Metrics, on the other hand, uses dimensions to measure the quality of data.
It is, unfortunately, easy to use the terms interchangeably, but they are not the same thing. Key performance indicators can help developing an organization’s strategy and focus. Metrics is more of a “business as usual” measurement system. A KPI is one kind of metric.
Importance and Benefits of Data Quality Metrics
Business organizations struggle to adapt to the flood of new technologies and data processing techniques. The ability to not only adjust to changing circumstances, but to eclectically embrace the best of those technologies and techniques, can lead to long-term improvements, help to minimize work stress, and increase profits. Using high-quality data for decision-making can be the difference between success and failure.
The key goals of a business are to become more profitable and successful, and high data quality (accurate information) can help to achieve those goals.
High-quality data can benefit businesses from all industries and sectors. Some of the benefits gained from understanding and maintaining high quality data are:
- Improved Marketing: Data accuracy plays a vital role in marketing, ranging from the customer experience to marketing research. The volume of marketing data currently available can help in achieving certain business goals.
- Improved Decision-making: Data quality metrics can provide accurate data for decision-making purposes. In turn, productivity increases, as does confidence in business intelligence and data analytics.
- Improved Customer Satisfaction: Customer service is currently a very data-oriented process. Today’s customers have learned to expect efficient, personalized experiences, and become upset if those expectations are broken. (Those pesky broken expectations.)
- Cost Reductions: This benefit is based on many dimensions. Data reliability allows businesses to finish more projects in less time.
The Data Steward
A data steward provides oversight for data within an organization, and is also responsible for ensuring the quality of an organization’s data assets. This includes understanding and measuring the data for quality, or accuracy. By using the right Data Quality tools, a data steward can develop a system that supports high quality data.
Dave Wells, a Data Management instructor and analyst at Eckerson stated:
“Today, data stewards are at the center of collaboration, coordination, and communication among data consumers, data governors, and Data Management staff.”
Although a large number of Data Quality dimensions have been created and defined, there are a few key dimensions that data stewards can use with relative ease to assure quality data.
Key Data Quality Dimensions
Data Quality dimensions provide a way to categorize different types of measurements for Data Quality. This system of measurements provides an underlying structure that supports trust in the data being used.
Generally speaking, different dimensions apply different measurement systems. For instance, accuracy compares the data (a symbolic representation of the real world, or at least parts of it) with the actual real world, while measuring completeness requires determining the amount of missing information in forms. Listed below are the basic dimensions needed to make reliable assessments:
- Accuracy: Data accuracy is critical in large organizations, where the penalties for failure are high. In the financial sector, data accuracy is usually black or white—it either is or isn’t accurate. Measuring accuracy involves finding the “percentage of values” that are correct, as compared to reality. This can be done by taking samples and using statistics.
- Completeness: This dimension is used to confirm all the needed information included in forms and applications.
- Consistency: Maintaining a synchronous relationship with other databases is essential to ensuring data remains consistent, and is regularly updated. (Certain software systems can help with this.)
- Integrity: Measuring data integrity involves using all the data quality metrics listed above. (Best to take those measurements first, record them, and then use them to measure integrity.) An example of metrics for integrity is the percentage of data that is “the same” across multiple systems.
- Duplication: Duplication can be a source of inconsistency. When the original document is updated, the copy may not be. The problem arises when someone uses the copy, instead of the updated original.
- Timeliness: This dimension shows the accuracy of data during specific points in time. For example, a customer moves and notifies their bank, but the bank doesn’t process the change-of-address for three days. These kinds of timeliness delays can lead to mistakes. A metric for timeliness is the percentage of accurate data that can be obtained within a certain amount of time (months, weeks, or days).
Data Governance and Data Quality
After an organization decides to focus on the issue of Data Quality, analysts can use data quality metrics to identify data errors, overall. Data Governance, however, can be difficult to measure, primarily because it uses not just data quality metrics, but can also include new processes, new expectations, and new responsibilities. It is important to consider different metrics that can reflect the complexities of Data Governance.
Although Data Governance and Data Quality are different disciplines, they do work in parallel.
When a focus on Data Quality is applied to a Data Governance program, Data Quality provides a much more complex view of quality. It offers a view of quality that expresses how the organization uses data, and allows data stewards to identify issues needing to be addressed.
Business value measures provide examples of business value include increases in profits, reductions in costs, and increases in productivity.
Accountability and compliance metrics measure the adoption of business standards and the Data Governance program’s performance. The following elements can be used in evaluating accountability and compliance. The number of:
- Departments using data standards
- Information systems sharing data standards
- Business processes using data standards
- Production reports using data standards
- People in the organization using data standards
Simple Data Quality Metrics
There are some Data Quality measurement systems that focus on processes with error rates. The errors are measured and then used as indicators of the data’s quality. These are the metrics companies can use in measuring their data’s quality:
- Ratio of Errors to Data: This metric shows the number of errors compared to the size of the data set. Some common errors include incomplete, duplicated, or missing entries.
- The number of empty values: This measures the number of empty fields in a data set or data that’s located in the wrong field.
- Data Transformation Errors: Data transformation converts data from one format to another. The number of errors is compared to the number of transformations.
- Email Bounce Rates: Emails that are returned suggest your data is of low quality. Often, emails are sent to the wrong address, and bounce back because of outdated or missing information.
- Dark Data: Dark data is useless data that was acquired through different computer network operations. It can’t be used for decision-making or gaining insights.
- Data Storage Costs: While the cost of storing data increases for an organization, and the amount of data being used remains the same, there are Data Quality issues. The bulk of the growing data is being stored, but not used.
Dashboards for Data Quality
Data Quality dashboards are information management tools that can visually track, analyze, and display the metrics used in measuring Data Quality. Data Quality dashboards can be customized to a business’s specific needs. They can be used to provide a snapshot of the data’s quality and can also use historical data to identify trends and patterns.
There is specific software available, as well as some dashboards that can be adapted. Also, a Data Quality dashboard can be built.
Image used under license from Shutterstock.com