To succeed as a data-driven business means understanding data problems quickly, before spending time and money on a ready-made solution. As Albert Einstein stated, “If I had an hour to solve a problem I’d spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.” Organizations need to use their time wisely by understanding the overlap and differences of Data Governance vs Data Quality issues before attempting to solve them.
Clearly, working with product or customer data from disparate sources without considering Data Quality can lead to disastrous results. According to Gartner, poor Data Quality will cost average organizations upwards of $8.2 million a year. But, only addressing Data Quality risks leaving Data Governance issues, such as security, regulation, and exploration uncovered. The price for non-compliance can be huge. Penalties for a non-compliant business with the European General Data Protection Regulation (GDPR) range around $22 million or four percent of the total business turnover, starting May 2018.
It may seem tempting to use a Data Governance solution to solve both Data Quality and Data Governance problems; however, this approach may impact Big Data’s timeliness and completeness towards business deliverables. For example, a telecommunications company needing transparency on mobile service for consumers and operators required a different kind of solution than a telecommunications business with a lack of Data Management and ownership. Understanding the similarities and differences between Data Governance and Data Quality means understanding the problem and spending less time and resources towards resolution.
Data Governance vs Data Quality: What Are They?
Think about Data Quality the next time you grab tomatoes to eat. Data Quality describes a system of keeping something usable (e.g. that grabbing tomatoes, from the fridge or counter, results in eating firm, sweet-smelling, shiny tomatoes instead of soft-spotted, putrid, moldy ones). Data Quality is the reliance on accuracy, consistency, and completeness of data to be useful across the enterprise. Gartner breaks down the Data Quality problem further to these aspects:
- Parsing and standardization
- Generalized “cleansing”
- Matching
- Profiling
- Monitoring
- Enrichment
Unlike Data Quality, Data Governance describes processes and practices in place. Data Governance means putting data assets management practices and processes in place. This includes authority and control (planning, monitoring, and enforcement). Consider Data Governance as if a parent asks a much older child to get some produce (e.g. tomatoes) for supper. Likely the parent will suggest the child purchase tomatoes from a reputable store or farm stand. If the older child decided to take a short cut and pick a tomato from one of the neighbor’s gardens to bring home, without permission, then the parent may not approve of this process. Eran Levy synthesizes Data Governance problems further to:
- Data Availability
- Data Consistency
- Data Selection
- Consistent Analytics, Metrics, and Reporting
- Data Compliance
How do Data Quality and Data Governance Overlap?
Data Governance vs Data Quality problems overlap over processes that address data credibility. High-levels of Data Quality can be measured by confidence in the data. From a Data Quality perspective, this can be addressed by Data Quality tools that deal with parsing and standardization, matching, generalized “cleansing”, profiling and monitoring. Goals formulated using the Data Quality Framework such as standardizing data represented in multiple formats, ensuring there are no duplicates, or a better understanding of data using Metadata or Master Data Management would fall into data cleansing. These Data Quality and data cleansing goals intersect with those from Data Governance.
Part of Data Governance practices require pure and usable data. Business goals needing resolution of past and current data issues and monitoring and tracking of clean data fall into the role of data cleansing and converge with Data Quality. In a Data Governance context, Data Stewards ensure pristine data, among other tasks. Data Stewards act as part of Data Governance to ensure unsullied data around: Data Quality rules and policies, Data Integration, and Business Glossary standardization. Specific data cleansing goals intersect with Data Quality activities ensuring the completeness, accuracy, and consistency. Processes dealing with issues around developing confidence in data, fall in the Data Quality and Data Governance domains.
Data Governance vs Data Quality: The Contrasts
While Data Quality and Data Governance overlap in data credibility issues, both use different frameworks in viewing and describing data. As a result, each framework defines problems differently. Questions posed from a Data Quality perspective include:
- How useful is my data across the enterprise?
- How complete is my data?
- What is my data structure?
- How quickly can I upload and download my data?
Addressing these questions lend themselves to nicely as a first step to Master Data Management (MDM) implementation and to integrating. Take another example, a call center integrating others products. Not only does the company require clean data for the telemarketers, but also analytics and reports for the call center to fuel sales. Such a problem may be best applied through a complete MDM package. Perhaps data looks fine from one system but gets really dirty when it is extracted to a different context costing data analysts and business owners extra time. Then a specific Data Quality product from a vendor specializing in such issues may fit this issue.
The Data Governance viewpoint, in contrast to the Data Quality one, does not concern itself, directly, about the data usability between marketing and sales or how to set up Master Data Management in the data warehouse. The Data Governance framework looks at authority and control over Data assets. The Data Governance approach asks questions like the following:
- Who is responsible for what aspects of my data?
- How well does the data comply with the GDPR so that the company can do business in Europe?
- How will new hires and existing staff be trained on company data policies?
- How can data scientists be assured access to raw data to identify new business opportunities?
These questions, for example, identify conflicts in data control needs among the Business Analysts, Data Scientists, and company requirements to be legally compliant.
A Data Governance solution, e.g. Collibra’s Catalog example, provides an Amazon type of checkout application that addresses these issues by enforcing Data authority and control. Verizon Wireless experienced such a problem where it needed to overcome reporting discrepancies driven by a lack of management and control. The Business Governance perspective may identify data silos among different departments. This results in a lack of access, hindering staff and customers in communicating essential information, in addition to creating pockets of adoption. In this case, a Data Stewardship solution from BackOffice Associates would set and enforce data policies towards connecting all an organization’s data for analysis and use.
Data Governance vs Data Quality: Final Conclusions
Companies need to know how to define data problems as Data Quality and/or Data Governance ones. While there is some overlap between the two perspectives the questions posed by each differs and impact understanding data problems. It may seem easier just to focus on milestones and results when analysts or other staff are spending 75 percent of their time cleaning up data to prepare it for use. But this type of issue could be a result of a bad data transformation from one system to another, or inaccessibility of staff to check out information, or a combination of both.
The remedy to, what seems quite a simple problem, truly depends on comprehending the problem. Data Quality and Data Governance concepts provide tools towards defining the data problems at hand. Using both frameworks, reduces the risk of seeing an issue through only one lens and investing in a solution that does not address the real data problem that a company wants to fix.
Photo Credit: Titima Ongkantong/Shutterstock.com