“To tackle today’s Data Quality challenges, you need a more strategic approach,” said Nigel Turner, Principal Consultant, Global Data Strategy. Turner spoke at the DATAVERSITY® Enterprise Data Governance Online (EDGO) Conference about Data Quality Management, what it entails, and how to succeed by aligning it closely with two other disciplines.
What is Data Quality Management?
The DAMA-DMBoK2 defines Data Quality Management as the planning, implementation and control of activities that apply quality management techniques to data in order to assure it is fit for consumption and meets the needs of data consumers. Quality data is demonstrably fit for purpose because it can be proven to meet the needs of the users of the data, said Turner.
The level of accuracy needed to consider data fit for purpose varies based on the end use of the data. One hundred percent accuracy is not always achievable, nor is it necessary in every context. When invoicing customers, for example, 100% should always be the target, because that directly affects income, he said.
For a marketing prospect database, on the other hand, 85% is probably acceptable. The effort and costs to achieve 100% accuracy should be weighed against any potential gains. In order to prove the effectiveness of Data Quality measures, baselines must be established so that when improvements are made, resulting outcomes can be measured.
Data Quality Criteria
Turner said these five key parameters are used to evaluate Data Quality:
- Accuracy: Does the data reflect the real world? “If an organization has my name and address, do I actually live at that address?”
- Completeness: Are all the elements of the address there? Is any needed data missing?
- Reliability: When data is duplicated, is the data consistent across all sources? “If my name and address is held in six different data sources, are all six data sources accurate?”
- Accessibility: Are the right people to able to access the right data?
- Timeliness: Can they do so in a timely way when they need it, rather than when is too late to be of any value or use to them?
How Poor Data Quality Impacts Organizations and Individuals
Regulatory: Data Quality is becoming increasingly important due to new laws and regulations for data use, and the consequences for violating them..
Decision-Making: “If your data isn’t fit for purpose, then being data-driven can lead you to some very unwanted outcomes,” he said. If performance data is inaccurate or missing, for example, bad decisions can result.
Lost Revenue and Increased Costs: Failing to bill customers on time, or failing to bill for all services or products due to poor quality data creates a loss of revenue. Costs increase when the wrong product is sent to the customer. The customer has to deal with a return and the right product must be re-sent.
Reputation costs: Bad data stories can damage the brand and the reputation of the company, and in some cases, cause real harm.
Data Quality Fail: Two Cautionary Tales
For a recent Prime Day sale event, Amazon featured a Canon telephoto lens, which normally retails at around $13,000 on sale for $9,498. The ad for the sale, however, listed the price as $94.98. Although Amazon discovered and corrected the error within a few hours, given the power of social media, hundreds of people had already purchased the lens at the price of $94.98.
Despite the loss of $7,738 per lens sold, Amazon chose to honor the deal at that price rather than risk possible legal battles with customers who bought the product in good faith at the price advertised. Had they not fixed the problem and put things right, it could have affected customer loyalty to their brand, he said.
A man entered a National Health Servicehospital in the U.K. for a cystoscopy, a diagnostic procedure that entails inserting a camera into the bladder to investigate potential problems. The patient’s name was very similar to another patient also awaiting surgery, and because no one verified the identity of the patient, he was given a circumcision instead of a cystoscopy. Apart from having one very understandably disgruntled patient taking legal action, Turner said, personal harm as well as economic damage can result from poor Data Quality.
Data Quality Is in Worse Shape Than Most Managers Realize
Turner shared that in a 2017 study, Harvard Business Review asked 75 senior executives in different organizations to review 100 records at random from a key data source. They found that only three in 100 records—3% of the records reviewed—met expected Data Quality thresholds. In other words, 97% had significant Data Quality problems. Although much of that 97% can be attributed to old data, even among newer records, about half also had Data Quality failures.
Industry Impact: Four Studies Paint the Picture
Over half of organizations believe that at least 26% of their data is inaccurate (BARC 2019). On average, poor Data Quality costs companies between 15 and 25% of revenue, according to a 2017 report by MIT Sloan.
According to a 2016 IBM study, the U.S. economy loses $3.1 trillion a year due to poor Data Quality, and in 2017, Royal Mail Data Services concluded that poor quality data costs U.K. companies an average of 6% of their annual revenues.
Why Does Poor Data Quality Persist?
Data Quality is complex because businesses and organizations are complex. Twenty years ago, the norm was a mainframe with one database, accessed by end users on dumb terminals. Managing Data Quality in that environment was a lot simpler.
Since then, the rate of growth of data has increased significantly, and it’s very hard to keep a handle on it, Turner said. Collection points for data are now ubiquitous and the variety of data now includes more structured, semi-structured, and unstructured data. Potential for duplication and errors increases exponentially now that people carry multiple devices and have multiple contact options.
The business environment is constantly changing, with mergers, acquisitions, startups and closures, so that a business-to-business contact list now decays at a rate of 2% a month, making 75% of that database useless within three years.
Also, a lack of understanding about data’s role can lead to a lack of responsibility. Poor Data Quality is a business problem, not an IT problem, he said, and if no one is formally responsible for the improvement of data, it never gets fixed.
People Make Mistakes
Very often when people don’t get the training they need, they misunderstand the meaning of data entry fields. A simple lack of commonly understood definitions for data collection can lead to complicated issues, and Turner presented the following example:
A telephone company in Wolverhampton, U.K., discovered an alarming rate of vandalism to their equipment in the community and put together a plan to reduce it. Turner was hired to help find a solution, and his team went to work analyzing the vandalism data and how it was collected.
In the course of that process, they sat with the front-line workers at the company who entered the data about when and where the vandalism occurred. One woman, whose name was Veronica, took a call about a flooded basement while they were there. In the field designated for the reason for the call, rather than “F” for flood, she put her initial, “V,” (unbeknownst to her, the company’s code for “vandalism”) to indicate that she’d handled it. Every call she had taken was erroneously reported as vandalism.
“The moral of that story is that it doesn’t matter how good your systems are,” he said. “If you don’t train people to input the data correctly, they make mistakes.”
Cleaning Up the Mess
Data scientists are now spending 80% of their time organizing and cleaning data, instead of using their expertise to provide business insights, and according to a 2018 study by RapidMiner, those tasks are the least enjoyable part of working as a data scientist.
Fixing data problems by hand simply isn’t viable with the quality with the volume and velocity of data today. “Automating some of that data prep is absolutely key to getting the best out of your analytics capabilities,” Turner said.
Data validation must now happen in real time, stopping the creation of bad data before it occurs, rather than waiting until it goes wrong and then trying to fix it. To tackle both traditional challenges and new data challenges, approaches should be automated, reusable and linked to Data Architecture. Tool sets should work in many different environments, he said.
Data Governance and Data Architecture: Keys to Success
Data Governance creates the strategic framework, within which companies can address Data Quality problems across the organization, he said. “Data Quality is really the end product of the Data Governance process.” Turner sees Governance as a process that delivers better Data Quality, but adds that perhaps the most underused strategy is maximizing the synergy between Data Quality and data architecture.
Data models can help to identify which data areas are most important as well as highlight key entities and attributes. Once issues are identified, a physical data model can show the platform and location where problem data is held. Data architecture also provides data flow diagrams or process swim lanes that can document processes.
A Strategic Approach to Data Quality Management
Addressing Data Quality issues requires a holistic approach that combines people, process, and technology, Turner said. The traditional bottom-up approach to solving problems at the project level has no inherent overarching policy tying Data Quality to the goals of the business. This narrow focus also leads to individuals finding multiple disparate solutions to quality issues rather than sharing knowledge, leveraging solutions, and reusing them company-wide.
A more strategic approach relies on a structured Data Governance framework to set the priorities for Data Quality improvement. Data Quality improvement should be managed, he said, like any good project, with a proper plan and a proper framework for actually delivering benefits that the business case proposes. Most importantly, the quality improvement piece of work needs a business case, he said. “If you can’t link Data Quality improvement to a business benefit, you should never do it.”
Want to learn more about DATAVERSITY’s upcoming events? Check out our current lineup of online and face-to-face conferences here.
Here is the video of the Enterprise Data Governance Online Presentation:
Image used under license from Shutterstock.com