Ensuring optimal Data Quality is a difficult and often problematic task for many organizations. A study of 75 executives conducted over the last couple of years revealed that only 3 percent found that their departments fell within the minimum acceptable range of 97 or more correct data records out of 100, according to an account in The Harvard Business Review.
The research, carried out by Thomas C. Redman, President of Data Quality Solutions, with Cork University Business School Lecturer Tadhg Nagle and Professor David Sammon, also revealed that on average, 47 percent of newly-created data records have at least one work-impacting error. “No sector, government agency, or department is immune to the ravages of extremely poor Data Quality,” the authors concluded.
Is it possible to drive high-performing Data Analytics with so much data of rather dubious quality? Yes, if you take a pragmatic approach to the work, advises Andrew Patricio, President of Data Effectiveness Inc., during his presentation titled “Practical Data Strategies in the Real World of Poor Data Quality” at the Enterprise Data World 2017 Conference.
“Data reporting must be clean enough to propagate up the stack. If you have a beautiful analytics machine but you put in stinky data you get garbage out,” Patricio said.
Steps to Success
Solving Data Quality issues to improve Data Analytics starts with having the business users articulate not what they want – say, a Data Warehouse or Data Lake – but the problem they’re trying to solve. That way, they’re partners in the effort and are equally invested in ensuring that the solution to that problem isn’t just about technology but also encompasses training and new processes where necessary. “If you’re talking about the problem to solve, whatever changes may happen in getting to a solution, your goal is still the same,” he pointed out.
When it comes to the technology quotient, he also said it’s best to just “slightly lead” the data processes that are already in place. That is, evolve the data capture system along with the processes and have the reporting sophistication keep pace with Data Quality. “Lead it a bit to give them room to grow from one solution to the next,” he said, because there’s so much work that goes into transforming data that you’ll save time by making small iterations rather than huge leaps.
Drawing on his past life as Chief Data Officer at the District of Columbia Public Schools, for instance, he explained how it would make sense for a school system to move from a Notepad data entry system with poor process sophistication, Data Quality and reporting sophistication to Excel, before progressing from data cells to data records with a system like Microsoft Access. Only then could the journey progress to a Student Information System with normalized data models for strong process sophistication and Data Quality and a separate reporting system to add reporting sophistication to the mix, too. In other words, don’t charge ahead and build a formal Data Warehouse for Excel “data systems.”
It’s equally pragmatic to prioritize the data you need to get into shape. That means putting attention on the data that feeds the analytics that inform the decisions that are being made in support of organizational goals. Not every piece of data needs to be at the same level of quality as the data that is relevant to achieving an organizational goal, whether foundational or value-adding.
In the case of the D.C. public schools, for instance, it was more important to have high-quality enrollment data than high-quality attendance report data. Why? The former directly relates to funding, he said,
“So, if you missed information on a student you wouldn’t get funded for that student. The organizational goal of having money led to the outcome of having the enrollment audit be correct, so the analytics had to be right, which meant data for the analytics had to be right.”
Get in the CAR
Organizations have to improve Data Quality in the long term but also make data-driven decisions in the present, all in the face of data entry mistakes and legacy data that could, for example, change definitions as regulations or mandates alter. So, what makes data good enough to be relied on for doing the jobs that need to be done now?
To that end he recommended embracing the ongoing CAR (consistency, accuracy, relevancy) data cycle. Consistency is driven by reporting and relates to a metric having the same value for the same parameter regardless of who pulls it. “If I get different numbers then no one trusts the data even if one of the numbers is correct,” he said.
The same metric in different reports must be able to be traced back to the same source, and care must be taken to ensure that it isn’t confused with a metric that has different parameters (in a school setting, for instance, truant absences are different from sick-day absences). Also, the time factor has to be considered, as legitimate changes can be made after a report is run.
“You can’t talk about accuracy until you have agreement of the values being discussed,” he said. Accuracy, which is driven by Analytics, depends upon using only good data. If there’s a bug in the query causing a metric to be inaccurate, fix it. If business rule definitions are wrong or inconsistent, correct them, for instance.
“Once there’s agreement that the number is consistent and reflects reality, the question is, ‘do we care?’” he said. That is, is the metric relevant – does it matter toward meeting the business’ goal? If it’s not, then either the goal or metric must change. In the case of his work at the D.C. public schools, for example, once its truancy number was deemed consistent and accurate, it became possible to analyze the impact of in-seat attendance (which counts all absences except in-school suspension) on academics.
Ensure Everyone is Invested
One aspect not to ignore when it comes to improving Data Quality is the need to drive a Data Quality culture. “In the end, people enter data,” he said, “so they must understand the point of Data Quality.”
Take the case of the front-line worker who is trying to enter data fast so as to keep the customer experience moving along. Speed might result in creating data inconsistencies that later impact that worker’s ability to reap any real value from reports. While it’s possible to put in place validations and restrictions around data entry to help keep quality high, “there’s only so much a computer can do to stop people from doing silly things,” he said. The key is helping them realize why they shouldn’t do that thing in the first place – “stop it at the source so it doesn’t impact the reporting side,” he said.
The best way to get to that is by not thinking of data entry personnel as users but as part of the data team. It’s a mental shift and a practical one, too, that requires they have tools and training of how to do things in a better way and how to avoid making entry mistakes that take a toll on report quality. At the D.C. public schools, the approach Patricio followed was to connect the dots for data entry personnel: “Let me tell you what happens when you do x and how it affects the report you complain about,” he related of his conversations with staff like registrars who inadvertently created enrollment overlaps by working around data entry procedures in an attempt to keep the registration process moving forward.
Nine times out of ten, Patricio said, if you show and explain how the algorithm you use to process the data doubles truancy or enrollment counts or whatever it might be, “they realize that later they will feel the pain themselves.”
Check out Enterprise Data World at www.enterprisedataworld.com
Here is the video of the Enterprise Data World 2017 Presentation:
Photo Credit: NicoElNino/Shutterstock.com