Click to learn more about author Ronald G. Ross.
Business has a fundamental problem with Data Quality. In some places it’s merely painful, in others it’s near catastrophic. Why is the problem so pervasive? Why does it never seem to get fixed? Perhaps we’ve been thinking about the problem wrong. Time for a fresh look.
The central flaw in the long-running discussion over Data Quality is literally its focus on ‘data’. Stored data is merely the system or database residue of things that have already happened in the business, a memory of past events.
To truly fix ‘Data Quality’ problems requires a business perspective, a shift in the focus from data design or data cleansing to what occurs in the business itself. Our sights should be trained squarely on the business activity that results in the data. This discussion introduces six dimensions of Semantic Quality to ensure our aim is true.
Consider what workers are actually doing when they create a piece of data. In the business world, of course, they’re probably just doing a bit of work. Look more closely, however, and in some ways what they’re doing is actually quite profound. Think about it this way:
Creating data is a business communication to people in the future.
In other words, the act of creating data is the act of sending a message.
Normally we think of communication in terms of either direct conversations or (in the spirit of the times) a flurry of text messages exchanged more or less in real time with people we know. In either case there’s usually a shared context within which the meaning of the messages can be interpreted.
What’s distinct about creating data as an act of business communication is that you’re almost certainly not going to be face-to-face with the recipients of the message or connected live with them via an interactive network. That fact rules out body language (e.g., raised eyebrows or emoticons) and dialog (including grunts and groans, or more emoticons) to clarify what you mean. In that sense the communication is blind, as illustrated in Figure 1.
Figure 1. The Act of Creating Data as a Blind Business Communication to People in the Future
As a consequence, the data a worker creates literally needs to speak for itself. The emphasis needs to be on communication quality.
Communication quality focuses on whether the meaning of a message is clear. Just formatting data correctly doesn’t get you there. If the meaning isn’t clear a business communication won’t be properly understood. In other words, you need Semantic Quality – not just Data Quality.
The six dimensions of Semantic Quality are presented in Table 1. They apply equally to both structured data as well as ‘unstructured data’. Don’t be put off by the word semantic. We’re simply talking about the meaning of messages.
The six dimensions of Semantic Quality are discussed more fully later. They seem largely self-evident but as you might imagine, there is much more to them than initially meets the eye.
The Role of Data/System Architectures and What Data Quality is Really About
Because of the time delay in delivering blind communications to everyone in the future who might need them, a secure, well-organized holding area is needed. IT professionals, hopefully guided by knowledgeable data architects, create data/system architectures for that purpose as illustrated by Figure 2.
Figure 2. Data/System Architecture as a Rest Stop for Business Communications (Data)
Unfortunately, typical Data Quality measures in current use (refer to Appendix 1) focus on the health of the content of the data/system architecture rather than on the Semantic Quality of the original business communications. That focus serves a purpose for data management, but misses the mark almost entirely in clarifying what practices produce good business communications in the first place. Compared with the Semantic Quality dimensions presented above, typical Data Quality dimensions are:
- Retroactive rather than proactive
- Quantitative rather than qualitative
- Systemic rather than semantic
The quality of data in a data/system architecture can never be any better than the quality of the business communications that produced it. A systematic means to manage data at rest simply does not guarantee the vitality – the semantic health – of the business communications it supports. Unfortunately, many IT professionals fail to understand this point. (Many data professionals do understand the point, but do not know quite how to articulate it or feel powerless to do much about it.)
To make the point differently, it is entirely possible to assess your Data Quality as outstanding even though the business communications that produced the data were confusing, contradictory, unintelligible, or otherwise ineffective. Such an assessment would of course be nonsense.
To read Part Two of this article click here.