Click to learn more about co-author John Ladley.
Click to learn more about co-author Len Silverston.
People are unconsciously incompetent at communicating with data. This applies to our personal and professional lives. This ignorance creates an ethical issue that is profoundly important to all business and societal communities.
This inability to understand and communicate with data has caused aggravation and divisiveness. We can argue data issues around COVID-19 have caused death. This is a problem for society to address, not just governments and companies.
A key issue is that users of data manipulate data to support and foster their own position. People regurgitate charts and other data without an understanding of the source or context. Many “official sources of data” are biased or taken out of context. Often the same data seems to result in differing opinions or is dismissed because it does not support a certain context or point of view. This behavior has certainly been a factor in the current pandemic.
We have reached a tipping point, but none of this is surprising. Humanity fails at data, astonishing given how much we produce.
The authors of this article are “data professionals.” We help organizations get “data literate,” improving management and use data. We feel a broader view is required, and we need to address social Data Literacy, i.e., society adopting standards of behavior for communicating and using data. Think of other standards we use in communicating, such as in legal matters — terms are defined at the beginning of a contract. We are accustomed to standardization in communication. With data, anything goes. There are no guardrails or commonly accepted standards for communicating and using data in our society. We are data illiterate, and if we don’t become more “data literate,” harmful trends will continue.
People have always spread information well before it is verified. Historically, data is filtered or adjusted to support opinions, motivations, perceptions, stories, and judgments. Rumors around the village campfire gave way to court gossip, which gave way to yellow journalism. Now we have false internet posts. None of this is new.
However, we are now in a period of human existence where data growth is exposing us to greater numbers of opinions and judgments. Our ability to effectively perceive is compromised since we are overloaded. The viral aspects of the internet encourage nasty and imprudent actions. Data is no longer just information you pull in and consider. It is intertwined in all human activity. Ethically, as a species, we can no longer tolerate the attitudes that led to yellow journalism and now lead to the manipulation of elections, environmental actions, health trends, and other endeavors.
Data should approximate truth or at least come close. But in today’s world, the vast proliferation of data means we will be a perpetrator or victim of the misuse of data.
This is a new problem. We have seen a world where data is now ubiquitous. We did not see this coming. There are many examples that demonstrate our challenges, and we are living such an example now.
Our current challenge, specifically, is a novel coronavirus that causes a disease called Corona Virus 2019, shortened to COVID -19. [1] We have all seen the published data and charts — with one side saying “We should do A” based on the data, and the other side saying “We should do B” based on the same data. The following are examples that illustrate this Data Literacy problem.
A news story compared the 2002-2004 SARS pandemic to the COVID-19 pandemic. Mention was made of the total deaths from SARS but in the context of the interpreted “higher percentage of fatalities from SARS vs. COVID-19.” The story then concluded SARS was far more dangerous, yet the global economy was not shut down. Based on that interpretation, the news story editorialized that society was overreacting. Some government officials accepted information such as this and limited their actions based on these perceptions. You could argue that people died because of this data interpretation.
From a data standpoint, there were multiple issues with the above.
- The percentage of fatalities was measured differently. The population affected for SARS was much smaller than COVID-19. The data sampling was different, and, thus, the comparison is not uniform or fair without taking the various differentiating factors into consideration.
- The aggressive reaction taken in response to SARS, such as Asia regions shutting down travel, was not mentioned in the news story.
- Worst of all, SARS is over. It mutated. It faded
away. COVID-19, as of this writing, is expanding. We do not have all of the
COVID-19 data, yet. So, the time frames cannot be accurately compared.
In layman’s terms, we aren’t talking apples to apples. The article could be right or wrong. There is no way to draw a reasoned conclusion if you do not understand the proper use of data. But that did not stop people from acting upon skewed judgments.
Here are other so-called “facts” about COVID-19. The following quotes were published and spread widely on social media:
- “Every election year has a disease. SARS-2004 Avian-2008 Swine-2010 MERS-2012 Ebola-2014 Zika-2016 Ebola-2018 Corona-2020.”
- “COVID-19 has a 99.7 percent cure rate for people under 50, and its spread is leveling off.”
- “Coronavirus has a contagion factor of 2, SARS was 4, and the measles was 18.”
One of the authors, Len, saw variations of these points in a healthcare professional’s office. However, all of the above statements were challenged by many sources. We all know, intellectually, that posting something on the internet does not make it a “fact.” Yet there is evidence that continuous sharing and forwarding these so-called “facts” creates an aura of truth. Many people accepted these “facts” and acted accordingly. This could have led to increasing the spread of the disease.
Besides the efficacy of “facts,” consider muddled terminology. The above-mentioned social media posts called these “coronavirus facts,” comparing coronavirus with SARS. Coronavirus is a category of a virus, one of which is the common cold; another is SARS-COV (the 2002-2004 outbreak), and another is COVID-19. When “facts” are stated with inconsistent definitions, the doors to distortion are opened. When disease categories are confused with a specific disease, the data is not distinguished or classified properly, and accurate measurement is difficult.
Understanding context is key to Data Literacy. People often look at isolated pieces of information out of context, and this skews their conclusions. For example, South Korea was declared one of the more dangerous countries because of the large number of people testing positive for COVID-19. The same data was used to compliment South Korea because the number of tests was almost one for every 150 people. Even though South Korea tested many more people than most countries as a percentage of their population, they appeared to have more sick people. Various presenters ignored the context that the truthful picture was not just the numbers of positive tests. Again, opinion was applied to the interpretation instead of viewing appropriate contextual factors.
A lack of understanding of basic statistics is an obstacle to Data Literacy. The difference between correlation and causality is important. Just because there is a correlation of data doesn’t mean that there was causation, or in other words, a cause and effect relationship between the variables. For instance, a study suggested that a certain vaccination offers some level of protection against infection by the novel coronavirus and even reduces mortality. This sounds promising. However, experts communicated that we must be careful not to suggest that the vaccination definitively affects the mortality rate. Thus, correlation is useful to suggest a plausible hypothesis, but clearly, “We need more data from trials to be able to say anything with confidence.”
The issue is now ethics as it relates to data. Spreading misinformation causes harm. Interestingly, we use data every day for ourselves with few problems. We compare prices. We review the statistics of our favorite team. But passing on inaccurate, inappropriate data or misinformation are ethical issues. When you do something that creates harm, you have done something unethical.
So, what is to be done?
First, before automatically sharing data, create some space, and wait. “Before fear makes you press ‘share,’ take a deep breath and check. A dose of caution can stop you from making a bad situation worse.”
Second, do not assume that data is factual. Know the difference between a story with data and the data itself. Consider asking the following questions:
- How true is this?
- What are possible biases?
- What about this is a fact or a story?
When accessing data, understand the data supply chain. What is the source of your data? What are the underlying motivations of the data suppliers? Sites like Media Bias/Fact Check (MBFC) and Snopes provide some insight regarding biases in data source and fact-checking.
Third, consider the context used in creating the data. Often an isolated fact is exploited without a referral to the bigger picture. Your use of the data may be in opposition to that context. Understand the variables in use. Don’t look at one piece of data in isolation. What are the important factors about the data you receive?
Fourth, and perhaps most painfully, we all need to grasp some basic statistics to be a responsible data citizen. Besides the aforementioned difference between correlation and causation, statistics behind data can be very misleading. Show your data with statistical margins of error and probabilities. Be careful of rounding the numbers to prove a point.
Fifth, consider the amount of data that was used to provide evidence of truth. In general, a larger data set with uniform data collection processes can provide a more reliable analysis. This includes limiting the types of data to what is most important for the analysis. Collecting a great variety of different variables can confuse and complicate the findings, making it difficult to assess useful data.
Sixth, do a gut check on the data. Check in to see if the data seems truthful. In addition to intellectual understanding, use intuitive capabilities to assess how true things are or not. [2]
Challenge yourself when you receive or send data. Is it possible to pause, question the veracity of the data, consider context, correctly use applicable statistics, consider the amount of supporting evidence, and/or intuitively check-in regarding how true the data is?
References
[1] A new strain of virus, not previously identified in humans — from the World Health Organization regional office for Europe.
[2] For example, studies using muscle response testing (MRT) have shown that peoples’ muscles often test stronger when presented with true statements and weaker when presented with false information, thus helping to distinguish lies from truths.