Data Quality Tax

May 28, 2020 in Research



Data Quality Tax

Note: The follow-up to this article can be seen here: https://medius.re/research/bricks-without-clay

Here’s a question: If you could reduce or eliminate a 16.6% tax on your business or research program, would you do it? Perhaps a better, more serious question would be: How fast would you do it? Now consider what you would do if it was a self-imposed tax. Perhaps instead of walking, you’d be sprinting toward a solution? The straightforward, yet unfortunate reality is that we levy a tax on ourselves through poor data quality. Whether it is inaccurate, static, inaccessible, non-standardized, and so on — think of poor data quality as anything that would compromise its usability.

A Costly Lesson in Data Standardization

In 1628, the crown jewel of the Swedish navy — the Vasa warship — sank not even a mile into its maiden voyage, killing 30 people in the process. Due to the very cold and poorly oxygenated water of the Baltic Sea, when the ship was recovered over 300 years later in 1961, an estimated 95% of the wood-constructed vessel remained intact. Once restored, archaeologists were able to conduct a modern-day post-mortem of the ship’s sinking that yielded some startling clues.

Recovered Vasa in the Vasa Museum

According to Fred Hocker, an archaeologist at the Vasa Museum in Stockholm, “There is more ship structure on the port side of the hull than on the starboard side. Unballasted, the ship would probably heel to port.”

Dr. Hocker went on to explain how during the examination that included measuring every piece of Vasa, four rulers were found — two that measured in 12-inch Swedish feet and two that measured in 11-inch Amsterdam feet. Ultimately, the loss of the ship can be reduced to data that was not standardized, an important lesson for all of us regardless of vocation. 

The challenge of identifying the value of accurate, usable data is not a small one. That’s why it was big news a few years ago when IBM estimated that the cost of poor quality data to the U.S. economy was $3.1 trillion in 2016 alone, according to the Harvard Business Review. Put another way, think of it as a 16.6% tax on the entire $18.7 trillion U.S. economy in 2016. To provide some perspective, that monstrous number is three times bigger than the entire U.S. agricultural economy that hovers around $1 trillion. Even more shocking, the cost of poor data quality to the U.S. economy alone is higher than the gross domestic product of every country in the world except for the U.S., China, Japan, and Germany, according to World Bank figures

Top 9 Countries and Data Quality in Gross Domestic Product for 2018

The IBM study did not break down the cost of poor data quality by sector, but earlier this month we highlighted a report by IDC Data Preparation that noted that the significant negative impacts of poor data transcended regional and industrial boundaries. So for what it’s worth, a data quality tax rate of 16.6% on the $16.3 billion annually spent on agricultural research translates to an estimated $2.7 billion, which still exceeds the GDP of 33 countries in the world. How many data-reliant organizations would love to reduce or eliminate a drag on growth like that? How many researchers would take a 16.6% increase in their budget? Imagine the compound gains over time if decisions were made based on accurate, usable data. 

The reason bad data costs so much is that decision makers, managers, knowledge workers, data scientists, and others must accommodate it in their everyday work. And doing so is both time-consuming and expensive.

Thomas C. Redman, Harvard Business Review

Which brings us to the significance of data quality generally. Data serves as the foundation for our decisions, and this is especially true in variety development. When building a structure, the foundation must be both solid and level. Without those two things, everything above will have to compensate for the inadequacies of the foundation below (if that is even possible). Even an error of one degree grows to more than two inches in 10 linear feet. Logic could be airtight, but if the underlying data is unreliable it will compromise the results — in other words: garbage in, garbage out. In the best-case scenario, the original error is fixed at the source, which takes time to both discover the error and then correct it. Otherwise, the error lives on but must be compensated for in future calculations. In a worst-case scenario, there is complete system failure. Regardless of where the error is accurately identified and addressed, it costs time and resources. In the Vasa example, an undiscovered difference in the fundamental unit of measurement repeated over time literally sank the whole project but only after a tremendous loss of time, resources, and life. 

Thankfully in the world of modern variety development, data quality does not generally yield such dire outcomes, or does it? Perhaps sound decisions arrived at sooner would have diverted limited resources away from a project destined for failure and toward one with greater promise. In a time when food supply chains are being disrupted beyond our collective imagination, who’s to say that the more effective use of limited resources in the past would not have had a dramatic effect in a food insecure place today? Thanks to the data quality tax, we’ll never know about what might have been. On the bright side, however, we can take active steps to reduce the negative impacts of poor data quality in the future through accountability and collaborative platforms using technology that is already available to us.

Share This News

Leave a Reply

Your email address will not be published.