A Data-Driven Society

25. 8. 2021

“In God we trust. All others must bring data.” — W. Edwards Deming

Making decisions based on data is a sensible idea now accepted by the vast majority of managers. There is also a fairly broad consensus on the prerequisites for data-driven decision-making in companies. First and foremost, the goals for the decisions must be defined. Then, there has to be relevant data that can be used for decision-making, and this data must be of sufficient quality.

But what about decision-making in broader society, which can be understood as a set of organizations, groups, informal communities and the general public? Here the situation is completely different. Nevertheless, data-based decision-making is still the best option. It’s much better than making decisions based on the ideologies, traditions or opinions of the individuals who shout the loudest.

What makes it different from the corporate environment is the definition of goals. In companies, management sets goals, but in society, different groups have different goals (often diametrically opposed), and numerous decisions are made at once in many directions. And it’s for all these decisions that it is good to use data.

We need a lot of data. And we have a lot of useable and useful data at our disposal. Data whose production and management are paid for by the public. These are data managed by the government, government institutions, public administration bodies and contributing organizations that get government funding.

The first question is: how do we find out about the existence of the data? And the next question is: how can we access the data?

In my opinion, the answer to the first question is simple. It is just basic decency that if I create data using public money, I inform the public about it. Technically, this means publishing metadata about all the data collected.

The more complex question is how to publish the data. I believe that, unless the data is a threat to national security, it should be published in a form that complies with laws and regulations such as the Personal Data Protection Act, the GDPR and so on. This can be accomplished by anonymizing and aggregating data to be published. Although there will still be disputes about the degree of anonymization and when to prioritize personal data protection versus when to prioritize the public interest, this is no reason not to publish the data.

There are two myths associated with the issue of data publishing.

Myth one is that hiding data prevents their misuse. This myth is based on the naive notion that data owners can prevent data misuse. But it is an illusion. Any data that exists can be misused. The bad guys will always find a way to access the data if it pays off financially or otherwise. Conversely, those who paid for the data to be created and could benefit from their publication won’t have access.

The second myth is that organizations must handle data with due diligence, which prevents publication. Yes, data is an asset and has value. But data’s value is maximized when everyone can use them. This is a big difference between society and companies. With companies, corporate law states, among other things, that acting with due diligence means “acting in the defensible interest of a business entity.” But as far as I know, neither government organizations nor contributing organizations are yet business entities that would be covered by corporate law, at least by Czech law.

The last prerequisite for data-based management mentioned in the introduction is sufficient data quality for decision-making.

What is data quality? Data quality is a very subjective concept. There is no such thing as technical specifications for the quality of data. The data are quality if they are sufficient for the needs of their users.

In companies, it is simple, because what the data should be used for is well-defined. Companies invest huge sums in obtaining and maintaining data in the quality they need for their decision-making. They, therefore, build special systems (data warehouses) and invest enormous amounts of money in applications for data cleaning and integration. And because companies know what the data are for, they can invest in data quality efficiently.

In society, it is different. Of course, I assume that data managers maintain data of a quality that corresponds to the reasons for its creation. However, the myth survives that there is such a thing as quality data. Data managers sometimes try to create data that everyone is happy with. But because the definition of data quality is subjective and because there are many stakeholders in society, this task is impossible.

It’s possible to identify inconsistencies in any published data, often mistakenly called data errors. You have to accept them. Only metadata can improve data quality in such situations. The metadata should include information on how the data are created, how they are processed, who modifies and integrates the data and how, who is responsible for them and what conditions they meet. This increases the quality of the data—at least in terms of its credibility. Based on this information, each user can decide to what extent the data are relevant and applicable to their decisions.

The summary is simple: If we want the best possible decisions to be made at all levels of society, and I believe we all want that, we need data… all data. The more we know about them, the more useful they will be in our decision-making. On the other hand, we have to accept that the data we get will never be perfect.

In conclusion, I would like to express one wish. Financial services companies are bound by strong regulations such as the Basel Accords for the banking sector and Solvency for insurance companies. Among other things, these regulations precisely describe the requirements for publishing data and the requirements for their quality. These regulations were created in response to the global financial crisis. Let’s hope that the COVID-19 pandemic and the decision-making experience in this crisis will lead to improved public access to data and to society as a whole being able to make better decisions based on data.

Author: Ondřej Zýka

Data and information are assets that need to be properly cared for. Ondřej Zýka has been promoting this idea for over fifteen years as Information Management Principal Consultant at Profinit and has been guided by it in projects involving data warehouse development, data quality improvement, metadata management and master data management deployment. It is also one of his lecture topics at universities, where he lectures on database systems.