Data governance is a set of processes, policies, standards, roles, and metrics which ensures that information is used efficiently, effectively and ethically to achieve organizational goals. Data governance is a phrase that is used at, both micro and macro level. On the macro front, data governance means governing data flows across countries. It includes norms and principles that apply to cross border flow of various types of data. At the micro-level, data governance refers to how data is stored and managed by individual organizations and how it can create value within the organization and contribute towards its overall organizational goal.
The 2019 State of Data Management Report has stated that data governance is one of the top priorities of global organizations this year. Many organizations want data to be shared across organizations as new technology trends like Machine Learning and Artificial Intelligence rely heavily on quality data. In this article here we will discuss some facts about data governance.
1. A Data Set Produces Benefits only when it is used to Make Decisions.
Even if we adhere to best practices, such as having a clean data set, publish organized categories of information, assign a data steward and a layer on an open API, it will not produce any direct benefit if nobody ever uses that information. Taking a decision and then acting on it produce benefits and until we support that decision making by using the data set, it’s all useless. It simply means wasting resources by incurring costs that do produce results. Ready to use data set has an option value, but that should not be the primary focus of data governance. By option value, we mean the willingness to pay for maintaining or preserving that data set even if there is little chance of it being ever used.
2. Value Chain of a Data
The value of a data set is described as the total benefits it produces minus the cost to use the data set. By benefits, we mean the benefits produced when the data set is used to support a decision.
A data set has a value chain that has four components as explained below.
The first part in the chain is the data producer, which could be a sensor, an open-source or another system. Then there is a publisher who acquires the data set, stores it and makes it available within an enterprise. A consumer then develops an application that supports a decision or an analytic that uses that data set. Lastly, the decision-maker uses the dataset to support his/her decision. There could be situations where a single entity plays multiple roles. For instance, the producer may also publish or the consumer and the decision-maker could be the same entity. In this value chain, the first three links only incur costs while the benefit is produced by the decision-maker alone.
As the producer and the decision-maker are beyond our authority, we will only focus on the publisher and the consumer.
When there is a single publisher and a single consumer, things are easy to manage as there is just one value chain and both of them can work out how to split the cost. On the other hand, when there is more than one consumer, things can get a bit complicated. Each consumer has different needs and the publisher has to negotiate separately with each one of them which may lead to duplication of effort and that may reduce the value produced from the use of data set.
Governance Constrains the Data Publisher to help the Data Consumers
With data, governance come responsibilities and freedom. One of the important facts about data governance is that governance restrains the publisher to benefit the consumer. The publisher is expected to deliver the data set in such a manner that it benefits all consumers. Responsibilities are allocated between the publisher and the consumer to minimize the cost and maximize the value produced by all uses of the data set.
Constraining the Publisher may reduce the Publisher’s costs by putting certain restrictions, but it can also increase the cost as it may have to maintain higher standards. These requirements can potentially reduce the consumers’ cost to use data. This may also reduce duplication across all consumers and hence increase total value.
Governance is Beneficial only if it Increases Value
As a data governance strategy, an enterprise does not need to govern every data set. Data governance should only be done if it increases the value of the data set. For instance, if there are only one-to-one exchanges between a single publisher and a single consumer, data governance doesn’t make much sense as it will only increase cost without any accompanying benefits.
Focus on What Data Consumers Want
Since governance constrains the publisher, we should put those constraints according to the need of each data set. For instance, one data set may require considerable investment to improve quality, while another data set may not require any improvement in quality. So treating all data set equally and constraining the publishers analogously is not right. Another one of the facts about data governance is that data consumers have a lot of concerns and data governance should address all their concerns and focus on what they want.