
As companies increasingly turn to data to influence their decisions, it is critical that data owners understand the rapidly evolving needs of risk management for data that crosses applications, on-premise facilities and clouds.
Successful data governance can be achieved with applications that span private on-premise environments and public cloud resources, but governance must be a fundamental part of the design and implementation, not an after thought.
5 Steps to a Complete Data Governance Strategy
- Value– Know what value the data holds in terms of cost if lost, cost of generation and value derived through analysis. These metrics will be used to determine safeguards for protecting the data and relative costs for storing the data on different platforms.
- Location– Know both where the data is created and where it is stored. This information leads to what safeguards are needed to protect the data at rest as well as in transit. This metric also helps determine the best methods for moving data between sites for analysis, transformation, and integration.
- Risk– Knowing what risk the data poses to your organization is key to ensuring it is appropriately protected. High-risk data includes social security numbers, addresses, and credit card information, all of which require alerting customers if they are lost or compromised.
- Know your decision makers– Every organization has varying levels of individuals that are accessing data in different ways. A solid data governance strategy will include an inventory of these decisions makers; including what data they require access to, on what time frames and with what tools. This enables the organization to properly plan how to enable these users, while managing the associated risk.
- Accuracy– Data, and the decisions derived from it present themselves in many ways, including completeness, non-obsolescence, precision and repeatability. It is critical that all data sets have an associated set of policies about the quality of the data that drives the organization to properly clean incoming data, and properly gauge the accuracy to results derived from that data.
7 Data Best Practices in a Cloud-First World:
- Keep security context with the data as it moves between systems – By keeping security context with a data set, it ensures uniformity in implementation across systems that may contain duplicate sets of the same data.
- Set a lifecycle and stick to it – Setting a lifecycle for data that determines the point in which data is retired and no longer needed ensures stale data is not floating around incurring costs, as well as driving decisions.
- Track metadata consistently across the organization – Metadata has become more critical in recent years with the increase in unstructured data being stored and analyzed. The metadata about creation, owners, and topics is key to understanding and increasing the value of a data set. Having an organization-wide policy and single instance for tracking all metadata will enable anyone in the organization to quickly locate information that is relevant to their work.
- Track copies/instances of the same data set with locations and times of creation – As information systems increase in complexity, it is more and more common that a dataset will be copied multiple times within an organization. These replica copies are key to ensuring successful operations, but should be tracked in a consistent fashion, along with their creation dates, to ensure that replica copies can be updated or removed if necessary.
- Integration and Transformation each need to be considered separately in data governance policies
- Integration – Policies for data integration should define what types of data can be combined and what security posture should be taken for the resultant data. Integration policies should also document where data can be combined and what processes need to be documented to ensure repeatability.
- Transformation – Transformation policies should document what is done with the original data. For example, is it kept or removed after transformation? In a world where we commonly transform data during analysis, the value of keeping the original data should be weighed in the event future workflows require the original form of the data.
- Model Management – Predictive models drive many organizations. These models are used to define many things from recommendations to risk profiling. These models are just as critical as the data feeding them, if not more so. These models should be considered in a data governance strategy to account for who can approve new model deployment, how they are tested and what documentation is required for all models produced.
- All data requires an assigned SME – Many organizations assign a data owner to define and implement policies on specific data sets. I recommend assigned a data subject matter expert (SME) for each data-set. Ultimately, the data is owned by the organization, so the “owner” title is misleading. The SME title is just that, the person who truly understands the risk and value the data brings and how to maximize it for the organization.
In today’s cloud-first, data-driven world, the role of data has never been more important. Policies for integration, transformation and security context have to be updated to accommodate the regular movement of applications and supporting data. These changes should be accommodated through risk management, led by a SME responsible for defining all aspects of data governance for a specific data set. The SME ensures risk is properly managed and balanced against the needs of the business and individuals who access and analyze the data. Only then will companies have a sound data governance strategy.