Data security was a straightforward process back when organizations stored all their data in on-premises IT environments. Organizations used the “lobster” security model – building hard-shelled exteriors to fend off threats at the perimeter — but they had fewer internal protections for the soft, delicious data inside.
In a cloud-oriented world, data protection is much more complicated. Companies are using the cloud for a growing list of benefits beyond scalable compute/storage capabilities, while still managing resources in on-premises IT environments. In a cloud or hybrid environment, the focus should be on protecting the workloads themselves, not just the perimeter. Perimeters are more fungible in the cloud. So, lobster models do not work in these scenarios. To protect data in today’s hybrid IT world, organizations need to take a closer look at how workloads behave, and adjust their overall data protection approach.
One way to better understand the changing nature of data protection is to see it in terms of data gravity. While data is stored in a digital format, it behaves like a physical mass – pulling in other resources throughout the IT stack. Simply put, the bigger the mass of data, the more it attracts applications and services to work with that data. In the cloud, data forms complicated relationships with associated apps and services. Organizations need to map these relationships, rethink their security strategies and leverage new tools to manage the increasingly complex data protection process.
Let us assume your company, in order to compete more effectively, is looking to leverage public cloud to accelerate time-to-value for your customers and optimize IT costs. As you evaluate which applications should migrate to the cloud, you will need to think through how data gravity and data protection needs affect your strategic goals.
To create a data protection plan that meets hybrid IT demands, organizations need to consider a number of factors around how data is organized, consumed and classified. They have to classify their data according to a number of parameters, including business needs, regulatory and specific data residency requirements. Each of these factors has a certain weight associated with it. They need to analyze how their data is grouped, and understand which groupings of data exert which kinds of gravitational forces. Doing so will help organizations develop more effective workload and data protections as they consider where their data may relocate.
If your organization has not done so yet, the first step on the data protection journey is to take inventory of your data and classify it according to various measures. Risk is an essential place to start. Rank your data from lowest to highest risk, starting with public access content (such as a public-facing website), moving up through internal business communications, to ultra-secret data (such as trade secrets and regulated content like PII and HIPAA data). Then determine which data groupings relate to other residency or regulatory requirements. (This is always worth reconfirming.) Finally, map out what your business needs are for the data in terms of types and frequency of access by which users in the organization. This mapping is essential to align how each type of data should be treated and protected. Simply applying the same classification and protection model across all your different data assets will create unnecessary risk or expense.
This can become interesting when tagging data with those classifications, not necessarily an easy task. It can be a challenge trying to classify chunks of data, since they have likely developed a life of their own in your environment and may no longer be clearly organized. Or, as is very common, many copies of data have proliferated, so it is difficult to determine which is the source of truth. Migrating to cloud and considering whether to relocate this data is a prime opportunity to address the organization, classification and tagging of data.
Many companies have a data classification strategy, but an incomplete/inconsistent deployment of that strategy. Typically this is caused by a limited capability, toolset or process for applying rules to their data. In some instances the only way data is identified by a classification is at the platform level–e.g., a “Confidential” MSSQL database, or all “Super Secret” files on a particular server.
Making sure data classification is defined and applied via tagging is an important step to take, whether or not data is moving to the cloud. Furthermore, enforcing tagging through automation when data is generated, or when it is ingested into the cloud estate, aids with compliance and scalability. This helps identify the location of your confidential data, and provides guidance on how to protect it.
How Will People Use Your Data?
Once you classify your data, you need to analyze how people use it. Where do they put it? Is it file system data, database data, data streams or a data lake?
As you consider whether an application should move to the cloud, determine what data that application requires. If the app needs to span across the chasm between an on-premises and a cloud-based environment, then data gravity will matter. That is because you will have to determine where your biggest chunks of data are, and whether you need to do analytics near that data. If you are pulling data from one environment into another, you are apt to run up transfer costs or create a certain amount of latency across that chasm. If you have the data on one side of the chasm and your application on the same side, then the data gravity issue is less pressing.
If people do not know where certain data resides, they might institute a rule to encrypt all the data across certain types. Or they may plan ETL work with that data, unaware of the transfer costs or protection requirements. So then you may lose visibility into where confidential data is located. In the cloud you want to define and tag your data to know where it is and how you want to encrypt it.
Characteristics That Drive Data Protection Decisions:
Now you have an inventory with the key data characteristics (classification, identification and location) needed to develop your protection strategy. Next, you must determine how to protect it and who should have access to it.
But why develop this inventory for data protection purposes? First, you will likely have different requirements for different levels of risk, and the approach for each level will have different costs.
Secondly, the tools for data protection on-premises, in the cloud and between cloud providers, are unlikely to have feature parity, so it is often necessary to implement different controls in each environment. For example, consider the need to shift the security focus in the cloud from perimeter protections to workloads. Many on-premises data protection tools are perimeter oriented, or just not optimized for public cloud workloads.
Encryption and Key Management
The first rule of cloud is: encrypt everything! While many firms have data classification strategies in place, data tagging is less common, and a consistent deployment of encryption tied to the classification strategy is seen even less. To simplify implementation, encryption is embedded in the platforms of the three major cloud service providers. You no longer have to bear the computational or time expense of encryption because it is provided, rapidly, by the platform! As always, when you encrypt something, you need a key. This is similar to your house key–anyone you give it to has access to your house. So if you want to limit access to a set of data, you must limit the people to whom you give the encryption key. You will also want to limit your “blast radius,” and define your key scoping by the levels of risk in your data classification policies.
Here is a simple example. You have four business functions: finance, trading, marketing and customer service. All have confidential data to protect. While it is possible to leverage just a single key to protect the private data across all these functions, if that one key is compromised, all the data is compromised.
In this scenario, you could use two keys to manage blast radius: one key that covers everything in your AWS account, and a second key for each business unit’s private data. At a minimum, each group gets one key, and for private information that can span all the groups, each gets one key that all four groups can use.
This simple model is typically expanded to distinguish between service types (e.g., AWS S3), but should be extended to support your specific data classification strategy.
A Strategy for Data Leakage
Since data does not always stay stationary, you will also need to put strategies in place to deal with data leakage monitoring and protection.
Data loss prevention (DLP) is the ability to understand and prevent data from going someplace it should not be. In the cloud, it is easy to monitor leakage, but it is more difficult to stop, because cloud DLP tools are not as mature as on-premises ones. Even the more cloud-ready DLP tools remain focused on the perimeter, while neglecting a more robust focus on workload orientation. Paying attention to where your data is located, and wrapping a DLP tool around it, offers a layer of protection, but it will impose an on-premises oriented architecture.
Cloud providers also offer tools to help with DLP, such as AWS’s Amazon Macie and Azure Advanced Threat Protection, both of which have data leakage monitoring and alerting capabilities. These tools are still maturing and need to be augmented in the near term with enforcement capabilities.
How It All Evolves
As your cloud estate grows, so too does its data gravity. This requires evolving and scaling your data management and data protection strategies in the cloud. In order to understand if data gravity and location is creating friction, you will need to expand your logging and monitoring capabilities, and continue to actively monitor performance across the chasms of your hybrid IT environment. Increased cloud data gravity requires automated compliance enforcement and remediation, in order to effectively deliver security and data protection. Finally, your business continuity and disaster recovery capabilities must expand to protect this growing critical asset. With a well-planned data protection foundation, all these important characteristics can be scaled effectively.
Organizations are leveraging their data to its fullest extent in an effort to improve operations and gain market share. Protecting data is therefore a mission-critical priority. To fend off threats and manage data resources well into the future, organizations should take specific steps and develop comprehensive data security policies. Data gravity complicates the landscape. Leaders need to look at practices governing aspects such as classification, tagging, encryption and key management, with an understanding of where their data will be located. Using this information, they will need to reexamine the use of on-premises tools for data protection in a hybrid IT environment, and confirm they can meet the needs of hybrid IT workloads.