Skip to content
CTP is part of HPE Pointnext Services.   Explore our new services here →
  • The Doppler Report
Cloud TP Logo
  • Thought Leadership
  • Clients
  • Services
  • Careers
  • Contact Us

Cloud Technology Partners

CLOUD SERVICES

  • The Cloud Adoption Program
  • Application Migration
  • Software Development
  • Infrastructure Modernization
  • DevOps & Continuous Delivery
  • Cloud Security & Governance
  • Cloud Strategy Consulting

TECH DOMAIN

  • Amazon Web Services
  • Google Cloud Platform

ABOUT US

  • Company Overview
  • Leadership Team
  • Partners
  • News & Recognition
  • Announcements
  • Weekly Cloud Report
  • Client Case Studies
  • Events

CAREERS

  • Join Us
  • Job Opportunities
 Cloud Technology Partners
  • Doppler Home
  • Client Case Studies
  • Podcasts
  • Videos
  • White Papers
  • Quarterly
  • Events
  • Subscribe

Untangling Data Entanglement

Understanding where the data entanglements lie, their impact and the requirements to replicate that data to the right place at the right time are all critical to the success of your data strategy.
Ed Featherston VP, Principal Architect
May 29, 2019 THE DOPPLER
Share this 
doppler_mail1

For more content like this, Get THE DOPPLER
email every Friday.
 
Subscribe here  chevron_right

In the technology world, buzzwords and analogies flood the landscape. Analogies help us explain unfamiliar concepts in familiar terms, and when the analogy strikes a chord, it often launches a buzzword. This can be helpful in bridging the communications gap between technology and business. “Data gravity” is one of those buzzwords that now permeate our discussions around digital innovation.

Data gravity was coined by David McRory, a software engineer, in a blog almost ten years ago. It uses the basic concept of gravity to help explain the behaviors and patterns resulting from the ever-growing disruptions caused by the cloud and big data. The law of gravity states that the attraction between objects is directly proportional to their mass. McRory posited that in the ever-growing cloud and big data environments, the mass of the data being accumulated increased the number of applications, services and consumers that are drawn to that data. Basically, the more data that is accumulated, the more applications and services that come into the orbit of that data to consume it.

This phenomenon becomes key when architecting, designing and building out your cloud and big data environments. Where the data resides in relation to the applications and services being drawn to it impacts system performance, cost and reliability. Consumers of the data may reside in the same cloud as the data. They may also still be in on-premises systems, or in another cloud platform. It is a hybrid cloud world, and that must be considered when implementing solutions. The explosive growth and proliferation of applications and services caused by the increase in data gravity can quickly overwhelm a system, degrading performance, increasing costs and even impacting the quality of the results, due to reliability issues with the content.

 

Download 18 Cloud Stories
How to accelerate your innovation in the cloud.

 

Data Entanglement to the Rescue

Enter a new analogy and buzzword to save the day. “Data entanglement” plays off the concept of quantum entanglement, again drawing on the world of physics. In simplified terms, quantum entanglement posits that two objects separated by great distances demonstrate a connection when a change to one is reflected in the other. In the data world, data entanglement means that when two data stores share common information, a change in one is reflected in the other.

Yes, we are fundamentally talking data replication, but there is a benefit to thinking about its challenges from the perspective of entangled systems. This encourages you, when considering use case scenarios, to make sure you are taking all the potential impacts into account. Let us look at some high-level scenarios, and why we should consider entanglements.

Living on the Edge in the World of IoT

With IoT devices, sensors of all types generate massive amounts of data. These devices live out on the edge of the cloud, and create multiple data entanglement impact scenarios, as follows:

  • Real-time system scenarios – In manufacturing, data from IoT scenarios frequently demands real-time acquisition and processing. Immediate response is required, so manufacturers cannot wait for the data to make it to the cloud and back. This means the data store needs to be on or near the edge of the services and applications doing the data processing. (For example, when sensors in a manufacturing line report an issue in one part of the system, the response must be immediate. Or when a new smart car senses an adverse driving condition, the analysis and response cannot be delayed by latencies back to the cloud.)
  • Long-term analysis scenarios – It is often beneficial to analyze IoT device data over the long term — for example, when doing predictive maintenance. Such applications and services do not need real-time data access and capabilities, so inherent latency is not an issue. The original data at the edge is entangled with the data stores used by the long-term analytical applications, which can be in an entirely different location within the cloud.
  • Feedback/updates to devices scenarios – Based on the various analyses done on the data, it may be important to send feedback/updates to the IoT devices. (In the smart car example, the analysis may provide data that improves the performance of the smart car features, so you would want that data to upgrade all cars in the fleet.) The devices and back-end systems are inexorably entangled, and changes propagate/replicate/get modified/return back to the devices involved.

As you can see from these high-level scenarios, data is indeed entangled between systems, so we need to ensure the right data is in the right place at the right time.

Data Replication Scenarios

Most replication needs fall into a small set of scenarios (although, as usual, there are exceptions outside these scenarios).

  • Data synchronization scenarios – One of the best-known examples is the full synchronization of data between two or more databases, typically in a near real-time fashion. All database systems have some level of this capability built-in. Potential negative impacts can affect cost, resources and performance. In this model, all systems perform reads and updates and are kept in sync. Race conditions are always a risk in this scenario, so a large amount of resources is required to ensure data integrity. While conceptually the simplest solution, this can be overkill for most needs.
  • Snapshot scenarios – The snapshot replication of one or more data tables at predetermined time intervals can be useful when there is no real-time need for data access, and destinations only require read access. For example, in the IoT examples above, feedback/updates could potentially be accomplished using the snapshot technique.
  • Transactional scenarios – Transactional replication is a step down from pure data synchronization. Data is copied from a master system to slave systems at or near real time. This is usually thought of as an incremental update from a snapshot, and it is frequently used as a mechanism for backup and passive system availability.
  • Read-only scenarios – Read-only copies are created in a similar manner to transactional ones, usually for performance reasons in systems doing heavy analysis. In most cases, applications can fall back to the active master system if the read-only system is not available.

Security and Privacy Considerations

One of the critical pieces that is often not addressed, or even considered, when implementing replication/entanglement scenarios, revolves around data security and privacy. When data is replicated from one environment to another, the security requirements surrounding that data must follow it. Many times, people assume security constraints are set up identically in both environments, but this is not necessarily the case.

A second challenge in replication revolves around data privacy issues. As an organization, you are responsible for any data passing through/replicated into your environment. A simple example is the GDPR “right to be forgotten” rule. If your system has personal information about an individual who has requested to be removed from your system, that means their data must be deleted from any and all systems where it resides. If their information is entangled throughout your environment, you must make sure you have identified all those connections in order to remain compliant.

The best way to address both these considerations is to have a comprehensive data governance process in place. With copies of data moving throughout your system, it is critical to know what data you have and where you have it. In many organizations, data replications and entanglements have grown organically, and without good governance, data can be forgotten, putting the organization at risk.

No Technology Negates the Need for Good Design and Planning

In the rapidly changing world of digital disruption, the ever growing number of technology tools available, along with the pressure to quickly implement new solutions, makes it easy to fall into a “ready, fire, aim” approach. Hybrid cloud and huge volumes of data provide tremendous opportunities to add value to your business. But the need for speed does not negate the need for good design and planning.

We as technologists have a responsibility to make sure we understand where the entanglements of data are, what the impacts are and what the requirements are to replicate that information to the right place, at the right time, taking into account performance, cost, security and privacy. Only then can we truly provide lasting and scalable business value.

Share this


Related articles

 

Feeding 10 Billion People

 

Building a Platform for Machine Learning and Analytics

By Joey Jablonski

 

12 Step Guide for Data Governance in a Cloud-First World

By Joey Jablonski

Related tags

Big Data   Data Gravity   Security & Governance

Ed Featherston

Ed Featherston is a VP and Principal Architect at Cloud Technology Partners.

Full bio and recent posts »



Find what you're looking for.

Visit The Doppler topic pages through the links below.

PLATFORMS

AWS
CTP
Docker
Google
IBM
Kubernetes
Microsoft Azure
OpenStack
Oracle
Rackspace

BEST PRACTICES

App Dev
App Migration
Disaster Recovery
Change Management
Cloud Adoption
Cloud Economics
Cloud Strategy
Containers
Data Integration
DevOps
Digital Innovation
Hybrid Cloud
Managed Services
Security & Governance

SUBJECTS

Big Data
Blockchain
Cloud Careers
CloudOps
Drones
HPC
IoT
Machine Learning
Market Trends
Mobile
Predictive Maintenance
Private Cloud
Serverless Computing
Sustainable Computing
TCO / ROI
Technical "How To" Vendor Lock-In

INDUSTRIES

Agriculture
Energy & Utilities
Financial Services
Government
Healthcare
Manufacturing
Media & Publishing
Software & Technology
Telecom

EVENTS

CES
DockerCon
Google NEXT
Jenkins
re:Invent


 

Get The Doppler

Join 5,000+ IT professionals who get The Doppler for cloud computing news and best practices every week.

Subscribe here


Services

Cloud Adoption
Application Migration
Digital Innovation
Compliance
Cost Control
DevOps
IoT

Company

Overview
Leadership
Why CTP?
News
Events
Careers
Contact Us

The Doppler

Top Posts
White Papers
Podcasts
Videos
Case Studies
Quarterly
Subscribe

Connect

LinkedIn
Twitter
Google +
Facebook
Sound Cloud

CTP is hiring.

Cloud Technology Partners, a Hewlett Packard Enterprise company, is the premier cloud services and software company for enterprises moving to AWS, Google, Microsoft and other leading cloud platforms. We are hiring in sales, engineering, delivery and more. Visit our careers page to learn more.

CWC-blue-01

© 2010 - 2019 Cloud Technology Partners, Inc., a Hewlett Packard Enterprise company. All rights reserved. Here is our privacy policy CTP, CloudTP and Cloud with Confidence are registered trademarks of Cloud Technology Partners, Inc., or its subsidiaries in the United States and elsewhere.

Do Not Sell My Personal Information

  • Home
  • Cloud Adoption
  • Digital Innovation
  • Managed Cloud Controls
  • The Doppler Report
  • Clients
  • Partners
  • About CTP
  • Careers
  • Contact Us
  • Most Recent Posts
  • All Topics
  • Podcasts
  • Case Studies
  • Videos
  • Contact
Our privacy statement has been changed to provide you with additional information on how we use personal data and ensure compliance with new privacy and data protection laws.  
Please take time to read our new Privacy Statement.
Continue