By adopting a cloud EDW to compliment existing EDW platforms, organizations can allow more agile growth, more flexibility in workload location and better cost points for different types of workloads.
The EDW is a key technology used in every large company today. EDWs can support a variety of workloads, including financial reporting, customer satisfaction analysis, manufacturing quality, shipping & logistics, as well as ad hoc workloads from individual business units. An EDW often is the lifeblood of an organization, and has strict policies to ensure maximum availability and performance for key business processes and departments. Today, most EDWs are based in traditional data centers and come from a small handful of vendors, including Oracle, Teradata and IBM.
A traditional EDW (Figure 1) is fed through a series of ETL processes from different online transaction processing (OLTP) systems. These OLTP systems are commonly business unit specific and support a variety of transactional processing across the enterprise. The EDW serves as a focal point for analyzing data across these OLTP platforms, while providing company wide operational reporting.
As organizations begin to renew their EDW maintenance agreements, while also adding in new and complex workloads, the cost of operating legacy systems continues to increase. Many organizations are also running into scalability challenges because legacy EDW technologies were not designed to handle today’s complex analytical workloads. Many organizations are now looking to cloud based technologies to allow organizations to more effectively manage their EDW operational costs, while providing advanced capabilities for modern data analysis. There are many reasons for leveraging cloud based technologies to complement existing EDW platforms:
- Cost savings & cost flexibility
- Using PaaS to lower operational overhead
- Eliminating capital costs
- Eliminating costly license renewals
- Adding advanced capabilities
- Keeping historical data longer than is practical with an on-premise solution
- Elastic capacity
One reason today’s EDW platforms are so complex is the growth in analytical and operational workloads over the many years that organizations have utilized EDW platforms. This growth has led to a mix of workloads that do not naturally fit together and can negatively impact one another with regard to scheduling and performance.
For prioritization and planning, there are several key categories that EDW workloads can be classified into:
- Auditable – Auditable workloads include those that are key for business operations and that are legally required for the company to operate. These include workloads for reporting organizational compliance, evaluating company risk and responding to government requests. These workloads are the most critical and must be maintained to ensure the legal compliance of the organization.
- Organizational Reporting – Organizational reporting workloads are commonly exposed through KPIs that management uses to measure company performance. These workloads are often a good fit for a cloud EDW because they are run at scheduled times during the day and have a uniform data set, which they analyze for each execution of the jobs.
- Sustained – Sustained workloads are commonly used by company management during the day for gathering reports and checking into key indicators for CRM, customer satisfaction, call center reporting and other metrics used to drive more tactical business decisions.
- Variable – Variable workloads are those operational and financial reporting activities that an organization can plan for, although they do not continuously use EDW resources. These workloads are commonly used for planning sales teams’ compensation and sales territories. Such workloads are good fits for cloud EDW.
- Business Unit Specific – Business Unit workloads are commonly workloads that are part of a departmental data mart, or provide organization specific reporting. These are often the first workloads to move to cloud EDW, because individual departments are less dependent on corporate IT and have complete access to their own data sets.
The above categories help to define the lowest risk, easiest technical use cases and workloads to migrate to a cloud EDW. This initial migration will provide the minimum capabilities to enable the future migration of use cases with higher technical needs and risk.
Moving to an EDW platform in the cloud involves several important design and migration considerations. These are key to ensuring that the EDW functionality moved to the cloud is seamlessly integrated with workloads that will stay on premise, and that downtime is minimized. Each of the following categories should contribute to driving the target architecture for a cloud based EDW, and to determining the priority of workloads that will move to the cloud.
- Movement of data to the cloud – Moving data between facilities, especially at volume, can be a time-consuming process. When migrating an EDW, it is important to define the data sets and volumes early, so that proper connectivity can be enabled for data movement, and a project schedule accurately built for the data migration time frames.
- Data Integration & access – Movement to the cloud will require ETL processes and data flows to be extended beyond the on-premise implementation. ETL tools should be validated for cloud operation, support and proper features for integration with cloud-native EDW technologies.
- Data transit costs – Data movement costs from a cloud provider to on-premise deployments can add up quickly if data is not moved efficiently. Because cloud providers charge for data egress, ETL processes and data migration must take advantage of best practices around sending only differential changes, as well as maximizing data compression before sending.
- Developer experience – Cloud based EDWs provide unique flexibility and capabilities that are not available in on-premise solutions. This can be challenging for developers who need to keep up with regularly released new features and learn new methods for service utilization. Cloud EDW projects should include training and enablement efforts, to ensure developers have access to examples, curriculum and resources when adopting cloud EDW.
While any IT service migration is risky, an EDW adds additional risk because of the critical nature of the system in managing the core business functions of a company. When moving to a cloud EDW, there are several key areas that should be assessed, evaluated and planned for prior to migration:
- Use of proprietary features in on-premise EDW implementation – All uses of proprietary features should be evaluated prior to migration to determine how best to provide similar functionality in the cloud.
- Developer Experience & Data Access – Developers need to be provided a similar experience with their cloud based EDW as they’ve been accustomed to in the data center. This ensures their productivity is not impacted when deploying new workloads in the cloud, or working on integration activities between the platforms.
- Query Cost – Some cloud services charge based on the amount of data queried. This model can provide advantages with many workloads, but needs to be communicated to all developers to ensure they are efficient at all tasks and understand the cost of adding new reporting capabilities.
- Industry specific data models & workflows – Many on-premise EDWs make use of vendor specific data models and workflows targeted to specific verticals. These allow businesses to quickly adopt an EDW that matches their needs, and customize it for their unique situations. This can present challenges in the cloud because of the different underlying technologies. Use cases should be evaluated to determine if the logic and data models can easily move to a cloud based EDW, or will require a level of redevelopment prior to implementation.
- ETL vendor support for Cloud integration – Many ETL technologies in use today were built before the growth of cloud capabilities. Therefore, many ETL vendors are now playing catch-up to add capabilities for natively accessing cloud EDW and other relational stores. Any ETL tools that will be leveraged should be evaluated to determine if they will support the cloud based technologies targeted for use and the level of support the vendor will provide. While making the change, it might be advantageous for the organization to also evaluate native cloud data integration tools.
- Proprietary vendor development and analytical languages – Many EDW platforms provide the ability to natively execute mathematical and analytical models in the database queries, as well as their own extended languages and models for advanced analysis. Prior to moving any workloads to a cloud EDW, an analysis should be completed to identify workloads that will require their analytical models to be updated to languages supported by the cloud provider and cloud EDW platform
The major cloud vendors, including Google, AWS and Azure, each provide their own unique set of capabilities to enable deployment and operations of EDW platforms. While the underlying technical architecture is different for these providers, their capabilities and service delivery are similar.
Migration Strategy & Considerations
Figure 3 shows the CTP approach for adopting cloud technologies. The adoption of an EDW in the cloud is a combination of a new deployment and a migration of existing workloads. This combination is best served by a methodical approach, in order to fully understand business needs, dependencies, data elements and organizational readiness.
EDWs are specific technical solutions that provide business capabilities, often critical to business operations. The Cloud Adoption Program (CAP) approach provides for a high level set of phases for EDW adoption in the cloud, with specific considerations, listed below, for each phase.
- CAP Phase 1: Workshop
- Engage all supporting team members – Business Analysts, BI Developers, DBAs, Operations, Security, Governance
- Define business metrics and drivers for measurement of project deliverables
- Brainstorm potential workloads and use cases for migration to a cloud EDW
- OUTPUT: Define target workloads and use cases to migrate
- CAP Phase 2: Assess & Plan
- Map potential workloads to specific data sets, tables and structures
- Map the source of authority for all in-scope data sets
- Define Security controls for implementation of Governance and Compliance policies
- Define Roadmap for Minimum Viable Cloud (MVC) and MVC+ (MVC + Ongoing Roadmap)
- Define staffing plan for execution of MVC Build, Migration and EDW operations
- CAP Phase 3: Build
- Build MVC, including core elements of connectivity, access controls, routing, auditing, logging, ACLs and automated deployment tools
- Create separate Dev, Test and Prod environments to match release and test cycles
- Build database instances
- Set up staging environments for data replication, transformation
- CAP Phase 4: Migrate
- Move data sets to the cloud EDW platform
- Implement ETL processes for ensuring data integrity between on-premise and cloud EDW platforms
- Migrate individual workloads and associated workflows, most commonly moved in order of Dev, Test, Production workloads
- Retire on-premise supporting systems and infrastructure
- Migrate workflows to native cloud services where feasible
- Update all analytical models to leverage community supported languages like R or Python for easy execution in the cloud and portability
- CAP Phase 5: Operate
- Implement operational monitoring and automated response to system events
- Train operations staff to identify, debug and correct system events
- Identify operations team, existing or new, to operate and manage the cloud EDW and supporting cloud services
Many organizations have organically built up EDW environments much larger than traditional technologies can effectively accommodate. Organizations are spending more and more every year for support, maintenance and to increase capacity for varying workloads. The cloud provides unique capabilities to accelerate many EDW workloads, while providing for better cost management and visibility. The migration of EDW workloads to the cloud should follow a methodical process which makes certain that data quality, performance and developer experience are positively impacted, to ensure the organization can take full advantage of the new capabilities provided by a cloud based EDW.
When you follow a proven approach to safely accelerate your cloud initiative, you can quickly realize the benefits of cloud technology to grow and stay competitive. CTP brings the collective experience of hundreds of cloud migration projects, moving thousands of applications, to enable your organization to successfully leverage cloud based EDW platforms to empower decision makers.