Migrating applications to the cloud is challenging, with many aspects that need to be considered. One of those is designing for a High Availability (HA)/Disaster Recovery (DR) solution that meets an application’s Recovery Time Objective (RTO)/Recovery Point Objective (RPO) requirements. There are multiple ways to approach the design, but the cost for DR varies depending on the solution. In this article we address some of those HA/DR challenges, and what your design options are in cloud.
As the first article in this series, we will focus on the Online Transaction Processing (OLTP) DB design options for maintaining RTO/RPO using appropriate HA/DR and failover strategies, primarily focused on methods for data replication in AWS. Future articles will address the application of similar strategies in Google Cloud Platform (GCP) and Microsoft Azure.
Determining Application Suitability on the Cloud
Before diving too deep into the design considerations, it’s important to note that these architectures, tools and methods are not meant to be a universal replacement for Oracle RAC or MSSQL in AWS. Careful research and consideration are required before determining whether OLTP workloads are suitable for the cloud. Even good candidates for migration may require significant redesigning and/or refactoring to make them suitable for public cloud. Workloads best suited for migration might include one or more of the following characteristics:
- The database needs to move out of the datacenter quickly
- Moving to Oracle Cloud is not a practical option
- Available resources and time are insufficient to make changes to the DB platform
- The application needs to maintain a consistent SLA and RTO/RPO
- Geographic separation is required for scale or redundancy, and current data centers are limited in location availability
Currently, AWS options for databases in the cloud fall into two primary categories: Amazon Relational Database Service (RDS), where maintenance and administration of the database is abstracted away from user; and Amazon EC2-based deployments, which are essentially standard installations and configurations of popular databases on EC2 virtual machines.
Below are the primary RDBMS options available in each category:
- Plus others
While there are many flavors of relational databases used in the industry, such as those listed above, this article will primarily focus on Oracle Real Application Clusters (RAC) and MSSQL, which are more commonly used in the enterprise.
Oracle RAC provides application high availability by failing over seamlessly to other cluster nodes in case of a node failure. Most enterprises replicate their Oracle RAC cluster to another datacenter using Oracle GoldenGate, Active Data Guard, Data Guard or SharePlex to accomplish seamless failover, resulting in typical recovery times of less than a minute.
Oracle RAC is not currently supported in AWS natively, so in order to meet similar SLAs and HA requirements, we have two options: Either create an AWS RDS Multi-AZ instance, or install Oracle on EC2 and replicate data to a separate geographically dispersed region for HA and failover. For online retail or other applications that need to support instant DB failover, DBs need to be cross-replicated on a near real-time basis between two regions to support the necessary RTO and RPO.
High Availability and Replication Options for Databases in AWS
There are several tools and methods/options available to synchronize DBs across two regions. Requirements impacting the solution choice include factors such as whether active-active is needed, and whether replication is intended to be unidirectional or bidirectional.
Available solutions for achieving high availability and cross-region replication vary, depending on whether we’re discussing RDS-based or EC2-based databases.
RDS Multi-AZ Oracle and MSSQL are built to provide high availability within the region, but neither provides the ability to replicate across regions. As a result, cross-region replication must be designed by the application owner or AWS customer.
- Oracle: Active-active, cross-region, bidirectional replication can be managed using Oracle GoldenGate. SharePlex or GoldenGate are also capable of active-active mode cross-region replication, but only with unidirectional replication.
- MSSQL: With Amazon RDS-MSSQL, cross-region bidirectional replication cannot be performed. Third-party tools, such as CloudBasic, may be used to replicate MSSQL DB data from one region to other.
- Oracle: There are situations where RDS Oracle is not a viable option. (An example would be when the total storage requirement is larger than 16TB, which is not supported in RDS Oracle.) The alternative is to install Oracle on EC2. In this case, active-active, cross-region, bidirectional replication can be managed using Oracle GoldenGate or SharePlex. To simulate a RAC-like cluster in AWS, stand-alone Oracle on EC2 instances can be configured in multiple Availability Zones (AZ) in each region. Then active-active read/write capability can be provided with GoldenGate or SharePlex replication, to sync the nodes across the AZs within the region. This type of setup can help to achieve similar RTO/RPO and SLAs as an on-prem Oracle RAC cluster. However, it is important to be aware that it will create more admin overhead, potential latency and slowness, and result in a more complex DB schema. The Setup of Cross-AZ (30-mile potential) will introduce some lag in cross-node health checks and may cause a slowdown in performance compared to RAC. Also, you will be forced to push for Sync replication with this for all nodes.
With MSSQL Server AlwaysOn Async, MSSQL native replication can be used to synchronize data across two AWS regions.
There are many tools available to support and accomplish the requirements of instant failover, RTO/RPO. Recommended common architecture patterns, tools and methods are listed in Figure 2.
It is important to keep in mind that moving to the recommended architecture highlighted in Figure 2 generally results in increased DB administration overhead without Oracle RAC management capabilities and features. Furthermore, the DB schema setup can be complex, as DB instance patching is more manual on a per node basis rather than using Oracle RAC for patching.
AWS services and capabilities are constantly changing. There are many PaaS and DBaaS offerings in the cloud for RDBMS, but choosing the right platform, and the appropriate tool for replication for high availability and business continuity is quite challenging. If your organization is currently moving to the cloud and experiencing the DB challenges outlined above, give CTP a call.