The “Ops” in DevOps is just as important as development and testing, but it gets the least amount of attention. CloudOps is no different, but getting operations right is key to success in the cloud. CloudOps means continuous operations and continuous improvement. If this is your path, you need to rethink traditional operations.
Before getting started, however, it’s important to define the process of continuous operations in CloudOps. From there you need to understand the processes, core components, emerging tools, and best practices.
Here’s what cloud computing brings to the table that makes CloudOps different. Cloud-based platforms are:
Able to scale out
You can expand capacity at any time. Clouds let you self- or auto-provision servers. This feature adds a great deal of value but can be a challenge to manage.
Distributed and stateless
Operations must adjust to management that could span the world.
Clouds can abstract the underlying infrastructure from the platforms and applications.
You don’t really care where the physical servers exist, but you must manage them consistently.
Latency can vary a great deal, and you’ll need to operate and manage clouds using the same attributes.
Clouds run applications that share common services but aren’t bound together.
Data that is sharded, replicated, and distributed
Data isn’t centrally located and is either physically or logically separated.
Much of the operations for clouds leverage a great deal of automation.
Cloud uses automation as a way to fix common operational problems without affecting the applications or users.
Dual active (or active/active)
This refers to how the cloud uses a network of independent processing nodes. Each node has access to a replicated database to give it access to and use of a single application.
With usage-based accounting systems in place, those leveraging cloud resources have their cloud usage tracked. They can then allocate the costs accordingly, with showbacks and chargebacks.
While many of these cloud platform features aren’t new, the rise of cloud has brought many of its more advanced properties to public, cloud-based platforms. Therefore, those charged with operations, or CloudOps, need to define the right operational procedures and practices around what clouds can do, rather than morph traditional approaches to operations for the cloud.
CloudOps relies on continuous operations. This is the approach to operations that’s emerging from best practices around DevOps. Continuous operations have the ability to run cloud-based systems in such a way that there’s never the need to take part or all of an application out of service to help attain a zero downtime goal.
To achieve this objective, the software must be updated and placed into production without any interruption in service. Thus, continuous operations, as related to CloudOps, means installing mechanisms that allow zero downtime procedures to occur.
Focus on zero downtime
If the objective of CloudOps is zero downtime, then what are the best practices and procedures you need to have in place to achieve that? Although CloudOps is still an emerging science, patterns are emerging that are on the way to becoming best practices.
Redundancy is core to all good cloud operations. Years ago, the use of redundant systems was costly, so most of those charged with operating systems used a single server. When the server was being updated with new patches and fixes, operations had to stop. When things went wrong, such as network or storage problems, application services stopped as well.
For many, the notion of downtime or outages is one of an inevitable inconvenience. Most enterprises experience several outages, both planned and unplanned, each year. A study by the Ponemon Institute reported that unplanned data center outages remain a significant threat to organizations in terms of lost revenue. Unplanned outages are so feared that 84 percent of survey respondents said they “would rather walk barefoot over hot coals than have their data center go down.”
Moreover, 91 percent of survey respondents reported having experienced an unplanned data center outage in the past 24 months, with the frequency of outages reported at an average of two complete data center outages during the past two years. Partial outages, which are limited to specific racks, occurred six times in the same time span. Device-level outages (those limited to individual servers) were the highest, with an average of 11.
However, as we move those systems to public or private clouds, the demand from users is for no outages at all. This is a tall order, given that cloud computing platforms are relatively new, but they are being sold as being more reliable and more scalable than traditional systems in enterprise data centers.
That doesn’t mean that continuous operations are an automatic part of cloud technology. You achieve continuous operations through the effective use of CloudOps procedures and best practices. Public and private cloud platforms support auto- and self- provisioning, which means you have the ability to set up dual redundant systems. The result is that operations remain up and running during system or software updates, and even during system failures that would bring down traditional systems.
A matter of abstraction
The ability to set up redundant systems is only part of the CloudOps battle. The real action is in the cloud’s ability to place these systems behind a layer of management software that can manage machine instances in a way that works around updates and failures.
There are two flavors of these tools:
Cloud management platform (CMP) tools
These tools let you manage cloud services, provision and de-provision machines and services, and automate continuous operations, since you can place a layer of automation around cloud-based machine instances and cloud services.
System failures can typically be worked around automatically. Therefore, most common problems, such as storage system failures, network device failures, etc., can usually be self-healed, without the users even realizing there was a problem. Also, when software is updated, automated processes that are typically linked with automated DevOps processes are able to test, stage, and deploy software updates without any interruption in application services.
Metrics and monitoring systems tools
These tools on private and public clouds are more data driven. The idea is to proactively spot issues when they arise in the operations of cloud-based systems. These tools constantly gather data that’s reflective of the current state of the system, and fire off automated procedures based on that data to correct issues as they become problems — or ideally, before.
Of course, CloudOps isn’t about what tools you buy. It’s about how you use them and the procedures and processes you place around them. Many enterprises fool themselves into thinking that a new tool or technology will deliver CloudOps capabilities, but that’s only a small fraction of what needs to be done.
To achieve continuous operations and zero downtime with CloudOps, you first need to do some basic blocking and tackling. This includes:
- Assessing the needs of the applications and data sets that you’re looking to host in the cloud. What changes need to be made to support CloudOps?
- Creating an update and deployment plan that eliminates planned outages, where updates to systems and applications don’t stop operations.
- Creating a strategy and technology solution to work around common problems that would normally cause downtime. Use the auto- and self-provisioning mechanisms of your cloud platforms to build and leverage redundant services that can function independently.
- Selecting the CloudOps tools best suited to your needs. At a minimum, you’ll need CMP and monitoring and metrics tools.
- Creating a process to receive continuous feedback as to the true effectiveness of CloudOps, to make sure that you’re seeing ongoing and continuous improvement.
CloudOps is yet another buzzword that IT must contend with. However, it’s also hugely valuable to the business, considering that the objective of CloudOps is an operation that never, ever stops. For that reason, its value to the business is enduring.
Links to existing practices, such as DevOps, are critical as well. The idea is to have continuous operations as the end state of continuous development, testing, deployment, and so on, which means that we’re moving toward a streamlined way of building and deploying software. CloudOps is nothing more than the migration of continuous operations into the world of cloud computing.