How is your cloud doing? Is it delivering performance according to plan? Is it generating value for the organization as quickly as you projected? If you are relatively new to the cloud, you are probably asking yourself these exact questions – and coming up empty for answers.
The problem cloud users run up against is that clouds operate in a shared resource model that provides less consistency, predictably or reliably than dedicated data center environments. Clouds are harder to manage, harder to model and harder to measure. Performance in “like instances” can vary widely, and today’s “performance management” tools only focus on use (consumption metrics) and are not configured to truly measure the performance throughput (outcome metrics) of your cloud service.
To deal with these issues, the industry is starting to embrace a new management model that focuses on integrating continuous testing/continuous optimization into cloud operations. This model focuses less on metrics tied to individual components and more on the quality of service — the consistency, predictability and reliability — that the cloud delivers.
Tools Are Not Keeping Pace
Before we look more closely at the new quality of service (QoS) model, let’s explore why cloud management tools are not keeping pace with the demands imposed by today’s rapid cloud adoption.
First, traditional tools assume you own your IT resources. They enable you to manage individual components like you did in the data center – the network, the storage, the compute resources. But these traditional tools do not take into account how cloud is inherently different from the legacy data center – how it is an integrated service, not just the sum of individual parts. A cloud can have all the power and capacity in the world, but as a shared, multi-tenant service, if for any reason it doesn’t provide the right resources at the right time, the cloud’s delivered performance will vary, impacting the application and the customer experience.
Second, if you’re using traditional, so-called “ground tools” to manage your cloud, you’re going to spend most of your time only managing the “availability” of your cloud component resources. Availability means nothing if the cloud service isn’t capable of delivering the desired performance, functionality and capability. With ground tools, you’re watching component availability when you should be testing service capability.
Third, traditional cloud management tools do not evaluate the maximum potential of your service. Your application is your consumer. What are the consumer’s limits? Can you generate throughput models about how many compute transactions are executable in a particular time period, in a particular configuration, in a particular cloud location? Traditional utilization metrics do none of this, as their measurements are relative to the whole. In the cloud, you need to test a workload profile and measure the maximum total potential throughput, to enable true capacity planning.
Achieving “Mastery of the Cloud”
If your cloud management tools are not measuring the right cloud performance metrics, cloud users end up confused about whether their cloud programs are doing what they should. Users wonder why cloud is costing them more than they had expected, and they have no way to go back and see if there was growth in application and user demand, or simply a decrease in service performance. They end up questioning every move, every decision made, as there is no transparency or accountability with either the cloud or application teams. Did the team do a bad job putting together the original ROI model? Did the workloads change? Are more customers using the company’s app and drawing down more cloud resources? Or is the cloud just simply not generating the performance that was promised?
Cloud operations teams are looking for transparent service knowledge – what is often referred to as “mastery of the cloud.” They want to know what their applications are demanding, and what they need to buy, so they do not end up with a storehouse of underperforming – or overperforming – units. They want to ensure they are getting a consistent, predictable, reliable service based on what they are paying.
How do you achieve mastery of the cloud? In short, it involves taking several steps to assign resources for your cloud environment, to set performance measuring criteria and then to manage facets of the environment based on performance over time.
Find the Best Fit, the Best Cost Solution
You need to have insight into what you are buying and make sure of what you are getting to ensure you are provisioning the best units available. The only way to do that is to know what you need and then test every instance upon provisioning to confirm that it meets your minimum requirements. If it does not meet that threshold, the unit will be killed and a new one provisioned.
How do you set the requirements? First, profile your workloads. Every workload has a resource consumption profile based on concurrent consumption of compute network memory and storage for IaaS, or database transactions in the case of DBaaS. To define the “workload profile,” collect the time series data over time and look for daily patterns where the maximum concurrent utilization is at its peak.
Once the workload profiles are captured and quantified, build out a synthetic workload that matches the native load, and place that on a number of instance types (big, medium, small). This helps you identify the best service available for your workload at the best total cost, to define the “‘best fit, best cost”’ solution.
Manage the Environment – Providing Service Assurance
Once you are running in the cloud, how do you validate that the service you were getting on day one is the same service or better on day two, and day 200. In a DevOps framework, you will set up regular tests that will re-validate the cloud service against a baseline to ensure your cloud continues to run at optimal levels. When services are not running at an optimal level, through the DevOps continuous integration/continuous deployment (CI/CD) pipeline, you integrate by pruning and replacing cloud instances.
If, for instance, an acceptable performance for a given workload profile is 230,000 operations per second, you need to find the right IaaS platform for the job. If it does not fit the requirements, throw it away and get another one in a matter of minutes. The once-a-day, -week or -month, 10-minute test will ensure you are not buying more than you need, confirm you are getting all the performance your application requires and eliminate churn in your DevOps/CloudOps teams.
Set Performance Measuring Criteria
As an example of this, we just completed a Proof of Concept for a large banking application. The customer has a mature CI/CD pipeline. The application is very sensitive to any performance delays, so the company was auto-scaling, resulting in a 50 percent overrun in budgeted costs. Leaders approached CTP and asked for a Proof of Value to assess their cloud services using the CloudQoS™ performance management solution. The goal was to determine if there could be potential cost savings while continuing to provide the same performance. In other words, they wanted to find out if they could get the same performance on fewer units with less cost.
The Proof of Value test goal:
- CI/CD integration of the CloudQoS solution to run a 10-minute test at provisioning
- Turned up 897 m3.large servers in AWS us-east-1 over a three-day period and executed a 10-minute test
- Automated the comparison of the test results with the service benchmark and the de-provisioning of the underperforming units
The Proof of Value results:
- Rolled “passed” systems into production at an average of 238,000 operations per second
- Eliminated “failed” systems immediately at an average of 136,000 operations per second
- A 43 percent reduction in cost while delivering the same performance
The customer was pleased with the results and started to refer to the integrated service as CI/CD/CT/CO (Continuous Integration/Continuous Deployment/Continuous Test/Continuous Optimization).
It can often be difficult to gauge whether a cloud is doing its job. Sometimes, if it seems to be falling short, the problem may not be the cloud itself. The problem can be that you have not bought exactly what you need or have not adapted the environment to fit evolving needs. Efficiency is not just using everything you’re buying. It is buying the right thing, and getting everything you have paid for.
Cloud is not a “set it and forget it” initiative. It needs to be continuously tested and actively managed. If you commit to these functions, cloud performance will be less of a mystery and more of a reliable metric to build your business on and ensure that you are mastering your cloud.
Clinton France is a former Cloud Technology Partners Cloud Architect and is now the CEO and Founder of Krystallize Technologies, providers of CloudQoS™ solutions.