Datadog reports massive adoption of Docker. Its study tracked 10,000 companies and their use of Docker and found that two-thirds of companies that try Docker end up adopting it. Additionally, those companies quintuple their usage after nine months. The report makes it clear: Things in the data center need to change.
For example, what will it take to rearchitect and optimize data center infrastructure around containers? What operations tools and processes can help with the transition? How can data centers be better orchestrated to support container-based computing?
Here are 5 steps to survive the transition.
1. Understand changes needed to support containers in production
The reality is that we’ve seen this movie before. Container architectures are not new, including containers based on J2EE, and Docker containers have the same basic patterns as distributed objects. However, Docker has its own interpretation of what a container is. Making things more complex, there are other container platforms, such as Core OS’s Rocket containers, which some enterprises are picking as well.
The basics of Docker are pretty easy to understand:
- You can get Docker images from the repository, which is best described as GitHub for Docker.
- You can create new images or modify existing ones. Docker containers are just instances of images, or runtime containers that exist in memory.
- To run Docker containers, you need the Docker engine. Think of that as just runtime for Docker containers.
If this seems simple and lightweight, you’re right. Most enterprises won’t deploy containers that way. A growing pattern in container deployment is the use of container orchestration technology, which can manage multiple containers, or container clusters, at scale.
The basics of these technologies are that they form a shared computing environment made up of servers (nodes). It’s within this environment where resources have been clustered together to support the workloads and processes running within the cluster. This technology needs a cluster management framework, which typically consists of a resource manager that keeps track of resources (memory, CPU and storage).
2. Understand container orchestration
What you need to know about the use of this technology and how data centers need to change is that they abstract the underlying resources from the containers. Thus, you don’t have to deal with the containers directly, at least when using container orchestration tools, and worry more about the orchestration tools themselves, including the impact on the network and the other issues we mentioned above.
Google Kubernetes, Apache Mesos, and Docker Swarm are three of the most used products that are runnable in a data center. The others, such as AWS’s container servers and Microsoft’s container servers, are public cloud-based, with Microsoft’s being a version of Mesos.
In order to understand the data center requirements for container orchestration tools, it’s helpful to look at the platform requirements. For instance, Mesos’ requirements are as follows:
- Mesos runs on Linux (64 bit) and Mac OS X (64 bit).To build Mesos from source, GCC 4.8.1+ or Clang 3.5+ is required.
- For full support of process isolation under Linux, a recent kernel >=3.10 is required.
That list does not include what’s really required to provide a platform with the proper scalability. Thus, the sizing of your CPU, memory, number of servers, and networking requirements will be determined more through trial and error than from exact requirements that come from the provider.
3. Be prepared to conduct your own proofs of concept
Given the fact that this technology is new and that there are a few metrics as to the impact on existing data center infrastructure, how does the data center manager address the change?
The best way is to do proofs of concept (POCs) to understand the new platform requirements. When doing POCs to determine platform sizing, there are a few core metrics that you want to consider.
- The number of CPUs, including the number of physical and virtual servers that you need.
- How the VMs or physical servers are sized with memory.
- The operating systems that the VMs run.
- Patterns and volume of communication through the network (this is important!).
- Finally, how they will containers persist data, either to container-oriented data stores, or using actual databases.
4. Consider a container orchestration engine
With the data gathered from the POC, it is possible to create performance models that show what the impact will be on CPU, memory, and networks. Then, you build your data center impact models based upon what you’ve found in the POCs, including both raw containers (Docker in the raw) and a container orchestration engine.
When leveraging an orchestration engine, the likely impact will be on storage I/O and that network. These engines, while all different, seem to be a bit chatty on the networks as they communicate between container clusters and manage shared resources.
The good news is that these orchestration engines make things more predictable in terms of platform impact. The bad news is that you’ll have to constantly keep up with them as they change over time. Moreover, all container-based applications will leverage their features and functions differently.
5. Consider an alternative: Non-converged, converged, and hyper-converged approaches
So, what canned solutions are out there for data center managers who are not into DIY? Case in point would be the interest in converged and hyper-converged platforms.
In a non-converged infrastructure, physical servers run a virtualization hypervisor, which then operates each of the virtual machines (VMs) created on that server. This currently exists in most data centers. The data storage for those physical and virtual machines is provided by direct attached storage (DAS), network attached storage (NAS), or a storage area network (SAN).
When considering a converged infrastructure, the storage is attached directly to the physical servers, and SSD-based storage is generally used for high performance. This means that, as containers or container orchestration layers access the I/O systems, which they will do often, they do so at the highest efficiency and performance. A converged infrastructure is an approach to data center management that seeks to minimize compatibility issues between all devices.
Hyper-convergence is a type of infrastructure system with a software-centric architecture. This is in contrast to the hardware-centric architecture of a converged infrastructure. A hyper-converged architecture tightly integrates compute, storage, networking, and virtualization resources, and other technologies, from scratch, in a commodity hardware box that is supported by a single vendor. Thus, it could be considered a container platform in a box.
The hyper-converged infrastructure has the storage controller function running as a service on each node in the cluster to improve scalability and resilience. This also allows containers and container orchestration to run faster and scale further, considering that they are typically I/O bound.
Containers will bring changes to your data center
No matter what you do in terms of containers, it’s a sound bet that ongoing support for containers will change aspects of your data center in the next few years. In some instances, the degree of change that needs to be made to platforms and networking will require major surgery, as well as some degree of attention paid to all of the other necessary stuff, such as power and cooling.
But be careful. If you’re looking to increase the size of your data center, that won’t be a popular pitch in the boardroom these days. Data center managers need to have a zero-sum-game plan in their back pockets. Looking at the in-a-box solution could be the right choice. On the other hand, if you’re taking a DIY approach, you’d better get started now.