Cloud DevOps tools and ecosystems have matured. We are at a tipping point, as we can now deploy multi-regional data centers with production-ready applications in a matter of hours. Over the past few months CTP has implemented such transformations in AWS, and we recently adopted the same strategy to deploy on Azure in a more cloud agnostic way. In order to execute at a faster pace, we made a few key technology decisions to let us make large strides with Azure. This article reviews the technical facets of those decisions.
Zero Touch Deployments
The primary goal of the project is to build an end-to-end solution with almost zero touch deployment. The solution should cover:
- Image lifecycle management (ILM)
- Infrastructure management
- Logging and monitoring
- Security and compliance
- Application deployment
Image Lifecycle Management
To meet the CISO requirements, any Azure marketplace image needs to be hardened. The Chef marketplace holds some interesting cookbooks to harden both Linux and Windows images. A combination of Packer and Chef solo can be used to harden the marketplace images and publish as custom VHDs. Down the line, once the Jenkins is provisioned, it will be used to automate the ILM.
Decisions about Cloud native tools such as ARM templates, and cloud agnostic tools like Terraform, are crucial. That’s because tool selection dictates the way cloud resources are created, updated and deleted. Both Terraform and ARM templates have their pros and cons.
Terraform is an interesting automation tool for teams running their workloads in multiple clouds or migrating from one cloud to another. The latest use case presented us with a scenario in which the application had its footprints in both AWS (Route 53, S3, etc.) and IAAS workloads running in Azure.
On a high level, Terraform implementation requires clearance of Hashicorp tools from InfoSec on required security standards (PCI, NIST, etc.), and expertise in using Hashicorp tools. However, in return you can maintain code modularity, state management of the Azure resources and similar codebase for managing both AWS and Azure. While ARM templates provide ready-made templates to deploy Azure resources, there is no easy way to share the states of Azure resources that are already deployed. This is important, especially in large-scale deployments that involve multi-environment and multi-region infrastructures. For example, there could be scenarios in which you may need subnet IDs from the west-us when you are deploying some NSGs in the east-us. With the inclusion of workspaces, it is now possible to share states reliably across multiple regions or tiers. And for unsupported Azure resources, Azure CLI can be integrated within Terraform resource blocks.
Terraform with Consul and Vault
Terraform states can be stored locally, in Azure Blob storage or in Consul. Though Consul adds an operational overhead to its install and configurations, it provides key value stores to store the state, and a mechanism to lock the state files when more than one deployment is simultaneously acting on the same state file. This capability is priceless in multiuser scenarios. Plus, Consul lets you sync states across multiple regions. Being a key value store, Consul is also used to store the application and other configuration data, even for workloads in different geographical locations.
Vault is a secret store, which uses Consul as a backend to store keys, secrets, etc., to securely pass the admin credentials and connection string info to the Terraform or Azure CLI. Vault also provides advanced features like CA, multi-region coverage, dynamic secrets and easy integration with Terraform. Future iterations will include Azure Key Vault as a possible replacement for Vault.
Logging and Monitoring
To meet security requirements, any activity within the Azure infrastructure needs to be logged. Third-party tools for logging and monitoring are quite mature in the AWS space, but in the case of Azure, custom forwarders are needed to enable logging, especially for monitoring cloud native logs, such as Azure Activity logs, Azure AD logs, etc. To simplify the equation, the decision was made to use Azure Log Analytics, as it supports the logging of cloud native Azure Services, such as Azure Functions, Azure Activity logs, etc., across multiple subscriptions consolidated within a single workspace. For infrastructure logs, such as NSG logs, Application gateway logs, Azure Key Vault logs, etc., Azure CLI scripts are embedded in Terraform to forward logs to specified workspaces. And for agent based logging, such as VMs, Containers, etc., Chef cookbooks are used to provision the agents. Azure log analytics also provide pre-built dashboards and solutions to report key Azure resources such as containers, Azure AD, NSGs, etc.
Security standards and best practices need to be embedded within each component of Azure resources and deployment. Azure Security Center should be enabled to detect infrastructure vulnerabilities. ASC is now natively integrated with Azure log analytics and log analytics agents, etc., to build security awareness and recommendations. All the VM disks and storage accounts are enabled with encryption at rest, as part of the automation. Advanced threat detection features in the Azure AD are enabled to monitor the login patterns. Trend Micro Deep Security is deployed to provide the network and host based IDS/IPS. Trend Micro agents are provisioned within each VM as part of the automation. These features will provide the security coverage required to detect any suspicious activities within the Azure infrastructure.
With a wide range of tools in the mix–such as Terraform, Azure Automation, Azure VM extensions, Chef, etc.–application configuration functionalities might overlap. Azure VM extensions support provisioning Chef agents, Trend Micro agents, Custom scripts, etc., but there isn’t an easy way to manage VM extensions across multiple servers. To make them easy to maintain, all custom VMs are pre-built with Chef agents using Packer and Chef as standard platforms for all application configurations. Terraform is used strictly to stand up the cloud infrastructure and supply the configuration data, such as cookbook attributes, connection strings, etc., from Consul and Vault to the Chef agents on VMs. Chef agents will pull the cookbooks and perform VM configurations. Jenkins, in conjunction with Chef and Artifactory will establish CI/CD pipelines for both infrastructure and application deployments.
With the current maturity of CLI, OMS and Security features in Azure, along with third party DevOps toolsets, it is quite possible to maintain automation end-to-end, and to build reliable, repeatable and maintainable infrastructure in Azure, just as you can with AWS.