Skip to content
CTP is part of HPE Pointnext Services.   Explore our new services here →
  • The Doppler Report
Cloud TP Logo
  • Thought Leadership
  • Clients
  • Services
  • Careers
  • Contact Us

Cloud Technology Partners

CLOUD SERVICES

  • The Cloud Adoption Program
  • Application Migration
  • Software Development
  • Infrastructure Modernization
  • DevOps & Continuous Delivery
  • Cloud Security & Governance
  • Cloud Strategy Consulting

TECH DOMAIN

  • Amazon Web Services
  • Google Cloud Platform

ABOUT US

  • Company Overview
  • Leadership Team
  • Partners
  • News & Recognition
  • Announcements
  • Weekly Cloud Report
  • Client Case Studies
  • Events

CAREERS

  • Join Us
  • Job Opportunities
 Cloud Technology Partners
  • Doppler Home
  • Client Case Studies
  • Podcasts
  • Videos
  • White Papers
  • Quarterly
  • Events
  • Subscribe

High Performance Compute on AWS, Google & Azure

We now have the technology in the cloud to run HPC workloads and the real-world use cases to demonstrate its success.
Joey Jablonski Former VP, CTO
September 5, 2016January 22, 2020 THE DOPPLER
Share this 
doppler_mail1

For more content like this, Get THE DOPPLER
email every Friday.
 
Subscribe here  chevron_right

High Performance Compute or High Performance Computing (HPC) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering or in business.

Today we are at the point where we not only have the technology in the cloud needed to run HPC workloads, but also have real-world use cases of companies who are successfully doing it.

Here is our breakdown of HPC capabilities across all three major public cloud providers.

Amazon Web Services

Amazon Web Services was the first major cloud provider to offer services tailored to the unique needs of the High Performance Computing HPC community.

This provided AWS an early set of adopters that supplied feedback and enabled Amazon to continually expand their capabilities to support a wider range of HPC workloads. Today, AWS has specialized capabilities through its platform features, case studies and marketplace partners to enable a range of specialized HPC workloads:

  • 3D Rendering & Special Effects
    As more consumer entertainment utilizes special effects, HPC has become a common tool for video rendering and introducing digital elements not easily filmed which can be more vivid through graphic arts.
  • EDA
    Electronic Design Automation, is a common HPC application used to simulate performance and failures within silicon chips during the design phase. Many application vendors, including Cadence and Synopsys, provide highly scalable tools to parallelly execute across systems for designing and analyzing complex circuits and chips.
  • Genomics
    While some Genomics workloads fit into the Big Data space more than HPC, the Genomics space continues to leverage complex HPC platforms. Many Genomics firms today have a combination of Big Data and HPC technologies integrated into a single analysis pipeline.
  • CFD
    Computational Fluid Dynamics is an early workload on HPC that has continued to gain momentum in the industrial field as product design has become more virtual. These workloads are mathematically intensive.
  • Risk Modeling & Back Testing
    Specific to the Financial Services domain, many firms will continually analyze the exposure of the organization against current and future trades taking place. New models are tested against past data sets to see how changes would potentially impact the market and the organization’s positions.

As part of AWS’s investment in HPC capabilities, there are a variety of technologies available to accelerate performance, simplify configuration and automate monitoring for HPC workloads on AWS:

  • CfnCluster
    A series of scripts that leverage CloudFormation for the rapid setup and configuration of HPC clusters in AWS. CfnCluster enables administrators to automate the deployment of compute resources for HPC workloads as well as ensure that the proper environment and libraries are in place to support application execution.
  • Placement Groups
    Placement groups are groupings of EC2 instances which provide a consistent, lower latency than would be available if instance location was decided dynamically. Placement groups enable more efficient communication between nodes. Only certain types of instances are available for assignment to placement groups.
  • Enhanced Networking
    Single root I/O virtualization (SR-IOV) enables more efficient communication from EC2 instances, and allow larger packet per second counts in network communication.
  • GPU Instances
    AWS provides specialized instances that are attached to GPUs, providing high density core counts for parallel processing using CUDA or OpenCL.
  • CloudFormation
    CloudFormation allows administrators and engineers to automate the process of setting up AWS resources, including networking, EC2, and VPCs. CloudFormation has programmatic methods and strongly documented APIs that can be integrated with existing HPC workload schedulers to quickly create highly customized environments to execute workloads.
  • CloudWatch
    CloudWatch provides a centralized service to monitor and respond to service outages and collect logs. CloudWatch is a key operational tool in a dynamic cloud environment that ensures application developers and engineers have a central repository for event and log information as services start and stop over time.

In addition to the native HPC capabilities that AWS delivers out of the box, the AWS Marketplace provides a variety of third-party technologies that are tested and certified to run on AWS.

These technologies can be deployed quickly and have simple elastic pricing models:

  • Intel Lustre
    Lustre is a very common parallel filesystem in the HPC space. Intel provides a pre-packaged AMI with Lustre installed. These AMIs can be used to quickly deploy a fully-supported version of Lustre on AWS.
  • FSMLabs TimeKeeper
    Many HPC workloads, specifically from the Financial Services industry, require highly accurate timestamps to operate and ensure data integrity. FSMLabs offers their TimeKeeper product in the AWS marketplace to assist admins in deploying highly accurate time synchronization.
  • Univa Grid Engine
    Univa Grid Engine, a leading workload scheduler for HPC is available as a preloaded AMI on AWS. Univa provides AMIs for both a Head node and Compute node variants with cloud-based usage pricing.

Google Cloud Platform

Google takes a slightly different approach to the HPC market than other cloud vendors. In addition to supporting IaaS capabilities for compute resources, they provide advanced machine learning capabilities they often also refer to as HPC. This is uncommon in the marketplace, as machine learning applies to a different set of domains than HPC. We will discuss Google capabilities in both areas for comparison.

While some customers do run HPC workloads on Google, for jobs like product design, rendering, special effects and other modeling of the physical world, Google does not offer the same breadth of capabilities as other providers. Google has a set of IaaS capabilities, including high CPU count instances, that can be used for HPC workloads.

Google Genomics supports genomics analysis workloads, a very common application for HPC, with significantly less upfront configuration and setup than traditional tools.

Google’s differentiation lies in the advanced computing capabilities it provides around machine learning. Machine learning is commonly leveraged by applications involved in analyzing human interactions, including social media, image analysis and social influences. Google speaks to machine learning capabilities as HPC because of the highly scalable nature of their ML implementations and the specialized hardware they leverage to provide high levels of performance. Out of the advanced work Google is doing for ML, two unique sets of technology are at the core:

  • TensorFlow
    TensorFlow is an open source set of libraries for analyzing data flow graphs. Google developed this capability as part of its machine learning work and open sourced the tool as a generic analysis platform that can be applied to many different domains and problem sets.
  • TPU
    Google worked to design and deploy the Tensor Processing Units inside its data centers to accelerate ML workloads. The TPU is a custom ASIC, specifically optimized for ML workloads.

Google also continues to build its partner ecosystem, with some partners focused on HPC. One partner with key capability is CycleComputing, which provides the ability to easily schedule HPC workloads to run on a variety of cloud providers, including Google.

While AWS and Azure provide rich sets of IaaS functionality, complemented by HPC specific technologies, Google has taken a path of PaaS capabilities with Google Genomics and Google Machine Learning. This allows organizations to analyze large, complex data sets without having to deploy, configure and manage IaaS services. Google’s approach is unique and will inevitably continue to be expanded to additional domains.

Microsoft Azure

Azure refers to HPC-centric workloads as Big Compute. This is to distinguish the HPC workloads that are processor and interconnect intensive, from Big Data workloads that have very different communication patterns from HPC. The Azure Big Compute capabilities on Azure cover several specific domains.

  • Engineering Design and Simulation
    Simulations, including finite element analysis, structural analysis and computational fluid dynamics, which commonly support product design and validation.
  • Genomics Research
    Workloads that enable researchers to evaluate larger and larger sets of population and genomic data, to accelerate time to market for new treatments and diagnostic routines.
  • Financial Risk Modeling
    Empowers financial organizations to quickly assess risk, empower efficient decision making and ensure compliance with industry regulations.
  • Rendering
    Rapidly scale resources to support the production of special effects for movie production, and empower designers to model products in new and interactive ways.

Azure has invested to ensure specific technologies are available for supporting HPC workloads. These technologies are focused on rapid deployment, high performance and scalability.

Some key capabilities include:

  • Hybrid HPC Pack
    Many HPC users start with hybrid models, then grow to native cloud deployments for all HPC resources. The Hybrid HPC pack from Azure allows a head node to be configured on-premise and then distribute jobs to Azure compute nodes. This model can be used for large workloads to scale very quickly, as well as to minimize cost so that fewer nodes have to be purchased for on-premise use.
  • RDMA and MPI
    Remote Direct Memory Access (RDMA) allows very high speed communications between nodes by using lighter weight protocols. Coupling RDMA with the Message Passing Interface (MPI) included with the Microsoft HPC Pack enables highly efficient communications between nodes at low latency.
  • Azure Resource Manager
    Resource Manager allows the creation of templates that span multiple Azure services to automate deployment, configuration and monitoring. Resource Manager is a key component of any HPC deployment on Azure that ensures consistency in deployed instances, and monitors for any failed resources that require re-provisioning.
  • Future GPU instances
    Graphics Processing Units (GPU) provide very large core count processors for execution of certain workloads in parallel. Many workloads can benefit from GPU acceleration. To assist these workloads, Microsoft has announced the future availability of GPU instances on Azure.

Microsoft has a large set of industry software vendors that sell commercial applications, tested and supported for running on Windows based hosts. These applications are commonly leveraged in Azure to allow customers to rapidly scale their capacity needs as user demand changes. Azure provides a scalable platform for the execution of HPC workloads, with additional capabilities to ensure automated management and high performance.

Share this


Related articles

 

Kubernetes and Opening Core Technologies at Google

By David Linthicum

 

The Calm After the Cloud Storm - Our Take on the AWS S3 Outage

By Mike Kavis

 

Data Warehousing with Apache Hive on AWS: Architecture Patterns

By Sudi Bhattacharya

Related tags

AWS   Cloud Providers   Google   HPC   Microsoft Azure

Joey Jablonski

Full bio and recent posts »



Find what you're looking for.

Visit The Doppler topic pages through the links below.

PLATFORMS

AWS
CTP
Docker
Google
IBM
Kubernetes
Microsoft Azure
OpenStack
Oracle
Rackspace

BEST PRACTICES

App Dev
App Migration
Disaster Recovery
Change Management
Cloud Adoption
Cloud Economics
Cloud Strategy
Containers
Data Integration
DevOps
Digital Innovation
Hybrid Cloud
Managed Services
Security & Governance

SUBJECTS

Big Data
Blockchain
Cloud Careers
CloudOps
Drones
HPC
IoT
Machine Learning
Market Trends
Mobile
Predictive Maintenance
Private Cloud
Serverless Computing
Sustainable Computing
TCO / ROI
Technical "How To" Vendor Lock-In

INDUSTRIES

Agriculture
Energy & Utilities
Financial Services
Government
Healthcare
Manufacturing
Media & Publishing
Software & Technology
Telecom

EVENTS

CES
DockerCon
Google NEXT
Jenkins
re:Invent


 

Get The Doppler

Join 5,000+ IT professionals who get The Doppler for cloud computing news and best practices every week.

Subscribe here


Services

Cloud Adoption
Application Migration
Digital Innovation
Compliance
Cost Control
DevOps
IoT

Company

Overview
Leadership
Why CTP?
News
Events
Careers
Contact Us

The Doppler

Top Posts
White Papers
Podcasts
Videos
Case Studies
Quarterly
Subscribe

Connect

LinkedIn
Twitter
Google +
Facebook
Sound Cloud

CTP is hiring.

Cloud Technology Partners, a Hewlett Packard Enterprise company, is the premier cloud services and software company for enterprises moving to AWS, Google, Microsoft and other leading cloud platforms. We are hiring in sales, engineering, delivery and more. Visit our careers page to learn more.

CWC-blue-01

© 2010 - 2019 Cloud Technology Partners, Inc., a Hewlett Packard Enterprise company. All rights reserved. Here is our privacy policy CTP, CloudTP and Cloud with Confidence are registered trademarks of Cloud Technology Partners, Inc., or its subsidiaries in the United States and elsewhere.

Do Not Sell My Personal Information

  • Home
  • Cloud Adoption
  • Digital Innovation
  • Managed Cloud Controls
  • The Doppler Report
  • Clients
  • Partners
  • About CTP
  • Careers
  • Contact Us
  • Most Recent Posts
  • All Topics
  • Podcasts
  • Case Studies
  • Videos
  • Contact
Our privacy statement has been changed to provide you with additional information on how we use personal data and ensure compliance with new privacy and data protection laws.  
Please take time to read our new Privacy Statement.
Continue