Introduction
Amazon EC2 (Elastic Compute Cloud) provides a scalable and flexible cloud computing environment, and High-Performance Computing (HPC) on EC2 allows users to run compute-intensive workloads, such as simulations, modeling, and data analysis, on a large scale. EC2 offers various instance types optimized for different workloads, including HPC.
Amazon EC2 Hpc7g, Hpc7a, and Hpc6id are HPC-optimized instances purpose built for running HPC workloads at scale on AWS. For running a successful HPC workloads one can choose any of these instances and along with the HPC supporting services offered by AWS.
An Overview of EC2 HPC instance types offered
When you launch an Instance in AWS console, you tend to choose the AMI and Instance type. Depending on the region you choose the supported HPC instances may vary, ie, some regions does not support or they’re enabled all the HPC optimized instances. Depending on your application workload you can choose the HPC optimized instance, keeping in mind that HPC optimized instances are costlier than other general type instances. So users are advised to list the available EC2 HPC instances in the region and find out the cost per hour of each EC2 HPC instance.
Listing the EC2 HPC instances in an AWS region
Follow the steps to find out the supported EC2 HPC instances in a particular region:
- Choose the region you want to deploy the workload
- Choose EC2 service by searching for EC2 in the search dialog box. Click “Launch an instance” wizard
- Under “Instance type”, you may find the “compare instance type” link (highlighted in red box) as shown in the below image
- On clicking “Compare instance types” the AWS console lists all the available Instance types in a list of pages. On the top, you will find a search box for finding resources by attribute or tag. There you type “hpc” and hit enter. This will list all the available HPC instances in the region. The below screenshot shows a similar search list in the Ohio region
From the list one can choose the Instance type and click “Select Instance type” to proceed to create an EC2 instance.
Product details of some of the listed Instance types
hpc6a.48xlarge
Hpc6a instances feature 3rd Gen AMD EPYC 7003 series processors with upto 3.6 GHz all-core turbo frequency built on a 7 nm process node for increased efficiency. It supports maximum 96vCPUs and 384Gb memory. The instance type supports up to 100 Gigabit network performance
hpc6id.32xlarge
EC2 Hpc6id instances feature 3rd Generation Intel Xeon Scalable processors that run at frequencies up to 3.5 GHz for increased efficiency. These instances are designed to improve performance for memory-bound workloads by offering 5 GB/s memory bandwidth per vCPU. It supports maximum of 64 vCPUs and 1024 Gb memory. The instance type supports up to 200 Gigabit network performance.
hpc7a series
Amazon Elastic Compute Cloud (Amazon EC2) Hpc7a instances, powered by 4th Gen AMD EPYC processors, deliver up to 2.5x better performance compared to Amazon EC2 Hpc6a instances. Hpc7a instances feature 2x higher core density (up to 192 cores), 2.1x higher memory bandwidth throughput, 2x memory (768 GB), and 3x higher network bandwidth compared to Hpc6a instances.
hpc7a.12xlarge supports up to 24 vCPUs.
hpc7a.24xlarge supports up to 48 vCPUs.
hpc7a.48xlarge supports up to 96 vCPUs.
hpc7a.96xlarge supports up to 192 vCPUs.
Apart from Instance types some of the features and considerations to be taken for running HPC workloads on Amazon EC2 include:
Placement Groups
To achieve low-latency communication between instances, you can use EC2 Placement Groups. There are different types of placement groups, such as “Cluster” for low-latency and “Spread” for increased availability.
Elastic Fabric Adapter (EFA)
EFA is a network interface designed for HPC applications that require low-latency and high-bandwidth communication between instances. It enables MPI (Message Passing Interface) communication for tightly-coupled HPC workloads.
Parallel Cluster
AWS ParallelCluster is an open-source cluster management tool that simplifies the deployment and management of HPC clusters on EC2. It supports various job schedulers, including Torque, Slurm, and AWS Batch.
Amazon FSx for Lustre
Lustre is a high-performance file system commonly used in HPC environments. Amazon FSx for Lustre is a fully managed service that provides scalable and high-performance file storage for HPC workloads.
Elastic Load Balancing
For distributed and parallel applications, you can use Elastic Load Balancing (ELB) to distribute incoming traffic across multiple instances, ensuring better resource utilization.
Custom AMIs
Create custom Amazon Machine Images (AMIs) tailored for your HPC applications to streamline instance launches and reduce setup time.
AWS Batch
AWS Batch is a fully managed service that enables you to run batch computing workloads on EC2 instances. It is suitable for parallel and distributed processing scenarios.
Cost Optimization
Choose the most cost-effective instance types for your specific workload and use spot instances or reserved instances to optimize costs.
Conclusion
Running High-Performance Computing (HPC) workloads on Amazon EC2 offers several benefits, making it a compelling choice for organizations and researchers with demanding computational requirements. Here are some key advantages of using Amazon EC2 for HPC:
Scalability: EC2 provides on-demand scalability, allowing you to scale your HPC infrastructure up or down based on your workload requirements. This elasticity ensures that you can handle varying computational needs without over-provisioning resources.
Diverse Instance Types: EC2 offers a variety of instance types optimized for different workloads, including compute-intensive HPC applications. Users can choose instances with specific CPU, GPU, or FPGA configurations based on their performance requirements.
Cost Efficiency: EC2 provides various pricing options, including On-Demand Instances, Reserved Instances, and Spot Instances. Users can optimize costs based on their workload characteristics and availability requirements. Spot Instances, in particular, offer significant cost savings for fault-tolerant and flexible workloads.
Managed Services: AWS offers managed services like AWS ParallelCluster, which simplifies the setup, configuration, and management of HPC clusters on EC2. Additionally, services like Amazon FSx for Lustre provide fully managed high-performance file systems.
Elastic Fabric Adapter (EFA): EFA is a network interface designed for HPC workloads that require low-latency communication between instances. It enables MPI communication, helping to achieve better performance for tightly-coupled parallel applications.
Customization and Flexibility: Users can customize Amazon Machine Images (AMIs) and choose from a wide range of operating systems to meet specific HPC application requirements.
EC2 also supports various job schedulers, enabling users to use their preferred scheduler for workload management.
Global Reach: EC2 is available in multiple AWS regions worldwide, allowing users to deploy HPC workloads close to their end-users or data sources. This global reach helps reduce latency and improves overall performance.
Integration with AWS Services: EC2 seamlessly integrates with other AWS services, such as Amazon S3 for object storage, AWS Identity and Access Management (IAM) for security, and AWS CloudWatch for monitoring. This integration simplifies the overall management of HPC workloads and enhances the capabilities of the infrastructure.
Security and Compliance: AWS provides a robust security framework, including features like Virtual Private Cloud (VPC), security groups, and network ACLs. Users can implement encryption, access controls, and compliance measures to ensure the security of their HPC applications and data.
Community and Support: The AWS community and support resources provide assistance and guidance for users deploying HPC workloads on EC2. Documentation, forums, and AWS support help users troubleshoot issues and optimize their HPC infrastructure.
Follow our Twitter and Facebook feeds for new releases, updates, insightful posts and more.