aws/aws-parallelcluster v3.14.0 on GitHub

We're excited to announce the release of AWS ParallelCluster 3.14.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

Include drivers for P6e-GB200 and P6-B200 instances. ParallelCluster sets up Slurm topology plugin to handle P6e-GB200 UltraServers. See limitations section for important additional setup requirements.
Support prioritized and capacity-optimized-prioritized Allocation Strategy. This allows users to prioritize subnets for instance placement to optimize costs and performance.
Add build-image support for Amazon Linux 2023 AMIs based on kernel 6.12 (in addition to 6.1).
Support DCV on Amazon Linux 2023.
Echo chef-client logs in the instance console when a node fails to bootstrap. This helps with investigating bootstrap failures in cases CloudWatch logs are not available.

LIMITATIONS

P6e-GB200 instances are only tested on Amazon Linux 2023, Ubuntu 22.04 and Ubuntu 24.04.
Using IMEX on P6e-GB200 requires additional setup. Please refer to the dedicated tutorial in our public documentation.
P6-B200 instances are only tested on Amazon Linux 2023, RHEL 8 & 9, Rocky 8 & 9, Ubuntu 22.04 and Ubuntu 24.04.
GPU HealthChecks are not recommended for instances with GPU memory above 320GB (such as p6-b200.48xlarge). Health check duration can exceed 10 minutes, potentially causing job failures and significantly reducing the job throughput.

CHANGES

Install nvidia-imex for all OSs except Amazon Linux 2.
Remove UnkillableStepTimeout from slurm.conf and let slurm set this value.
Upgrade Python runtime used by Lambda functions to Python 3.12 (from 3.9). See Lambda Documentation for important information about Python 3.9 EOL: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
Support encryption of EFS file system used for the head node internal shared storage via a new configuration parameter HeadNode/SharedStorageEfsSettings/Encrypted
Add validator that warns against using non GPU instances with DCV.
Upgrade Slurm to version 24.11.6 (from 24.05.8).
Upgrade EFA installer to 1.43.2 (from 1.41.0).
- Efa-driver: efa-2.17.2-1
- Efa-config: efa-config-1.18-1
- Efa-profile: efa-profile-1.7-1
- Libfabric-aws: libfabric-aws-2.1.0-5
- Rdma-core: rdma-core-58.0-1
- Open MPI: openmpi40-aws-4.1.7-2 and openmpi50-aws-5.0.6-11
Upgrade Cinc Client to version 18.4.12 (from 18.2.7).
Upgrade NVIDIA driver to version 570.172.08 (from 570.86.15) for all OSs except Amazon Linux 2.
Upgrade CUDA Toolkit to version 12.8.1 (from 12.8.0) for all OSs except Amazon Linux 2.
Upgrade DCGM to version 4.4.1 (from 3.3.6) for all OSs except Amazon Linux 2.
Upgrade Python to 3.12.11 (from 3.12.8) for all OSs except Amazon Linux 2.
Upgrade Python to 3.9.23 (from 3.9.20) for Amazon Linux 2.
Upgrade Intel MPI Library to 2021.16.0 (from 2021.13.1).
Upgrade DCV to version 2024.0-19030.
Upgrade the official ParallelCluster Amazon Linux 2023 AMIs to kernel 6.12 (from 6.1).

BUG FIXES

Prevent build-image stack deletion failures by deploying a global role that automatically deletes the build-image stack after images either succeed or fail the build.
The role is meant to exist even after the stack has been deleted. See #5914.
Fix an issue where Security Group validation failed when a rule contained both IPv4 ranges (IpRanges) and security group references (UserIdGroupPairs).
Fix build-image failure on Rocky 9, occurring when the parent image does not ship the latest kernel version on the latest Rocky minor version.
Fix cluster id mismatch issue which causes cluster update failures when slurm accounting is used.
Fix a race condition in CloudWatch Agent startup that could cause node bootstrap failures.

DEPRECATIONS

The configuration parameter LoginNodes/Pools/Ssh/KeyName has been deprecated, and it will be removed in future releases. The CLI now returns a warning message when it is used in the cluster configuration.
See #6811.
Ubuntu 20.04 is no longer supported.

aws/aws-parallelcluster v3.14.0 AWS ParallelCluster v3.14.0 on GitHub

Upgrade

aws/aws-parallelcluster v3.14.0
AWS ParallelCluster v3.14.0

on GitHub