AWS ParallelCluster v2.10.0
enrico-usai
released this
18 Nov 16:21
·
137 commits
to release-2.10
since this release
We're excited to announce the release of AWS ParallelCluster 2.10.0.
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add support for CentOS 8 in all Commercial regions.
- Add support for P4d instance type as compute node.
- Add the possibilty to enable NVIDIA GPUDirect RDMA support on EFA by using the new
enable_efa_gdr
configuration
parameter. - Enable support for NICE DCV in GovCloud regions.
- Enable support for AWS Batch scheduler in GovCloud regions.
- FSx Lustre:
- Add possibility to configure Auto Import policy through the new
auto_import_policy
parameter. - Add support to HDD storage type and the new
storage_type
anddrive_cache_type
configuration parameters.
- Add possibility to configure Auto Import policy through the new
- Create a CloudWatch Dashboard for the cluster, named
<clustername>-<region>
, including head node EC2 metrics and
cluster logs. It can be disabled by configuring theenable
parameter in thedashboard
section. - Add
-r/-region
arg topcluster configure
command. If this arg is provided, configuration will
skip region selection. - Add
-r/-region
arg tossh
anddcv connect
commands. - Add
cluster_resource_bucket
parameter undercluster
section to allow the user to specify an existing S3 bucket. createami
:- Add validation step to fail when using a base AMI created by a different version of ParallelCluster.
- Add validation step for AMI creation process to fail if the selected OS and the base AMI OS are not consistent.
- Add
--post-install
parameter to use a post installation script when building an AMI. - Add the possibility to use a ParallelCluster base AMI.
- Add possibility to change tags when performing a
pcluster update
. - Add new
all_or_nothing_batch
configuration parameter forslurm_resume
script. WhenTrue
,slurm_resume
will
succeed only if all the instances required by all the pending jobs in Slurm will be available. - Enable queue resizing on update without requiring to stop the compute fleet. Stopping the compute fleet is only
necessary when existing instances risk to be terminated. - Add validator for EBS volume size, type and IOPS.
- Add validators for
shared_dir
parameter when used in bothcluster
andebs
sections. - Add validator
cfn_scheduler_slots
key in theextra_json
parameter.
CHANGES
- CentOS 6 is no longer supported.
- Upgrade EFA installer to version 1.10.1
- EFA configuration:
efa-config-1.5
(from efa-config-1.4) - EFA profile:
efa-profile-1.1
(from efa-profile-1.0.0) - EFA kernel module:
efa-1.10.2
(from efa-1.6.0) - RDMA core:
rdma-core-31.amzn0
(from rdma-core-28.amzn0) - Libfabric:
libfabric-1.11.1amzn1.1
(from libfabric-1.10.1amzn1.1) - Open MPI:
openmpi40-aws-4.0.5
(from openmpi40-aws-4.0.3) - Unifies installer runtime options across x86 and aarch64
- Introduces
-g/--enable-gdr
switch to install packages with GPUDirect RDMA support - Updates to OMPI collectives decision file packaging, migrated from efa-config to efa-profile
- Introduces CentOS 8 support
- EFA configuration:
- Upgrade NVIDIA driver to version 450.80.02.
- Install NVIDIA Fabric manager to enable NVIDIA NVSwitch on supported platforms.
- Remove default region
us-east-1
. After the change,pcluster
will adhere to the following lookup order for region:-r/--region
arg.AWS_DEFAULT_REGION
environment variable.aws_region_name
in ParallelCluster configuration file.region
in AWScli configuration file.
- Slurm: change
SlurmctldPort
to 6820-6829 to not overlap with defaultslurmdbd
port (6819). - Slurm: add
compute_resource
name andefa
as node features. - Remove validation on
ec2_iam_role
parameter. - Improve retrieval of instance type info by using
DescribeInstanceType
API. - Remove
custom_awsbatch_template_url
configuration parameter. - Upgrade
pip
to latest version in virtual environments. - Upgrade image used by CodeBuild environment when building container images for Batch clusters, from
aws/codebuild/amazonlinux2-x86_64-standard:1.0
toaws/codebuild/amazonlinux2-x86_64-standard:3.0
.
BUG FIXES
- Retrieve the right number of compute instance slots when instance type is updated.
- Include user tags in compute nodes and EBS volumes.
- Fix
pcluster status
output when head node is stopped. pcluster update
:- Fix issue when tags are specified but not changed.
- Fix issue when the
cluster
section label changed. - Fix issue when
shared_dir
andebs_settings
are both configured in thecluster
section. - Fix
cluster
andcfncluster
compatibility inextra_json
parameter.
- Fix
pcluster configure
to avoid using default/initial values for internal parameter initialization. - Fix pre/post install script arguments management when using double quotes.
- Fix a bug that was causing
clustermgtd
andcomputemgtd
sleep interval to be incorrectly computed when
system timezone is not set to UTC. - Fix queue name validator to properly check for capital letters.
- Fix
enable_efa
parameter validation forqueue
section. - Fix CloudWatch Log Group creation for AWS Lambda functions handling CloudFormation Custom Resources.