Skip to content

aws-samples/aws-pcluster-post-samples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ParallelCluster post-install samples

This repository gather some ParallelCluster post-install samples for common HPC-related operations.
Primary ParallelCluster script sets the environment and launches secondary scripts according to their naming convention.
All those scripts are meant to be stored on an S3 bucket. See more details in the Requirements section below.

At the moment we are including:

  1. 01.install.enginframe.master.sh
    Secondary script installing NICE EnginFrame HPC portal
  2. 02.install.dcv.broker.master.sh
    Secondary script installing DCV Session Manager Broker

Software and Services used

AWS ParallelCluster is an open source cluster management tool that simplifies deploying and managing HPC clusters with Amazon FSx for Lustre, EFA, a variety of job schedulers, and the MPI library of your choice. AWS ParallelCluster simplifies cluster orchestration on AWS so that HPC environments become easy-to-use even for if you’re new to the cloud. 

NICE EnginFrame is the leading grid-enabled application portal for user-friendly submission,control and monitoring of HPC jobs and interactive remote sessions.It includes sophisticated data management for all stages of HPC job lifetime and is integrated with most popular job schedulers and middleware tools to submit, monitor, and manage jobs.

NICE DCV  is a remote visualization technology that enables users to securely connect to graphic-intensive 3D applications hosted on a remote, high-performance server. With NICE DCV, you can make a server's high-performance graphics processing capabilities available to multiple remote users by creating secure client sessions. 

NICE DCV Session Manager is set of two software packages (an Agent and a Broker) and an application programming interface (API) that makes it easy for developers and independent software vendors (ISVs) to build front-end applications that programmatically create and manage the lifecycle of NICE DCV sessions across a fleet of NICE DCV servers. 

Overview

I’ll add the following 2 options to my ParallelCluster configuration file:
post_install = s3://<bucket>/<bucket key>/scripts/post.install.sh
post_install_args = '<bucket> <bucket key> <efadmin password (optional)>'
The first one, post_install, specifies a Bash script stored on Amazon S3 as ParallelCluster post-install option. This is my main script that will run secondary scripts for EnginFrame and DCV Session Manager broker respectively.

The second parameter, post_install_args, passes a set of arguments to the above script:
  • the S3 bucket repository and
  • the S3 bucket key identifying the location of the secondary scripts
  • the password for EnginFrame administrator user, required to install EnginFrame
Secondary script will get those arguments, detect all the other information required and proceed with the installation of the 2 components on ParallelCluster master host.

EnginFrame and DCV Session Manager Broker secondary scripts are separated, so you can potentially install just one of them.

Note: This procedure has been tested with EnginFrame version 2020.0 and DCV Session Manager Broker version 2020.2. With easy modifications, though, it can work with previous versions, just mind to add the license management.

Walktrough

Requirements

To perform a successful installation of EnginFrame and DCV Sesssion Manager broker, you’ll need:
  • An S3 bucket, made accessible to ParallelCluster via its s3_read_resource or s3_read_write_resource [cluster] settings. Refer to ParallelCluster configuration for details.
  • An EnginFrame efinstall.config file, containing the desired settings for EnginFrame installation. This enables post-install script to install EnginFrame in unattended mode. An example efinstall.config is provided in this post code: You an review and modify it according to your preferences.
    Alternatively, you can generate your own one by performing an EnginFrame installation: in this case an efinstall.config containing all your choices will be generated in the folder where you ran the installation.
  • A security group allowing EnginFrame inbound port. By default ParallelCluster creates a new Master security group with just port 22 publicly opened, so you can either use a replacement (via ParallelCluster vpc_security_group_id setting) or add an additional security group (additional_sg setting). In this post I’ll specify an additional security group.
  • ParallelCluster configuration including post_install and post_install_args as mentioned above and described later with more details
  • (optionally) EnginFrame and DCV Session Manager packages, available online from https://download.enginframe.com. Having them in the bucket avoids the need for outgoing internet access for your ParallelCluster master to download them. In this article I’ll instead have them copied into my target S3 bucket. My scripts will copy them from S3 to the master node.
Note: neither EnginFrame 2020 or DCV Session Manager Broker need a license if running on EC2 instances. For more details please refer to their documentation.

Step 1. Review and customize post-install scripts

GitHub code repository for this article contains 3 main scripts:
  • post.install.sh 
    Primary post-install script, preparing the environment and launching secondary scripts in alphanumerical order
  • 01.install.enginframe.master.sh
    Secondary script installing EnginFrame
    Most installation parameters are up to efinstall.config script
  • 02.install.dcv.broker.master.sh
    Secondary script installing DCV Session Manager Broker
Secondary scripts follow this naming convention: they start with a number that will set their execution order, then they describe their purpose, and finally define the node type in which they should be executed (master or compute) as a last argument, just before the extension, e.g.:
01.install.enginframe.master.sh
|  |                  |      |    
|  |                  |      file extension
|  purpose            |
|                     to be run on master or compute nodes
execution order
While main post-install script post.install.sh just sets environment variables and launches secondary scripts, you might want to check the secondary ones: 01.install.enginframe.master.sh installing EnginFrame and 02.install.dcv.broker.master.sh installing DCV Session Manager Broker.

Crucial parameters are set in ParallelCluster configuration file, and some EnginFrame settings are defined into efinstall.config file. All these files should be checked to reflect what you have in mind.

You can also add further custom scripts, in the same folder, following the naming convention stated above. An example could be installing an HPC application locally on a compute node, or in the master shared folder.

Each script sources /etc/parallelcluster/cfnconfig to get the required information about current cluster settings, AWS resources involved and node type. Specifically, cfnconfig defines 
  • cfn_node_type=MasterServer if current node is the master node 
  • cfn_node_type=ComputeFleet if current node is a compute node
Note: More details on each scripts are provided in Post-install scripts details section following the Walktrough.

Step 2. Prepare your S3 bucket 

I create an S3 Bucket e.g. mys3bucket, with the following structure and contents in a prefix of choice (Packages names and version numbers may vary):
packages
├── NICE-GPG-KEY.conf
├── efinstall.config
├── enginframe-2020.0-r58.jar
└── nice-dcv-session-manager-broker-2020.2.78-1.el7.noarch.rpm
scripts
├── 01.install.enginframe.master.sh
├── 02.install.dcv.broker.master.sh
└── post.install.sh

Step 3. Modify or create your ParallelCluster configuration file

As mentioned, the only settings required by my scripts are the following in the [cluster] section:  post_install, post_install_args and s3_read_resource:
post_install = s3://<bucket>/<bucket key>/scripts/post.install.sh
post_install_args = '<bucket> <bucket key> <efadmin password (optional)>'
s3_read_resource = arn:aws:s3:::<bucket>/<bucket key>/*
The post.install.sh main script is set as the post_install option value, with its S3 full path, and provided arguments:
a) bucket name 
b) bucket folder/key location
c) efadmin user (primary EnginFrame administrator) password
all separated by space. All post install arguments must be enclosed in a single pair of single quotes, as in the example code.
Finally, the s3_read_resource option grants the master access to the same S3 location to download secondary scripts: first one installing EnginFrame (01.install.enginframe.master.sh) and second one installing DCV Session Manager broker (02.install.dcv.broker.master.sh).

Note: you may wish to associate a custom role to the ParallelCluster master instead of using the s3_read_resource option.

Note: ParallelCluster documentation suggests to use double quotes for post_install_args. This is not working with the last version of parallelcluster available when writing this article, so I’m using single quotes. This is under fixing and will probably change in near future.

A configuration file sample is provided under the parallelcuster folder of the github repository.

Step 4. Create ParallelCluster

You can now start ParallelCluster creation with your preferred invocation command, e.g.:
pcluster create --norollback --config parallelcluster/config.sample PC291
Hint: when testing it’s probably better to disable rollback like in the above command line: this will allow you to connect via ssh to the Master instance to diagnose problems if something with the post-install scripts went wrong.

Cleaning up

To avoid incurring future charges, delete idle ParallelCluster instances via its delete command:
pcluster delete --config parallelcluster/config.sample PC291

Post-install scripts details

In this section I’ll list some more details on the scripts logic. This could be a starting point in customizing, evolving or adding more secondary scripts to the solution. For example, you might want to add a further script to automatically install an HPC application into ParallelCluster master node.

Main post.install.sh

Post-install script post.install.sh goes through the following steps:
  • Gets post-install arguments, and exports them as environment variables, in particular:
    export S3Bucket="$2"
    export S3Key="$3"
    export efadminPassword="$4"

  • Downloads the entire scripts subfolder from the S3 bucket into master node /tmp/scripts folder
  • Runs every script in /tmp/scripts in alphanumerical order

EnginFrame

Provided script 01.install.enginframe.sh performs the following steps:
  • Installs openjdk (required for EnginFrame)
  • Downloads the packages subfolder of the bucket into /tmp/packages. So it gets EnginFrame installer and also any other secondary script in advance
  • Checks if EnginFrame installer and efinstall.config are available under /tmp/packages
  • Inline modifies its efinstall.config copy to install EnginFrame under ParallelCluster shared folder cfn_shared_dir
  • Adds efadmin and efnobody local users, again required by EnginFrame. Sets efadmin password if present. If not present you should set it later, for example by connecting via ssh to the master node
  • Installs EnginFrame in unattended mode into the ParallelCluster shared folder
  • Enables and starts EnginFrame service

DCV Session Manager Broker

Provided script 02.install.dcv.broker.master.sh performs the following steps:
  • Downloads the packages sobfolder of the bucket into /tmp/packages
  • Checks if NICE-GPG-KEY and DCV Session Manager Broker package are available under /tmp/packages
  • Imports NICE-GPG-KEY and installs DCV Session Manager Broker rpm
  • Modifies broker configuration to switch port to 8446 since 8443 is used by EnginFrame 
  • Enables and starts DCV Session Manager Broker service
  • Copies DCV Session Manager Broker certificate under efadmin’s home
Optionally, if EnginFrame is installed, it:
  • Registers EnginFrame as API client
  • Saves API client credentials into EnginFrame configuration
  • Adds DCV Session Manager Broker certificate into Java keystore
  • Restarts EnginFrame

Troubleshooting

Detailed output log is available on the master node, in:
  • /var/log/cfn-init.log
  • /var/log/cfn-init-cmd.log
You can reach it via ssh, after getting the master node IP address from AWS Console → EC2 → Instances and looking for an instance named Master.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.