Let’s now look at what really happens in AWS when you submit a job and reveal a bit of the magic behind your auto-scaling computational resources.
At the heart of AWS ParallelCluster exists an auto-scaling group. It is a logical group of instances that can scale up and scale down based on a series of criteria. In the case of AWS ParallelCluster we have three processes controlling the scaling of the cluster. These processes will:
More details can be found in the documentation of AWS ParallelCluster.
Let’s check in more details how this works.
We will now take a look at auto-scaling groups
Go back to your AWS Cloud9 environment and launch a new job with the commands below. It will just run a wait for 5 minutes.
cat > sleep_script.sbatch << EOF
#!/bin/bash
#SBATCH --job-name=hello-world-job
#SBATCH --ntasks=2
#SBATCH --output=%x_%j.out
sleep 300
EOF
sbatch sleep_script.sbatch
Go back to the EC2 Dashboard and Auto Scaling Groups, refresh the fields with the circling arrows if necessary. You should see that an instance just appeared on the desired field instead of 0. It corresponds to the 2 physical cores or c4.xlarge equivalent that we requested.
On the EC2 Dashboard click on Instances on the left side of the window. You should see your compute instances labeled as Compute.
Now you have a better understanding on how AWS ParallelCluster operates. If you are interested, please look at the documentation as there is more to explore.