Run docker-based workload on HPC with GPU
In this case study, we will walk thru how to convert a docker image into singularity format and import it into the cluster, how to look up appropriate hardware, and finally enqueue a job.
Due to security concerns, OAsis HPC supports Singularity rather than Docker. But you may convert docker images easily using command line statements.
Convert a docker image to Singularity
If you use containers other than Docker and Singularity, please consult this page for details.
To communicate with our GPU, your container should have CUDA. Version 11.6 is recommended. Besides packaging CUDA from scratch, you may also extend the built-in image nvhpc.22.9-devel-cuda_multi-ubuntu20.04.sif in /pfss/containers. It has lots of GPU libraries and utilities pre-built.
We recommend putting your containers in the containers folder under one of the scratch folders so you can browse inside the portal. Following is an example of converting the julia docker image to Singularity to put in the home containers folder.
singularity pull julia.1.8.2.sif docker://julia:alpine3.16
# move it to the containers folder, then we can run it in the web portal
mkdir -p ~/containers
mv julia.1.8.2.sif ~/containers
Explore available GPUs
There may be more than one GPU available for your account. Head to the partitions page to look up the best partition for your job.
You may also click on the nodes count, to check out the nodes under this partition. In the following example, there is only one node currently available in the gpu partition. That node has 6 idle GPUs ready for use.
a100 is referring to the NVIDIA A100 80GB GPU. 1g.10gb and 3g.40gb are a MIG partitions.
4. Execute cmd in container by using srun cmd or using sbatch script file. In this example, we use 1 a100 gpu to run nvaccelinfo. The result should show us the gpu info.
4.1 With srun cmd
srun -p gpu --gpus a100:1 singularity exec --nv /pfss/containers/nvhpc.22.9-devel-cuda_multi-ubuntu20.04.sif /bin/sh -c nvidia-smi
4.2 With sbatch script
#!/usr/bin/env bash
#SBATCH -p gpu
#SBATCH --gpus a100:1
singularity exec --nv /pfss/containers/nvhpc.22.9-devel-cuda_multi-ubuntu20.04.sif /bin/sh -c nvaccelinfo