Run docker-based workload on HPC with GPU
In this case study, we will walk thru how to convert a docker image into singularity format and import it into the cluster, how to look up appropriate hardware, and finally enqueue a job.
Due to security concerns, OAsis HPC supports Singularity rather than Docker. But you may convert docker images easily using command line statements.
Convert a docker image to Singularity
If you are usinguse containers other than Docker and Singularity, please consult this page for details.
In order toTo communicate with our GPU, your container should have CUDA. Version 11.6 is recommended. Besides packaging CUDA from scratch, you may also extend the built-in image nvhpc.22.9-devel-cuda_multi-ubuntu20.04.sif in /pfss/containers. It has lots of GPU libraries and utilities pre-built.
We recommend putting your containers in the containers folder under one of the scratch folders so you can browse inside the portal. Following is an example of converting the julia docker image to Singularity to put in the home containers folder.
singularity pull julia.1.8.2.sif docker://julia:alpine3.16
# move it to the containers folder, then we can run it in the web portal
mkdir -p ~/containers
mv julia.1.8.2.sif ~/containers
Explore available GPUs
2.There Preparemay sbatchbe argumentsmore than one GPU available for gpuyour usage
Head 3.1 First, findto the partitionpartitions page,page andto thenlook openup the nodesbest tab.partition for your job.
You may 3.2also click on the nodes count, to check out the nodes under this partition. In this popup, we can find all the gpus and their availability in this cluster. Forfollowing example, there areis 4only gpusone withnode 1currently available in the gpu corepartition. That node has 6 idle GPUs ready for use.
a100 is referring to the NVIDIA A100 80GB GPU. 1g.10gb and 10gb memory, 1 gpu with 3 gpu cores and 3g.40gb memory,are 1a gpuMIG with entire a100 capability. partitions.
4. Execute cmd in container by using srun cmd or using sbatch script file. In this example, we use 1 a100 gpu to run nvaccelinfo. The result should show us the gpu info.
4.1 With srun cmd
srun -p gpu --gpus a100:1 singularity exec --nv /pfss/containers/nvhpc.22.9-devel-cuda_multi-ubuntu20.04.sif /bin/sh -c nvidia-smi
4.2 With sbatch script
#!/usr/bin/env bash
#SBATCH -p gpu
#SBATCH --gpus a100:1
singularity exec --nv /pfss/containers/nvhpc.22.9-devel-cuda_multi-ubuntu20.04.sif /bin/sh -c nvaccelinfo