Run docker-based workload on HPC with GPU

1. Prepare a container image by converting docker image from docker hub. For furter detail, please check ref. It would be recommended that using container that cuda installed. We have several container that cuda already installed and configuated, like nvhpc.22.9-devel-cuda_multi-ubuntu20.04.sif. You can check all ~~those~~the provided image under /pfss/containers/ directory.

singularity pull julia.1.8.2.sif docker://julia:alpine3.16

# move it to the containers folder, then we can run it in the web portal
mkdir -p ~/containers
mv julia.1.8.2.sif ~/containers

2. Prepare sbatch arguments for gpu usage

3.1 First, find the gpu partition, and then open the nodes tab.

3.2 In this popup, we can find every gpu and their availability in this cluster. For example, there are 4 gpu with 1 gpu core and 10gb memory, 1 gpu with 3 gpu core and 40gb memory, 1 gpu with entire a100 capability.

4. Execute cmd in container by using srun cmd or using sbatch script file. In this example, we use 1 a100 gpu to run nvaccelinfo. The result should show us the gpu info.

4.1 With srun cmd

srun -p gpu --gpus a100:1 singularity exec --nv /pfss/containers/nvhpc.22.9-devel-cuda_multi-ubuntu20.04.sif /bin/sh -c nvidia-smi

4.2 With sbatch script

#!/usr/bin/env bash

#SBATCH -p gpu
#SBATCH --gpus a100:1

singularity exec --nv /pfss/containers/nvhpc.22.9-devel-cuda_multi-ubuntu20.04.sif /bin/sh -c nvaccelinfo