Run docker-based workload on HPC with GPU

In this case study, we will walk thru how to convert a docker image into singularity format. How to look up appropriate hardware and finally enqueue a job.

Convert a docker image to Singularity

1. Prepare a container image by converting docker image from docker hub. For furter detail, please check ref. It would be recommended that using container that cuda installed. We provide several container images that cuda already installed and configuated, like nvhpc.22.9-devel-cuda_multi-ubuntu20.04.sif. You can check all the provided images under /pfss/containers/ directory.

singularity pull julia.1.8.2.sif docker://julia:alpine3.16

# move it to the containers folder, then we can run it in the web portal
mkdir -p ~/containers
mv julia.1.8.2.sif ~/containers

2. Prepare sbatch arguments for gpu usage

3.1 First, find the partition page, and then open the nodes tab.

3.2 In this popup, we can find all the gpus and their availability in this cluster. For example, there are 4 gpus with 1 gpu core and 10gb memory, 1 gpu with 3 gpu cores and 40gb memory, 1 gpu with entire a100 capability.

4. Execute cmd in container by using srun cmd or using sbatch script file. In this example, we use 1 a100 gpu to run nvaccelinfo. The result should show us the gpu info.

4.1 With srun cmd

srun -p gpu --gpus a100:1 singularity exec --nv /pfss/containers/nvhpc.22.9-devel-cuda_multi-ubuntu20.04.sif /bin/sh -c nvidia-smi

4.2 With sbatch script

#!/usr/bin/env bash

#SBATCH -p gpu
#SBATCH --gpus a100:1

singularity exec --nv /pfss/containers/nvhpc.22.9-devel-cuda_multi-ubuntu20.04.sif /bin/sh -c nvaccelinfo