Run nemo-megatron-gpt-5B model with NVIDIA NeMo

Introduction

NVIDIA NeMo is a powerful toolkit designed for researchers working on various conversational AI tasks, including automatic speech recognition (ASR), text-to-speech synthesis (TTS), large language models (LLMs), and natural language processing (NLP). It aims to facilitate the reuse of existing code and pretrained models while enabling the creation of new conversational AI models. In this tutorial, we will explore NeMo's capabilities and learn how to use the Megatron-GPT 5B language model for language modeling tasks.

Model and Software References:

NVIDIA NeMo: [https://github.com/NVIDIA/NeMo]
nemo-megatron-gpt-5B: [https://huggingface.co/nvidia/nemo-megatron-gpt-5B]

Launch Jupyter Lab Job

Create a Jupyter Lab job with the following specifications:

CPU Cores: 4
Memory: 64 GB
GPU: 3g.40gb

Open your web browser and navigate to the Jupyter Lab web interface.

In the Jupyter Lab menu, open the Terminal.

Enabling the NeMo Container Kernel in Jupyter Lab

Execute the following commands in the Terminal:

cd $HOME
mkdir -p .local/share/jupyter/kernels/ngc.nemo.22.07
echo '
{
 "language": "python",
 "argv": ["/usr/bin/singularity",
   "exec",
   "--nv",
   "-B",
   "/run/user:/run/user",
   "/pfss/containers/ngc.nemo.22.07.sif",
   "python",
   "-m",
   "ipykernel",
   "-f",
   "{connection_file}"
 ],
 "display_name": "nemo22.07"
}
' > .local/share/jupyter/kernels/ngc.nemo.22.07/kernel.json

After adding the content to the kernel.json file, refresh your browser by pressing F5. You should now see "nemo22.07" under the Notebook section in Jupyter Lab Launcher.

Launch eval server

Execute the following command in the Terminal:

# set the TMPDIR environment variable
export TMPDIR=/pfss/scratch02/appcara/nlp/tmp

# start eval server with nemo-megatron-gpt-5B model by nemo container
singularity run --nv /pfss/containers/ngc.nemo.22.07.sif python /pfss/scratch02/appcara/nlp/NeMo/examples/nlp/language_modeling/megatron_gpt_eval.py gpt_model_file=/pfss/scratch02/appcara/nlp/nemo_gpt5B_fp16_tp1.nemo server=true tensor_model_parallel_size=1 trainer.devices=1 port=5556

Send prompts to the model

Copying the Jupyter Lab File:

# copy the jupyter example file into your home folder
cp $SCRATCH_APPCARA/nlp/nemo-megatron-gpt-template.ipynb $HOME

In the "File Browser" section of Jupyter Lab, locate the copied file and open it. Also change the kernel to nemo22.07.

Edit what you want to talk with chatbot in the "sentences" section of the file. Run the program.

Brief introduction to the cluster

Access the cluster

Builtin software

Finding help

Submit jobs

Fine tune your workload

Access your files

Manage your team

Custom software

Quick jobs

Jobs, quota, and setup alerts

Manage accounts and quotas

Billing, cost allocation and reports

Integrate your own workflow with job automation APIs

Run docker-based workload on HPC with GPU

Render 3D graphics with Blender

AI painting with stable diffusion

Run and train chatbots with OpenChatKit

PyTorch with GPU in Jupyter Lab using container-based kernel

Run NVIDIA-Merlin MovieLens Example in Jupyter Lab

Multinode PyTorch Model Training using MPI and Singularity

Running the Vicuna-33B/13B/7B Chatbot with FastChat

Run nemo-megatron-gpt-5B model with NVIDIA NeMo

Accelerating molecular dynamics simulations with MPI and GPU

Accelerate a simple C++ program with MPI and CUDA

Accelerate FASTQ to BAM conversion using GPU and Parabricks

Generate sound effect/music with Meta's AudioCraft

Introduce Nvidia Modulus Symbolic (Modulus Sym)

Nvidia Modulus Symbolic(Modulus Sym) Workflow and Example

Retrieval Augmentation Generation - Langchain integration with local LLM

Using 10x Genomics Cell Ranger

Creating JupyterLab in Kubernetes workspace

Insufficient disk space for Anaconda3

Run nemo-megatron-gpt-5B model with NVIDIA NeMo

Introduction

Model and Software References:

Launch Jupyter Lab Job

Enabling the NeMo Container Kernel in Jupyter Lab

Launch eval server

Send prompts to the model