Run nemo-megatron-gpt-5B model with NVIDIA NeMo
Introduction
NVIDIA NeMo is a powerful toolkit designed for researchers working on various conversational AI tasks, including automatic speech recognition (ASR), text-to-speech synthesis (TTS), large language models (LLMs), and natural language processing (NLP). It aims to facilitate the reuse of existing code and pretrained models while enabling the creation of new conversational AI models. In this tutorial, we will explore NeMo's capabilities and learn how to use the Megatron-GPT 5B language model for language modeling tasks.
Model and Software References:
- NVIDIA NeMo: [https://github.com/NVIDIA/NeMo]
- nemo-megatron-gpt-5B: [https://huggingface.co/nvidia/nemo-megatron-gpt-5B]
Launch Jupyter Lab Job
Create a Jupyter Lab job with the following specifications:
- CPU Cores: 4
- Memory: 64 GB
- GPU: 3g.40gb
Open your web browser and navigate to the Jupyter Lab web interface.
In the Jupyter Lab menu, open the Terminal.
Enabling the NeMo Container Kernel in Jupyter Lab
Execute the following commands in the Terminal:
cd $HOME
mkdir -p .local/share/jupyter/kernels/ngc.nemo.22.07
echo '
{
"language": "python",
"argv": ["/usr/bin/singularity",
"exec",
"--nv",
"-B",
"/run/user:/run/user",
"/pfss/containers/ngc.nemo.22.07.sif",
"python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"display_name": "nemo22.07"
}
' > .local/share/jupyter/kernels/ngc.nemo.22.07/kernel.json
After adding the content to the kernel.json file, refresh your browser by pressing F5. You should now see "nemo22.07" under the Notebook section in Jupyter Lab Launcher.
Launch eval server
Execute the following command in the Terminal:
# set the TMPDIR environment variable
export TMPDIR=/pfss/scratch02/appcara/nlp/tmp
# start eval server with nemo-megatron-gpt-5B model by nemo container
singularity run --nv /pfss/containers/ngc.nemo.22.07.sif python /pfss/scratch02/appcara/nlp/NeMo/examples/nlp/language_modeling/megatron_gpt_eval.py gpt_model_file=/pfss/scratch02/appcara/nlp/nemo_gpt5B_fp16_tp1.nemo server=true tensor_model_parallel_size=1 trainer.devices=1 port=5556
Send prompts to the model
Copying the Jupyter Lab File:
# copy the jupyter example file into your home folder
cp $SCRATCH_APPCARA/nlp/nemo-megatron-gpt-template.ipynb $HOME
In the "File Browser" section of Jupyter Lab, locate the copied file and open it. Also change the kernel to nemo22.07.
Edit what you want to talk with chatbot in the "sentences" section of the file. Run the program.