Skip to main content

Running the Vicuna-13B Chatbot with FastChat

Introduction

The Vicuna-13B chatbot is an open-source conversational AI model trained using fine-tuning LLaMA on user-shared conversations collected from ShareGPT. It has demonstrated remarkable performance, surpassing other models such as OpenAI ChatGPT, Google Bard, LLaMA, and Stanford Alpaca in more than 90% of cases. This IT tutorial will guide you through initializing the environment and running the Vicuna-13B chatbot using the FastChat inference software.

Model and Software References:

Installation and Setup

# Create conda environment
# conda create -n [env_name]
conda create -n chatbotDemo
# source activate [env_name]
source activate chatbotDemo

# Install required packages
conda install pip
pip3 install fschat

Run chatbot

After creating the conda environment, you can activate it at any time by running:

# source activate [env_name]
source activate chatbotDemo
Single GPU Case

To run the Vicuna-13B chatbot using a GPU (requires 28GB of GPU memory), execute the following command:

# request 4 core, 50 GB RAM, 3g.40gb GPU resource with interactive shell
srun -p gpu --gpus 3g.40gb:1 -c 4 --mem 50000 --pty bash

source activate chatbotDemo
python3 -m fastchat.serve.cli --model-path /pfss/toolkit/vicuna-13b --style rich

Screenshot from 2023-05-23 17-40-57.png

CPU-Only Case

If you prefer to run the chatbot on a CPU (requires around 60GB of CPU memory), follow these steps:

# request 4 core, 70 GB resource with interactive shell
srun -p batch -c 4 --mem 70000 --pty bash

source activate chatbotDemo
python3 -m fastchat.serve.cli --model-path /pfss/toolkit/vicuna-13b --style rich

Screenshot from 2023-05-23 18-21-11.png

Conclusion

Following these steps, you can successfully set up and run the Vicuna-13B chatbot using the FastChat inference software. Feel free to explore fine-tuning the model and evaluating the chatbot using the resources available on the Vicuna-13B website (ref: [https://lmsys.org/blog/2023-03-30-vicuna/]).