Audiogen experiment from facebook AudioCraft
AudioGen generates audio samples based on given descriptions. The follow code block provides instructions for creating a Conda environment and running the script.
module load Anaconda3/2022.05 GCCcore/11.3.0 FFmpeg/4.4.2
# Create conda environment
# conda create -n [env_name]
conda create -n audioCraft
# source activate [env_name]
source activate audioCraft
# Install required packages
conda install pip
pip3 install git+https://github.com/facebookresearch/audiocraft.git
srun --pty -p gpu --cpus-per-task=12 --gres=gpu:a100:1 --mem=100G bash
python3 audio_craft.py
audio_craft.py imports the necessary packages and loads the pre-trained model from our storage. It then sets parameters for the audio generation and provides three sample descriptions. The model generates audio based on these descriptions, and the resulting audio is saved to a file using loudness normalization.
## audio_craft.py
import torchaudio
from audiocraft.models import AudioGen
from audiocraft.data.audio import audio_write
model = AudioGen.get_pretrained('/pfss/toolkit/audio_craft_audiogen_medium_1.5b/snapshots/3b776a70d1d682d75e01ed5c4924ea31d156a62c/')
model.set_generation_params(duration=5) # generate 8 seconds.
descriptions = ['The sound of nails on a chalkboard in a noisy classroom', 'someone chew with their mouth open', 'sound of a car alarm going off repeatedly']
wav = model.generate(descriptions) # generates 3 samples.
for idx, one_wav in enumerate(wav):
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
AudioGen is an autoregressive transformer LM that synthesizes general audio conditioned on text (Text-to-Audio). Internally, AudioGen operates over discrete representations learnt from the raw waveform, using an EnCodec tokenizer.
Textually Guided Audio Generation by Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi.moduleAudioGen loadwas Anaconda3/2022.05presented GCCcore/11.3.0at FFmpeg/4.4.2AudioGen:
variant of the original AudioGen model that follows MusicGen architecture. More specifically, it is trained over a 16kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz with a delay pattern between the codebooks. Having only 50 auto-regressive steps per second of audio, this AudioGen model allows faster generation while reaching similar performances to the original AudioGen model introduced in the paper.#AudioGen Create1.5B condais environmenta
# conda create -n [env_name] conda create -n audioCraft # source activate [env_name] source activate audioCraft # Install required packages conda install pip pip3 install git+https://github.com/facebookresearch/audiocraft.git srun --pty -p gpu --cpus-per-task=12 --gres=gpu:a100:1 --mem=100G bash python3 audio_craft.py
module load Anaconda3/2022.05 GCCcore/11.3.0 FFmpeg/4.4.2 # Create conda environment # conda create -n [env_name] conda create -n audioCraft # source activate [env_name] source activate audioCraft # Install required packages conda install pip pip3 install git+https://github.com/facebookresearch/audiocraft.git srun --pty -p gpu --cpus-per-task=12 --gres=gpu:a100:1 --mem=100G bash python3 audio_craft.py