Skip to content

Instantly share code, notes, and snippets.

@AmgadHasan
Last active October 30, 2023 10:48
Show Gist options
  • Save AmgadHasan/389ca9772e4d505a0d1e9be693064b2e to your computer and use it in GitHub Desktop.
Save AmgadHasan/389ca9772e4d505a0d1e9be693064b2e to your computer and use it in GitHub Desktop.
How to Convert Whisper from HF's Transformer format into Ctranslate2 format (needed for FasterWhisper)
# Create a virtual environment named myenv
python3 -m venv myvenv
# Activate this venv
source myvenv/bin/activate
# Now the venv is activated, install the packages
pip install transformers ctranslate2
ct2-transformers-converter \
--model whisper-large-v2
--output_dir whisper-large-v2-ct2 \
--copy_files tokenizer_config.json \
--quantization float16

How to Convert Whisper from HF's Transformer format into Ctranslate2 format (needed for FasterWhisper)

TL;DR

# Create a virtual environment named myenv
python -m venv myvenv
# Activate this venv
source myvenv/bin/activate
# Now the venv is activated, install the packages
pip install transformers ctranslate2
ct2-transformers-converter \
    --model whisper-large-v2
    --output_dir whisper-large-v2-ct2 \
    --copy_files tokenizer_config.json \
    --quantization float16

Overview

Faster Whisper is a python package for running OpenAI's Whisper model efficiently. It allows you to transcribe (and translate) speech with lower memory requirements and lower latency. However, this package only supports Ctranslate2's models; it cannot use the Huggingface's Transformer's models. You need to manually convert these models from tranformers (pytorch) into ctranslate2. This way, you can use any of the finetuned whisper models available on Huggingface Hub

Dependencies

To be able to convert the models from HF's transformers into Ctranslate2, you need the following pacakges:

  1. transformers
  2. ctranslate2

That's all we need :) You can easily install them using pip as follow's:

pip install transformers ctranslate2

Note

It's generally recommended to create a python's virtual environment before installing these packages to prevent conflict. You can do that as follows:

# Create a virtual environment named myenv
python -m venv myvenv
# Activate this venv
source myvenv/bin/activate
# Now the venv is activated, install the packages
pip install transformers ctranslate2

Conversion

Now we can easily convert a model from transformers into ctranslate2. There are three steps to convert the model:

  1. Load the model into memory transformers format
  2. Convert it into Ctranslate2 format
  3. Save the converted model in ctranslate2 for later usage
  4. [Optional] Copy the tokenizer into the model directory for easier packaging

This can be done as follows: Assuming the transformers model is in a directory named whisper-large-v2 and we want to save it into a directory named whisper-large-v2-ct2 and add the tokenizer tokenizer_config.json to it:

ct2-transformers-converter \
    --model whisper-large-v2
    --output_dir whisper-large-v2-ct2 \
    --copy_files tokenizer_config.json \
    --quantization float16

Faster Whisper

Now we can easily use this model in Faster Whisper as follows:

from faster_whisper import WhisperModel

model_path = "whisper-large-v2-ct2"

# Load model on GPU with FP16
model = WhisperModel(model_path, device="cuda", compute_type="float16")

# Transcrive a wav file
segments, info = model.transcribe("83.wav", beam_size=1, language='ar', task="translate")

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

# Print transcript with timestamps
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment