Skip to content

Instantly share code, notes, and snippets.

@sanchit-gandhi
Last active February 18, 2024 02:54
Show Gist options
  • Save sanchit-gandhi/781dd7003c5b201bfe16d28634c8d4cf to your computer and use it in GitHub Desktop.
Save sanchit-gandhi/781dd7003c5b201bfe16d28634c8d4cf to your computer and use it in GitHub Desktop.
The Whisper JAX demo can be used as an endpoint through the Gradio Client library. The transcription API takes as input the audio file you want to transcribe, as well as optional arguments such as the task (transcribe or translate) and whether to return timestamps.
from gradio_client import Client
API_URL = "https://sanchit-gandhi-whisper-jax.hf.space/"
# set up the Gradio client
client = Client(API_URL)
def transcribe_audio(audio_path, task="transcribe", return_timestamps=False):
"""Function to transcribe an audio file using the Whisper JAX endpoint."""
if task not in ["transcribe", "translate"]:
raise ValueError("task should be one of 'transcribe' or 'translate'.")
text, runtime = client.predict(
audio_path,
task,
return_timestamps,
api_name="/predict_1",
)
return text
# transcribe an audio file using the Whisper JAX endpoint
output = transcribe_audio("audio.mp3")
# transcribe and return timestamps
output_with_timestamps = transcribe_audio("audio.mp3", return_timestamps=True)
@sanchit-gandhi
Copy link
Author

To use the Gradio Client library, ensure you have the latest version of the package installed:

pip install --upgrade gradio_client

@silvacarl2
Copy link

hi, we have 12M names and we would like to fine tune whisper on them. also, i am happy to share with you the results.

the question is it better to fine tune whisper using the entire spoken name? Or is it better to fine tune using invidial names and recording snippets of each anme spoken?

@shanky100
Copy link

websockets.exceptions.ConnectionClosedError: no close frame received or sent

I am getting the above mentioned error when I run the provided code.

from gradio_client import Client

API_URL = "https://sanchit-gandhi-whisper-jax.hf.space/"

set up the Gradio client

client = Client(API_URL)

def transcribe_audio(audio_path, task="transcribe", return_timestamps=False):
"""Function to transcribe an audio file using the Whisper JAX endpoint."""
if task not in ["transcribe", "translate"]:
raise ValueError("task should be one of 'transcribe' or 'translate'.")

text, runtime = client.predict(
    audio_path,
    task,
    return_timestamps,
    api_name="/predict_1",
)
return text

transcribe an audio file using the Whisper JAX endpoint

output = transcribe_audio("audio.mp3")

transcribe and return timestamps

output_with_timestamps = transcribe_audio("audio.mp3", return_timestamps=True)

Is this for me or anyone else is facing this issue.

@heroaltman
Copy link

Can we do it with only requests instead of using gradio_client why do we need that can we do that ?

@abdshomad
Copy link

I leave a working sample here: https://colab.research.google.com/drive/11J39vhEu-JfPOoMK9P_d9lU7f_9ZHJ_w hope others find it helpful.

@Molotov79
Copy link

@abdshomad @sanchit-gandhi can you please make available whisper jax with diarization because i testes it on a 40min icelandic audio and it always return error

@ashishb
Copy link

ashishb commented Feb 18, 2024

@sanchit-gandhi Any way to increase the timeout? The default seems to be 60 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment