Here are instructions for generating an SRT file with the transcript of a media file given the URL of that file. In order to achive our goal we're going to use Google Cloud Run(GCRun, a part of the Google Cloud(GC) enviorenmentent (avaiable in "Always free tier)
Credits
Some part of this instructions, as well as the original idea has been obtained from vxlabs
- Create a Google Cloud Project or select one: https://console.cloud.google.com/
- Enable Speech To Text API for your project
- Create a new bucket on the storage section for your project
- Download The SRT generator and edit the app-config.json specifying the name of the bucket you just created
- Use the following button to create a new Run service:
5.1. During deployment you also have to define the enviorenment varable BUCKET with the name of the bucket you wish to use as a temporary storage.
5.2. Edit the just-created service. You may use 1GB of RAM or more instead of the default 512MB. Be sure to change the timeout to the maximum avaiable as well
Problem: Cloud run have a limited duraction of 15 minutes max. And the speech to text computing may take longer than it. That means that if the audio file is long (I don't know how much for sure but longer than 50 minutes) the cloud function may not be able to generate the SRT and extra manual steps might be followed
- Use the SRT Generator Frontend (source code) to craft the petition. Make sure the "download only" is not pressed. Repeat from this step once per file
- Wait for the Web Page to alert you when SRT is generated, you may copy the text or use the download button to download the SRT file. 7.1. If the page doesn't answer after 15 minutes the function has timeouted and you have to follow from the stem Number 7 of "For long clips"
- Use the SRT Generator Frontend (source code) to craft the petition. Make sure the "download only" is pressed. The "words" input will have no effect.
- Create a service account for your project and download the JSON key. Please make sure to grant Storage and Speech to text permissions to the account.
- Open the Cloud shell and upload the JSON key.
- Set the Authoritacion env using
export GOOGLE_APPLICATION_CREDENTIALS=~/auth.json
. You must do this once per session. - Set the petition config using a JSON file. Repeat from this step once per file
{
"config": {
"encoding": "FLAC",
"language_code": "en-US",
"enableWordTimeOffsets": true,
"speechContexts": [
{
"phrases": [
"word1"
]
}
]
},
"audio": {
"uri": "gs://your-bucket/your-file.flac"
}
}
(SpeechContexts can be omited)
-
Make the petition
curl -X POST -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" --data @petition.json "https://speech.googleapis.com/v1/speech:longrunningrecognize"
-
The call will return a long number. This is your job number
-
Use the job number to check the progress of the operation:
curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" "https://speech.googleapis.com/v1/operations/JOBNUMBER"
-
That call will tell you the percentage, once is ready the call will return the result, you can save that result to a file
curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" "https://speech.googleapis.com/v1/operations/3514284744675421582" > transcript.txt
-
Copy the contents of the response and paste them onto the SRT converter (source code) and click generate.
-
You may copy the text or use the download button to download the SRT file.
I tried to use "STR converter" but I got this error: