Skip to content

Instantly share code, notes, and snippets.

@LuisMayo
Last active July 8, 2022 22:12
Show Gist options
  • Save LuisMayo/8e7b95dee866841b218e046ddebb4028 to your computer and use it in GitHub Desktop.
Save LuisMayo/8e7b95dee866841b218e046ddebb4028 to your computer and use it in GitHub Desktop.
Instructions to generate subtitles in SubRip(SRT) format for a given media

Here are instructions for generating an SRT file with the transcript of a media file given the URL of that file. In order to achive our goal we're going to use Google Cloud Run(GCRun, a part of the Google Cloud(GC) enviorenmentent (avaiable in "Always free tier)

Credits

Some part of this instructions, as well as the original idea has been obtained from vxlabs

Steps

  1. Create a Google Cloud Project or select one: https://console.cloud.google.com/
  2. Enable Speech To Text API for your project
  3. Create a new bucket on the storage section for your project
  4. Download The SRT generator and edit the app-config.json specifying the name of the bucket you just created
  5. Use the following button to create a new Run service: Run on Google Cloud

5.1. During deployment you also have to define the enviorenment varable BUCKET with the name of the bucket you wish to use as a temporary storage.

5.2. Edit the just-created service. You may use 1GB of RAM or more instead of the default 512MB. Be sure to change the timeout to the maximum avaiable as well

Problem: Cloud run have a limited duraction of 15 minutes max. And the speech to text computing may take longer than it. That means that if the audio file is long (I don't know how much for sure but longer than 50 minutes) the cloud function may not be able to generate the SRT and extra manual steps might be followed

For short clips.

  1. Use the SRT Generator Frontend (source code) to craft the petition. Make sure the "download only" is not pressed. Repeat from this step once per file
  2. Wait for the Web Page to alert you when SRT is generated, you may copy the text or use the download button to download the SRT file. 7.1. If the page doesn't answer after 15 minutes the function has timeouted and you have to follow from the stem Number 7 of "For long clips"

For long clips.

  1. Use the SRT Generator Frontend (source code) to craft the petition. Make sure the "download only" is pressed. The "words" input will have no effect.
  2. Create a service account for your project and download the JSON key. Please make sure to grant Storage and Speech to text permissions to the account.
  3. Open the Cloud shell and upload the JSON key.
  4. Set the Authoritacion env using export GOOGLE_APPLICATION_CREDENTIALS=~/auth.json. You must do this once per session.
  5. Set the petition config using a JSON file. Repeat from this step once per file
{
    "config": {
        "encoding": "FLAC",
        "language_code": "en-US",
        "enableWordTimeOffsets": true,
        "speechContexts": [
            {
                "phrases": [
                    "word1"
                ]
            }
        ]
    },
    "audio": {
        "uri": "gs://your-bucket/your-file.flac"
    }
}

(SpeechContexts can be omited)

  1. Make the petition curl -X POST -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" --data @petition.json "https://speech.googleapis.com/v1/speech:longrunningrecognize"

  2. The call will return a long number. This is your job number

  3. Use the job number to check the progress of the operation: curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" "https://speech.googleapis.com/v1/operations/JOBNUMBER"

  4. That call will tell you the percentage, once is ready the call will return the result, you can save that result to a file curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" "https://speech.googleapis.com/v1/operations/3514284744675421582" > transcript.txt

  5. Copy the contents of the response and paste them onto the SRT converter (source code) and click generate.

  6. You may copy the text or use the download button to download the SRT file.

@felipelalli
Copy link

Thank you! I'm sorry, I don't have the json anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment