dtinth/README.md

## README.md

      
    Raw
  

              README.md
            
          
    How to transcribe Thai speech in videos into text.
Requirements


Google Cloud or Firebase project with billing enabled.


gcloud command line tool installed.


ffmpeg or Docker.


youtube-dl to download YouTube videos.


30 Baht per 1 hour of input.


Step 1: Grab the audio track

Example, from YouTube, using youtube-dl:
youtube-dl -f bestaudio 'https://www.youtube.com/watch?v=..........'

Step 2: Convert

We need to convert a audio into a format that is supported by Google Cloud APIs.
We will use OGG Opus.
docker run -v "$PWD:/data" jrottenberg/ffmpeg -i "/data/<FILENAME>.m4a" -c:a libopus -ar 16000 -ac 1 "/data/<FILENAME>.ogg"

To cut a portion of audio, put -ss <START TIME> -t <DURATION> before -i. For example, -ss 01:38:23 -t 00:30:00.
Step 3: Recognize


Upload the ogg file to Google/Firebase Cloud Storage. After uploading, you will get a <STORAGE LOCATION> such as gs://<PROJECT>.appspot.com/transcribe/<FILENAME>.ogg.


Start the transcription:
gcloud ml speech recognize-long-running "<STORAGE LOCATION>" --language-code=th --encoding=ogg-opus --include-word-time-offsets --sample-rate=16000 --async
It will print out:
{
  "name": "5766027198115285298"
}
This is your <OPERATION ID>.


Wait for the operation to finish and write the results to the file.
gcloud ml speech operations wait "<OPERATION ID>" > "<FILENAME>.json"


View the JSON file.