Skip to content

Instantly share code, notes, and snippets.

@jiru
Forked from rshipp/forvo_scraper.sh
Last active May 21, 2023 01:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jiru/0f98401f1703119428f53b6aa65d9112 to your computer and use it in GitHub Desktop.
Save jiru/0f98401f1703119428f53b6aa65d9112 to your computer and use it in GitHub Desktop.
Scrape the highest rated MP3 from Forvo.com for a given word
#!/bin/bash
# Forvo scraper
language=${FORVO_LANG:-fr}
BASEURL="http://forvo.com/search/"
AUDIOURL="http://audio.forvo.com/audios/mp3/"
word=$1
if [[ -z $word ]]; then
echo "usage: "
echo "FORVO_LANG=languagecode ./forvo_scraper.sh myword"
echo "for example: "
echo "FORVO_LANG=fr ./forvo_scraper.sh chien"
echo "will save a single file named 'chien.mp3' in the current folder"
exit
fi
url="${BASEURL}${word}/${language}"
playurl="${BASEURL}${word}/#${language}"
file="$(wget -qO- "${url}" | grep 'onclick="Play(' | head -1 | sed "s/^.*Play(.*,'\([^']*\)','[^']*',.*$/\1/g" | base64 -d)"
wget -qO"${word}.mp3" "${AUDIOURL}${file}"
@UnsolvedCypher
Copy link

This is great, thank you! It looks like you can get multi-word phrases by replacing the spaces with underscores, perhaps this script could do that automatically?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment