Skip to content

Instantly share code, notes, and snippets.

@cblgh
Created October 13, 2016 20:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cblgh/9706ae2ff9e38001bfa8540659a2ddfa to your computer and use it in GitHub Desktop.
Save cblgh/9706ae2ff9e38001bfa8540659a2ddfa to your computer and use it in GitHub Desktop.
import urllib2
import sys
import json
import codecs
from bs4 import BeautifulSoup
sys.setrecursionlimit(2500)
html = urllib2.urlopen("http://wordsgalore.com/wordsgalore/languages/spanish/spanish1000.html").read()
base_url = "http://wordsgalore.com/wordsgalore/languages/spanish/"
soup = BeautifulSoup(html, "html.parser")
dictionary = []
for line in soup.table.find_all("a"):
dictionary.append({"spanish": line.contents[0], "english":
line.next_sibling, "audio": base_url + line.get("href")})
with codecs.open("spanish-source.json", "w", encoding="utf-8") as src:
src.write(json.dumps(dictionary, ensure_ascii=False))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment