Skip to content

Instantly share code, notes, and snippets.

@py-ranoid
Last active January 14, 2019 07:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save py-ranoid/4acb07b12d59b945d5d042fa55715989 to your computer and use it in GitHub Desktop.
Save py-ranoid/4acb07b12d59b945d5d042fa55715989 to your computer and use it in GitHub Desktop.
Downloading Google CodeIn tasks without an API Key.
"""
Note : This approach is an alternative to using the API for fetching instance info by
saving the webpages in ~/Downloads/ instead and using BeautifulSoup for parsing the data.
Go to https://codein.withgoogle.com/dashboard/task-instances/?sp-order=name&sp-my_tasks=false&sp-page_size=100
Iterate over all pages (1, 2, 3...) and save them.
"""
from glob import glob
from bs4 import BeautifulSoup as soup
import pandas as pd
all_rows = []
for fname in glob("/Users/vishalgupta/Downloads/Task instances _ Google Code-in *.htm"):
with open(fname) as f:
cont = f.read()
s = soup(cont)
rows = s.select('md-table-container tbody tr')
for row in rows:
vals = [i.text.strip() for i in row.select('td') if i.text.strip()]
all_rows.append(vals)
col_names = [i.text.strip() for i in s.select('md-table-container th') if i.text.strip()]
df = pd.DataFrame(all_rows,columns=col_names)
df.to_csv("GCI_instance_dump.csv")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment