Skip to content

Instantly share code, notes, and snippets.

@cab938
Last active April 4, 2018 21:46
Show Gist options
  • Save cab938/a8a2f3d636e2d80d8b2d3f92104ef9d1 to your computer and use it in GitHub Desktop.
Save cab938/a8a2f3d636e2d80d8b2d3f92104ef9d1 to your computer and use it in GitHub Desktop.
#!pip install html5lib #install html5lib, only needs to be run once
import pandas as pd
import numpy as np
df=pd.read_html('https://proxy.mentoracademy.org/getContentFromUrl/?userid=brooks&url=https://en.wikipedia.org/wiki/List_of_highest-grossing_Indian_films', header=0)
bollywood=df[5].head(10)['Worldwide gross']
tollywood=df[13].head(10)['Worldwide gross']
bolly_top=np.sum(bollywood.apply(lambda x: x.split('₹')[1].split(' ')[0].replace(',','')).astype(float))
tolly_top=np.sum(tollywood.apply(lambda x: x[1:x.find(' ')].replace(',','')).astype(float))
size_ratio=bolly_top/tolly_top
print(size_ratio)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment