Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@shantanuo
Created September 14, 2015 08:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shantanuo/55646e47602f140e82f5 to your computer and use it in GitHub Desktop.
Save shantanuo/55646e47602f140e82f5 to your computer and use it in GitHub Desktop.
a python script that will download and merge all excel files of village Amenities across all states
### Download the xlsx files in the range of 1,000 to 2,999
import urllib2, urllib, time
for i in xrange(1000, 3000):
filename='DCHB_Village_Release_'+str(i)+'.xlsx'
url = "http://www.censusindia.gov.in/2011census/dchb/"+filename
try:
urllib2.urlopen(url)
except urllib2.HTTPError, e:
print(e.code)
except urllib2.URLError, e:
print(e.args)
else:
urllib.urlretrieve(url, filename)
finally:
time.sleep(1)
### Open the excel files downloaded from the above script and load data as dataFrame for each file
import glob,os
import pandas as pd
dfList=[]
for myfile in glob.glob('/home/ubuntu/datameet/*.xlsx'):
print myfile.split(',')[0]
myvar=os.path.basename(myfile).split('.')[0]
myvar=pd.read_excel(myfile)
dfList.append(myvar)
### merge all dataframes into one
final=pd.concat(dfList)
### export all data to csv
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
final.to_csv('final_data.csv')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment