Skip to content

Instantly share code, notes, and snippets.

@tatianass
Last active March 17, 2019 04:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tatianass/b957bcbd7f8ef08ad52818d4b6023bfc to your computer and use it in GitHub Desktop.
Save tatianass/b957bcbd7f8ef08ad52818d4b6023bfc to your computer and use it in GitHub Desktop.
Code to download all Kaggle database references, including tags.
#!/usr/bin/python
from kaggle.api.kaggle_api_extended import KaggleApi
import csv, sys, os
# Authentificaiton
# Make sure to set your username and key in your enviroment variables.
api = KaggleApi()
api.authenticate()
fields = ['ref', 'title', 'tags', 'size', 'lastUpdated', 'downloadCount']
page = 1
with open('kaggle_datasets.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file, delimiter=';')
writer.writerow(fields) # writes header
# Writes information while there's still pages to search
while True:
try:
datasets = api.dataset_list(sort_by='hottest', size='all', file_type='all', license_name='all', page=page)
for i in datasets:
for tag in i.tags:
writer.writerow([i.ref, i.title, tag, i.size, i.lastUpdated, i.downloadCount])
page += 1
except Exception as e:
print('No more pages to load.')
break
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment