Skip to content

Instantly share code, notes, and snippets.

@jacKlinc
Last active January 9, 2021 14:54
Show Gist options
  • Save jacKlinc/2aabda60fe8ea5ea2994d58c5fbb7699 to your computer and use it in GitHub Desktop.
Save jacKlinc/2aabda60fe8ea5ea2994d58c5fbb7699 to your computer and use it in GitHub Desktop.
Pull data from Kaggle dataset
from kaggle.api.kaggle_api_extended import KaggleApi
from zipfile import ZipFile
import pandas as pd
def get_kaggle_dataset(dataset, d_file, used_dtypes, usecols):
'''
Pass Kaggle dataset URL (user/dataset) and dataset file
Returns Pandas DataFrame for dataset
**your kaggle api key must be saved in .kaggle/kaggle.json
'''
# setup API connection
api = KaggleApi()
api.authenticate()
# download file
api.dataset_download_file(dataset, d_file)
zf = ZipFile(d_file+'.zip')
# extracted data is saved in the same directory as notebook
zf.extractall()
zf.close()
return pd.read_csv(d_file, dtype=used_dtypes, usecols=usecols)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment