Skip to content

Instantly share code, notes, and snippets.

@jayspeidell
Last active July 18, 2023 12:23
  • Star 58 You must be signed in to star a gist
  • Fork 27 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save jayspeidell/d10b84b8d3da52df723beacc5b15cb27 to your computer and use it in GitHub Desktop.
Sample script to download Kaggle files
# Info on how to get your api key (kaggle.json) here: https://github.com/Kaggle/kaggle-api#api-credentials
!pip install kaggle
api_token = {"username":"USERNAME","key":"API_KEY"}
import json
import zipfile
import os
with open('/content/.kaggle/kaggle.json', 'w') as file:
json.dump(api_token, file)
!chmod 600 /content/.kaggle/kaggle.json
!kaggle config path -p /content
!kaggle competitions download -c jigsaw-toxic-comment-classification-challenge
os.chdir('/content/competitions/jigsaw-toxic-comment-classification-challenge')
for file in os.listdir():
zip_ref = zipfile.ZipFile(file, 'r')
zip_ref.extractall()
zip_ref.close()
@XiaohanYa
Copy link

Thanks for sharing! However, I wonder if there is any way that can use the dataset without downloading it? Because some of the datasets are quite large, like 100GB+.

Maybe you could try it on google colab. It's quite fast.

@dhosco
Copy link

dhosco commented Dec 12, 2019

###Hi guys!
Can someone actually put all this together? I am a newby and I like using colab and I want to be able to do everything straight from google colab. Also is there a possibility to not download the locally?

@Adnan-Toky
Copy link

I found this gist today, because I wanted to download kaggle dataset into google colab, but I ended up using a different approach, I hope it helps:

!mkdir /root/.kaggle
!echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json

This way you don't need to change kaggle json file directory in the kaggle settings
and now you can use kaggle to do whatever you need!

!kaggle datasets download -d owner/dataset-slug

This is simple and working !! Thanx man !

@arthurcotaf
Copy link

Encontrei essa essência hoje, porque queria fazer o download do conjunto de dados kaggle no google colab, mas acabei usando uma abordagem diferente, espero que ajude:

!mkdir /root/.kaggle
!echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json

Dessa forma, você não precisa alterar o diretório de arquivos do kaggle json nas configurações do kaggle
e agora pode usar o kaggle para fazer o que precisar!

!kaggle datasets download -d owner/dataset-slug

É possível salvar o arquivo de download para usar em uma variável?

@ayushxx7
Copy link

ayushxx7 commented Oct 29, 2020

Setup and Download dataset

Imports

import json
import zipfile
import os
!pip install kaggle
api_token = {"username":"---Your Username","key":"Your API Key"}
!mkdir -p ~/.kaggle
with open('kaggle.json', 'w') as file:
    json.dump(api_token, file)
!cp kaggle.json ~/.kaggle/
!ls ~/.kaggle
!chmod 600 /root/.kaggle/kaggle.json
!kaggle datasets download -d heeraldedhia/groceries-dataset
  • The dataset will now be present in the /content/ folder (you can see it using os.listdir())

Further, to extract the dataset,

for file in os.listdir():
    if '.zip' in file:
      zip_ref = zipfile.ZipFile(file, 'r')
      zip_ref.extractall()
      zip_ref.close()
  • This will also place the files directly inside the /content/ folder

@Sebastian-ctr
Copy link

Sebastian-ctr commented Mar 18, 2021

just set the variables...

#Set the enviroment variables
import os
os.environ['KAGGLE_USERNAME'] = "xxxx"
os.environ['KAGGLE_KEY'] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
!kaggle datasets download -d iarunava/happy-house-dataset

thanks, It works but when I participate in competition:
import os
os.environ['KAGGLE_USERNAME'] = "xxxx"
os.environ['KAGGLE_KEY'] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
!kaggle competition download -c xxxxxxxxxx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment