Skip to content

Instantly share code, notes, and snippets.

@jayspeidell
Last active July 18, 2023 12:23
Show Gist options
  • Star 58 You must be signed in to star a gist
  • Fork 27 You must be signed in to fork a gist
  • Save jayspeidell/d10b84b8d3da52df723beacc5b15cb27 to your computer and use it in GitHub Desktop.
Save jayspeidell/d10b84b8d3da52df723beacc5b15cb27 to your computer and use it in GitHub Desktop.
Sample script to download Kaggle files
# Info on how to get your api key (kaggle.json) here: https://github.com/Kaggle/kaggle-api#api-credentials
!pip install kaggle
api_token = {"username":"USERNAME","key":"API_KEY"}
import json
import zipfile
import os
with open('/content/.kaggle/kaggle.json', 'w') as file:
json.dump(api_token, file)
!chmod 600 /content/.kaggle/kaggle.json
!kaggle config path -p /content
!kaggle competitions download -c jigsaw-toxic-comment-classification-challenge
os.chdir('/content/competitions/jigsaw-toxic-comment-classification-challenge')
for file in os.listdir():
zip_ref = zipfile.ZipFile(file, 'r')
zip_ref.extractall()
zip_ref.close()
@oldJonny
Copy link

Dude, where is the path '/content/.kaggle/kaggle.json'?
Is it hosted on my google drive?

@sherlock1453
Copy link

IOErrorTraceback (most recent call last)
in ()
4 import zipfile
5 import os
----> 6 with open('/content/.kaggle/kaggle.json', 'w') as file:
7 json.dump(api_token, file)
8 get_ipython().system(u'chmod 600 /content/.kaggle/kaggle.json')

IOError: [Errno 2] No such file or directory: '/content/.kaggle/kaggle.json'

getting this error from your code

@sherlock1453
Copy link

ypeErrorTraceback (most recent call last)
in ()
1 import zipfile
2 import os
----> 3 for file in os.listdir():
4 zip_ref = zipfile.ZipFile(file, 'r')
5 zip_ref.extractall()

TypeError: listdir() takes exactly 1 argument (0 given)

another error

@jagatfx
Copy link

jagatfx commented Sep 18, 2018

Read about Kaggle API https://github.com/Kaggle/kaggle-api#api-credentials for info on kaggle.json

@jayspeidell
Copy link
Author

Sorry I didn't read the replies earlier. You have to download the token from Kaggle's website, you can find all the info at the link jagatfx provided. Once you have the json you can use whatever file transfer method you like to automate getting it on the server you're working off of, but I don't think you can automate downloading it from Kaggle. I'l edit the Kaggle API link into the gist.

@mzmmoazam
Copy link

kaggle.json not foud error -- solution

!pip install kaggle
api_token = {"username":"username","key":"TOKEN_HERE"}
import json
import zipfile
import os
with open('/root/.kaggle/kaggle.json', 'w') as file:
    json.dump(api_token, file)
!kaggle datasets download -d iarunava/happy-house-dataset
if not os.path.exists("/content/competitions/happy-house-dataset"):
    os.makedirs("/content/competitions/happy-house-dataset")
os.chdir('/content/competitions/happy-house-dataset')
for file in os.listdir():
    zip_ref = zipfile.ZipFile(file, 'r')
    zip_ref.extractall()
    zip_ref.close()

@canivel
Copy link

canivel commented Jan 20, 2019

just set the variables...

#Set the enviroment variables
import os
os.environ['KAGGLE_USERNAME'] = "xxxx"
os.environ['KAGGLE_KEY'] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
!kaggle datasets download -d iarunava/happy-house-dataset

@ucalyptus
Copy link

To all guys still in doubt,
https://www.kaggle.com/general/51898

@bothmena
Copy link

I found this gist today, because I wanted to download kaggle dataset into google colab, but I ended up using a different approach, I hope it helps:

!mkdir /root/.kaggle
!echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json

This way you don't need to change kaggle json file directory in the kaggle settings
and now you can use kaggle to do whatever you need!

!kaggle datasets download -d owner/dataset-slug

@TosinJayeola
Copy link

Thanks bothmena

@bmusangu
Copy link

bmusangu commented Jun 24, 2019

Hello could someone help me figure out why I am getting the error message.
Here is the code I have.

import json
import zipfile
import os
!pip install kaggle
api_token = api_token = {"username":"username","key":"TOKEN_HERE"}
with open('/content/.kaggle/kaggle.json', 'w') as file:
json.dump(api_token, file)
!cp /content/.kaggle/kaggle.json ~/.kaggle/kaggle.json
!kaggle config set -n path -v/content
!chmod 600 /root/.kaggle/kaggle.json
!kaggle competitions download -c ga-customer-revenue-prediction
os.chdir('/content/competitions/ga-customer-revenue-prediction')
for file in os.listdir():
zip_ref = zipfile.ZipFile(file, 'r')
zip_ref.extractall()
zip_ref.close()

It downloads my files

  • path is now set to: /content
    sample_submission_v2.csv.zip: Skipping, found more recently modified local copy (use --force to force download)
    test_v2.csv.zip: Skipping, found more recently modified local copy (use --force to force download)
    train_v2.csv.zip: Skipping, found more recently modified local copy (use --force to force download)

But then I get this error message.

/usr/lib/python3.6/zipfile.py in _RealGetContents(self)
1196 raise BadZipFile("File is not a zip file")
1197 if not endrec:
-> 1198 raise BadZipFile("File is not a zip file")
1199 if self.debug > 1:
1200 print(endrec)

BadZipFile: File is not a zip file

Any suggestions? Thank you

@bmusangu
Copy link

I got it to work. By just breaking the code down. For some reason it wouldn't work with the code as a block in one cell

@zionverse
Copy link

I found this gist today, because I wanted to download kaggle dataset into google colab, but I ended up using a different approach, I hope it helps:

!mkdir /root/.kaggle
!echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json

This way you don't need to change kaggle json file directory in the kaggle settings
and now you can use kaggle to do whatever you need!

!kaggle datasets download -d owner/dataset-slug

This is simple and working !! Thanx man !

@bmusangu
Copy link

bmusangu commented Jul 9, 2019

I found this gist today, because I wanted to download kaggle dataset into google colab, but I ended up using a different approach, I hope it helps:

!mkdir /root/.kaggle
!echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json

This way you don't need to change kaggle json file directory in the kaggle settings
and now you can use kaggle to do whatever you need!

!kaggle datasets download -d owner/dataset-slug

This is simple and working !! Thanx man !

Nice! Cheers!

@Joycechidi
Copy link

Thanks so much, @bothmena. This clearly works. The easiest way to download kaggle dataset.

@McGregorWwww
Copy link

Thanks for sharing! However, I wonder if there is any way that can use the dataset without downloading it? Because some of the datasets are quite large, like 100GB+.

@ucalyptus
Copy link

does it work with private datasets?

@bothmena
Copy link

bothmena commented Nov 6, 2019

@ucalyptus It should work if you are authenticated, so you should be prompted every time you run the command to insert your credentials. You can probably overcome this by using an ssh key, but I don't recommend it especially if you will share the notebook with others.

Note that I did not try my method with private datasets.
Edit: @ucalyptus try using my method without any modifications, I believe it should work, because you're already using an API key for authentication.

@McGregorWwww I do not think it's possible to use Kaggle datasets on Google Colab without downloading them. If you wish to use datasets without downloading them your only option is to use Kaggle kernels.

@mdresaj
Copy link

mdresaj commented Nov 6, 2019

Once the data is downloaded using bothmena's method, how do you define it and actually begin to use it? I received the message saying the download was successful, but lack the ability to actually see/use the data now.

@bothmena
Copy link

bothmena commented Nov 7, 2019

@mdresaj try using this command !ls . it will show you all the directories in your current working directory, there you should see the files that the command downloaded.

@XiaohanYa
Copy link

Thanks for sharing! However, I wonder if there is any way that can use the dataset without downloading it? Because some of the datasets are quite large, like 100GB+.

Maybe you could try it on google colab. It's quite fast.

@dhosco
Copy link

dhosco commented Dec 12, 2019

###Hi guys!
Can someone actually put all this together? I am a newby and I like using colab and I want to be able to do everything straight from google colab. Also is there a possibility to not download the locally?

@Adnan-Toky
Copy link

I found this gist today, because I wanted to download kaggle dataset into google colab, but I ended up using a different approach, I hope it helps:

!mkdir /root/.kaggle
!echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json

This way you don't need to change kaggle json file directory in the kaggle settings
and now you can use kaggle to do whatever you need!

!kaggle datasets download -d owner/dataset-slug

This is simple and working !! Thanx man !

@arthurcotaf
Copy link

Encontrei essa essência hoje, porque queria fazer o download do conjunto de dados kaggle no google colab, mas acabei usando uma abordagem diferente, espero que ajude:

!mkdir /root/.kaggle
!echo '{"username":"USERNAME","key":"API_KEY"}' > /root/.kaggle/kaggle.json

Dessa forma, você não precisa alterar o diretório de arquivos do kaggle json nas configurações do kaggle
e agora pode usar o kaggle para fazer o que precisar!

!kaggle datasets download -d owner/dataset-slug

É possível salvar o arquivo de download para usar em uma variável?

@ayushxx7
Copy link

ayushxx7 commented Oct 29, 2020

Setup and Download dataset

Imports

import json
import zipfile
import os
!pip install kaggle
api_token = {"username":"---Your Username","key":"Your API Key"}
!mkdir -p ~/.kaggle
with open('kaggle.json', 'w') as file:
    json.dump(api_token, file)
!cp kaggle.json ~/.kaggle/
!ls ~/.kaggle
!chmod 600 /root/.kaggle/kaggle.json
!kaggle datasets download -d heeraldedhia/groceries-dataset
  • The dataset will now be present in the /content/ folder (you can see it using os.listdir())

Further, to extract the dataset,

for file in os.listdir():
    if '.zip' in file:
      zip_ref = zipfile.ZipFile(file, 'r')
      zip_ref.extractall()
      zip_ref.close()
  • This will also place the files directly inside the /content/ folder

@Sebastian-ctr
Copy link

Sebastian-ctr commented Mar 18, 2021

just set the variables...

#Set the enviroment variables
import os
os.environ['KAGGLE_USERNAME'] = "xxxx"
os.environ['KAGGLE_KEY'] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
!kaggle datasets download -d iarunava/happy-house-dataset

thanks, It works but when I participate in competition:
import os
os.environ['KAGGLE_USERNAME'] = "xxxx"
os.environ['KAGGLE_KEY'] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
!kaggle competition download -c xxxxxxxxxx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment