Skip to content

Instantly share code, notes, and snippets.

@Quasimondo
Created March 10, 2021 11:23
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save Quasimondo/30416ce22243610a9c95424e8796b008 to your computer and use it in GitHub Desktop.
Save Quasimondo/30416ce22243610a9c95424e8796b008 to your computer and use it in GitHub Desktop.
This is a very basic no-frills scraper to retrieve the metadata and digital assets from all tokens minted on hicetnunc.xyz. I share this as a starting point for people who want to experiment with building alternative views on the works created on the platform or preserve the data. Feel free to improve upon this or add additional features.
import requests
import os
import ipfsApi
api = ipfsApi.Client(host='https://ipfs.infura.io', port=5001)
url = "https://better-call.dev/v1/contract/mainnet/KT1RJ6PbjHpwc3M5rw5s2Nbmefwbuwbdxton/tokens"
r = requests.get(url)
data = r.json()
format2suffix = {}
format2suffix['image/png'] = "png"
format2suffix['image/jpeg'] = "jpg"
format2suffix['video/mp4'] = "mp4"
format2suffix['image/gif'] = "gif"
format2suffix['video/quicktime'] = "mov"
format2suffix['image/svg+xml'] = "svg"
format2suffix['audio/mpeg'] = "mpg"
format2suffix['application/pdf'] = "pdf"
format2suffix['image/tiff'] = "tif"
format2suffix['video/avi'] = "avi"
format2suffix['image/webp'] = "webp"
format2suffix['image/bmp'] = "bmp"
format2suffix['video/x-matroska'] = "mkv"
format2suffix['video/webm'] = "webm"
assetFolder = "assets/"
os.makedirs(assetFolder,exist_ok=True)
os.chdir(assetFolder)
print(len(data),"tokens")
for i in range(len(data)):
#print ("token data",data[i])
if "token_info" in data[i]:
if "formats" in data[i]["token_info"]:
mimeType = data[i]["token_info"]["formats"][0]["mimeType"]
uri = data[i]["token_info"]["formats"][0]["uri"].split("ipfs://")[1]
if mimeType in format2suffix:
saveName = str(data[i]["token_id"])+"."+format2suffix[mimeType]
if not os.path.exists(saveName):
print("downloading",uri)
api.get(uri)
if os.path.exists(uri):
os.rename(uri,saveName)
else:
print("unknown mime type:",data[i]["token_info"]["formats"][0]["mimeType"])
else:
print ("incomplete token data:",data[i])
@vorg
Copy link

vorg commented Mar 10, 2021

On MacOS you might need to change ipfsApi to lowercase ipfsapi

import ipfsapi
api = ipfsapi.Client(host='https://ipfs.infura.io', port=5001)

And in order to run

pip3 install ipfsapi
pip3 install requests
python3 hic_et_nunc_basic_scraper.py

Please note that this is not recommended way of running it and one should probably use virtual environment like pyenv to manage dependencies.

@Quasimondo
Copy link
Author

There is now an improved version by Marcel Schwittlick that adds parallel downloads which should speed up the process quite a bit:
https://gist.github.com/schwittlick/a6a839b211060dcbf766d24b99e0ad1a

@Quasimondo
Copy link
Author

This will not work anymore with some of the latest changes to the api

@myownelixir2
Copy link

I was just looking at your script, it does not work as you noted, is there a reason though why it only pulls 10 last items? am I missing something? I am referring to this bit
url = "https://better-call.dev/v1/contract/mainnet/KT1RJ6PbjHpwc3M5rw5s2Nbmefwbuwbdxton/tokens"

@Quasimondo
Copy link
Author

Did you miss my comment right above yours? The API was changed a few weeks after I published this and this approach does not work anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment