hexcowboy/chatgpt.md

## chatgpt.md

      
    Raw
  

              chatgpt.md
            
          
    Started by asking it to write python code with an example from the Resevoir docs
can you turn this into a python3 request

curl --request GET \
     --url 'https://api.reservoir.tools/tokens/v6?collection=0x3Fe1a4c1481c8351E91B64D5c398b159dE07cbc5&sortBy=tokenId&sortDirection=asc&limit=10&includeAttributes=true' \
     --header 'accept: */*' \
     --header 'x-api-key: demo-api-key'

Asked it to iterate over the 20k NFT collection
can you make it iterate 100 at a time, from 0-20,000

Told it a bit about how the response was formatted and how it could interface with the API
so actually instead of providing "offset", each response will provide a continuation key in the form of

{
    ...rest of response
    "continuation": "abcdef123"
}

which you must use in the subsequent request params as "continuation": "abcdef123"

Now that it has a loop set up properly, I told it about the business logic I wanted it to include
great. lets exclude the "includeAttributes" param. the response will have the following shape

{
    "tokens": [
        {
            "token": {
                "tokenId": "0",
                "image": "https://i.seadn.io/gcs/files/9acb975358b9caf654f9103f309c1b3e.png?w=500&auto=format"
            },
        },
    ],
    "continuation": "MHhkNjY4YTJlMDAxZjMzODViOGJiYzVhODY4MmFjM2MwZDgzYzE5MTIyXzk5",
}

there will be many `tokens`. you will need to go through each token on each request to get the `token.image`, download the image, and output it to a folder called "./images/". name the image by the `token.tokenId`, so the resulting file should be at `./images/0.

some of the `token.images` will be gif and some will be .avif, so you will need to first determine which one it is, and then convert it to a webp image when you save the file.

also, you should resize the images so they are 100x100 pixels.

I noticed a bug, so I asked it to fix it
i found a bug. to fix it, you have to check if `token.image` is None, and if it is, use `token.collection.image` instead

It was running very slowly so I asked it to run the downloads in parallel. This is usually a relatively big engineering task.
this works great. can you please make it async so it can download images in parallel?

Once it did this, I just had it fix a few smaller bugs and it was set to go. In total, I spent 20 minutes prompting ChatGPT before it completely build this scraper. I probably would've spent 1-2 hours writing this myself and a junior dev would probably have spent a day or more.

  
## scraper.py
import asyncio
import io
import os
import sys

import aiohttp
import PIL
from PIL import Image

url = "https://api.reservoir.tools/tokens/v6"
headers = {"accept": "*/*", "x-api-key": "demo-api-key"}
continuation_key = None
image_folder = "./images/"

# Create the image folder if it doesn't exist
if not os.path.exists(image_folder):
    os.makedirs(image_folder)


async def download_image(session, token_id, image_url):
    try:
        async with session.get(image_url) as response:
            if response.status == 200:
                image_data = await response.read()

                try:
                    # Open the image using PIL
                    image = Image.open(io.BytesIO(image_data))

                    # Convert the image to WebP format
                    image = image.convert("RGB")
                    image_format = "webp"

                    # Resize the image to 100x100 pixels
                    image = image.resize((100, 100), Image.LANCZOS)

                    # Save the image to the specified folder
                    image_path = os.path.join(
                        image_folder, f"{token_id}.{image_format}"
                    )
                    image.save(image_path, format=image_format)
                    print(f"Saved image {token_id}.{image_format}")

                except PIL.UnidentifiedImageError:
                    print(f"Error: Failed to identify image file for token {token_id}")

            else:
                print(
                    f"Failed to download image for token {token_id}. Status: {response.status}"
                )

    except aiohttp.ClientError as e:
        print(f"Error downloading image for token {token_id}: {e}")


async def process_tokens(tokens):
    async with aiohttp.ClientSession() as session:
        tasks = []
        for token in tokens:
            token_id = token.get("token", {}).get("tokenId")
            image_url = token.get("token", {}).get("image")

            if image_url is None:
                image_url = token.get("token", {}).get("collection", {}).get("image")

            if image_url is None:
                print(f"Token {token_id} has no image URL. Skipping...")
                continue

            task = asyncio.create_task(download_image(session, token_id, image_url))
            tasks.append(task)

        await asyncio.gather(*tasks)


async def main():
    global continuation_key  # Declare continuation_key as a global variable

    # Retrieve the filenames from the image folder
    processed_tokens = set()
    for filename in os.listdir(image_folder):
        token_id = os.path.splitext(filename)[0]
        processed_tokens.add(token_id)

    while True:
        params = {
            "collection": "0xd668A2E001f3385B8BBC5a8682AC3C0D83C19122",
            "sortBy": "tokenId",
            "sortDirection": "asc",
            "limit": 100,
        }

        if continuation_key:
            params["continuation"] = continuation_key

        async with aiohttp.ClientSession() as session:
            try:
                async with session.get(url, params=params, headers=headers) as response:
                    data = await response.json()

                    # Extract the continuation key from the response
                    continuation_key = data.get("continuation")

                    tokens = data.get("tokens", [])

                    # Skip tokens that have already been processed
                    tokens = [
                        token
                        for token in tokens
                        if token["token"]["tokenId"] not in processed_tokens
                    ]

                    await process_tokens(tokens)

                    # Update the set of processed tokens
                    processed_tokens.update(
                        token["token"]["tokenId"] for token in tokens
                    )

                    # Break the loop if no more continuation key is provided
                    if not continuation_key:
                        break

            except aiohttp.ClientError as e:
                print("Error:", e)
                sys.exit(1)


if __name__ == "__main__":
    asyncio.run(main())
	import asyncio
	import io
	import os
	import sys

	import aiohttp
	import PIL
	from PIL import Image

	url = "https://api.reservoir.tools/tokens/v6"
	headers = {"accept": "/", "x-api-key": "demo-api-key"}
	continuation_key = None
	image_folder = "./images/"

	# Create the image folder if it doesn't exist
	if not os.path.exists(image_folder):
	os.makedirs(image_folder)


	async def download_image(session, token_id, image_url):
	try:
	async with session.get(image_url) as response:
	if response.status == 200:
	image_data = await response.read()

	try:
	# Open the image using PIL
	image = Image.open(io.BytesIO(image_data))

	# Convert the image to WebP format
	image = image.convert("RGB")
	image_format = "webp"

	# Resize the image to 100x100 pixels
	image = image.resize((100, 100), Image.LANCZOS)

	# Save the image to the specified folder
	image_path = os.path.join(
	image_folder, f"{token_id}.{image_format}"
	)
	image.save(image_path, format=image_format)
	print(f"Saved image {token_id}.{image_format}")

	except PIL.UnidentifiedImageError:
	print(f"Error: Failed to identify image file for token {token_id}")

	else:
	print(
	f"Failed to download image for token {token_id}. Status: {response.status}"
	)

	except aiohttp.ClientError as e:
	print(f"Error downloading image for token {token_id}: {e}")


	async def process_tokens(tokens):
	async with aiohttp.ClientSession() as session:
	tasks = []
	for token in tokens:
	token_id = token.get("token", {}).get("tokenId")
	image_url = token.get("token", {}).get("image")

	if image_url is None:
	image_url = token.get("token", {}).get("collection", {}).get("image")

	if image_url is None:
	print(f"Token {token_id} has no image URL. Skipping...")
	continue

	task = asyncio.create_task(download_image(session, token_id, image_url))
	tasks.append(task)

	await asyncio.gather(*tasks)


	async def main():
	global continuation_key # Declare continuation_key as a global variable

	# Retrieve the filenames from the image folder
	processed_tokens = set()
	for filename in os.listdir(image_folder):
	token_id = os.path.splitext(filename)[0]
	processed_tokens.add(token_id)

	while True:
	params = {
	"collection": "0xd668A2E001f3385B8BBC5a8682AC3C0D83C19122",
	"sortBy": "tokenId",
	"sortDirection": "asc",
	"limit": 100,
	}

	if continuation_key:
	params["continuation"] = continuation_key

	async with aiohttp.ClientSession() as session:
	try:
	async with session.get(url, params=params, headers=headers) as response:
	data = await response.json()

	# Extract the continuation key from the response
	continuation_key = data.get("continuation")

	tokens = data.get("tokens", [])

	# Skip tokens that have already been processed
	tokens = [
	token
	for token in tokens
	if token["token"]["tokenId"] not in processed_tokens
	]

	await process_tokens(tokens)

	# Update the set of processed tokens
	processed_tokens.update(
	token["token"]["tokenId"] for token in tokens
	)

	# Break the loop if no more continuation key is provided
	if not continuation_key:
	break

	except aiohttp.ClientError as e:
	print("Error:", e)
	sys.exit(1)


	if __name__ == "__main__":
	asyncio.run(main())