Skip to content

Instantly share code, notes, and snippets.

@alfredplpl
Created October 9, 2023 06:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alfredplpl/c13296fb2ceaf84a31a15f9599b90fd9 to your computer and use it in GitHub Desktop.
Save alfredplpl/c13296fb2ceaf84a31a15f9599b90fd9 to your computer and use it in GitHub Desktop.
from datasets import load_dataset
import requests
from PIL import Image
from tqdm import tqdm
dataset = load_dataset("laion/dalle-3-dataset",split="train")
for i,row in enumerate(tqdm(dataset)):
with open(f"dalle3/{i:06}.txt","w") as f:
f.write(row["caption"])
img_url = row["link"]
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
raw_image.save(f"dalle3/{i:06}.png")
@alfredplpl
Copy link
Author

alfredplpl commented Oct 9, 2023

If you want to save the images with more compression, change the code:

- raw_image.save(f"dalle3/{i:06}.png")
+ raw_image.save(f"dalle3/{i:06}.webp", quality=90)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment