Skip to content

Instantly share code, notes, and snippets.

@satyarth
Last active January 20, 2016 03:29
Show Gist options
  • Save satyarth/ee8e447df4e4d9344dbb to your computer and use it in GitHub Desktop.
Save satyarth/ee8e447df4e4d9344dbb to your computer and use it in GitHub Desktop.
Script to scrape the hi-res tiles of a painting from tretyakovgallery.ru
from bs4 import BeautifulSoup
import urllib.request
from PIL import Image
base_url = "http://www.tretyakovgallery.ru/"
with open('table.txt') as f:
html = f.read()
soup = BeautifulSoup(html)
urls = []
for row in soup.find_all('tr'):
row_urls = []
for tile in row.find_all('td'):
style = tile['style']
url = style[style.find("(")+2:style.find(")")-1]
row_urls.append(url)
urls.append(row_urls)
for i in range(len(urls)):
print(i)
for j in range(len(urls[0])):
print(j)
urllib.request.urlretrieve(base_url + urls[i][j], "img/"+str(i)+"_"+str(j)+".jpg")
xs = len(urls[0])
ys = len(urls)
imgs = []
for i in range(ys):
row = []
for j in range(xs):
row.append(Image.open("img/"+str(i)+"_"+str(j)+".jpg"))
imgs.append(row)
y_max = 0
for i in range(ys):
y_max += imgs[i][0].size[1]
x_max = 0
for j in range(xs):
x_max += imgs[0][j].size[0]
out = Image.new('RGB', (x_max, y_max))
y = 0
for i in range(ys):
x = 0
for j in range(xs):
out.paste(imgs[i][j], (x,y))
x += imgs[i][j].size[0]
y += imgs[i][0].size[1]
out.save("out.jpg")
@satyarth
Copy link
Author

Usage:

  1. Go to the painting you want to download, eg. http://www.tretyakovgallery.ru/en/collection/_show/image/_id/252
  2. Click on the image to bring up zoomed-in popup
  3. Right-click on the popup, then 'Inspect element'
  4. Scroll up to the first <tbody> tag you see. Right click it, and then 'Copy inner HTML'
  5. Paste what you just copied into a file called 'table.txt'
  6. Run the script in the same directory as table.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment