Skip to content

Instantly share code, notes, and snippets.

@ShapeLayer
Last active December 1, 2018 12:14
Show Gist options
  • Save ShapeLayer/f5dace144c54555c7c5f65ef532ddceb to your computer and use it in GitHub Desktop.
Save ShapeLayer/f5dace144c54555c7c5f65ef532ddceb to your computer and use it in GitHub Desktop.
Image scrapper
import urllib.request
import re
import datetime
title = re.compile(r'<title>(?P<title>(.*?))<\/title>')
regex = re.compile(r'<img[^>]*src="(?P<src>[0-9a-zA-Zㄱ-ㅣ가-힣(?=.*[!@#$%^*()\-_=+\\\|\[\]{};:\'",.<>\/?]*)"[^>]*>')
re_ext = re.compile(r'\.(bmp|gif|jpg|jpeg|png)$')
target = input()
req = urllib.request.Request(target, headers={'User-Agent': 'Mozilla/5.0'})
scrap = urllib.request.urlopen(req).read().decode('utf-8')
titleobj = title.search(scrap).group('title')
print('불러왔습니다: {}'.format(titleobj))
matchobj = regex.findall(scrap)
print('{}개 이미지 찾음: {}'.format(len(matchobj), titleobj))
for i in range(len(matchobj)):
print('{} 다운로드중..'.format(matchobj[i]))
try:
exten = re_ext.search(matchobj[i]).group()
urllib.request.urlretrieve(matchobj[i], '{}-{}{}'.format(titleobj, i+1, exten))
print('Done. {}/{}'.format(i+1, len(matchobj)))
except:
print('Failed. {}/{}'.format(i+1, len(matchobj)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment