Skip to content

Instantly share code, notes, and snippets.

@miratcan
Created June 30, 2011 18:34
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save miratcan/1056873 to your computer and use it in GitHub Desktop.
Save miratcan/1056873 to your computer and use it in GitHub Desktop.
a Crawler for komikaze.net
"""
Mirat Can Bayrak / 2009
"""
from urllib import urlopen, urlretrieve
from datetime import date as Date
from datetime import timedelta
from xml.dom import minidom
from os.path import basename
import re
ONE_DAY = timedelta(1)
def asstring(date):
return "%s%s%s" % (date.year, date.month, date.day)
def nextday(date):
return date + ONE_DAY
day = Date(2006,1,1)
today = Date.today()
while day != today:
url = "http://www.komikaze.net/Default.asp?gun=%s" % asstring(day)
print "checking url :", url
page = urlopen(url).read()
images = re.findall('/karikaturler/.*.jpg', page)
if images:
for image in images:
image = "http://www.komikaze.net/komikaze/" + image
urlretrieve(image, basename(image))
print image, "downloaded"
else:
print "no image found"
day = nextday(day)
@cevheroglu
Copy link

Guzel siteydi be, lisedeyken home page imdi :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment