Skip to content

Instantly share code, notes, and snippets.

@kurokikaze
Created April 4, 2014 07:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kurokikaze/9969866 to your computer and use it in GitHub Desktop.
Save kurokikaze/9969866 to your computer and use it in GitHub Desktop.
Получает с togif.me гифки с кол-вом просмотров и выдает в stdout в виде csv
from bs4 import BeautifulSoup
from urllib2 import urlopen
from urlparse import urljoin
import re
for num in range(1, 150):
url = "http://togif.me/catalog/" + str(num)
res = urlopen(url)
soup = BeautifulSoup(res.read())
links = soup.select('td a')
for link in links:
page_url = urljoin(url, link.attrs['href'])
page_res = urlopen(page_url)
page_soup = BeautifulSoup(page_res.read())
views = page_soup.select('div.image-date')[0].text
views_int = int(re.findall(r'\d+', views)[0])
img_src = urljoin(page_url, page_soup.select('img')[0].attrs['src'])
print str(views_int) + "," + img_src
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment