Skip to content

Instantly share code, notes, and snippets.

@pombredanne
Forked from paultag/scraper.clj
Created March 11, 2014 15:41
Show Gist options
  • Save pombredanne/9488388 to your computer and use it in GitHub Desktop.
Save pombredanne/9488388 to your computer and use it in GitHub Desktop.
#!/usr/bin/env hython
; not actually clojure ...
(import "urllib2")
(import-from "lxml" "html")
(defn form-pts-url [developer]
"Return the PTS URL for a given developer."
(+ "http://qa.debian.org/developer.php?login=" developer))
(defn digest-page [url]
"Take a URL and return an lxml.html page ready for scraping."
(.fromstring html (.read (.urlopen urllib2 url))))
(defn scrape-page [page]
"Scrape the PTS for bugs"
(do
(def xpath-base "//table[contains(@class, 'packagetable sortable')]")
(zip
(.xpath page (+ xpath-base "//a[@name]/text()"))
(.xpath page (+ xpath-base "//td[2]//a[contains(@href, 'bugs.debian.org')]/text()")))))
(print (scrape-page (digest-page (form-pts-url "paultag"))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment