Skip to content

Instantly share code, notes, and snippets.

@TauPan
Created July 26, 2011 11:51
Show Gist options
  • Save TauPan/1106571 to your computer and use it in GitHub Desktop.
Save TauPan/1106571 to your computer and use it in GitHub Desktop.
Beginnings of a prototype scraper in racket for s-bahn-hamburg.de
#!/usr/bin/env racket
#lang scheme/base
;; downloading and compiling those requires will take ages, but only
;; for the first time!
(require srfi/1
srfi/13
net/url
(planet bzlib/http:1:0)
(planet neil/htmlprag:1:6)
;; (except-in (planet lizorkin/ssax:2:0/ssax)
;; ;; conflicts with srfi-1:
;; fold-right
;; fold
;; filter
;; cons*)
(planet lizorkin/sxml:2:1/sxml))
(define http_proxy (regexp-match #rx"^([Hh][Tt][Tt][Pp])(?:://)?([^:]+)(?::([0-9]+))?/?$" (getenv "http_proxy")))
(when http_proxy (current-proxy-servers (let ((p (cdr http_proxy)))
`((,(first p)
,(second p)
,(string->number (third p)))))))
(define web-url "http://www.s-bahn-hamburg.de/s_hamburg/view/index.shtml")
(define get-sxml (compose html->sxml http-get))
(define s-bahn-sxml (get-sxml web-url))
;; see http://www.metapaper.net/lisovsky/query/examples/xpath/ to get
;; what's happening here
(define verkehrsmeldungen
((sxpath '(// *text*)) ((txpath "//div[contains(@class, 'info-box context info')]")
s-bahn-sxml)))
(display (format "~a" verkehrsmeldungen))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment