Skip to content

Instantly share code, notes, and snippets.

@gilesc
Created April 28, 2011 01:26
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gilesc/945614 to your computer and use it in GitHub Desktop.
Save gilesc/945614 to your computer and use it in GitHub Desktop.
twss
(require '[net.cgrand.enlive-html :as html])
(defn fetch-url [url]
(html/html-resource
(java.net.URL. url)))
(defn fetch-twss [n]
(drop-last 2
(map #(second (re-find #"\"(.+?)\"" (html/text %)))
(html/select
(fetch-url (str "http://twssstories.com/node?page=" n))
[:p]))))
(defn fetch-fml [n]
(flatten
(map #(seq (.split (subs % 7 (- (count %) 4)) "[.] "))
(drop-last 2
(take-nth 3
(rest
(map html/text
(html/select
(fetch-url (str "http://www.fmylife.com/?page=" n)) [:p]))))))))
(defonce corpora
{:twss (flatten (map fetch-twss (range 1 212)))
:fml (flatten (map fetch-fml (range 1 80)))})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment