Skip to content

Instantly share code, notes, and snippets.

@owainlewis
Last active August 29, 2015 13:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save owainlewis/8965668 to your computer and use it in GitHub Desktop.
Save owainlewis/8965668 to your computer and use it in GitHub Desktop.
Sitecore job parser
(ns mammoth.parsers.sitecore
(:require
[cheshire.core :as json]
[athena.core :as ath]
[mammoth.benchmark :refer [bench]]
[mammoth.crawler :as crawler]))
(defn job-list
"This function will return a map containing jobs parsed from Sitecore"
[location]
(let [sc "http://www.sitecore.net/Company/Job-Opportunities/%s.aspx"
selector ".documentlist"
document (ath/parse
(crawler/fetch (format sc location)))
el (ath/query-selector document selector)
titles (ath/query-selector el "h4 a")
descs (ath/query-selector el "p")]
(into []
(for [[link d] (map vector titles descs)]
(let [href (first (ath/get-attr link :href))
title (ath/text link)
description (ath/text d)]
{:title title
:href (str "http://www.sitecore.net" href)
:description description})))))
(defn run-parser
"Find all sitecore jobs running the crawl for each location
in parallel"
[]
(let [locations ["France" "United-Kindom" "Southeast-Asia"
"Belgium" "Denmark" "North-America"]]
(->> (doall (map #(future (job-list %)) locations))
(map deref)
flatten
(into []))))
(def all-jobs (comp json/generate-string run-parser))
(defn benchmark [] (bench all-jobs))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment