Skip to content

Instantly share code, notes, and snippets.

@michelrandahl
Created July 3, 2016 11:37
Show Gist options
  • Save michelrandahl/86970924ff63a25e0b5e5ebc1c9f97a6 to your computer and use it in GitHub Desktop.
Save michelrandahl/86970924ff63a25e0b5e5ebc1c9f97a6 to your computer and use it in GitHub Desktop.
#r "../../packages/FSharp.Data/lib/net40/FSharp.Data.dll"
#r "System.Xml.Linq.dll"
open FSharp.Data
let base_url = "http://fsharp.github.io"
type HTML = HtmlProvider<"http://fsharp.github.io/FSharp.Data/index.html">
let html = HTML.GetSample()
html.Lists.Html.CssSelect "li"
|> Seq.skipWhile (fun el -> el.InnerText() <> "Documentation")
|> Seq.skip 3
|> Seq.takeWhile (fun el -> Seq.length(el.CssSelect "a") > 0)
|> Seq.map (fun el -> el.CssSelect "a" |> Seq.head)
|> Seq.map (fun el -> el.Attribute "href")
|> Seq.map (fun el -> el.Value())
|> Seq.map ((+) base_url)
|> Seq.map (HTML.AsyncLoad) // using the same type because all these pages look the same
|> Async.Parallel
|> Async.RunSynchronously
|> Seq.map (fun site -> site.Html.CssSelect "h1" |> Seq.head)
|> Seq.map (fun el -> el.InnerText())
|> List.ofSeq
|> printfn "%A"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment