Skip to content

Instantly share code, notes, and snippets.

@hpoit
Last active April 8, 2019 17:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hpoit/7c6ce575fc5a4a274ed8c9cb6eab4a19 to your computer and use it in GitHub Desktop.
Save hpoit/7c6ce575fc5a4a274ed8c9cb6eab4a19 to your computer and use it in GitHub Desktop.
Cascadia.jl working webscraping example
# https://github.com/Algocircle/Cascadia.jl#webscraping-example
using Cascadia
using Gumbo
using HTTP
# using Requests (deprecated)
r = HTTP.get("http://stackoverflow.com/questions/tagged/julia-lang")
h = parsehtml(String(r.body))
qs = eachmatch(Selector(".question-summary"), h.root)
println("StackOverflow Julia Questions (votes answered? url)")
for q in qs
votes = nodeText(eachmatch(Selector(".votes .vote-count-post "), q)[1])
answered = length(eachmatch(Selector(".status.answered"), q)) > 0
href = eachmatch(Selector(".question-hyperlink"), q)[1].attributes["href"]
println("$votes $answered http://stackoverflow.com$href")
end
using JSON
JSON.json(qs) # returns StackOverflowError
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment