Skip to content

Instantly share code, notes, and snippets.

@renkun-ken
Last active August 29, 2015 14:06
Show Gist options
  • Save renkun-ken/85aa6dff500196f82bb1 to your computer and use it in GitHub Desktop.
Save renkun-ken/85aa6dff500196f82bb1 to your computer and use it in GitHub Desktop.
Where do R's supportive members mainly come from?
library(pipeR) # https://github.com/renkun-ken/pipeR
library(rlist) # https://github.com/renkun-ken/rlist
library(rvest) # https://github.com/hadley/rvest
library(stringr) # https://github.com/hadley/stringr
# please ensure rvest is the latest dev version
Pipe("http://www.r-project.org/foundation/memberlist.html")$
html()$ # use xpath to scrape the name list
html_nodes(xpath = "//table[2]//td//text() | //table[3]//td//text()")$
html_text(trim = TRUE)$
str_match_all(".+\\s\\((.+)\\)")$ # select their nations
list.rbind()[,2]$ # combine to matrix and select nation column
str_split(", ")$ # some members have multiple nationalities
unlist()$
table()$
sort(decreasing = TRUE)$
head(10)$
barplot(main = "Where do R's supportive members mainly come from?")
@renkun-ken
Copy link
Author

Thanks @daroczig for sharing, that's really cool and amazing!

@pssguy
Copy link

pssguy commented Sep 13, 2014

Error in get(y, envir = parent.frame(), mode = "function") :
object 'html_node' of mode 'function' was not found

@renkun-ken
Copy link
Author

@pssguy, please update rvest package to the latest dev version at https://github.com/hadley/rvest, it has major API change from the early release. xpath() is no longer exported, and html_node() combines css selector and xpath selector together. html_text() extracts the text so there's no need for XML::xmlValue() here.

@renkun-ken
Copy link
Author

rvest is developed to the stage that httr is no longer required here since html() fully handles the downloading and parsing, so omit library(httr).

@pssguy
Copy link

pssguy commented Sep 13, 2014

Thanks. It's amazing what you can produce in one second now
Did you try the SelectorGadget in conjunction with html_node?

Although some of the country names do not line up, I used the library(choroplethr) to construct a map
Is there a way to pipe this (it would be nice to have it as say a side effect so as to produce a map and plot from same basic code)?

What I did is just have obj <- your code down to table() and added

df <- data.frame(tab$value)
names(df) <- c("region","value")
choroplethr(df,"world")

guess it could be more elegant

@renkun-ken
Copy link
Author

Thanks @pssguy for introducing to me the fantastic map package! It has a set of state names which unfortunately do not support alias (like USA). Some code shows how will it be organized with Pipe():

library(pipeR) # https://github.com/renkun-ken/pipeR
library(rlist) # https://github.com/renkun-ken/rlist
library(rvest) # https://github.com/hadley/rvest
library(stringr) # https://github.com/hadley/stringr

ptable <- Pipe("http://www.r-project.org/foundation/memberlist.html")$
  html()$ # use xpath to scrape the name list
  html_node(xpath = "//table[2]//td//text() | //table[3]//td//text()")$
  html_text(trim = TRUE)$
  str_match_all(".+\\s\\((.+)\\)")$ # select their nations
  list.rbind()[,2]$ # combine to matrix and select nation column
  str_split(", ")$ # some members have multiple nationalities
  unlist()$
  table()

ptable$sort(decreasing = TRUE)$
  head(10)$
  barplot(main = "Where do R's supportive members mainly come from?")  

library(choroplethr)

ptable$data.frame()$
  # some work to transform the data
  setNames(c("region","value"))$
  choroplethr("world")

@renkun-ken
Copy link
Author

If you really want to maintain the mainstream pipeline (plotting the chart) and branch it with a map, the following code works:

library(pipeR) # https://github.com/renkun-ken/pipeR
library(rlist) # https://github.com/renkun-ken/rlist
library(rvest) # https://github.com/hadley/rvest
library(stringr) # https://github.com/hadley/stringr
library(choroplethr)

ptable <- Pipe("http://www.r-project.org/foundation/memberlist.html")$
  html()$ # use xpath to scrape the name list
  html_node(xpath = "//table[2]//td//text() | //table[3]//td//text()")$
  html_text(trim = TRUE)$
  str_match_all(".+\\s\\((.+)\\)")$ # select their nations
  list.rbind()[,2]$ # combine to matrix and select nation column
  str_split(", ")$ # some members have multiple nationalities
  unlist()$
  table()$
  .(~ Pipe(.)$
      data.frame()$
      setNames(c("region","value"))$
      choroplethr("world") -> map)$
  sort(decreasing = TRUE)$
  head(10)$
  barplot(main = "Where do R's supportive members mainly come from?")$
  .(~ print(map))

This requires pipeR v0.5 to allow -> for assignment or otherwise you can only use less elegant = :)

@pssguy
Copy link

pssguy commented Sep 15, 2014

Thanks very much
Yes Ari has done a good job on choroplethr and it works in shiny and rmarkdown which I don't believe some of the attractive rMaps options do. There is also a prob with country aliases e.g UK in my example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment