GitHub Gist Comments Feed Generator in R (this is how much I hate Ruby)
# Roll your own GitHub Gist Comments Feed in R
library(xml2) # github version
library(rvest) # github version
library(stringr) # for str_trim & str_replace
library(dplyr) # for data_frame & bind_rows
library(pbapply) # free progress bars for everyone!
library(XML) # to build the RSS feed
who <- "hrbrmstr" # CHANGE ME!
# Grab the user's gist feed -----------------------------------------------
gist_feed <- sprintf("", who)
feed_pg <- read_xml(gist_feed)
ns <- xml_ns_rename(xml_ns(feed_pg), d1 = "feed")
# Extract the links & titles of the gists in the feed ---------------------
links <- xml_attr(xml_find_all(feed_pg, "//feed:entry/feed:link", ns), "href")
titles <- xml_text(xml_find_all(feed_pg, "//feed:entry/feed:title", ns))
#' This function does the hard part by iterating over the
#' links/titles and building a tbl_df of all the comments per-gist
get_comments <- function(links, titles) {
bind_rows(pblapply(1:length(links), function(i) {
# get gist
pg <- read_html(links[i])
# look for comments
ref <- tryCatch(html_attr(html_nodes(pg, "div.timeline-comment-wrapper a[href^='#gistcomment']"), "href"),
error=function(e) character(0))
# in theory if 'ref' exists then the rest will
if (length(ref) != 0) {
# if there were comments, get all the metadata we care about
author <- html_text(html_nodes(pg, "div.timeline-comment-wrapper"))
timestamp <- html_attr(html_nodes(pg, "div.timeline-comment-wrapper time"), "datetime")
contentpg <- str_trim(html_text(html_nodes(pg, "div.timeline-comment-wrapper div.comment-body")))
} else {
ref <- author <- timestamp <- contentpg <- character(0)
# bind_rows ignores length 0 tbl_df's
if (sum(lengths(list(ref, author, timestamp, contentpg))==0)) {
return(data_frame(title=titles[i], link=links[i],
ref=ref, author=author,
timestamp=timestamp, contentpg=contentpg))
comments <- get_comments(links, titles)
feed <- xmlTree("feed")
feed$addNode("id", sprintf("user:%s", who))
feed$addNode("title", sprintf("%s's gist comments", who))
feed$addNode("icon", "")
feed$addNode("link", attrs=list(href=sprintf("", who)))
feed$addNode("updated", format(Sys.time(), "%Y-%m-%dT%H:%M:%SZ", tz="GMT"))
for (i in 1:nrow(comments)) {
feed$addNode("entry", close=FALSE)
feed$addNode("id", sprintf("gist:comment:%s:%s", who, comments[i, "timestamp"]))
feed$addNode("link", attrs=list(href=sprintf("%s%s", comments[i, "link"], comments[i, "ref"])))
feed$addNode("title", sprintf("Comment by %s", comments[i, "author"]))
feed$addNode("updated", comments[i, "timestamp"])
feed$addNode("author", close=FALSE)
feed$addNode("name", comments[i, "author"])
feed$addNode("content", saveXML(xmlTextNode(as.character(comments[i, "contentpg"])), prefix=""),
rss <- str_replace(saveXML(feed), "<feed>", '<feed xmlns="">')
writeLines(rss, con="feed.xml")
<?xml version="1.0"?>
<feed xmlns="">
<title>hrbrmstr's gist comments</title>
<link href=""/>
<link href=""/>
<title>Comment by ateucher</title>
<content type="html">Very nice! Regarding the extreme values, is truncating them back to maximum the right thing to do? Or should they &amp;quot;wrap&amp;quot; into the other half of the globe (eg, rather than converting -186.0 to -179.99999, should it actually be 174.0?)
I ask this out of ignorance of the source of the errors...</content>
<link href=""/>
<title>Comment by hrbrmstr</title>
<name>hrbrmstr</name<name />
<content type="html">That&amp;apos;s a good question. I&amp;apos;m going to post this to the r-sig-geo list to get some feedback.</content>
<link href=""/>
<title>Comment by ateucher</title>
<content type="html"></content>
<link href=""/>
<title>Comment by ateucher</title>
<content type="html">So I don&amp;apos;t think that chopping it at 180 is the answer, as those values &amp;gt; 180 are actually &amp;apos;valid&amp;apos;, as Russia, Fiji, and Antarctica all cross the 180th meridian ( But I don&amp;apos;t know what the answer is - see the &amp;apos;software representation problems&amp;apos; in the Wikipedia article - we&amp;apos;re not alone :)
world &amp;lt;- map_data(&amp;quot;world&amp;quot;)
gg &amp;lt;- ggplot()
gg &amp;lt;- gg + geom_map(data=world, map=world,
aes(x=long, y=lat, map_id=region))
gg &amp;lt;- gg + xlim(c(170, 200)) + ylim(c(60, 70))
<link href=""/>
<title>Comment by hrbrmstr</title>
<content type="html">As I posted on Twitter (adding it here just for folks who stumble on this via my blog post) i totally knew I was DESTROYING THE EARTH with that hack ;-) rworldmap::getMap() has a cleaner shapefile for the world that doesn&amp;apos;t impact this, but I do need to do something about this before it becomes &amp;quot;a real thing&amp;quot; for folks. No replies from r-sig-geo yet but I&amp;apos;ll research over the weekend and see what I can come up with. It won&amp;apos;t be super-scary math, but i need to ensure I cover all the edge cases (no pun intended).</content>
<link href=""/>
<title>Comment by hadley</title>
<content type="html">There some good stuff on the general problem in</content>
<link href=""/>
<title>Comment by hrbrmstr</title>
<content type="html">heh. that site of Bostock&amp;apos;s always makes me dizzy. thx for that, tho. hopefully won&amp;apos;t be too hard to work around.</content>
<link href=""/>
<title>Comment by hrbrmstr</title>
<content type="html">This comment is solely to see if the IFTTT action is working</content>
<link href=""/>
<title>Comment by abresler</title>
<content type="html">This code is so clean just wanted to say nice!!!</content>
<link href=""/>
<title>Comment by bearloga</title>
<content type="html">Error in UseMethod(&amp;quot;html_nodes&amp;quot;) :
no applicable method for &amp;apos;html_nodes&amp;apos; applied to an object of class &amp;quot;c(&amp;apos;xml_document&amp;apos;, &amp;apos;xml_node&amp;apos;)&amp;quot;
:\ Have you seen that error?
P.S. My machine has:
Package Version
1 xml2 0.1.1
2 rvest 0.2.0
3 htmltools 0.2.6</content>
<link href=""/>
<title>Comment by hrbrmstr</title>
<content type="html">aye. i just made a note in the source.
rvest * 2015-06-21 Github (hadley/rvest@9461bc4) is what I&amp;apos;m using. I think i can tweak this, tho.</content>
<link href=""/>
<title>Comment by hrbrmstr</title>
<content type="html">and, it should work on stable and github versions</content>
<link href=""/>
<title>Comment by cpsievert</title>
<content type="html">I think you want xml2, not xml</content>
<link href=""/>
<title>Comment by hrbrmstr</title>
<content type="html">aye. thxk @cpsievert. v1 was beautiful. v2+ has been coded whilst catching up from being on vacation and dealing with the morning routine.</content>
<link href=""/>
<title>Comment by irichgreen</title>
<content type="html">Hi,
I&amp;apos;ve got an error message in the line number 9 code.
&amp;quot;us &amp;lt;- readOGR(&amp;quot;us_states_hexgrid.geojson&amp;quot;, &amp;quot;OGRGeoJSON&amp;quot;)&amp;quot;
Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
GDAL Error 3: Cannot open file &amp;apos;us_states_hexgrid.geojson&amp;apos;
Could you please resolve it?</content>
<link href=""/>
<title>Comment by bnjcbsn</title>
<content type="html">Curious about this error as well. Interesting topic.</content>
<link href=""/>
<title>Comment by hrbrmstr</title>
<content type="html">I really need to figure out how to get notices abt comments on gists
You need the latest gdal library and the a fresh install of rgdal
You need the shapefile referenced in the previous blog post. Here&amp;apos;s the link to said shapefile
I also added it here</content>
<link href=""/>
<title>Comment by timelyportfolio</title>
<content type="html">freaking awesome</content>
