Created
April 29, 2012 01:50
-
-
Save flodel/2523265 to your computer and use it in GitHub Desktop.
[R] Scrape crantastic.org for most used packages
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(plyr) | |
library(XML) | |
# build a vector of URL pages we'll want to use | |
urls <- paste("http://crantastic.org/popcon?page=", 1:10, sep = "") | |
# scrape all the data from the URLs into one big data.frame | |
packages.df <- ldply(urls, function(url)readHTMLTable(url)[[1]]) | |
# turn the "Users" column from factor to numeric | |
packages.df$Users <- as.numeric(as.character(packages.df$Users)) | |
# sort by decreasing "Users" | |
packages.df <- arrange(packages.df, desc(Users)) | |
# print the 50 most used packages | |
head(packages.df$`Package Name`, 50) | |
# [1] ggplot2 plyr data.table reshape Sim.DiffProc | |
# [6] Sim.DiffProcGUI lme4 Hmisc lattice RODBC | |
# [11] randomForest xtable RColorBrewer stringr sp | |
# [16] RSQLite MASS foreign RTextTools quantmod | |
# [21] foreach xts reshape2 XML car | |
# [26] cacheSweave twitteR zoo rgl maxent | |
# [31] Matrix survival latticeExtra ape maptools | |
# [36] survey RMySQL vegan rJava nlme | |
# [41] xlsReadWrite tikzDevice PerformanceAnalytics rpart MCMCglmm | |
# [46] rgdal boot Rcmdr Rcpp fortunes |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment