Skip to content

Instantly share code, notes, and snippets.

@mokjpn
Created February 23, 2016 06:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mokjpn/7a3d02a73285ea32b0f1 to your computer and use it in GitHub Desktop.
Save mokjpn/7a3d02a73285ea32b0f1 to your computer and use it in GitHub Desktop.
Read MediaWiki's WikiTable text and convert it into a R dataframe
# Read MediaWiki's WikiTable text and convert it into a dataframe
lines <- readLines(file.choose())
columns <- NULL
row <- NULL
df <- data.frame()
ncol <- 0
for(line in lines) {
if((m <- sub("^\\! *(.*)$", "\\1", line)) != line){
columns <- append(columns, gsub(" +$","", gsub("^ +","",m)))
}
if((m <- regexpr("^\\|-", line)) != -1) {
df <- rbind(df, row)
row <- data.frame()
ncol <- 0
}
if((m <- sub("^\\| (.*)$","\\1", line)) != line) {
ncol <- ncol+1
#browser()
column <- data.frame(gsub(" +$","", gsub("^ +","",m)))
colnames(column) <- columns[ncol]
if(nrow(row) != 0)
row <- cbind(row, column)
else
row <- column
}
}
# resulted data frame
df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment