Last active
November 20, 2016 12:43
-
-
Save andrie/a9ff69a274963c70b72f to your computer and use it in GitHub Desktop.
Functions to read Google Docs spreadsheet into R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(XML) | |
cleanGoogleTable <- function(dat, table=1, skip=0, ncols=NA, nrows=-1, header=TRUE, dropFirstCol=NA){ | |
if(!is.data.frame(dat)){ | |
dat <- dat[[table]] | |
} | |
if(is.na(dropFirstCol)) { | |
firstCol <- na.omit(dat[[1]]) | |
if(all(firstCol == ".") || all(firstCol== as.character(seq_along(firstCol)))) { | |
dat <- dat[, -1] | |
} | |
} else if(dropFirstCol) { | |
dat <- dat[, -1] | |
} | |
if(skip > 0){ | |
dat <- dat[-seq_len(skip), ] | |
} | |
if(nrow(dat) == 1) return(dat) | |
if(nrow(dat) >= 2){ | |
if(all(is.na(dat[2, ]))) dat <- dat[-2, ] | |
} | |
if(header && nrow(dat) > 1){ | |
header <- as.character(dat[1, ]) | |
names(dat) <- header | |
dat <- dat[-1, ] | |
} | |
# Keep only desired columns | |
if(!is.na(ncols)){ | |
ncols <- min(ncols, ncol(dat)) | |
dat <- dat[, seq_len(ncols)] | |
} | |
# Keep only desired rows | |
if(nrows > 0){ | |
nrows <- min(nrows, nrow(dat)) | |
dat <- dat[seq_len(nrows), ] | |
} | |
# Rename rows | |
rownames(dat) <- seq_len(nrow(dat)) | |
dat | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
readGoogleSheet <- function(url, na.string="", header=TRUE){ | |
stopifnot(require(XML)) | |
# Suppress warnings because Google docs seems to have incomplete final line | |
suppressWarnings({ | |
doc <- paste(readLines(url), collapse=" ") | |
}) | |
if(nchar(doc) == 0) stop("No content found") | |
htmlTable <- gsub("^.*?(<table.*</table).*$", "\\1>", doc) | |
ret <- readHTMLTable(htmlTable, header=header, stringsAsFactors=FALSE, as.data.frame=TRUE) | |
lapply(ret, function(x){ x[ x == na.string] <- NA; x}) | |
} |
I have the same issue as colinglaes: trying to read a private spreadsheet.
Hi, It works if you use the sharable link from google doc, without web publishing
best
Humm... does anyone know how to import more than 99 lines?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey Andrie,
I would love to use this code for a project but was wondering if there was any way to use a private google doc so that I don't have to publish it online, the information i want to utilize is sensitive.
Thanks,
Colin