Skip to content

Instantly share code, notes, and snippets.

@jennybc
Created May 23, 2015 20:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jennybc/279ab4675f5f4d275350 to your computer and use it in GitHub Desktop.
Save jennybc/279ab4675f5f4d275350 to your computer and use it in GitHub Desktop.
2015-05-23_read-excel-sheet

Puzzling interaction of read_excel and a Google Sheet

Jenny Bryan
23 May, 2015

library("readxl")

I'm getting puzzling behavior from the sheet = argument of readxl::read_excel() with a small spreadsheet I use for examples and tests in googlesheets. The file is mini-gap.xlsx and it was created by downloading a Google Sheet as an xlsx file. Basically, I don't get the sheet I'm requesting.

mini_gap <- read_excel("mini-gap.xlsx")
  #read_excel(system.file("mini-gap.xlsx", package = "googlesheets"))
  #https://github.com/jennybc/googlesheets/blob/master/inst/mini-gap.xlsx
str(mini_gap)
## Classes 'tbl_df', 'tbl' and 'data.frame':	5 obs. of  6 variables:
##  $ country  : chr  "Albania" "Austria" "Belgium" "Bosnia and Herzegovina" ...
##  $ continent: chr  "Europe" "Europe" "Europe" "Europe" ...
##  $ year     : num  1952 1952 1952 1952 1952
##  $ lifeExp  : num  55.2 66.8 68 53.8 59.6
##  $ pop      : num  1282697 6927772 8730405 2791000 7274900
##  $ gdpPercap: num  1601 6137 8343 974 2444
mini_gap$continent
## [1] "Europe" "Europe" "Europe" "Europe" "Europe"

The "Europe" sheet actually appears as the 4th tab. Why is it the default sheet, i.e. the first sheet?

Let's systematically request the sheets 1 through 5.

continents <- sapply(1:5, function(i) {
  mini_gap <- read_excel("mini-gap.xlsx", sheet = i)
  mini_gap$continent[1]
})
continents
## [1] "Europe"   "Oceania"  "Americas" "Africa"   "Asia"

The tabs are arranged alphabetically in the spreadsheet but when read_excel() access them via the integers 1 through 5, we get Europe (4), Oceania (5), Americas (2), Africa (1), Asia (3).

Let's systematically request the sheet by name.

(continents <- sort(continents))
## [1] "Africa"   "Americas" "Asia"     "Europe"   "Oceania"
sapply(continents, function(i) {
  mini_gap <- read_excel("mini-gap.xlsx", sheet = i)
  mini_gap$continent[1]
})
##     Africa   Americas       Asia     Europe    Oceania 
##   "Europe"  "Oceania" "Americas"   "Africa"     "Asia"

When I access the sheets by name, I get the same strange behavior.

If I open mini-gap.xlsx in Excel and save it the file, this weird behavior goes away. I've confirmed all of this with a second sheet downloaded as xlsx from Google Sheets. I don't know if Google writes non-standard xlsx or ???

---
title: "Puzzling interaction of read_excel and a Google Sheet"
author: "Jenny Bryan"
date: "23 May, 2015"
output:
html_document:
keep_md: yes
theme: cosmo
---
```{r}
library("readxl")
```
I'm getting puzzling behavior from the `sheet =` argument of `readxl::read_excel()` with a small spreadsheet I use for examples and tests in [`googlesheets`](https://github.com/jennybc/googlesheets). The file is [mini-gap.xlsx](https://github.com/jennybc/googlesheets/blob/master/inst/mini-gap.xlsx) and it was created by downloading a Google Sheet as an `xlsx` file. Basically, I don't get the sheet I'm requesting.
```{r}
mini_gap <- read_excel("mini-gap.xlsx")
#read_excel(system.file("mini-gap.xlsx", package = "googlesheets"))
#https://github.com/jennybc/googlesheets/blob/master/inst/mini-gap.xlsx
str(mini_gap)
mini_gap$continent
```
The "Europe" sheet actually appears as the 4th tab. Why is it the default sheet, i.e. the first sheet?
Let's systematically request the sheets 1 through 5.
```{r}
continents <- sapply(1:5, function(i) {
mini_gap <- read_excel("mini-gap.xlsx", sheet = i)
mini_gap$continent[1]
})
continents
```
The tabs are arranged alphabetically in the spreadsheet but when `read_excel()` access them via the integers 1 through 5, we get Europe (4), Oceania (5), Americas (2), Africa (1), Asia (3).
Let's systematically request the sheet by name.
```{r}
(continents <- sort(continents))
sapply(continents, function(i) {
mini_gap <- read_excel("mini-gap.xlsx", sheet = i)
mini_gap$continent[1]
})
```
When I access the sheets by name, I get the same strange behavior.
If I open `mini-gap.xlsx` in Excel and save it the file, this weird behavior goes away. I've confirmed all of this with a second sheet downloaded as `xlsx` from Google Sheets. I don't know if Google writes non-standard `xlsx` or ???
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment