Skip to content

Instantly share code, notes, and snippets.

@markwh markwh/responseAnalysis.Rmd Secret
Last active Nov 2, 2015

Embed
What would you like to do?
October 2015 GRiD interim elections and survey analysis
---
output:
html_document:
keep_md: true
---
```{r, message = FALSE, warning = FALSE}
library(knitr)
opts_chunk$set(fig.width = 12, warning = FALSE, message = FALSE)
```
The big news out of GRiD this past week was the interim elections for GRiD club officers. While having a full executive board is very exciting, the election data unfortunately are not--since everybody was running unopposed. Just for fun, I'll do a visualization of results anyway. Pus I'll look at the responses to two survey questions on the election form. All analysis will be conducted using R, with figures made using the **ggplot2** package.
### Step 1: Getting the data
The elections were conducted online using Google Forms, which returns results as a csv file. So loading the data is straightforward:
```{r}
responses = read.csv("responses.csv")
```
### Step 2: Choosing a ggplot theme
Since this is a post about elections, I thought I'd go with the `ggthemes::theme_fivethirtyeight` theme, but this doesn't have a y-axis title, so I'll change it accordingly:
```{r}
library(ggplot2)
library(ggthemes)
theme_grid <- theme_fivethirtyeight()
theme_grid$axis.title = theme_bw()$axis.title
theme_grid$axis.title.x = theme_bw()$axis.title.x
theme_grid$axis.title.y = theme_bw()$axis.title.y
```
### Step 3: Plot the election results
The code below produces the plot the followis it. It includes some light preprocessing to get the data into the form that ggplot likes. I won't go into detail about this; have a look at the documentation for the functions below if you want to know more.
```{r}
library(reshape2)
library(dplyr)
election <- responses[, 2:6]
eln.m <- melt(election, id.vars = NULL) %>%
filter(value != "")
eln.m %>%
group_by(variable, value) %>%
summarize(n = n(), pct = n() / length(value) * 100) %>%
ggplot(aes(x = value, y = pct)) +
geom_bar(stat = "identity") +
facet_wrap(~variable, scales = "free_x", nr = 1) +
theme_grid +
theme(axis.title.x = element_blank()) +
ylab("Percent of Vote")
```
As I said, not very exciting. Everyone got 100% of the vote. That's what happens when you run unopposed I guess.
### Step 4: Plot the survey responses
Before presenting the survey results, a brief note for future Google Form setup: *don't include commas in checkbox options!*. Google returns the responses to checkbox input for each respondant and question as a as comma-delineated string. Because of this, any commas in the checkbox options, it's impossible for R to disambiguate the delimiter commas from the text commas. For example, here is one person's responses to the question "What kinds of GRiD events are you most interested in?":
```{r}
responses[8, 8] %>%
as.character
```
Splitting this string by comma yields a character vector of length 7, despite only 4 checkboxes having been selected:
```{r}
responses[8, 8] %>%
as.character %>%
strsplit(split = ", ") %>%
`[[`(1)
```
To remedy this, I had to manually replace non-separating commas with something else; I chose to use a forward slash. I did this replacement using the `gsub` function and (not so much here as in the next question) appropriately chosen [regular expresions](https://en.wikipedia.org/wiki/Regular_expression). Here is the result of doing this with the first survey question:
```{r}
fixfun <- function(x) {
out <- x %>%
gsub(", etc.", " etc.", x = .) %>%
gsub("shops, inter", "shops/inter", x = .) %>%
gsub("tice, career", "tice/career", x = .)
}
surv1 <- responses[, 8] %>%
as.character() %>%
fixfun()
```
The next survey question had a lot of commas inside of parentheses and only one outside of parentheses. Replacement of the inside-parentheses commas is pretty simple using the right regular expression. Actually, I just did away with the parenthetical statements altogether.
```{r}
surv2 <- as.character(responses[, 9]) %>%
gsub("works, deep", "works / deep", x = .) %>%
gsub("\\(([^)]+)\\)", "", x = .)
```
Now that that's fixed, we can finally look at the survey responses!
#### Question 1: *What kinds of GRiD events are you most interested in?*
```{r}
surv1 %>%
strsplit(split = ", ") %>%
unlist() %>%
as.factor() %>%
summary() %>%
data.frame(item = names(.), n = .) %>%
ggplot(aes(x = item, y = n)) +
geom_bar(stat = "identity") +
theme_grid +
theme(axis.title.y = element_blank()) +
coord_flip() +
ylab("Number Interested")
```
Question 2: *What data science topics do you want to learn (more) about?*
```{r}
surv2 %>%
strsplit(", ") %>%
unlist() %>%
as.factor() %>%
summary() %>%
data.frame(item = names(.), n = .) %>%
ggplot(aes(x = item, y = n)) +
geom_bar(stat = "identity") +
theme_grid +
theme(axis.title.y = element_blank()) +
ylab("Number Interested") +
coord_flip()
```
Neat! So it looks like GRiD members are most interested in attending talks by people in industry and professional-development events. We'll be sure to host more of these! There is also major interest in learning big data methods (Spark, Hadoop, and the like), data visualization, Bayesian stats, and others. This is great. So many things to get excited about.
If you are a GRiD member and haven't yet completed the member survey, you can still do that [here](http://goo.gl/forms/t8pPMoAj4K).
Timestamp Co-Chair of Operations Chief Technical Officer Chief Outreach Officer Chief Networking Officer Chief Communications Officer Have you completed our member survey? Untitled Question
10/25/2015 22:54:52 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar Yes.
10/25/2015 22:55:50 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar Yes.
10/25/2015 22:57:16 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar No. Social events, Talks by people in industry, Networking with other grad-student groups Data visualization (ggplot2, d3.js, tableau, htmlwidgets, etc.)
10/25/2015 23:05:59 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar Maybe. Social events, Talks by people in industry, Talks by academic faculty / researchers, Hands-on data projects / hackathons, Coding, etc. tutorials, Professional events--resume workshops, interview practice, career fairs Data visualization (ggplot2, d3.js, tableau, htmlwidgets, etc.), Bayesian statistics, Big data (Hadoop, Apache Spark), Software development (code structure, testing, etc.), Version control (git, github, svn)
10/25/2015 23:06:36 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar No. Social events, Talks by people in industry, Talks by academic faculty / researchers, Talks by students, Networking with other grad-student groups, Hands-on data projects / hackathons, Coding, etc. tutorials, Professional events--resume workshops, interview practice, career fairs Basics of data-science languages (R, Python, etc.), Data visualization (ggplot2, d3.js, tableau, htmlwidgets, etc.), Regression modeling, Bayesian statistics, Machine learning libraries (scikit-learn, caret, etc.), Neural networks, deep learning, Big data (Hadoop, Apache Spark), Software development (code structure, testing, etc.), Databases (sql, mongoDB, ???)
10/25/2015 23:13:00 Kostis Gourgoulias Andy Smith Emma Kearney Yes.
10/25/2015 23:29:05 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar No. Talks by people in industry, Hands-on data projects / hackathons, Coding, etc. tutorials, Professional events--resume workshops, interview practice, career fairs Machine learning libraries (scikit-learn, caret, etc.), Big data (Hadoop, Apache Spark), Databases (sql, mongoDB, ???)
10/26/2015 1:23:07 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar No.
10/26/2015 6:54:44 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar No. Social events, Talks by people in industry, Talks by academic faculty / researchers, Talks by students, Coding, etc. tutorials, Professional events--resume workshops, interview practice, career fairs Basics of data-science languages (R, Python, etc.), Data visualization (ggplot2, d3.js, tableau, htmlwidgets, etc.), Regression modeling, Big data (Hadoop, Apache Spark), Writing web applications (Shiny, javascript, Docker)
10/26/2015 9:48:44 Kostis Gourgoulias Andy Smith Emma Kearney No. Social events, Talks by people in industry, Talks by academic faculty / researchers, Hands-on data projects / hackathons Basics of data-science languages (R, Python, etc.), Bayesian statistics, Machine learning libraries (scikit-learn, caret, etc.), Big data (Hadoop, Apache Spark)
10/26/2015 9:53:22 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar No. Talks by academic faculty / researchers, Talks by students, Coding, etc. tutorials Basics of data-science languages (R, Python, etc.), Data visualization (ggplot2, d3.js, tableau, htmlwidgets, etc.), Regression modeling, Software development (code structure, testing, etc.)
10/26/2015 10:43:13 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar Yes.
10/26/2015 10:48:05 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar Maybe. Talks by people in industry, Networking with other grad-student groups, Professional events--resume workshops, interview practice, career fairs Big data (Hadoop, Apache Spark), Software development (code structure, testing, etc.), Databases (sql, mongoDB, ???)
10/26/2015 11:00:50 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar Maybe. Talks by people in industry, Talks by academic faculty / researchers, Talks by students, Coding, etc. tutorials, Professional events--resume workshops, interview practice, career fairs Basics of data-science languages (R, Python, etc.), Data visualization (ggplot2, d3.js, tableau, htmlwidgets, etc.), Bayesian statistics, Software development (code structure, testing, etc.), Version control (git, github, svn), Writing web applications (Shiny, javascript, Docker)
10/26/2015 17:57:52 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar Yes.
10/26/2015 21:52:01 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar Yes.
10/27/2015 8:20:35 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar No. Social events, Talks by people in industry, Talks by academic faculty / researchers, Talks by students, Networking with other grad-student groups, Hands-on data projects / hackathons, Coding, etc. tutorials, Professional events--resume workshops, interview practice, career fairs Basics of data-science languages (R, Python, etc.), Data visualization (ggplot2, d3.js, tableau, htmlwidgets, etc.), Bayesian statistics, Machine learning libraries (scikit-learn, caret, etc.), Big data (Hadoop, Apache Spark), Version control (git, github, svn), Databases (sql, mongoDB, ???)
10/27/2015 14:39:17 Kostis Gourgoulias Andy Smith Yes.
10/27/2015 16:43:06 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar No. Social events, Talks by people in industry, Talks by students, Networking with other grad-student groups, Hands-on data projects / hackathons, Professional events--resume workshops, interview practice, career fairs Bayesian statistics, Machine learning libraries (scikit-learn, caret, etc.), Neural networks, deep learning, Big data (Hadoop, Apache Spark), Software development (code structure, testing, etc.), Databases (sql, mongoDB, ???)
10/27/2015 21:04:26 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar Maybe. Talks by people in industry, Networking with other grad-student groups, Hands-on data projects / hackathons, Coding, etc. tutorials, Professional events--resume workshops, interview practice, career fairs Data visualization (ggplot2, d3.js, tableau, htmlwidgets, etc.), Big data (Hadoop, Apache Spark), Databases (sql, mongoDB, ???), Writing web applications (Shiny, javascript, Docker)
10/28/2015 8:08:30 Anupama Pasumarthy Kostis Gourgoulias Andy Smith Emma Kearney Ankita Shankhdhar No. Talks by people in industry, Talks by academic faculty / researchers, Talks by students, Networking with other grad-student groups, Professional events--resume workshops, interview practice, career fairs Bayesian statistics, Big data (Hadoop, Apache Spark)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.