Skip to content

Instantly share code, notes, and snippets.

@mrcsparker
Last active November 13, 2015 15:03
Show Gist options
  • Save mrcsparker/12439eb54777674f2c89 to your computer and use it in GitHub Desktop.
Save mrcsparker/12439eb54777674f2c89 to your computer and use it in GitHub Desktop.
R in 10 minutes

Sample R Code

# Filtering data sets
install.packages("dplyr")

# Visualizing data sets
install.packages("ggplot2")

# Clean up data sets
install.packages("tidyr")

Load movie data

# Load dplyr
library(dplyr)

# Load ggplot2
library(ggplot2)

# Load movies
movies <- read.csv("~/movies.csv")

# Quick summary of the data
head(movies)

Filtering data

# Find movies that have a genre of type 'Drama'
movies %>%
  filter(genres == 'Drama') %>%
  head()


# Filter by year and genre
movies %>%
  filter(grepl("2014", title), grepl("Drama", genres)) %>%
  head()

Functions

# Turn the above code into a function
byYear <- function(movies, y) {
  movies %>%
    filter(grepl(y, title))
}

byGenreAndYear <- function(movies, g, y) {
  movies %>%
    byYear(y) %>% filter(grepl(g, genres))
}

# Count the number of movies
movieCount <- function(movies) {
  movies %>% summarize(n = n())
}

# Combine
movies %>%
  byGenreAndYear(2014, "Drama") %>%
  movieCount()

Graphs with ggplot2

m <- movies %>% byYear(2015) %>% head(100)

# Not readable
ggplot(m) + geom_bar(aes(m$genres))

# Much better
ggplot(m, aes(m$genres)) + geom_bar() + coord_flip()

# Bring down the number of items
n <- movies %>% byGenreAndYear("Horror", 2015)
ggplot(n, aes(n$genres)) + geom_bar() + coord_flip()

Follow up

RStudio

Run R locally in an IDE (Integrated Development Environment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment