Skip to content

Instantly share code, notes, and snippets.

@padpadpadpad
Last active February 2, 2017 17:39
Show Gist options
  • Save padpadpadpad/c1b7ebc2e81812dc782d0e8ff17a2f04 to your computer and use it in GitHub Desktop.
Save padpadpadpad/c1b7ebc2e81812dc782d0e8ff17a2f04 to your computer and use it in GitHub Desktop.
Example of using R to add columns, filter and create means for groups of data and plot output
# good coding practice ####
# 1. #hashtag your code so you know what it does
# 2. clear workspace and load packages at the top to keep track of what you have loaded
# 3. make sure your working directory is in the right place
# 4. space things out in a way that makes your code readable to you
# 5. google things you do not understand. The answers are out there, go find them
# 6. do not get scared/angry when you get errors. It does get easier.... eventually
# clear workspace #### Good code practice to do first
rm(list = ls())
# set working directory - do not need to do here ####
#setwd("~/where/your/stuff/is")
# load packages ####
# if you do not have these packages - install.packages('package name')
library(dplyr)
library(tidyr)
library(ggplot2)
library(magrittr)
library(lubridate)
# load data ####
# df <- read.csv("data.csv", stringsAsFactors = FALSE)
# create dummy data ####
# a %>% means take what I have on the left and put it into the next expression
# '.' means whatever the object on the left is
df <- data.frame(time = seq(as.POSIXct("2016-06-24 23:00:00"), as.POSIXct("2016-10-25 08:30:00"), by = "15 mins")) %>%
mutate(birds = rnorm(n(), 30, 5),
temp = rnorm(n(), 25, 4),
random = NA)
# look at column names
colnames(df)
# look at first 6 values of dataframe
head(df)
# look at format of data
str(df)
# deselect the column random ####
df <- select(df, - random)
# rename a column ####
df <- rename(df, date = time)
# check difference
colnames(df)
# make a new column for hour then make a column for night or day - you can apply this to your timeframes accordingly
# mutate allows us to make new columns without multiple assignments and without excessive use of the $ sign.
df <- mutate(df, hour = hour(date),
day = day(date),
time_of_day = ifelse(hour >= 18 | hour <= 7, 'night', 'day'))
# create a mean number of birds every 30 minutes
# group_by allows us to group variables in the dataframe to then do the same action on all of those groups
df2 <- df %>%
group_by(day, time_of_day, time = cut(date, breaks = '30 min')) %>%
summarise(birds = mean(birds, na.rm = TRUE),
temp = mean(temp)) %>%
data.frame()
# our time format of time is lost!
# we can easily make this again though using mutate!
df2 <- mutate(df2, time = as.POSIXct(strptime(time, format = '%Y-%m-%d %H:%M:%S')),
hour = hour(time))
# a quick plot
# 1. does number of birds change with temperature
ggplot(df2) +
geom_point(aes(x = temp, y = birds, col = time_of_day)) +
facet_wrap(~ time_of_day)
# no because this is my made up data!!!!
# 2. look at change through time
ggplot(df2) +
geom_point(aes(x = time, y = birds, col = time_of_day))
# so many points we could change our grouping so that it does one value per day!
# Quick redo ####
df2 <- df %>%
group_by(day, time_of_day, time = cut(date, breaks = '1 day')) %>%
summarise(birds = mean(birds, na.rm = TRUE),
temp = mean(temp)) %>%
data.frame()
# our time format is lost!
# we can easily make this again though using mutate!
df2 <- mutate(df2, time = as.POSIXct(strptime(time, format = '%Y-%m-%d')))
ggplot(df2) +
geom_point(aes(x = time, y = birds, col = time_of_day))
# looks a bit better!!!
@MolKems
Copy link

MolKems commented Feb 1, 2017

If I want to plot the number drinking against hour, like we have for total present, I entered the code:

ggplot(df2) +

  • geom_point(aes(x = hour, y = Drinking)) +
    
  • facet_wrap(~ Location) +
    

But I got the error message: Error in eval(expr, envir, enclos) : object 'Drinking' not found
I have checked the colnames and tried using the Drinking title in other commands, but it repeats the error message.. any suggestions?
Thanks!

@padpadpadpad
Copy link
Author

padpadpadpad commented Feb 2, 2017

Hi

So it means that Drinking is not in the data frame... Or if it is that R does not know that it is!

You are doing the correct things but you could try:

# check column names
colnames(df2)

# check indexing of Drinking
head(df2$Drinking)
# if this is fine then I am not sure what is wrong
# if this gives an error then it is not a proper column !!!

# make sure data frame has column Drinking in
ggplot() +
geom_point(aes(x = hour, y = Drinking), data = df2) +
facet_wrap(~ Location)

# other than this check with other variables apart from Total and see if that works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment