Skip to content

Instantly share code, notes, and snippets.

@mGalarnyk
Last active September 11, 2023 08:00
Show Gist options
  • Save mGalarnyk/bef0d3194c04e296a6e9784eccdb36f4 to your computer and use it in GitHub Desktop.
Save mGalarnyk/bef0d3194c04e296a6e9784eccdb36f4 to your computer and use it in GitHub Desktop.
R Programming Programming Assignment 1 (Week 2) John Hopkins Data Science Specialization Coursera for the github repo https://github.com/mGalarnyk/datasciencecoursera

R Programming Project 1

github repo for rest of specialization: Data Science Coursera

For this first programming assignment you will write three functions that are meant to interact with dataset that accompanies this assignment. The dataset is contained in a zip file specdata.zip that you can download from the Coursera web site.

Although this is a programming assignment, you will be assessed using a separate quiz.

The zip file containing the data can be downloaded here: specdata.zip [2.4MB]
Description: The zip file contains 332 comma-separated-value (CSV) files containing pollution monitoring data.

# install.packages("data.table")
library("data.table")

pollutantmean <- function(directory, pollutant, id = 1:332) {
  
  # Format number with fixed width and then append .csv to number
  fileNames <- paste0(directory, '/', formatC(id, width=3, flag="0"), ".csv" )
  
  # Reading in all files and making a large data.table
  lst <- lapply(fileNames, data.table::fread)
  dt <- rbindlist(lst)
  
  if (c(pollutant) %in% names(dt)){
    return(dt[, lapply(.SD, mean, na.rm = TRUE), .SDcols = pollutant][[1]])
  } 
}

# Example usage
pollutantmean(directory = '~/Desktop/specdata', pollutant = 'sulfate', id = 20)

Part 2 (complete.R)

complete <- function(directory,  id = 1:332) {
  
  # Format number with fixed width and then append .csv to number
  fileNames <- paste0(directory, '/', formatC(id, width=3, flag="0"), ".csv" )
  
  # Reading in all files and making a large data.table
  lst <- lapply(fileNames, data.table::fread)
  dt <- rbindlist(lst)
  
  return(dt[complete.cases(dt), .(nobs = .N), by = ID])
  
}

#Example usage
complete(directory = '~/Desktop/specdata', id = 20:30)

Part 3 (corr.R)

corr <- function(directory, threshold = 0) {
  
  # Reading in all files and making a large data.table
  lst <- lapply(file.path(directory, list.files(path = directory, pattern="*.csv")), data.table::fread)
  dt <- rbindlist(lst)
  
  # Only keep completely observed cases
  dt <- dt[complete.cases(dt),]
  
  # Apply threshold
  dt <- dt[, .(nobs = .N, corr = cor(x = sulfate, y = nitrate)), by = ID][nobs > threshold]
  return(dt[, corr])
}

# Example Usage
corr(directory = '~/Desktop/specdata', threshold = 150)
@flaviaouyang
Copy link

Here it says when I try to do part 1 that there's no package named 'data.table', what should I do?

you need to install the package. install.packages(data.table)

@Bell-016
Copy link

I am very frustrated with this course. I took it assuming it will explain things from the beggining for a beginner, but the first assignment to me is unreadble, I would never give this answer because I felt I never learn this things you used for your answer.

@utamadonny
Copy link

i run the corr.R and it return "Error in eval(bysub, parent.frame(), parent.frame()) :
object 'ID' not found"

@Rushield
Copy link

Rushield commented Apr 9, 2022

Bruh this course is just annoying because it ain't show us how to do those things, even understanding your simplified code in a week 2 is damn hard.

@emcdowell28
Copy link

I am very frustrated with this course. I took it assuming it will explain things from the beggining for a beginner, but the first assignment to me is unreadble, I would never give this answer because I felt I never learn this things you used for your answer.

You and me both. I've been using multiple other online textbooks to try and gain any kind of fundamental understanding of this material. I don't usually struggle with things like this, but nothing makes me feel more unintelligent than being tested over things we haven't even been taught yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment