Skip to content

Instantly share code, notes, and snippets.

@mGalarnyk
Last active September 11, 2023 08:00
Show Gist options
  • Save mGalarnyk/bef0d3194c04e296a6e9784eccdb36f4 to your computer and use it in GitHub Desktop.
Save mGalarnyk/bef0d3194c04e296a6e9784eccdb36f4 to your computer and use it in GitHub Desktop.
R Programming Programming Assignment 1 (Week 2) John Hopkins Data Science Specialization Coursera for the github repo https://github.com/mGalarnyk/datasciencecoursera

R Programming Project 1

github repo for rest of specialization: Data Science Coursera

For this first programming assignment you will write three functions that are meant to interact with dataset that accompanies this assignment. The dataset is contained in a zip file specdata.zip that you can download from the Coursera web site.

Although this is a programming assignment, you will be assessed using a separate quiz.

The zip file containing the data can be downloaded here: specdata.zip [2.4MB]
Description: The zip file contains 332 comma-separated-value (CSV) files containing pollution monitoring data.

# install.packages("data.table")
library("data.table")

pollutantmean <- function(directory, pollutant, id = 1:332) {
  
  # Format number with fixed width and then append .csv to number
  fileNames <- paste0(directory, '/', formatC(id, width=3, flag="0"), ".csv" )
  
  # Reading in all files and making a large data.table
  lst <- lapply(fileNames, data.table::fread)
  dt <- rbindlist(lst)
  
  if (c(pollutant) %in% names(dt)){
    return(dt[, lapply(.SD, mean, na.rm = TRUE), .SDcols = pollutant][[1]])
  } 
}

# Example usage
pollutantmean(directory = '~/Desktop/specdata', pollutant = 'sulfate', id = 20)

Part 2 (complete.R)

complete <- function(directory,  id = 1:332) {
  
  # Format number with fixed width and then append .csv to number
  fileNames <- paste0(directory, '/', formatC(id, width=3, flag="0"), ".csv" )
  
  # Reading in all files and making a large data.table
  lst <- lapply(fileNames, data.table::fread)
  dt <- rbindlist(lst)
  
  return(dt[complete.cases(dt), .(nobs = .N), by = ID])
  
}

#Example usage
complete(directory = '~/Desktop/specdata', id = 20:30)

Part 3 (corr.R)

corr <- function(directory, threshold = 0) {
  
  # Reading in all files and making a large data.table
  lst <- lapply(file.path(directory, list.files(path = directory, pattern="*.csv")), data.table::fread)
  dt <- rbindlist(lst)
  
  # Only keep completely observed cases
  dt <- dt[complete.cases(dt),]
  
  # Apply threshold
  dt <- dt[, .(nobs = .N, corr = cor(x = sulfate, y = nitrate)), by = ID][nobs > threshold]
  return(dt[, corr])
}

# Example Usage
corr(directory = '~/Desktop/specdata', threshold = 150)
@SUSANKI
Copy link

SUSANKI commented Jul 30, 2020

Thank u so much, It's a little bit complicated for me @-@

@Cyberclip
Copy link

Here it says when I try to do part 1 that there's no package named 'data.table', what should I do?

@harshit229
Copy link

Here it says when I try to do part 1 that there's no package named 'data.table', what should I do?

use rstudio

@Romeroc3
Copy link

Romeroc3 commented Dec 27, 2020

Thank you very much for this assignment information. I am currently doing my case study on the refugee situation and I need to study data science to analyze the data. Interestingly, the idea for the research came spontaneously when I read https://samplius.com/free-essay-examples/refugee/ in preparation for lesson. These free essay examples got me interested more in migration and globalization issue. Therefore, I decided to do a little research, but I lack the skills to do a high-quality analysis of big data.

@kennethwoanyah
Copy link

@SUSANKI yep, complicated for me too . lol.
Works perfectly though.

@flaviaouyang
Copy link

Here it says when I try to do part 1 that there's no package named 'data.table', what should I do?

you need to install the package. install.packages(data.table)

@Bell-016
Copy link

I am very frustrated with this course. I took it assuming it will explain things from the beggining for a beginner, but the first assignment to me is unreadble, I would never give this answer because I felt I never learn this things you used for your answer.

@utamadonny
Copy link

i run the corr.R and it return "Error in eval(bysub, parent.frame(), parent.frame()) :
object 'ID' not found"

@Rushield
Copy link

Rushield commented Apr 9, 2022

Bruh this course is just annoying because it ain't show us how to do those things, even understanding your simplified code in a week 2 is damn hard.

@emcdowell28
Copy link

I am very frustrated with this course. I took it assuming it will explain things from the beggining for a beginner, but the first assignment to me is unreadble, I would never give this answer because I felt I never learn this things you used for your answer.

You and me both. I've been using multiple other online textbooks to try and gain any kind of fundamental understanding of this material. I don't usually struggle with things like this, but nothing makes me feel more unintelligent than being tested over things we haven't even been taught yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment