Skip to content

Instantly share code, notes, and snippets.

Created March 7, 2017 18:04
Show Gist options
  • Save mGalarnyk/d14d90015ba23f885746e104100ed6e2 to your computer and use it in GitHub Desktop.
Save mGalarnyk/d14d90015ba23f885746e104100ed6e2 to your computer and use it in GitHub Desktop.
Getting and Cleaning Data Quiz 1 (Week 1) John Hopkins Data Science Specialization Coursera for the github repo

Getting and Cleaning Data Quiz 1 (JHU) Coursera

Question 1

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

and load the data into R. The code book, describing the variable names is here:

How many housing units in this survey were worth more than $1,000,000?

# fread url requires curl package on mac 
# install.packages("curl")

housing <- data.table::fread("")

# VAL attribute says how much property is worth, .N is the number of rows
# VAL == 24 means more than $1,000,000
housing[VAL == 24, .N]

# Answer: 
# 53

Question 2

Use the data you loaded from Question 1. Consider the variable FES in the code book. Which of the "tidy data" principles does this variable violate?


Tidy data one variable per column

Question 3

Download the Excel spreadsheet on Natural Gas Aquisition Program here:

Read rows 18-23 and columns 7-15 into R and assign the result to a variable called:


What is the value of:


(original data source:

fileUrl <- ""
download.file(fileUrl, destfile = paste0(getwd(), '/getdata%2Fdata%2FDATA.gov_NGAP.xlsx'), method = "curl")

dat <- xlsx::read.xlsx(file = "getdata%2Fdata%2FDATA.gov_NGAP.xlsx", sheetIndex = 1, rowIndex = 18:23, colIndex = 7:15)

# Answer:
# 36534720

Question 4

Read the XML data on Baltimore restaurants from here:

How many restaurants have zipcode 21231?

Use http instead of https, which caused the message Error: XML content does not seem to be XML: ''.

# install.packages("XML")
doc <- XML::xmlTreeParse(sub("s", "", fileURL), useInternal = TRUE)
rootNode <- XML::xmlRoot(doc)

zipcodes <- XML::xpathSApply(rootNode, "//zipcode", XML::xmlValue)
xmlZipcodeDT <- data.table::data.table(zipcode = zipcodes)
xmlZipcodeDT[zipcode == "21231", .N]

# Answer: 
# 127

Question 5

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

using the fread() command load the data into an R object


Which of the following is the fastest way to calculate the average value of the variable


broken down by sex using the data.table package?

DT <- data.table::fread("")

# Answer (fastest):
Copy link

letyndr commented Sep 3, 2017

Question number 5 is not complete.

Copy link

Question number 5 is not complete.

indeed it is a correct answer.

Copy link

think you

Copy link

Question number 5 is not complete.

Answer Question 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment