Skip to content

Instantly share code, notes, and snippets.

@khufkens
Created June 12, 2019 12:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save khufkens/7283e3fc6f83881f11b659672c81e6e8 to your computer and use it in GitHub Desktop.
Save khufkens/7283e3fc6f83881f11b659672c81e6e8 to your computer and use it in GitHub Desktop.
Ghent air quality study VMM rehash
# Ghent air pollution analysis
# parametric t-tests
# non-parametric Mann-Whitney U test
# download and collate the data
pdf_data <- paste0(pdftools::pdf_text("https://klimaat.stad.gent/sites/default/files/nota_circulatieplangent_3.pdf")[2],
collapse = " ")
# some regular expression moving about
subset <- gsub(',', ".", pdf_data)
subset <- gsub('\n', " ", subset)
subset <- unlist(strsplit(subset, "\\s{2,}"))[10:129]
# recast into a data frame
df <- data.frame(matrix(subset, 20,6, byrow = TRUE),
stringsAsFactors = FALSE)
# assign column names
names(df) <- c("meetplaatscode",
"straat",
"voor",
"na",
"verschil",
"verschil_perc")
# convert to numeric
df$voor <- as.numeric(df$voor)
df$na <- as.numeric(df$na)
df$verschil <- as.numeric(df$verschil)
# test for normality
s_test <- shapiro.test(df$verschil)
# t-test on the difference
t.test(df$voor, df$na)
t.test(df$voor, df$na + 3.7)
# non-parametric t-test
wilcox.test(df$voor, df$na)
wilcox.test(df$voor, df$na + 3.7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment