The original code for the desaster markdown file, as well as a corrected version
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "What a desaster!" | |
author: "Claudius" | |
date: '2022-04-06' | |
output: pdf_document | |
--- | |
# Packages used | |
```{r} | |
library(tidyverse) | |
library(DataScienceExercises) | |
library(knitr) | |
``` | |
# Exploring flight data | |
In this short text we explore the following data set on flights departing | |
from New York. | |
```{r} | |
base_data <- DataScienceExercises::nycflights21_small[1:200, ] | |
data.frame(head(DataScienceExercises::nycflights21_small, 50)) | |
``` | |
To have a first look on the relationship of the variables, consider the | |
following scatter plots: | |
```{r, out.width='40%', out.height='40%', fig.pos="center"} | |
arrival_dep <- ggplot(data = base_data) + | |
geom_point(mapping = aes(x=arr_delay, y=dep_delay), | |
alpha=0.5, color="#00395B") + | |
ggplot2::theme_bw() + | |
labs(x="Arrival delay", y="Departure delay") + | |
theme( | |
legend.position = "bottom", | |
legend.title = ggplot2::element_blank(), | |
panel.border = ggplot2::element_blank(), | |
axis.line = ggplot2::element_line(colour = "grey"), | |
axis.ticks = ggplot2::element_line(colour = "grey") | |
) | |
arrival_dist <- ggplot(data = base_data) + | |
geom_point(mapping = aes(x=arr_delay, y=distance), | |
alpha=0.5, color="#00395B") + | |
ggplot2::theme_bw() + | |
labs(x="Arrival delay", y="Departure delay") + | |
theme( | |
legend.position = "bottom", | |
legend.title = ggplot2::element_blank(), | |
panel.border = ggplot2::element_blank(), | |
axis.line = ggplot2::element_line(colour = "grey"), | |
axis.ticks = ggplot2::element_line(colour = "grey") | |
) | |
arrival_month <- ggplot(data = base_data) + | |
geom_point(mapping = aes(y=arr_delay, x=month), | |
alpha=0.5, color="#00395B") + | |
ggplot2::theme_bw() + | |
labs(x="Arrival delay", y="Departure delay") + | |
theme( | |
legend.position = "bottom", | |
legend.title = ggplot2::element_blank(), | |
panel.border = ggplot2::element_blank(), | |
axis.line = ggplot2::element_line(colour = "grey"), | |
axis.ticks = ggplot2::element_line(colour = "grey") | |
) | |
arrival_carrier <- ggplot(data = base_data) + | |
geom_point(mapping = aes(y=arr_delay, x=carrier), | |
alpha=0.5, color="#00395B") + | |
ggplot2::theme_bw() + | |
labs(x="Arrival delay", y="Departure delay") + | |
theme( | |
legend.position = "bottom", | |
legend.title = ggplot2::element_blank(), | |
panel.border = ggplot2::element_blank(), | |
axis.line = ggplot2::element_line(colour = "grey"), | |
axis.ticks = ggplot2::element_line(colour = "grey") | |
) | |
ggpubr::ggarrange( | |
arrival_dep, arrival_dist, | |
arrival_month, arrival_carrier, | |
ncol = 2, nrow = 2) | |
``` | |
This suggests that there is a strong correlation between departure and arrival | |
delay. To compute the correlation we might use the following R code: | |
```{r, echo=FALSE} | |
cor(base_data$arr_delay, base_data$dep_delay) | |
``` | |
There is indeed a very strong correlation. But is it significant? Lets check | |
it using the Pearson correlation test: | |
```{r} | |
cor.test(base_data$arr_delay, base_data$dep_delay, method = "pearson") | |
``` | |
Of course, these are just preliminary results, from a methodological point of | |
view there is still much to do... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "What a beauty!" | |
author: "Claudius" | |
date: '2022-04-06' | |
output: pdf_document | |
header-includes: | |
- \usepackage{setspace} | |
- \onehalfspacing | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set( | |
warning = FALSE, message = FALSE) | |
``` | |
# Packages used | |
```{r} | |
library(tidyverse) | |
library(DataScienceExercises) | |
library(knitr) | |
``` | |
# Exploring flight data | |
In this short text we explore the following data set on flights departing | |
from New York. | |
```{r, echo=FALSE} | |
base_data <- DataScienceExercises::nycflights21_small[1:200, ] | |
knitr::kable(head(DataScienceExercises::nycflights21_small, 5)) | |
``` | |
To have a first look on the relationship of the variables, consider the | |
following scatter plots: | |
```{r, fig.align='center', out.width='70%', out.height='70%', echo=FALSE} | |
arrival_dep <- ggplot(data = base_data) + | |
geom_point(mapping = aes(x=arr_delay, y=dep_delay), | |
alpha=0.5, color="#00395B") + | |
ggplot2::theme_bw() + | |
labs(x="Arrival delay", y="Departure delay") + | |
theme( | |
legend.position = "bottom", | |
legend.title = ggplot2::element_blank(), | |
panel.border = ggplot2::element_blank(), | |
axis.line = ggplot2::element_line(colour = "grey"), | |
axis.ticks = ggplot2::element_line(colour = "grey") | |
) | |
arrival_dist <- ggplot(data = base_data) + | |
geom_point(mapping = aes(x=arr_delay, y=distance), | |
alpha=0.5, color="#00395B") + | |
ggplot2::theme_bw() + | |
labs(x="Arrival delay", y="Departure delay") + | |
theme( | |
legend.position = "bottom", | |
legend.title = ggplot2::element_blank(), | |
panel.border = ggplot2::element_blank(), | |
axis.line = ggplot2::element_line(colour = "grey"), | |
axis.ticks = ggplot2::element_line(colour = "grey") | |
) | |
arrival_month <- ggplot(data = base_data) + | |
geom_point(mapping = aes(y=arr_delay, x=month), | |
alpha=0.5, color="#00395B") + | |
ggplot2::theme_bw() + | |
labs(x="Arrival delay", y="Departure delay") + | |
theme( | |
legend.position = "bottom", | |
legend.title = ggplot2::element_blank(), | |
panel.border = ggplot2::element_blank(), | |
axis.line = ggplot2::element_line(colour = "grey"), | |
axis.ticks = ggplot2::element_line(colour = "grey") | |
) | |
arrival_carrier <- ggplot(data = base_data) + | |
geom_point(mapping = aes(y=arr_delay, x=carrier), | |
alpha=0.5, color="#00395B") + | |
ggplot2::theme_bw() + | |
labs(x="Arrival delay", y="Departure delay") + | |
theme( | |
legend.position = "bottom", | |
legend.title = ggplot2::element_blank(), | |
panel.border = ggplot2::element_blank(), | |
axis.line = ggplot2::element_line(colour = "grey"), | |
axis.ticks = ggplot2::element_line(colour = "grey") | |
) | |
ggpubr::ggarrange( | |
arrival_dep, arrival_dist, | |
arrival_month, arrival_carrier, | |
ncol = 2, nrow = 2) | |
``` | |
These plots suggests that there is a strong correlation between departure and | |
arrival delay. To compute the correlation we might use the following R code: | |
```{r, echo=TRUE, results='hide'} | |
cor_coef <- cor(base_data$arr_delay, base_data$dep_delay) | |
``` | |
This produces a correlation coefficient of `r round(cor_coef, 3)`, suggesting | |
that there is indeed a very strong correlation. | |
But is it significant? Lets check it using the Pearson correlation test: | |
```{r} | |
c_test <- cor.test( | |
x = base_data$arr_delay, | |
y = base_data$dep_delay, | |
method = "pearson") | |
``` | |
The most relevant statistics are: | |
```{r, echo=FALSE} | |
knitr::kable(tibble( | |
"t-stat"=c_test$statistic, | |
"df"=c_test$parameter, | |
"p-val"=c_test$p.value, | |
"95% conf interval"=paste0( | |
"[", | |
paste(round(c_test$conf.int, 3), collapse = "; "), | |
"]") | |
), align = "c") | |
``` | |
Of course, these are just preliminary results, from a methodological point of | |
view there is still much to do... | |
# The corrections we did | |
To make this document look *much* nicer immediately, the following changes | |
were made: | |
* Supress warnings and messages by default | |
* Set line spacing to one and a half (just looked it up in the internet) | |
* Do not show the whole table in the beginning but only the first lines; | |
* do not show the R code in this context since it is not meaningful; | |
* use `knitr::kable()` to print tables | |
* Do not show the code for preparing the plot, it is not necessary to understand | |
the message | |
* Adjust `out.width` and `out.height` options in the plot chunk such that the | |
plot is easier to read | |
* Show the code use to compute the correlation coefficient, but in a readable | |
way; but summarize the output concisely, focusing on what is relevant | |
* Report the result of the Pearson correlation test in a more concise way | |
Of course, the last sentence above is true: to analyze this data in a | |
meaningful way, we must invest a bit more thinking into the correct analysis | |
method! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment