Skip to content

Instantly share code, notes, and snippets.

@turgeonmaxime
Created February 27, 2019 15:36
Show Gist options
  • Save turgeonmaxime/2e5ca274d6f521bd49c3d3b024cfcac1 to your computer and use it in GitHub Desktop.
Save turgeonmaxime/2e5ca274d6f521bd49c3d3b024cfcac1 to your computer and use it in GitHub Desktop.
Add a trend line to see with how many points the Tampa Bay Lightnings could finish the season
library(rvest)
library(tidyverse)
library(lubridate)
url <- "https://www.hockey-reference.com/teams/TBL/2019_games.html"
webpage <- read_html(url)
results <- webpage %>%
html_table(fill = TRUE) %>%
.[[1]] %>%
repair_names() %>% # Some column names are empty
dplyr::select(GP, Date, Time, Opponent,
GF, GA, V3) %>%
as.tibble() %>%
rename(ExtraTime = V3) %>%
filter(GP != "GP") %>% # Remove embedded repeat headers
mutate(GP = as.numeric(GP),
GF = as.numeric(GF),
GA = as.numeric(GA),
Points = case_when(
is.na(GF) ~ NA_integer_,
GF > GA ~ 2L,
ExtraTime != "" ~ 1L,
TRUE ~ 0L
),
TotalPoints = cumsum(Points),
Date = ymd_hm(paste(Date, Time)))
results %>%
ggplot(aes(Date, TotalPoints)) +
geom_line() +
theme_minimal() +
geom_smooth(fullrange = TRUE, method = "lm",
formula = y ~ splines::bs(x, df = 3)) +
ggtitle("Tampa Bay Lightning-Total Points over time",
subtitle = "Trend line: B-splines with 3 dfs")
@turgeonmaxime
Copy link
Author

This is the resulting graph:

rplot

@turgeonmaxime
Copy link
Author

What looks like a loosing streak at the end of January is actually the all-star break. Looking at the total points as a function of the number games played gives a higher estimate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment