Skip to content

Instantly share code, notes, and snippets.

@erictleung
Created March 15, 2022 21:08
Show Gist options
  • Save erictleung/c6a82205707fbced1b4b3afbbaecd4f3 to your computer and use it in GitHub Desktop.
Save erictleung/c6a82205707fbced1b4b3afbbaecd4f3 to your computer and use it in GitHub Desktop.
For each row, count number of other rows that fall within a time interval
start_date stop_date
1999-07-15 1999-11-15
1999-11-15 2000-02-15
1999-12-15 2000-02-15
2000-09-15 2002-02-15
2002-02-15 2003-12-15
2002-02-15 2003-12-15
2003-02-15 2004-03-15
2004-04-15 2004-08-15
2004-08-15 2005-04-15
2005-04-15 2017-12-15
2006-02-15 2013-04-15
# Load libraries
library(tidyverse) # CRAN v1.3.1
library(lubridate) # CRAN v1.7.10
# Read in example data
in_data <- read_csv("example_data.csv")
# Modify to create intervals and one year prior start_date
df <- in_data %>%
mutate(
minus1Y = start_date %m-% years(1),
interval = interval(ymd(minus1Y), ymd(start_date)),
count = 0
)
# Put back results into this data frame
df_out <- data.frame()
# Loop through rows and compare each row to the rest of the data frame
for (i in 1:nrow(df)) {
# Pull out row of interest
tmp <- df[i,]
# Loop through other intervals in the data frame
# Count number of rows that have a stop_date within the interest interval
n_count <- df %>%
dplyr::setdiff(tmp) %>%
mutate(in_interval = stop_date %within% tmp$interval) %>%
filter(in_interval) %>%
nrow()
# Save back information into data
tmp$count <- n_count
df_out <- bind_rows(df_out, tmp)
}
df_out
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment