Skip to content

Instantly share code, notes, and snippets.

@sdtaylor
Last active July 21, 2021 13:59
Show Gist options
  • Save sdtaylor/bb396d1c1b0e278a76a1cd467c43e55f to your computer and use it in GitHub Desktop.
Save sdtaylor/bb396d1c1b0e278a76a1cd467c43e55f to your computer and use it in GitHub Desktop.
idenfity continuous non-zero values in timeseries
library(tidyverse)
identify_continuous_non_zero_series = function(df, min_sequence_size =5){
# adds a new column called 'continuous_series' to the df data.frame
# identifying where 'value' column is >0 for at least min_sequence_size
# filter to only continous chunks > 0, assign each an ID,
# and flag each chunk >= min_sequence_size.
# the cumsum() trick is from https://stackoverflow.com/a/42734207/6615512
temp_df = df %>%
filter(value>0) %>%
arrange(date) %>%
mutate(sequence_id = cumsum( c(1,diff(date))!=1 )) %>%
group_by(sequence_id) %>%
mutate(continuous_series = ifelse(n() >= min_sequence_size, 'Yes','No') ) %>%
ungroup() %>%
select(date, continuous_series)
# join to original and input No for all dates where value == 0
df %>%
left_join(temp_df, by='date') %>%
mutate(continuous_series = replace_na(continuous_series, 'No'))
}
df <- tibble(
value = c(.5, .8, .7, .5, .2, .06, 0, 0, 0, 0, 0, .1, .3, .2, 0,0.5, 0.67, 0.32, 0.34,0.34,0.33),
date = as.Date("2005-07-20") + 0:20
)
identify_continuous_non_zero_series(df, min_sequence_size = 5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment