Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save blackhole077/eedfa52b7500de3cb5a16dfaf020fbc0 to your computer and use it in GitHub Desktop.
Save blackhole077/eedfa52b7500de3cb5a16dfaf020fbc0 to your computer and use it in GitHub Desktop.
Find all contiguous chunks of time where some condition was either True or False. Given a Pandas DataFrame that has timestamp information (e.g., Unix Time in ms) and some column where the values are either True or False, this function will find the start and end times, along with what the value was for that slice of time, for all chunks present,…
import numpy as np
import pandas as pd
from typing import List, Tuple
def get_start_end_times_from_dataframe_based_on_binary_condition(pd_timestamp_series:pd.Series, pd_binary_condition_series:pd.Series)->List[Tuple[int]]:
"""Find all contiguous chunks of time where some condition was either True or False.
Given a Pandas DataFrame that has timestamp information (e.g., Unix Time in ms) and some column where the values are either True or False, this function
will find the start and end times, along with what the value was for that slice of time, for all chunks present, returning them as a list of tuples.
NOTE: While it could be argued that this would work for categorical conditions (i.e., non-binary conditions), I personally did not write this function
with that in mind. In other words, I can't guarantee that it will work off-the-shelf.
:param pd_timestamp_series: A Pandas Series that contains timestamp information, such as the Unix Time in milliseconds.
:type pd_timestamp_series: pd.Series
:param pd_binary_condition_series: A Pandas series that contains either 0 or 1 (or some analogous setup) representing the state of some condition.
:type pd_binary_condition_series: pd.Series
:return: A list of all contiguous chunks, denoted by a starting and ending timestamp and the truth value, found within the DataFrame based on the condition provided.
:rtype: List[Tuple[int]]
"""
np_binary_condition_array:np.ndarray = pd_binary_condition_series.to_numpy()
# This gets any point when recording stopped or started. Each value is the LAST index where the consecutive nature held true.
np_end_indices:np.ndarray = np.argwhere(np.diff(np_binary_condition_array) != 0).flatten()
### Since the previous line returns the last indices of consecutive chunks, the START of the next chunk would naturally be the subsequent index. ###
### Add the zero index to the array of starting indices. Similarly, add the final timestamp to the ending indices. ###
start_indices:List[int] = [0] + (np_end_indices + 1).tolist()
end_indices:List[int] = np_end_indices.tolist() + [len(np_binary_condition_array) - 1]
start_end_timestamps_with_condition_value = list(zip(pd_timestamp_series[start_indices], pd_timestamp_series[end_indices], pd_binary_condition_series[start_indices]))
return start_end_timestamps_with_condition_value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment