Created
March 25, 2022 16:34
-
-
Save blackhole077/eedfa52b7500de3cb5a16dfaf020fbc0 to your computer and use it in GitHub Desktop.
Find all contiguous chunks of time where some condition was either True or False. Given a Pandas DataFrame that has timestamp information (e.g., Unix Time in ms) and some column where the values are either True or False, this function will find the start and end times, along with what the value was for that slice of time, for all chunks present,…
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas as pd | |
from typing import List, Tuple | |
def get_start_end_times_from_dataframe_based_on_binary_condition(pd_timestamp_series:pd.Series, pd_binary_condition_series:pd.Series)->List[Tuple[int]]: | |
"""Find all contiguous chunks of time where some condition was either True or False. | |
Given a Pandas DataFrame that has timestamp information (e.g., Unix Time in ms) and some column where the values are either True or False, this function | |
will find the start and end times, along with what the value was for that slice of time, for all chunks present, returning them as a list of tuples. | |
NOTE: While it could be argued that this would work for categorical conditions (i.e., non-binary conditions), I personally did not write this function | |
with that in mind. In other words, I can't guarantee that it will work off-the-shelf. | |
:param pd_timestamp_series: A Pandas Series that contains timestamp information, such as the Unix Time in milliseconds. | |
:type pd_timestamp_series: pd.Series | |
:param pd_binary_condition_series: A Pandas series that contains either 0 or 1 (or some analogous setup) representing the state of some condition. | |
:type pd_binary_condition_series: pd.Series | |
:return: A list of all contiguous chunks, denoted by a starting and ending timestamp and the truth value, found within the DataFrame based on the condition provided. | |
:rtype: List[Tuple[int]] | |
""" | |
np_binary_condition_array:np.ndarray = pd_binary_condition_series.to_numpy() | |
# This gets any point when recording stopped or started. Each value is the LAST index where the consecutive nature held true. | |
np_end_indices:np.ndarray = np.argwhere(np.diff(np_binary_condition_array) != 0).flatten() | |
### Since the previous line returns the last indices of consecutive chunks, the START of the next chunk would naturally be the subsequent index. ### | |
### Add the zero index to the array of starting indices. Similarly, add the final timestamp to the ending indices. ### | |
start_indices:List[int] = [0] + (np_end_indices + 1).tolist() | |
end_indices:List[int] = np_end_indices.tolist() + [len(np_binary_condition_array) - 1] | |
start_end_timestamps_with_condition_value = list(zip(pd_timestamp_series[start_indices], pd_timestamp_series[end_indices], pd_binary_condition_series[start_indices])) | |
return start_end_timestamps_with_condition_value |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment