Skip to content

Instantly share code, notes, and snippets.

@chausen
Created March 18, 2025 14:33
Show Gist options
  • Save chausen/85c185a5329e5078cc3cb7dddaac9127 to your computer and use it in GitHub Desktop.
Save chausen/85c185a5329e5078cc3cb7dddaac9127 to your computer and use it in GitHub Desktop.
data alignment
# Goals
# Label each dataset with its day or config (e.g., “6 cores” vs. “10 cores”).
# Combine them into a single DataFrame (or keep them separate if you prefer).
# Align them by time so you can compare performance around similar test phases.
import pandas as pd
# Read each day, convert timestamp, label config
df_day1 = pd.read_csv("day1_raw.csv")
df_day1["timestamp"] = pd.to_datetime(df_day1["timestamp"])
df_day1["config"] = "6 cores" # or "Day 1"
df_day2 = pd.read_csv("day2_raw.csv")
df_day2["timestamp"] = pd.to_datetime(df_day2["timestamp"])
df_day2["config"] = "10 cores" # or "Day 2"
df_day3 = pd.read_csv("day3_raw.csv")
df_day3["timestamp"] = pd.to_datetime(df_day3["timestamp"])
df_day3["config"] = "4 cores" # or "Day 3"
# Combine
df_all = pd.concat([df_day1, df_day2, df_day3], ignore_index=True)
# Optionally sort by timestamp if you want a chronological DataFrame
df_all.sort_values(by="timestamp", inplace=True)
# normalize timestamps / align by start time
# calculate each day's start time
start_day1 = df_day1["timestamp"].min()
start_day2 = df_day2["timestamp"].min()
start_day3 = df_day3["timestamp"].min()
# create a relative time column (seconds since start)
df_day1["relative_time_s"] = (df_day1["timestamp"] - start_day1).dt.total_seconds()
df_day2["relative_time_s"] = (df_day2["timestamp"] - start_day2).dt.total_seconds()
df_day3["relative_time_s"] = (df_day3["timestamp"] - start_day3).dt.total_seconds()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment