Skip to content

Instantly share code, notes, and snippets.

@chausen
Created March 18, 2025 14:06
Show Gist options
  • Save chausen/ccaf558f384769bba1899e6deb69ffc9 to your computer and use it in GitHub Desktop.
Save chausen/ccaf558f384769bba1899e6deb69ffc9 to your computer and use it in GitHub Desktop.
Downsample large data file
import pandas as pd
# 1. Read raw CSV
df = pd.read_csv("raw_data.csv")
# 2. Convert the timestamp column to datetime (adjust column name/format as necessary)
df["timestamp"] = pd.to_datetime(df["timestamp"])
# 3. Make the timestamp the index
df.set_index("timestamp", inplace=True)
# 4. Resample to 10-second intervals, and compute mean, min, max on each numeric column
df_agg = df.resample("10S").agg(["mean", "min", "max"])
# 5. Optional: Flatten the multi-level column names produced by agg()
df_agg.columns = ["_".join(col).strip() for col in df_agg.columns.values]
# 6. Save the aggregated data to a new CSV
df_agg.to_csv("aggregated_data_10s.csv")
print("Done! Aggregated file saved as aggregated_data_10s.csv")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment