Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@geoHeil
Last active November 9, 2020 13:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save geoHeil/7344932b27f05bfaab551b3b948ac2c5 to your computer and use it in GitHub Desktop.
Save geoHeil/7344932b27f05bfaab551b3b948ac2c5 to your computer and use it in GitHub Desktop.
matrixprofile / stumpy / mulltiple time-series handling / pandas UDF per group (no aggregation to scalar
sure:
%pylab inline
import stumpy
import pandas as pd
import numpy as np
import random
random_seed = 47
np.random.seed(random_seed)
random.seed(random_seed)
def generate_df_for_device(n_observations, n_metrics, device_id, geo_id, topology_id, cohort_id):
df = pd.DataFrame(np.random.randn(n_observations,n_metrics), index=pd.date_range('2020', freq='H', periods=n_observations))
df.columns = [f'metrik_{c}' for c in df.columns]
df['geospatial_id'] = geo_id
df['topology_id'] = topology_id
df['cohort_id'] = cohort_id
df['device_id'] = device_id
return df
def generate_multi_device(n_observations, n_metrics, n_devices, cohort_levels, topo_levels):
results = []
for i in range(1, n_devices +1):
#print(i)
r = random.randrange(1, n_devices)
cohort = random.randrange(1, cohort_levels)
topo = random.randrange(1, topo_levels)
df_single_dvice = generate_df_for_device(n_observations, n_metrics, i, r, topo, cohort)
results.append(df_single_dvice)
#print(r)
return pd.concat(results)
# hourly data, 1 week of data
n_observations = 7 * 24
n_metrics = 1
n_devices = 20
cohort_levels = 3
topo_levels = 5
df = generate_multi_device(n_observations, n_metrics, n_devices, cohort_levels, topo_levels)
df = df.sort_index()
df = df.reset_index().rename(columns={'index':'hour'})
df.head()
m=30
mp_T1 = stumpy.stump(df_single_device['metrik_0'], m)
@geoHeil
Copy link
Author

geoHeil commented Nov 9, 2020

mp_T1 = stumpy.stump(df_single_device['metrik_0'], m)

is the example to apply the UDF to all the values at once (but this is not the desired result).
I certainly could iterate over all the results - as outlined beforehand. But I would want to vectorize this iterating procedure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment