Skip to content

Instantly share code, notes, and snippets.

@HariSan1
Created December 1, 2019 20:29
Show Gist options
  • Save HariSan1/6a51e383ddfbed3d449d542b3990de89 to your computer and use it in GitHub Desktop.
Save HariSan1/6a51e383ddfbed3d449d542b3990de89 to your computer and use it in GitHub Desktop.
# prep data for running ludwig time series, from ludwig examples
import pandas as pd
from ludwig.utils.data_utils import add_sequence_feature_column
df = pd.read_csv(
'/content/weather_forecast/temperature.csv',
usecols=['Los Angeles']
).rename(
columns={"Los Angeles": "temperature"}
).fillna(method='backfill').fillna(method='ffill')
print(df.head)
# normalize
df.temperature = ((df.temperature-df.temperature.mean()) /
df.temperature.std())
train_size = int(0.6 * len(df))
vali_size = int(0.2 * len(df))
# train, validation, test split
df['split'] = 0
df.loc[
(
(df.index.values >= train_size) &
(df.index.values < train_size + vali_size)
),
('split')
] = 1
df.loc[
df.index.values >= train_size + vali_size,
('split')
] = 2
# prepare timeseries input feature colum
# (here we are using 20 preceeding values to predict the target)
add_sequence_feature_column(df, 'temperature', 20)
df.to_csv('/content/weather_forecast/temperature_la.csv')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment