Skip to content

Instantly share code, notes, and snippets.

@rchurch4
Last active October 3, 2022 16:00
Show Gist options
  • Save rchurch4/2c32ec41f2b0acea383fea8e6043b679 to your computer and use it in GitHub Desktop.
Save rchurch4/2c32ec41f2b0acea383fea8e6043b679 to your computer and use it in GitHub Desktop.
from gdtm.helpers.common import load_dated_dataset, load_split_dataset, split_dataset_by_date, save_split_dataset, month
path_to_data = 'path/to/data/dated_tweets.csv'
dataset = load_dated_dataset(path=path_to_data, date_delimiter='\t', doc_delimiter=',')
# Split the data by month (there are epoch functions for day and week as well)
split_dataset = split_dataset_by_date(dataset, epoch_function=month)
# Save the split data set to make it easier to load in the future
# This is useful if we are running multiple experiments on the same data
num_time_periods = len(split_dataset.keys())
save_split_dataset(path=path_to_data, file_name='split_dataset', dataset=dataset, delimiter=' ')
# Load the split data set
loaded_dataset = load_split_dataset(path=path_to_data, file_name='split_dataset',
num_time_periods=num_time_periods, delimiter=' ')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment