-
-
Save ptschandl/ef67bbaa93ec67aba2cab0a7af47700b to your computer and use it in GitHub Desktop.
#!/usr/bin/env python3 | |
''' | |
This script exctracts training variables from all logs from | |
tensorflow event files ("event*"), writes them to Pandas | |
and finally stores in long-format to a CSV-file including | |
all (readable) runs of the logging directory. | |
The magic "5" infers there are only the following v.tags: | |
[lr, loss, acc, val_loss, val_acc] | |
''' | |
import tensorflow as tf | |
import glob | |
import os | |
import pandas as pd | |
# Get all event* runs from logging_dir subdirectories | |
logging_dir = './logs' | |
event_paths = glob.glob(os.path.join(logging_dir, "*","event*")) | |
# Extraction function | |
def sum_log(path): | |
runlog = pd.DataFrame(columns=['metric', 'value']) | |
try: | |
for e in tf.train.summary_iterator(path): | |
for v in e.summary.value: | |
r = {'metric': v.tag, 'value':v.simple_value} | |
runlog = runlog.append(r, ignore_index=True) | |
# Dirty catch of DataLossError | |
except: | |
print('Event file possibly corrupt: {}'.format(path)) | |
return None | |
runlog['epoch'] = [item for sublist in [[i]*5 for i in range(0, len(runlog)//5)] for item in sublist] | |
return runlog | |
# Call & append | |
all_log = pd.DataFrame() | |
for path in event_paths: | |
log = sum_log(path) | |
if log is not None: | |
if all_log.shape[0] == 0: | |
all_log = log | |
else: | |
all_log = all_log.append(log) | |
# Inspect | |
print(all_log.shape) | |
all_log.head() | |
# Store | |
all_log.to_csv('all_training_logs_in_one_file.csv', index=None) |
´Hallo theRealSuperMario,
how can i use your code tflogs2pandas.py to get the tensorboard data from a trained model.
I have the event data with the name "events.out.tfevents.1566371516.VTD2-PC".
But i do not know how i can set a path to this saved event data with your code.
The script was intended to be used on logfolders, not files.
However, I updated the script and it now supports your use case. You can now either run the script on a folder path and it converts all the
logs within that folder to a pandas dataframe (useful when you interrupt and resume training and create multiple log files)
OR you provide the explicit path to the log file and it converts it.
Therefore, you should now be able to run
tflogs2pandas.py xx/yy/events.out.tfevents.1566371516.VTD2-PC --write-csv --no-write-pkl -o converted
or
cd xx/yy tflogs2pandas.py . --write-csv --no-write-pkl -o converted
Feel free to create issues on the repo, so that I can keep track of what is missing.
@theRealSuperMario i tried your script but unfortunately it doesnt work on pytorch events, and for my TFlog files it only prints the headers ... :(
unfortunately I am not involved in TF and PT related projects anymore, so do not expect any updates on this. Sorry guys.
Feel free to modify the code in every way it becomes useful.
thank you @theRealSuperMario !
´Hallo theRealSuperMario,
how can i use your code tflogs2pandas.py to get the tensorboard data from a trained model.
I have the event data with the name "events.out.tfevents.1566371516.VTD2-PC".
But i do not know how i can set a path to this saved event data with your code.