Skip to content

Instantly share code, notes, and snippets.

@ptschandl
Last active January 18, 2024 19:00
Show Gist options
  • Save ptschandl/ef67bbaa93ec67aba2cab0a7af47700b to your computer and use it in GitHub Desktop.
Save ptschandl/ef67bbaa93ec67aba2cab0a7af47700b to your computer and use it in GitHub Desktop.
Extract all tensorboard events files to pandas dataframe
#!/usr/bin/env python3
'''
This script exctracts training variables from all logs from
tensorflow event files ("event*"), writes them to Pandas
and finally stores in long-format to a CSV-file including
all (readable) runs of the logging directory.
The magic "5" infers there are only the following v.tags:
[lr, loss, acc, val_loss, val_acc]
'''
import tensorflow as tf
import glob
import os
import pandas as pd
# Get all event* runs from logging_dir subdirectories
logging_dir = './logs'
event_paths = glob.glob(os.path.join(logging_dir, "*","event*"))
# Extraction function
def sum_log(path):
runlog = pd.DataFrame(columns=['metric', 'value'])
try:
for e in tf.train.summary_iterator(path):
for v in e.summary.value:
r = {'metric': v.tag, 'value':v.simple_value}
runlog = runlog.append(r, ignore_index=True)
# Dirty catch of DataLossError
except:
print('Event file possibly corrupt: {}'.format(path))
return None
runlog['epoch'] = [item for sublist in [[i]*5 for i in range(0, len(runlog)//5)] for item in sublist]
return runlog
# Call & append
all_log = pd.DataFrame()
for path in event_paths:
log = sum_log(path)
if log is not None:
if all_log.shape[0] == 0:
all_log = log
else:
all_log = all_log.append(log)
# Inspect
print(all_log.shape)
all_log.head()
# Store
all_log.to_csv('all_training_logs_in_one_file.csv', index=None)
@viniciusarruda
Copy link

viniciusarruda commented Apr 5, 2018

Line 38 is giving an error.

@theRealSuperMario
Copy link

I also wrote a small script here.
https://github.com/theRealSuperMario/supermariopy/blob/master/scripts/tflogs2pandas.py

I think a bit faster and run without errors

@Siggi1988
Copy link

´Hallo theRealSuperMario,

how can i use your code tflogs2pandas.py to get the tensorboard data from a trained model.
I have the event data with the name "events.out.tfevents.1566371516.VTD2-PC".
But i do not know how i can set a path to this saved event data with your code.

@theRealSuperMario
Copy link

´Hallo theRealSuperMario,

how can i use your code tflogs2pandas.py to get the tensorboard data from a trained model.
I have the event data with the name "events.out.tfevents.1566371516.VTD2-PC".
But i do not know how i can set a path to this saved event data with your code.

The script was intended to be used on logfolders, not files.
However, I updated the script and it now supports your use case. You can now either run the script on a folder path and it converts all the
logs within that folder to a pandas dataframe (useful when you interrupt and resume training and create multiple log files)
OR you provide the explicit path to the log file and it converts it.

Therefore, you should now be able to run
tflogs2pandas.py xx/yy/events.out.tfevents.1566371516.VTD2-PC --write-csv --no-write-pkl -o converted

or

cd xx/yy tflogs2pandas.py . --write-csv --no-write-pkl -o converted
Feel free to create issues on the repo, so that I can keep track of what is missing.

@Siggi1988
Copy link

Siggi1988 commented Aug 30, 2019 via email

@Ademord
Copy link

Ademord commented Dec 15, 2021

@theRealSuperMario i tried your script but unfortunately it doesnt work on pytorch events, and for my TFlog files it only prints the headers ... :(

@j3soon
Copy link

j3soon commented May 10, 2022

@Ademord You can use tbparse instead, which handles both PyTorch and TensorFlow events.

@theRealSuperMario
Copy link

unfortunately I am not involved in TF and PT related projects anymore, so do not expect any updates on this. Sorry guys.

Feel free to modify the code in every way it becomes useful.

@aroraakshit
Copy link

thank you @theRealSuperMario !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment