Skip to content

Instantly share code, notes, and snippets.

@owenrh
Created November 25, 2021 09:34
Show Gist options
  • Save owenrh/a30072a456b75a5fa018853ef55f58f7 to your computer and use it in GitHub Desktop.
Save owenrh/a30072a456b75a5fa018853ef55f58f7 to your computer and use it in GitHub Desktop.
import re
import glob
for in_file in glob.glob('/home/owen/logs/some-spark-job-log*.log'):
out_file = f'{in_file}.p'
with open(in_file, mode='r') as f:
with open(out_file, mode='w') as out:
for line in f:
if len(line) > 2:
with_stage_removed = re.sub('\[Stage.*?\]','',line, flags=re.DOTALL)
if len(with_stage_removed.strip()) > 0:
out.write(with_stage_removed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment