public
Last active

Convert Mailman archive to text and mbox formatted archives.

  • Download Gist
mailmanToMBox.py
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
#!/usr/bin/env python
"""
mailmanToMBox.py: Inserts line feeds to create mbox format from Mailman Gzip'd
Text archives
Usage: ./to-mbox.py dir
Where dir is a directory containing .txt.gz files pulled from mailman Gzip'd Text
"""
import sys
import os
def makeMBox(fIn,fOut):
'''
from http://lists2.ssc.com/pipermail/linux-list/2006-February/026220.html
'''
if not os.path.exists(fIn):
return False
if os.path.exists(fOut):
return False
 
out = open(fOut,"w")
 
lineNum = 0
 
for line in open(fIn):
if line.find("From ") == 0:
if lineNum != 0:
out.write("\n")
lineNum +=1
line = line.replace(" at ", "@")
out.write(line)
out.close()
return True
 
if __name__ == '__main__':
if len(sys.argv) !=2:
print __doc__
sys.exit()
 
rootDir = sys.argv[1]
numConv = 0
for root, dirs, files in os.walk(rootDir):
for fil in files:
if(fil.find('.txt.gz') > -1):
inFile = os.path.join(rootDir,fil)
outFile = inFile.replace('.txt.gz','.mbox')
if not makeMBox(inFile,outFile):
print(outFile,' already exists, did not overwrite')
else:
numConv +=1
print('Converted ' ,str(numConv),'archives to mbox format')

Hey @olinslac we can perhaps salvage some of this for indexing email archives.

hey cory, looks like you have an error in this - or at least it didn't work for me like this.The files should be gunzipped before being processed. Right now they're just being processed as is and I get what looks like binary mbox files which totally don't work! :-).....

line 9: import gzip

line 23: for line in gzip.open(fIn):

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.