Skip to content

Instantly share code, notes, and snippets.

@kosztik
Created April 9, 2020 05:44
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Embed
What would you like to do?
AUG
21
Mailpiler - Importing & Deduplicating email from PST Files
Hi there,
I'll be tackling getting mail from PST files into mailpiler, and deduping them before the import.
This howto is written for Ubuntu 16.04 - as my Piler install howto is for the same.
To do this, we will need the libpst libraries(Specifically readpst) to extract PST file on the server. We'll also be using fdupes to check for duplicate files.
Without further ado:
On Piler server:
apt install readpst fdupes
mkdir PilerWorking
cd PilerWorking
mkdir PSTs <-- dump your pst's in there, this will be our working dir.
mkdir ImportDir <-- Where PST's will be extracted to
To export the contents of a PST, you'd need to do the following:
cd ImportDir
readpst -M -b ../PSTs/file.pst
Then we need to check for duplicates using fdupes.
We'll be automating all of that 😏😀
Use this script on your piler machine & modify as appropriate (Paths etc....)
It Runs through your PSTs directory, extracting the contents to the directory we'll be importing from.
If the extraction is successful, it moves the PST to the COMPLETEDDIR directory.
If the PST is corrupted (readpst fails to open it), the pst is moved to the CORRUPTEDDIR directory and mark the current run as "CORRUPTED".
Once all the PST files have been extracted, it checks the DSTDIR for duplicates, removing them .
Then the CORRUPTEDDIR gets checked and if there are files in there the run is terminated. Then The Script checks if any CORRUPTED PSTs were found during the current run, and if not - imports the mails into piler.
Basically - dump your PSTs into the TMPDIR, script removes spaces and other funny chars from filenames then moves them to SRCDIR.
==============================================================
#! /bin/bash
ROOTDIR="/PilerWorking"
SRCDIR="/PilerWorking/PSTs"
DSTDIR="/PilerWorking/ImportDir"
CORRUPTEDDIR="/PilerWorking/CorruptedPSTs"
COMPLETEDDIR="/PilerWorking/Completed"
TMPDIR="/PilerWorking/DUMPDIR"
WORKINGDIR="/PilerWorking/Working"
READPST="/usr/bin/readpst -e -b"
FDUPES="/usr/bin/fdupes -r -d -N"
PILERIMPORT="sudo -u piler /usr/local/bin/pilerimport -e"
CHECKUNPROCESSED=$(find $TMPDIR -type f |wc -l)
CHECKUNEXPORTED=$(find $SRCDIR -type f |wc -l)
CHECKCORRUPTED=$(find $CORRUPTEDDIR -type f |wc -l)
RUNCORRUPTED=0
LOG="/var/log/pst-importlog.log"
touch $LOG
## REMOVE SPACES and other funny chars from filenames before processing.
if [ $CHECKUNPROCESSED -gt 0 ]; then
PSTS=$(find $TMPDIR -type f -name '*.pst' | sed 's/\ /?/g')
for PST in ${PSTS[@]}; do
REMSPACE=$(echo $PST | rev | cut -d'/' -f1 |rev | sed 's/\ /_/g')
REMAT=$(echo $REMSPACE | sed 's/@//g')
REMCOLON=$( echo $REMAT | sed "s/'//g")
NEWNAME=$(echo $REMCOLON | sed 's/\.//g')
mv "$PST" "$SRCDIR/$NEWNAME.pst"
done
fi
## Export all PST Files in $SRCDIR to the directory to be imported.
cd $DSTDIR
if [ $CHECKUNEXPORTED -gt 0 ]; then
for FILE in $SRCDIR/*.pst ; do
FILENAME=$(echo $FILE | rev | cut -d'/' -f1 | rev)
mkdir $FILENAME
cd $FILENAME
echo "Processing $FILE ..." >> $LOG
echo " " >> $LOG
$READPST $FILE >> $LOG
EXPORTSTAT=$?
if [ $EXPORTSTAT -gt 0 ] ; then
mv $FILE $CORRUPTEDDIR
RUNCORRUPTED=1
else
FILENAME=$(echo $FILE | rev | cut -d'/' -f1 | rev)
mv $FILE $COMPLETEDDIR/$FILENAME
fi
cd $DSTDIR
done
fi
## Check if CORRUPTEDDIR IS EMPTY
if [ $CHECKCORRUPTED -gt 0 ]; then
RUNCORRUPTED=1
fi
## CHECK FOR DUPLICATES
$FDUPES $DSTDIR >> $LOG
#IMPORT EMAIL
mkdir $WORKINGDIR
chown -R piler.piler $ROOTDIR
cd $WORKINGDIR
if [ $RUNCORRUPTED -gt 0 ] ; then
exit 0
else
FILES=$(find $DSTDIR -type f -name '*.eml' | sed 's/\ /?/g')
for MAIL in ${FILES[@]}; do
echo "Processing $MAIL..." >> $LOG
$PILERIMPORT "$MAIL" >> $LOG
done
fi
rm -R $WORKINGDIR
exit 0
==============================================================
Now all you have to do once this has run through is check the CORRUPTEDDIR for any PST files that need repairing, and repair them & dump them in the PSTs directory again. Rerun & voila!
OPTIONAL:
You can use this script to search & copy all psts from a different linux server to piler. Modify as needed.
The script also adds the modify date and time to the filename to avoid overwriting files with same names. (they're all copied to the same directory so names need to be unique.) Remember to create you TMPDIR & to modify paths as needed.
===========================================================
#! /bin/bash
## Tool to copy PST files to relevant mailpiler for processing.
SRC=$1
PILER=$2
TMPDIR="/mnt/RAID/PSTCopyTMP"
DEST="root@$PILER:/PilerWorking/DUMPDIR"
if [ "$SRC" = "" ] || [ "$PILER" = "" ] ; then
echo "Syntax: copyPSTs.bash [source_dir] [ip of piler]"
exit 1
fi
FILES=$(find $SRC -type f -name '*.pst' | sed 's/\ /?/g')
cd $SRC
for THEFILE in ${FILES[@]} ; do
MODDATE=$(stat "$THEFILE" | grep "Modify:" | awk '{print $2}')
MODTIME=$(stat "$THEFILE" | grep "Modify:" | awk '{print $3}' | sed 's/:/-/g')
FILENAME=$(echo "$THEFILE" | rev | cut -d'/' -f1 | rev)
rsync -vratu "$THEFILE" $TMPDIR
mv "$TMPDIR/$FILENAME" "$TMPDIR/$FILENAME-$MODDATE-$MODTIME.pst"
done
rsync -vhratu --progress $TMPDIR/* $DEST
rm $TMPDIR/*
exit 0
===========================================================
A few suggestions....
I deliberately wrote the import script to not do the actual import if any corrupted PSTs are encountered during a run or if any PST files are still in the CORRUPTEDDIR folder. The reason for this is so that you can process all your PSTs in one go for effective deduplication. I would therefore reccomend doing all your PSTs in one go if at all possible.
So the procedure would be to fix any corrupted psts in that folder and move them to the SRCDIR once fixed - and rerun the script again. (make sure you clear the CORRUPTEDDIR of files once you've place the fixed verions in the SRCDIR.)
Hope this helps!
2019-08-22 - Update to scripts - few bugfixes
2019-08-23 - Update to scripts - few bugfixes
Posted 21st August 2019 by Jurie Botha
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment