This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AUG | |
21 | |
Mailpiler - Importing & Deduplicating email from PST Files | |
Hi there, | |
I'll be tackling getting mail from PST files into mailpiler, and deduping them before the import. | |
This howto is written for Ubuntu 16.04 - as my Piler install howto is for the same. | |
To do this, we will need the libpst libraries(Specifically readpst) to extract PST file on the server. We'll also be using fdupes to check for duplicate files. | |
Without further ado: | |
On Piler server: | |
apt install readpst fdupes | |
mkdir PilerWorking | |
cd PilerWorking | |
mkdir PSTs <-- dump your pst's in there, this will be our working dir. | |
mkdir ImportDir <-- Where PST's will be extracted to | |
To export the contents of a PST, you'd need to do the following: | |
cd ImportDir | |
readpst -M -b ../PSTs/file.pst | |
Then we need to check for duplicates using fdupes. | |
We'll be automating all of that 😏😀 | |
Use this script on your piler machine & modify as appropriate (Paths etc....) | |
It Runs through your PSTs directory, extracting the contents to the directory we'll be importing from. | |
If the extraction is successful, it moves the PST to the COMPLETEDDIR directory. | |
If the PST is corrupted (readpst fails to open it), the pst is moved to the CORRUPTEDDIR directory and mark the current run as "CORRUPTED". | |
Once all the PST files have been extracted, it checks the DSTDIR for duplicates, removing them . | |
Then the CORRUPTEDDIR gets checked and if there are files in there the run is terminated. Then The Script checks if any CORRUPTED PSTs were found during the current run, and if not - imports the mails into piler. | |
Basically - dump your PSTs into the TMPDIR, script removes spaces and other funny chars from filenames then moves them to SRCDIR. | |
============================================================== | |
#! /bin/bash | |
ROOTDIR="/PilerWorking" | |
SRCDIR="/PilerWorking/PSTs" | |
DSTDIR="/PilerWorking/ImportDir" | |
CORRUPTEDDIR="/PilerWorking/CorruptedPSTs" | |
COMPLETEDDIR="/PilerWorking/Completed" | |
TMPDIR="/PilerWorking/DUMPDIR" | |
WORKINGDIR="/PilerWorking/Working" | |
READPST="/usr/bin/readpst -e -b" | |
FDUPES="/usr/bin/fdupes -r -d -N" | |
PILERIMPORT="sudo -u piler /usr/local/bin/pilerimport -e" | |
CHECKUNPROCESSED=$(find $TMPDIR -type f |wc -l) | |
CHECKUNEXPORTED=$(find $SRCDIR -type f |wc -l) | |
CHECKCORRUPTED=$(find $CORRUPTEDDIR -type f |wc -l) | |
RUNCORRUPTED=0 | |
LOG="/var/log/pst-importlog.log" | |
touch $LOG | |
## REMOVE SPACES and other funny chars from filenames before processing. | |
if [ $CHECKUNPROCESSED -gt 0 ]; then | |
PSTS=$(find $TMPDIR -type f -name '*.pst' | sed 's/\ /?/g') | |
for PST in ${PSTS[@]}; do | |
REMSPACE=$(echo $PST | rev | cut -d'/' -f1 |rev | sed 's/\ /_/g') | |
REMAT=$(echo $REMSPACE | sed 's/@//g') | |
REMCOLON=$( echo $REMAT | sed "s/'//g") | |
NEWNAME=$(echo $REMCOLON | sed 's/\.//g') | |
mv "$PST" "$SRCDIR/$NEWNAME.pst" | |
done | |
fi | |
## Export all PST Files in $SRCDIR to the directory to be imported. | |
cd $DSTDIR | |
if [ $CHECKUNEXPORTED -gt 0 ]; then | |
for FILE in $SRCDIR/*.pst ; do | |
FILENAME=$(echo $FILE | rev | cut -d'/' -f1 | rev) | |
mkdir $FILENAME | |
cd $FILENAME | |
echo "Processing $FILE ..." >> $LOG | |
echo " " >> $LOG | |
$READPST $FILE >> $LOG | |
EXPORTSTAT=$? | |
if [ $EXPORTSTAT -gt 0 ] ; then | |
mv $FILE $CORRUPTEDDIR | |
RUNCORRUPTED=1 | |
else | |
FILENAME=$(echo $FILE | rev | cut -d'/' -f1 | rev) | |
mv $FILE $COMPLETEDDIR/$FILENAME | |
fi | |
cd $DSTDIR | |
done | |
fi | |
## Check if CORRUPTEDDIR IS EMPTY | |
if [ $CHECKCORRUPTED -gt 0 ]; then | |
RUNCORRUPTED=1 | |
fi | |
## CHECK FOR DUPLICATES | |
$FDUPES $DSTDIR >> $LOG | |
#IMPORT EMAIL | |
mkdir $WORKINGDIR | |
chown -R piler.piler $ROOTDIR | |
cd $WORKINGDIR | |
if [ $RUNCORRUPTED -gt 0 ] ; then | |
exit 0 | |
else | |
FILES=$(find $DSTDIR -type f -name '*.eml' | sed 's/\ /?/g') | |
for MAIL in ${FILES[@]}; do | |
echo "Processing $MAIL..." >> $LOG | |
$PILERIMPORT "$MAIL" >> $LOG | |
done | |
fi | |
rm -R $WORKINGDIR | |
exit 0 | |
============================================================== | |
Now all you have to do once this has run through is check the CORRUPTEDDIR for any PST files that need repairing, and repair them & dump them in the PSTs directory again. Rerun & voila! | |
OPTIONAL: | |
You can use this script to search & copy all psts from a different linux server to piler. Modify as needed. | |
The script also adds the modify date and time to the filename to avoid overwriting files with same names. (they're all copied to the same directory so names need to be unique.) Remember to create you TMPDIR & to modify paths as needed. | |
=========================================================== | |
#! /bin/bash | |
## Tool to copy PST files to relevant mailpiler for processing. | |
SRC=$1 | |
PILER=$2 | |
TMPDIR="/mnt/RAID/PSTCopyTMP" | |
DEST="root@$PILER:/PilerWorking/DUMPDIR" | |
if [ "$SRC" = "" ] || [ "$PILER" = "" ] ; then | |
echo "Syntax: copyPSTs.bash [source_dir] [ip of piler]" | |
exit 1 | |
fi | |
FILES=$(find $SRC -type f -name '*.pst' | sed 's/\ /?/g') | |
cd $SRC | |
for THEFILE in ${FILES[@]} ; do | |
MODDATE=$(stat "$THEFILE" | grep "Modify:" | awk '{print $2}') | |
MODTIME=$(stat "$THEFILE" | grep "Modify:" | awk '{print $3}' | sed 's/:/-/g') | |
FILENAME=$(echo "$THEFILE" | rev | cut -d'/' -f1 | rev) | |
rsync -vratu "$THEFILE" $TMPDIR | |
mv "$TMPDIR/$FILENAME" "$TMPDIR/$FILENAME-$MODDATE-$MODTIME.pst" | |
done | |
rsync -vhratu --progress $TMPDIR/* $DEST | |
rm $TMPDIR/* | |
exit 0 | |
=========================================================== | |
A few suggestions.... | |
I deliberately wrote the import script to not do the actual import if any corrupted PSTs are encountered during a run or if any PST files are still in the CORRUPTEDDIR folder. The reason for this is so that you can process all your PSTs in one go for effective deduplication. I would therefore reccomend doing all your PSTs in one go if at all possible. | |
So the procedure would be to fix any corrupted psts in that folder and move them to the SRCDIR once fixed - and rerun the script again. (make sure you clear the CORRUPTEDDIR of files once you've place the fixed verions in the SRCDIR.) | |
Hope this helps! | |
2019-08-22 - Update to scripts - few bugfixes | |
2019-08-23 - Update to scripts - few bugfixes | |
Posted 21st August 2019 by Jurie Botha |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment