Skip to content

Instantly share code, notes, and snippets.

@dustinknopoff
Last active December 12, 2024 19:57
Show Gist options
  • Save dustinknopoff/e16040fd76df3b546a5fa7938445a08d to your computer and use it in GitHub Desktop.
Save dustinknopoff/e16040fd76df3b546a5fa7938445a08d to your computer and use it in GitHub Desktop.
Automatically Save emails to a folder using Mail.app, AppleScript, and Bash.

How to get it to work.

NOTE: This is mac only.

  1. Go to Finder.
  2. Press CMD+SHIFT+G.
  3. Type ~/Library/Application Scripts/com.apple.mail.
  4. Open saveByRule.scpt and change theFolder to where you'd like emails to be saved.
  5. Copy and Paste saveByRule.scpt into ~/Library/Application Scripts/com.apple.mail.
  6. Go to Mail>Preferences>Rules>Add Rule.
  7. Enter some filter for your rule and choose Run Applescript as action to perform. Use saveByRule as option.
  8. Click Ok.

For converting to PDF and HTML.

  1. In the terminal, run sudo pip install mail-parser and brew install Caskroom/cask/wkhtmltopdf.
  2. Open toPDF.sh and alter location to match the directory your saved emails are located. (Keep *.eml)
  3. In the terminal, run bash toPDF.sh.

For Automating conversion to HTML/PDF.

  1. Create new>Folder Action in Automator.
  2. Set folder to watch to folder where emails are saved.
  3. Add run shell script action.
  4. Copy and paste contents of toPDF.sh into textarea of run shell script.
  5. Save.

NOTE: saveByRule.scpt is almost entirely copied from StackOverflow.

using terms from application "Mail"
on perform mail action with messages theMessages for rule theRule
tell application "Mail"
set msgs to selection
if length of msgs is not 0 then
set theFolder to (system attribute "HOME") & "/Downloads/"
repeat with msg in msgs
set msgContent to source of msg
-- determine date received of msg and put into YYYY-MM-DD format
set msgDate to date received of msg
-- parse date SEMversion below using proc pad2()
set {year:y, month:m, day:d, hours:h, minutes:min} to (msgDate)
set msgDate to ("" & y & "-" & my pad2(m as integer) & "-" & my pad2(d))
-- assign subject of msg
set msgSubject to (subject of msg)
-- create filename.eml to be use as title saved
set newFile to (msgDate & "-" & msgSubject & ".eml") as rich text
set newFilePath to theFolder & newFile as rich text
set referenceNumber to open for access newFilePath with write permission
try
write msgContent to referenceNumber
delete msg
on error
close access referenceNumber
end try
close access referenceNumber
end repeat
end if -- msgs > 0
end tell
end perform mail action with messages
end using terms from
on pad2(n)
return text -2 thru -1 of ("00" & n)
end pad2
#!/bin/sh
# if [ -z $(pip list | grep -F "mail-parser" ) ]; then
# pip install mail-parser || sudo pip install mail-parser
# fi
# if [ -z $(brew cask list | grep -F "wkhtmltopdf" ) ]; then
# brew install Caskroom/cask/wkhtmltopdf
# fi
location=~/Downloads
for file in $location/*.eml;
do
echo $file
name=${file##*/}
noeml=${name%.eml}
echo "$noml"
mailparser -f $file -b > $location/$noeml.html
x=$noeml.html
wkhtmltopdf $location/$x $location/$noeml.pdf
done
# Uncomment any these if you'd prefer not to keep a file type.
# rm -f *.eml
# rm -f *.html
# rm -f *.pdf
@eightball298
Copy link

Anyone else experiencing an issue where the rule reverts back to the iCloud account, even when another is selected? I am trying to apply this to my Yahoo Mail account, but every time I go into the rule by choosing "edit", icloud is the selected option. I still can't get the script to actually work, and I'm doubtful this is why, but I probably do need to clear this obstacle first.
Screen Shot 2024-04-28 at 11 37 23 AM

@lcn2
Copy link

lcn2 commented Nov 30, 2024

-- assign subject of msg
set msgSubject to (subject of msg)

-- create filename.eml to be use as title saved
set newFile to (msgDate & "-" & msgSubject & ".eml") as rich text

Is the use of the message subject potentially dangerous, @dustinknopoff ?

What if an Email subject line were to contain some text such as “/../../../../../../usr/local/bin/“, where an Email subject line causes the Email to be written to some place you don’t want / some place dangerous?

Perhaps after setting themsgSubject value, that value should have potently dangerous characters (such as / or * or ? or ; or ` or \ or $ or & or < or > or “ or ‘ or : or # or ( or ) or [ or ] or { or } or | etc.) removed from that variable or perhaps potentially dangerous characters could be changed into something harmless?

Perhaps remove anything from themsgSubject value that it not a lower case letter or UPPER CASE LETTER or digit to space, or perhaps convert anything in themsgSubject value that it not a lower case letter or UPPER CASE LETTER or digit to space into the letter x?

UPDATE 0

Alternatively, setting newFile could be done without the use of the message subject string, as in:

    set newFile to (msgDate & ".eml") as rich text

If you do this, you might want to modify msgDate to include the hour and minute and to include a counter when forming newFIle as suggested in GH-gistcomment-5311296 below.

UPDATE 1a

The use of the Email subject in forming the filename could become especially dangerous when used in conjunction with the toPDF.sh shell script. Consider an Email with a subject that contains some text such as:

    Subject: hello ; rm -rf * ; exit 0 ;

It might depend on what mailparser and wkhtmltopdf do with their arguments, and which version of the shell you are using as to if such an Email subject line could be used to cause problems. Nevertheless, why not try to avoid such problems by stripping out dangerous characters or converting them into something harmless?

UPDATE 2

You might want to add some additional protections to toPDF.sh such as changing:

    echo $file
    name=${file##*/}
    noeml=${name%.eml}
    echo "$noml"
    mailparser -f $file -b > $location/$noeml.html
    x=$noeml.html
    wkhtmltopdf $location/$x $location/$noeml.pdf

Into

    echo "$file"
    name=${file##*/}
    noeml=${name%.eml}
    echo "$noml"
    mailparser -f "$file" -b > "$location/$noeml.html"
    x=$noeml.html
    wkhtmltopdf "$location/$x" "$location/$noeml.pdf"

@lcn2
Copy link

lcn2 commented Nov 30, 2024

What happens if you have two different Email messages with the same message date (YYYY-MM-DD) and the same subject, @dustinknopoff ? Won’t your script cause the second Email file to overwrite the first Email file?

Perhaps you should add a counter into the loop and use that counter when forming the newFile value? That way when going through the loop, the newFile value would be different even if two Email messages had the same message date (YYYY-MM-DD) and the same subject.

@dustinknopoff
Copy link
Author

Hey @lcn2 these are all awesome suggestions and you're totally right about the security implications of this approach. Since writing this, I've actually shifted to a yearly export as an .mbox that I backup instead. In practice searching through my emails more than a year ago has been a rare occurrence.

That being said! If I were to come back to this, I'd probably try and do my best to keep the eml contents in memory and then pass it into a language like Python as soon as possible so that these kind of protections, looping, deduplicating would be much easier :)

@lcn2
Copy link

lcn2 commented Dec 12, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment