Download all "company-departure" emails with got-your-back:
./gyb --email MYGMAILACCOUNT@gmail.com --search 'label:company-departures'
Build this program based on a stackoverflow answer:
#!/usr/bin/env python
import os
import email
from emaildata.text import Text
emls = []
for dirpath, dirnames, filenames in os.walk("."):
for filename in [f for f in filenames if f.endswith(".eml")]:
emls.append(os.path.join(dirpath, filename))
for eml in emls:
message = email.message_from_file(open(eml))
text = Text.text(message)
print("email: %s has body:\n" % eml)
print("-----------------------")
print(text)
print("-----------------------")
Run the program.
./get-email-text.py > departure-email-text.txt
- removing personal info (emails, etc.)
- Get rid of annoying '^M' characters:
sed -i -e "s/^M//" departure-email-text.2nd-try.txt
NOTE: Need to type '^M' in a special manner:
- To enter ^M, type CTRL-V, then CTRL-M. That is, hold down the CTRL key then press V and M in succession.
- This effort did reveal that I'd want to do the next step of having phrases in the word-cloud.
e.g., "thank you", "keep in touch", etc. -- you can join the words with a '~' and most of these Word-Cloud tools will treat them as a single word and remove the '~'.
Can be done with sed, vim, etc.
- Exclude: com email hi https linkedin lot
- Group similar words? Yes
- Show frequencies: yes
# copy tagcrowd edit-box contents with frequencies into /tmp/1
cat /tmp/1 | tr ')' '\n' > /tmp/2
cat /tmp/2 | sed 's/^ //g' | sed 's/(//g' > /tmp/3
cat /tmp/3 | sort -rn -k2 | tr ' ' ';' > word-cloud-seeds.downcase.csv
- Upload G mask image
- Choose colors based on Groupon color pallete (found an image in Skynet that had the hex codes)
- Copy the contents of word-cloud-seeds.downcase.csv into the words list.
- Generate image.