Skip to content

Instantly share code, notes, and snippets.

View benwattsjones's full-sized avatar

Ben Watts-Jones benwattsjones

View GitHub Profile
@benwattsjones
benwattsjones / gmail_mbox_parser.py
Last active April 29, 2024 16:39
Quick python code to parse mbox files, specifically those used by GMail. Extracts sender, date, plain text contents etc., ignores base64 attachments.
#! /usr/bin/env python3
# ~*~ utf-8 ~*~
import mailbox
import bs4
def get_html_text(html):
try:
return bs4.BeautifulSoup(html, 'lxml').body.get_text(' ', strip=True)
except AttributeError: # message contents empty