Skip to content

Instantly share code, notes, and snippets.

View jkmackie's full-sized avatar

jkmackie

View GitHub Profile
@jkmackie
jkmackie / gmail_mbox_parser.py
Created July 21, 2023 17:57 — forked from benwattsjones/gmail_mbox_parser.py
Quick python code to parse mbox files, specifically those used by GMail. Extracts sender, date, plain text contents etc., ignores base64 attachments.
#! /usr/bin/env python3
# ~*~ utf-8 ~*~
import mailbox
import bs4
def get_html_text(html):
try:
return bs4.BeautifulSoup(html, 'lxml').body.get_text(' ', strip=True)
except AttributeError: # message contents empty