Skip to content

Instantly share code, notes, and snippets.

@kylemcdonald
Created November 14, 2014 00:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save kylemcdonald/ffada056acc32162c323 to your computer and use it in GitHub Desktop.
Save kylemcdonald/ffada056acc32162c323 to your computer and use it in GitHub Desktop.
This script will process the json dump from Google Takeout to extract all messages sent between two people. Before extracting the conversation you need to know the conversation_id.
# example:
# python hangout-filter.py Hangouts.json Ugw3axd-nDFKCvFsDi94AaABAQ | sort | tr '\n' ' ' > words.txt
# !/usr/bin/env python
import json
import argparse
import dateutil.parser
import re
parser = argparse.ArgumentParser(
description='Print conversations in readable format.')
parser.add_argument('filename')
parser.add_argument('conversation_id')
args = parser.parse_args()
data = json.load(open(args.filename))
states = data['conversation_state']
for state in states:
conversation_state = state['conversation_state']
if 'event' in conversation_state:
conversations = conversation_state['event']
for conversation in conversations:
if 'chat_message' in conversation:
message_content = conversation['chat_message']['message_content']
if 'segment' in message_content:
segment = message_content['segment']
for line in segment:
conversation_id = conversation['conversation_id']['id']
if conversation_id == args.conversation_id:
timestamp = conversation['timestamp']
user_id = conversation['sender_id']['gaia_id']
# out = user_id + ' @ ' + timestamp + ': ' + line['text']
# print(out.encode('ascii', 'ignore').replace('\n', ''))
out = line['text']
out = out.encode('ascii', 'ignore')
out = re.sub('\s+', '\n', out)
print(out)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment