Skip to content

Instantly share code, notes, and snippets.

@NeighborGeek
Last active April 30, 2024 11:00
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save NeighborGeek/84d578de2bc5538bd8d1 to your computer and use it in GitHub Desktop.
Save NeighborGeek/84d578de2bc5538bd8d1 to your computer and use it in GitHub Desktop.
Parse Google Voice data exported using Google Takeout to create useful logs of text message and phone calls
<#
.Synopsis
Parses html files produced by Google Takeout to present Google Voice call and text history in a useful way.
.DESCRIPTION
When exporting Google Voice data using the Google Takeout service, the data is delivered in the form
of many individual .html files, one for each call (placed, received, or missed), each text message
conversation, and each voicemail or recorded call. For heavy users of Google Voice, this could mean many
thousands of individual files, all located in a single directory. This script parses all of the html
files to collect details of each call or message and outputs them as an object which can then be
manipulated further within powershell or exported to a file.
This script requires the "HTML Agility Pack", available via NuGet. The script was written and tested
with HTML Agility Pack version 1.4.9 For more information, go to http://htmlagilitypack.codeplex.com/
In order to run this script, you must have at least powershell version 3.0. If you're running Windows 7,
You can update powershell by installing the latest "Windows Management Framework" from microsoft, currently
WMF 4.
About This Script
-----------------
Author: Steve Whitcher
Web Site: http://www.neighborgeek.net
Version: 1.1
Date: 11/16/2015
.EXAMPLE
import-gvhistory -path c:\temp\takeout\voice\calls -agilitypath C:\packages\HtmlAgilityPack.1.4.9\lib\net45
This command parses files in c:\temp\takeout\voice\calls using the HtmlAgilityPack.dll file located
in 'C:\packages\HtmlAgilityPack.1.4.9\lib\net45'. Run this way, all of the text message and call history would be
output to the screen only.
.EXAMPLE
import-gvhistory -path c:\temp\takeout\voice\calls -agilitypath C:\packages\HtmlAgilityPack.1.4.9\lib\net45\ |
where-object {$_.Type -eq "Text"} | export-csv c:\temp\TextMessages.csv
This command uses the same parameters as Example 1, but then passes the information on be filtered
by Where-Object to only include records of Text messages, and not calls. After filtering, the information
is saved to c:\temp\TextMessages.csv by passing the output of Where-Object to Export-CSV.
.EXAMPLE
import-gvhistory -path c:\temp\takeout\voice\calls | export-csv c:\temp\GVHistory.csv
This command does not include the -agilitypath parameter, so the script will attempt to find
and use HTMLAgilityPack.dll in the current working directory. The command will process all call and text
message information and save it to c:\temp\GVHistory.csv
#>
function import-gvhistory
{
[CmdletBinding()]
[Alias()]
[OutputType("Selected.System.Management.Automation.PSCustomObject")]
#Requires -Version 3.0
Param
(
# Path to the "Calls" directory containing Google Voice data exported from Google Takeout.
[Parameter(Mandatory=$true,
ValueFromPipelineByPropertyName=$true,
Position=0)]
$Path,
# Path to "HtmlAgilityPack.dll" if not located in the working directory.
$AgilityPath = "."
)
Begin
{
$option = [System.StringSplitOptions]::None
$separator = "-"
$Records = (get-childitem $Path) | Where-object {$_.Name -like "*.html"}
$Calls = @()
$Texts = @()
$GVHistory = @()
add-type -assemblyname system.web
add-type -path "$($AgilityPath)\HtmlAgilityPack.dll"
}
Process
{
ForEach ($Record in $Records)
{
Write-Verbose "Record $Record.Name" # File name being processed
# Split File Name into Contact Name, Call Type, and Timestamp
$RecordName = (($Record.Name).trimend(".html")).split($separator,3,$option)
Write-Verbose "RecordName $RecordName"
$Contact = $RecordName[0].trim()
$Type = $RecordName[1].trim()
$FileTime = $RecordName[2]
Write-Verbose "Name $Contact"
Write-Verbose "Type $Type"
Write-Verbose "TimeStamp $FileTime"
Write-Verbose ""
$doc = New-Object HtmlAgilityPack.HtmlDocument
$source = $doc.Load($Record.fullname)
if ($Type -ne "Text")
{
# Record is of a phone call that was placed, received, or missed, or of a voicemail message.
# v1.1 Changed to get time from time attribute instead of innertext due to a change in how google formats
# the text format of the time.
# $GMTTime = $doc.documentnode.selectnodes("//abbr [@class='published']").InnerText.Trim()
# $CallTime = get-date $GMTTime
$AttribTime = $doc.documentnode.selectnodes("//abbr [@class='published']").getattributevalue('title','')
$CallTime = get-date $attribtime
$Tel = $doc.documentnode.selectnodes(".//a [@class='tel']")
$ContactName = $tel.selectsinglenode(".//span[1]").InnerText.Trim()
$ContactNum = $tel.GetAttributeValue("href", "Number").TrimStart("tel:+")
If ($Type -ne "Missed" -and $Type -ne "Recorded")
{
# Missed Calls don't have a duration listed. Some recorded calls might also be zero length.
# Get duration for all other call types.
$Duration = $doc.documentnode.selectnodes(".//abbr[@class='duration']").InnerText.Trim("(",")")
}
Else
{
$Duration = ""
}
If ($Type -eq "Voicemail")
{
# Get the Automated Transcription of voicemail messages as well as the name of the mp3 audio file.
$FullText = $doc.documentnode.selectnodes("//span [@class='full-text']").InnerText
$Fulltext = [System.Web.HttpUtility]::HtmlDecode($FullText)
$Audio = $doc.documentnode.selectsinglenode("//audio")
If ($Audio)
{
# If there was no audio recorded, the mp3 file won't exist.
$AudioFilePath = $Audio.GetAttributeValue("src", "")
}
}
Else
{
# Calls of type other than "Voicemail" won't have audio or transcription, so blank the associated variables.
$FullText = ""
$Audio = ""
$AudioFilePath = ""
}
# Add the details of this call record to $Calls
$Calls += [PSCustomObject]@{
Contact = $ContactName
Time = $CallTime
Type = $Type
Number = $ContactNum
Duration = $Duration
Message = $FullText
AudioFile = $AudioFilePath
Direction = ""
}
}
else
{
# Record is of an SMS Conversation containing one or more messages
$Messages = $doc.documentnode.selectnodes("//div[@class='message']")
# Each HTML file represents a single SMS "Conversation". A conversation could include many messages.
# Process each individual message.
ForEach ($Msg in $messages) {
# v1.1 Changed to get time from time attribute instead of innertext due to a change in how google formats
# the text format of the time.
# $GMTTime = $msg.selectsinglenode(".//abbr[@class='dt']").InnerText.Trim()
# $MsgTime = get-date $GMTTime
$AttribTime = $msg.selectsinglenode(".//abbr[@class='dt']").getattributevalue('title','')
$MsgTime = get-date $AttribTime
$Tel = $msg.selectsinglenode(".//a [@class='tel']")
$SenderName = $tel.InnerText.Trim()
$SenderNum = $tel.GetAttributeValue("href", "Number").TrimStart("tel:+")
$Body = $msg.selectsinglenode(".//q").InnerText.Trim()
$Body = [System.Web.HttpUtility]::HtmlDecode($Body)
if ($SenderName -eq "Me")
{
$Direction = "Received"
}
else
{
$Direction = "Sent"
}
# Add the details of this message to $Texts
$Texts += [PSCustomObject]@{
Contact = $Contact
Time = $MsgTime
Type = $Type
Direction = $Direction
Message = $Body
}
}
}
}
}
End
{
# Combine all $Calls and $Texts, sort based on the timestamp.
$GVHistory = $Calls + $Texts
$GVHistory | Sort Time
}
}
@bouchacha
Copy link

bouchacha commented Aug 28, 2020

@NeighborGeek, I admit I have no idea what dot sourcing is and am a complete PowerShell noob, so thank you for your patience. I did in fact manage to get this to work! One of the issues I ran into is that I didn't know you had to enter the code you included above in separate lines.

That said, I get a TON of these two errors:

You cannot call a method on a null-valued expression.
At C:\temp\gvimport\import-gvhistory.ps1:105 char:17
+ ...             $AttribTime = $doc.documentnode.selectnodes("//abbr [@cla ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull
You cannot call a method on a null-valued expression.
At C:\temp\gvimport\import-gvhistory.ps1:116 char:21
+ ...             $Duration = $doc.documentnode.selectnodes(".//abbr[@class ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

I still get a very full GVHistory.csv, but it's not clear whether any records are missing.

Also, is there an elegant way to include MMS like photos and videos? I realize that's asking a lot. It could work if this was exported to HTML instead of CSV.

Thank you for your work on this.

@ParselTon
Copy link

@NeighborGeek I am getting the same errors as @bouchacha, except in my case the CSV file is blank.

This is what I am running on an admin prompt for PowerShell:

. 'C:\Users\user.name\Documents\GV History\Scripts\import-gvhistory.ps1'
$agilitypath = 'C:\Users\user.name\Documents\GV History\Scripts\htmlagilitypack.1.11.24\lib\Net45'
$path = 'C:\Users\user.name\Documents\GV History\Takeout\Voice\Calls'
import-gvhistory -Path $path -AgilityPath $agilitypath |
where-object {$_.Type -eq "Text"} | export-csv 'C:\Users\user.name\Documents\GV History\TextMessages.csv'

Can you let me know what I am doing wrong?

@ITechGeek81
Copy link

I've been working w/ this and it looks like the null valued errors are group conversations.

@Av1dLearner
Copy link

I tried getting this to run, in all cases I just get a blank GVHistory.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment