Skip to content

Instantly share code, notes, and snippets.

@NeighborGeek
Last active April 6, 2023 15:36
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save NeighborGeek/84d578de2bc5538bd8d1 to your computer and use it in GitHub Desktop.
Save NeighborGeek/84d578de2bc5538bd8d1 to your computer and use it in GitHub Desktop.
Parse Google Voice data exported using Google Takeout to create useful logs of text message and phone calls
<#
.Synopsis
Parses html files produced by Google Takeout to present Google Voice call and text history in a useful way.
.DESCRIPTION
When exporting Google Voice data using the Google Takeout service, the data is delivered in the form
of many individual .html files, one for each call (placed, received, or missed), each text message
conversation, and each voicemail or recorded call. For heavy users of Google Voice, this could mean many
thousands of individual files, all located in a single directory. This script parses all of the html
files to collect details of each call or message and outputs them as an object which can then be
manipulated further within powershell or exported to a file.
This script requires the "HTML Agility Pack", available via NuGet. The script was written and tested
with HTML Agility Pack version 1.4.9 For more information, go to http://htmlagilitypack.codeplex.com/
In order to run this script, you must have at least powershell version 3.0. If you're running Windows 7,
You can update powershell by installing the latest "Windows Management Framework" from microsoft, currently
WMF 4.
About This Script
-----------------
Author: Steve Whitcher
Web Site: http://www.neighborgeek.net
Version: 1.1
Date: 11/16/2015
.EXAMPLE
import-gvhistory -path c:\temp\takeout\voice\calls -agilitypath C:\packages\HtmlAgilityPack.1.4.9\lib\net45
This command parses files in c:\temp\takeout\voice\calls using the HtmlAgilityPack.dll file located
in 'C:\packages\HtmlAgilityPack.1.4.9\lib\net45'. Run this way, all of the text message and call history would be
output to the screen only.
.EXAMPLE
import-gvhistory -path c:\temp\takeout\voice\calls -agilitypath C:\packages\HtmlAgilityPack.1.4.9\lib\net45\ |
where-object {$_.Type -eq "Text"} | export-csv c:\temp\TextMessages.csv
This command uses the same parameters as Example 1, but then passes the information on be filtered
by Where-Object to only include records of Text messages, and not calls. After filtering, the information
is saved to c:\temp\TextMessages.csv by passing the output of Where-Object to Export-CSV.
.EXAMPLE
import-gvhistory -path c:\temp\takeout\voice\calls | export-csv c:\temp\GVHistory.csv
This command does not include the -agilitypath parameter, so the script will attempt to find
and use HTMLAgilityPack.dll in the current working directory. The command will process all call and text
message information and save it to c:\temp\GVHistory.csv
#>
function import-gvhistory
{
[CmdletBinding()]
[Alias()]
[OutputType("Selected.System.Management.Automation.PSCustomObject")]
#Requires -Version 3.0
Param
(
# Path to the "Calls" directory containing Google Voice data exported from Google Takeout.
[Parameter(Mandatory=$true,
ValueFromPipelineByPropertyName=$true,
Position=0)]
$Path,
# Path to "HtmlAgilityPack.dll" if not located in the working directory.
$AgilityPath = "."
)
Begin
{
$option = [System.StringSplitOptions]::None
$separator = "-"
$Records = (get-childitem $Path) | Where-object {$_.Name -like "*.html"}
$Calls = @()
$Texts = @()
$GVHistory = @()
add-type -assemblyname system.web
add-type -path "$($AgilityPath)\HtmlAgilityPack.dll"
}
Process
{
ForEach ($Record in $Records)
{
Write-Verbose "Record $Record.Name" # File name being processed
# Split File Name into Contact Name, Call Type, and Timestamp
$RecordName = (($Record.Name).trimend(".html")).split($separator,3,$option)
Write-Verbose "RecordName $RecordName"
$Contact = $RecordName[0].trim()
$Type = $RecordName[1].trim()
$FileTime = $RecordName[2]
Write-Verbose "Name $Contact"
Write-Verbose "Type $Type"
Write-Verbose "TimeStamp $FileTime"
Write-Verbose ""
$doc = New-Object HtmlAgilityPack.HtmlDocument
$source = $doc.Load($Record.fullname)
if ($Type -ne "Text")
{
# Record is of a phone call that was placed, received, or missed, or of a voicemail message.
# v1.1 Changed to get time from time attribute instead of innertext due to a change in how google formats
# the text format of the time.
# $GMTTime = $doc.documentnode.selectnodes("//abbr [@class='published']").InnerText.Trim()
# $CallTime = get-date $GMTTime
$AttribTime = $doc.documentnode.selectnodes("//abbr [@class='published']").getattributevalue('title','')
$CallTime = get-date $attribtime
$Tel = $doc.documentnode.selectnodes(".//a [@class='tel']")
$ContactName = $tel.selectsinglenode(".//span[1]").InnerText.Trim()
$ContactNum = $tel.GetAttributeValue("href", "Number").TrimStart("tel:+")
If ($Type -ne "Missed" -and $Type -ne "Recorded")
{
# Missed Calls don't have a duration listed. Some recorded calls might also be zero length.
# Get duration for all other call types.
$Duration = $doc.documentnode.selectnodes(".//abbr[@class='duration']").InnerText.Trim("(",")")
}
Else
{
$Duration = ""
}
If ($Type -eq "Voicemail")
{
# Get the Automated Transcription of voicemail messages as well as the name of the mp3 audio file.
$FullText = $doc.documentnode.selectnodes("//span [@class='full-text']").InnerText
$Fulltext = [System.Web.HttpUtility]::HtmlDecode($FullText)
$Audio = $doc.documentnode.selectsinglenode("//audio")
If ($Audio)
{
# If there was no audio recorded, the mp3 file won't exist.
$AudioFilePath = $Audio.GetAttributeValue("src", "")
}
}
Else
{
# Calls of type other than "Voicemail" won't have audio or transcription, so blank the associated variables.
$FullText = ""
$Audio = ""
$AudioFilePath = ""
}
# Add the details of this call record to $Calls
$Calls += [PSCustomObject]@{
Contact = $ContactName
Time = $CallTime
Type = $Type
Number = $ContactNum
Duration = $Duration
Message = $FullText
AudioFile = $AudioFilePath
Direction = ""
}
}
else
{
# Record is of an SMS Conversation containing one or more messages
$Messages = $doc.documentnode.selectnodes("//div[@class='message']")
# Each HTML file represents a single SMS "Conversation". A conversation could include many messages.
# Process each individual message.
ForEach ($Msg in $messages) {
# v1.1 Changed to get time from time attribute instead of innertext due to a change in how google formats
# the text format of the time.
# $GMTTime = $msg.selectsinglenode(".//abbr[@class='dt']").InnerText.Trim()
# $MsgTime = get-date $GMTTime
$AttribTime = $msg.selectsinglenode(".//abbr[@class='dt']").getattributevalue('title','')
$MsgTime = get-date $AttribTime
$Tel = $msg.selectsinglenode(".//a [@class='tel']")
$SenderName = $tel.InnerText.Trim()
$SenderNum = $tel.GetAttributeValue("href", "Number").TrimStart("tel:+")
$Body = $msg.selectsinglenode(".//q").InnerText.Trim()
$Body = [System.Web.HttpUtility]::HtmlDecode($Body)
if ($SenderName -eq "Me")
{
$Direction = "Received"
}
else
{
$Direction = "Sent"
}
# Add the details of this message to $Texts
$Texts += [PSCustomObject]@{
Contact = $Contact
Time = $MsgTime
Type = $Type
Direction = $Direction
Message = $Body
}
}
}
}
}
End
{
# Combine all $Calls and $Texts, sort based on the timestamp.
$GVHistory = $Calls + $Texts
$GVHistory | Sort Time
}
}
@pursehouse
Copy link

hey does this still work for you? I tried it and it just runs with no output at all. I am using the same library HtmlAgilityPack.1.4.9 version you mention, and setting it to my calls path. any ideas? thanks!

@bouchacha
Copy link

Every CSV this script creates comes out blank. I've tried different -path parameters (basically tried every nested folder within the Takeout archive) and also tried different HTML Agility Pack versions to no avail. CSV gets generated, so we know the script "ran", but all are blank despite a full /Calls/ folder. Any ideas on how to troubleshoot?

@NeighborGeek
Copy link
Author

NeighborGeek commented Aug 19, 2020 via email

@NeighborGeek
Copy link
Author

@pursehouse - It looks like I missed your post back in December, sorry about that. It sounds like you may be having the same issue as @bouchacha, so if you're still needing this feel free to join in and we'll try to figure it out together.

I downloaded my google voice data and tested the script against a a portion of the data, and my output file contains the expected data. I used the current version of the HTML Agility pack (1.11.24), so I don't think that's your issue.

What is the command line that you're using?

If you run the import-gvhistory script without piping the output to CSV, what do you get for output?

@bouchacha
Copy link

@NeighborGeek
The command line I used was:
C:\PARSER\import-gvhistory -path C:\PARSER\Takeout\Voice\Calls\ | export-csv C:\PARSER\GVHistory.csv
PowerShell is running as admin, the Takeout folder is definitely not empty, and the 1.11.24 version of HtmlAgilityPack.dll is in the same folder. When I run that command, 'GVHistory.csv' gets created, but it's blank. If I run only the first half of the command (without the CSV output), literally nothing happens. I press enter in PowerShell, and I all I see is blank command prompt. There's no message or acknowledgement.

Would you be able to share some of your Takeout folder so that I can test the issue? Obviously feel free to anonymize it.

@NeighborGeek
Copy link
Author

NeighborGeek commented Aug 27, 2020

@bouchacha -
One thought which could fit what you're describing, but based on your command line I'm not sure - Since this is written as a function within the ps1 file, you have to dot source the script to load the function into your powershell session before you can run the function. If you just run import-gvhistory.ps1 it would return nothing. To do this, type a dot and space followed by the full path to the script, like so:
. c:\parser\import-gvhistory.ps1

In my case, I put the script in c:\temp\gvimport\ and copied the CALLS directory there as well. I don't have the dll in the same path, so here's what I'm running:

. c:\temp\gvimport\import-gvhistory.ps1
$agilitypath = 'c:\Program Files\PackageManagement\NuGet\Packages\HtmlAgilityPack.1.11.24\lib\Net45'
$path = 'c:\temp\gvimport\calls'
import-gvhistory -Path $path -AgilityPath $agilitypath

@bouchacha
Copy link

bouchacha commented Aug 28, 2020

@NeighborGeek, I admit I have no idea what dot sourcing is and am a complete PowerShell noob, so thank you for your patience. I did in fact manage to get this to work! One of the issues I ran into is that I didn't know you had to enter the code you included above in separate lines.

That said, I get a TON of these two errors:

You cannot call a method on a null-valued expression.
At C:\temp\gvimport\import-gvhistory.ps1:105 char:17
+ ...             $AttribTime = $doc.documentnode.selectnodes("//abbr [@cla ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull
You cannot call a method on a null-valued expression.
At C:\temp\gvimport\import-gvhistory.ps1:116 char:21
+ ...             $Duration = $doc.documentnode.selectnodes(".//abbr[@class ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

I still get a very full GVHistory.csv, but it's not clear whether any records are missing.

Also, is there an elegant way to include MMS like photos and videos? I realize that's asking a lot. It could work if this was exported to HTML instead of CSV.

Thank you for your work on this.

@ParselTon
Copy link

@NeighborGeek I am getting the same errors as @bouchacha, except in my case the CSV file is blank.

This is what I am running on an admin prompt for PowerShell:

. 'C:\Users\user.name\Documents\GV History\Scripts\import-gvhistory.ps1'
$agilitypath = 'C:\Users\user.name\Documents\GV History\Scripts\htmlagilitypack.1.11.24\lib\Net45'
$path = 'C:\Users\user.name\Documents\GV History\Takeout\Voice\Calls'
import-gvhistory -Path $path -AgilityPath $agilitypath |
where-object {$_.Type -eq "Text"} | export-csv 'C:\Users\user.name\Documents\GV History\TextMessages.csv'

Can you let me know what I am doing wrong?

@ITechGeek81
Copy link

I've been working w/ this and it looks like the null valued errors are group conversations.

@Av1dLearner
Copy link

I tried getting this to run, in all cases I just get a blank GVHistory.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment