-
-
Save NeighborGeek/84d578de2bc5538bd8d1 to your computer and use it in GitHub Desktop.
<# | |
.Synopsis | |
Parses html files produced by Google Takeout to present Google Voice call and text history in a useful way. | |
.DESCRIPTION | |
When exporting Google Voice data using the Google Takeout service, the data is delivered in the form | |
of many individual .html files, one for each call (placed, received, or missed), each text message | |
conversation, and each voicemail or recorded call. For heavy users of Google Voice, this could mean many | |
thousands of individual files, all located in a single directory. This script parses all of the html | |
files to collect details of each call or message and outputs them as an object which can then be | |
manipulated further within powershell or exported to a file. | |
This script requires the "HTML Agility Pack", available via NuGet. The script was written and tested | |
with HTML Agility Pack version 1.4.9 For more information, go to http://htmlagilitypack.codeplex.com/ | |
In order to run this script, you must have at least powershell version 3.0. If you're running Windows 7, | |
You can update powershell by installing the latest "Windows Management Framework" from microsoft, currently | |
WMF 4. | |
About This Script | |
----------------- | |
Author: Steve Whitcher | |
Web Site: http://www.neighborgeek.net | |
Version: 1.1 | |
Date: 11/16/2015 | |
.EXAMPLE | |
import-gvhistory -path c:\temp\takeout\voice\calls -agilitypath C:\packages\HtmlAgilityPack.1.4.9\lib\net45 | |
This command parses files in c:\temp\takeout\voice\calls using the HtmlAgilityPack.dll file located | |
in 'C:\packages\HtmlAgilityPack.1.4.9\lib\net45'. Run this way, all of the text message and call history would be | |
output to the screen only. | |
.EXAMPLE | |
import-gvhistory -path c:\temp\takeout\voice\calls -agilitypath C:\packages\HtmlAgilityPack.1.4.9\lib\net45\ | | |
where-object {$_.Type -eq "Text"} | export-csv c:\temp\TextMessages.csv | |
This command uses the same parameters as Example 1, but then passes the information on be filtered | |
by Where-Object to only include records of Text messages, and not calls. After filtering, the information | |
is saved to c:\temp\TextMessages.csv by passing the output of Where-Object to Export-CSV. | |
.EXAMPLE | |
import-gvhistory -path c:\temp\takeout\voice\calls | export-csv c:\temp\GVHistory.csv | |
This command does not include the -agilitypath parameter, so the script will attempt to find | |
and use HTMLAgilityPack.dll in the current working directory. The command will process all call and text | |
message information and save it to c:\temp\GVHistory.csv | |
#> | |
function import-gvhistory | |
{ | |
[CmdletBinding()] | |
[Alias()] | |
[OutputType("Selected.System.Management.Automation.PSCustomObject")] | |
#Requires -Version 3.0 | |
Param | |
( | |
# Path to the "Calls" directory containing Google Voice data exported from Google Takeout. | |
[Parameter(Mandatory=$true, | |
ValueFromPipelineByPropertyName=$true, | |
Position=0)] | |
$Path, | |
# Path to "HtmlAgilityPack.dll" if not located in the working directory. | |
$AgilityPath = "." | |
) | |
Begin | |
{ | |
$option = [System.StringSplitOptions]::None | |
$separator = "-" | |
$Records = (get-childitem $Path) | Where-object {$_.Name -like "*.html"} | |
$Calls = @() | |
$Texts = @() | |
$GVHistory = @() | |
add-type -assemblyname system.web | |
add-type -path "$($AgilityPath)\HtmlAgilityPack.dll" | |
} | |
Process | |
{ | |
ForEach ($Record in $Records) | |
{ | |
Write-Verbose "Record $Record.Name" # File name being processed | |
# Split File Name into Contact Name, Call Type, and Timestamp | |
$RecordName = (($Record.Name).trimend(".html")).split($separator,3,$option) | |
Write-Verbose "RecordName $RecordName" | |
$Contact = $RecordName[0].trim() | |
$Type = $RecordName[1].trim() | |
$FileTime = $RecordName[2] | |
Write-Verbose "Name $Contact" | |
Write-Verbose "Type $Type" | |
Write-Verbose "TimeStamp $FileTime" | |
Write-Verbose "" | |
$doc = New-Object HtmlAgilityPack.HtmlDocument | |
$source = $doc.Load($Record.fullname) | |
if ($Type -ne "Text") | |
{ | |
# Record is of a phone call that was placed, received, or missed, or of a voicemail message. | |
# v1.1 Changed to get time from time attribute instead of innertext due to a change in how google formats | |
# the text format of the time. | |
# $GMTTime = $doc.documentnode.selectnodes("//abbr [@class='published']").InnerText.Trim() | |
# $CallTime = get-date $GMTTime | |
$AttribTime = $doc.documentnode.selectnodes("//abbr [@class='published']").getattributevalue('title','') | |
$CallTime = get-date $attribtime | |
$Tel = $doc.documentnode.selectnodes(".//a [@class='tel']") | |
$ContactName = $tel.selectsinglenode(".//span[1]").InnerText.Trim() | |
$ContactNum = $tel.GetAttributeValue("href", "Number").TrimStart("tel:+") | |
If ($Type -ne "Missed" -and $Type -ne "Recorded") | |
{ | |
# Missed Calls don't have a duration listed. Some recorded calls might also be zero length. | |
# Get duration for all other call types. | |
$Duration = $doc.documentnode.selectnodes(".//abbr[@class='duration']").InnerText.Trim("(",")") | |
} | |
Else | |
{ | |
$Duration = "" | |
} | |
If ($Type -eq "Voicemail") | |
{ | |
# Get the Automated Transcription of voicemail messages as well as the name of the mp3 audio file. | |
$FullText = $doc.documentnode.selectnodes("//span [@class='full-text']").InnerText | |
$Fulltext = [System.Web.HttpUtility]::HtmlDecode($FullText) | |
$Audio = $doc.documentnode.selectsinglenode("//audio") | |
If ($Audio) | |
{ | |
# If there was no audio recorded, the mp3 file won't exist. | |
$AudioFilePath = $Audio.GetAttributeValue("src", "") | |
} | |
} | |
Else | |
{ | |
# Calls of type other than "Voicemail" won't have audio or transcription, so blank the associated variables. | |
$FullText = "" | |
$Audio = "" | |
$AudioFilePath = "" | |
} | |
# Add the details of this call record to $Calls | |
$Calls += [PSCustomObject]@{ | |
Contact = $ContactName | |
Time = $CallTime | |
Type = $Type | |
Number = $ContactNum | |
Duration = $Duration | |
Message = $FullText | |
AudioFile = $AudioFilePath | |
Direction = "" | |
} | |
} | |
else | |
{ | |
# Record is of an SMS Conversation containing one or more messages | |
$Messages = $doc.documentnode.selectnodes("//div[@class='message']") | |
# Each HTML file represents a single SMS "Conversation". A conversation could include many messages. | |
# Process each individual message. | |
ForEach ($Msg in $messages) { | |
# v1.1 Changed to get time from time attribute instead of innertext due to a change in how google formats | |
# the text format of the time. | |
# $GMTTime = $msg.selectsinglenode(".//abbr[@class='dt']").InnerText.Trim() | |
# $MsgTime = get-date $GMTTime | |
$AttribTime = $msg.selectsinglenode(".//abbr[@class='dt']").getattributevalue('title','') | |
$MsgTime = get-date $AttribTime | |
$Tel = $msg.selectsinglenode(".//a [@class='tel']") | |
$SenderName = $tel.InnerText.Trim() | |
$SenderNum = $tel.GetAttributeValue("href", "Number").TrimStart("tel:+") | |
$Body = $msg.selectsinglenode(".//q").InnerText.Trim() | |
$Body = [System.Web.HttpUtility]::HtmlDecode($Body) | |
if ($SenderName -eq "Me") | |
{ | |
$Direction = "Received" | |
} | |
else | |
{ | |
$Direction = "Sent" | |
} | |
# Add the details of this message to $Texts | |
$Texts += [PSCustomObject]@{ | |
Contact = $Contact | |
Time = $MsgTime | |
Type = $Type | |
Direction = $Direction | |
Message = $Body | |
} | |
} | |
} | |
} | |
} | |
End | |
{ | |
# Combine all $Calls and $Texts, sort based on the timestamp. | |
$GVHistory = $Calls + $Texts | |
$GVHistory | Sort Time | |
} | |
} |
@NeighborGeek I am getting the same errors as @bouchacha, except in my case the CSV file is blank.
This is what I am running on an admin prompt for PowerShell:
. 'C:\Users\user.name\Documents\GV History\Scripts\import-gvhistory.ps1'
$agilitypath = 'C:\Users\user.name\Documents\GV History\Scripts\htmlagilitypack.1.11.24\lib\Net45'
$path = 'C:\Users\user.name\Documents\GV History\Takeout\Voice\Calls'
import-gvhistory -Path $path -AgilityPath $agilitypath |
where-object {$_.Type -eq "Text"} | export-csv 'C:\Users\user.name\Documents\GV History\TextMessages.csv'
Can you let me know what I am doing wrong?
I've been working w/ this and it looks like the null valued errors are group conversations.
I tried getting this to run, in all cases I just get a blank GVHistory.csv
@NeighborGeek, I admit I have no idea what dot sourcing is and am a complete PowerShell noob, so thank you for your patience. I did in fact manage to get this to work! One of the issues I ran into is that I didn't know you had to enter the code you included above in separate lines.
That said, I get a TON of these two errors:
I still get a very full GVHistory.csv, but it's not clear whether any records are missing.
Also, is there an elegant way to include MMS like photos and videos? I realize that's asking a lot. It could work if this was exported to HTML instead of CSV.
Thank you for your work on this.