Skip to content

Instantly share code, notes, and snippets.

@jackross
Created December 6, 2012 15:42
Show Gist options
  • Save jackross/4225458 to your computer and use it in GitHub Desktop.
Save jackross/4225458 to your computer and use it in GitHub Desktop.
Email/Account Validity Stats

CheetahMail sends USGA Email Event Exports on a daily basis. Email Events are emailing-related events, sometimes called 'Click-Stream' or 'Conversions'. The default data format provided is a mixed-record format (the format of the data is different depending on the type of event) with all email events (sends, opens, transactions, etc) in one file and the meta-data about each mailing (issue) in a second file.

Each record in the Event file represents a single event in the click-stream. The initial event in the click-stream is always the 'Send' event. Each subsequent event in the click-stream is tied back to the send event. The send event record is the only event record that contains the USGA Account Id for the sent-to email address. Since email addresses are not unique across USGA Members/Accounts, it is ideal to have both the Account Id and the account's email address in order to tie the email events back to the Member/Account/Email Address.

The goal of this quick analysis is to determine the quality of the send event record. The Account Id/Email Address combination in the send event record should ideally match back to an Account Id/Email Address combination in the DW, since the email campaigns were generated from the same source as the DW. If the data provided in the send event record does not match back to the DW, then we must decide when and how the data should be integrated.

The first question is: What is the size and scope of the data set we're dealing with:

# Email Addresses# SentMin Sent OnMax Sent On
1,447,71055,139,910Mar 03, 2010Dec 04, 2012

Of these, how many were Member Club events:

# Email Addresses# Sent
12,882236,670

Leaving us with Non-Member-Club Account Send Events

# Email Addresses# Sent
1,447,63754,903,240

However, not all of the Account Id's match to known Member Account Id's in the DW. If we remove events linked to these Account Id's, we will remove:

# Email Addresses# Sent
499,0861,739,673

Which leaves us with Valid Member Account Send Events:

# Email Addresses# Sent
1,404,84153,163,567

However, not all of these Email Addresses exist in the DW. If we remove events linked to these Email Addresses, we will remove:

# Email Addresses# Sent
237,9385,472,344

Leaving us with Send Events for Valid Member Accounts with Known Email Addresses:

# Email Addresses# Sent
1,166,90347,691,223

Next, let's remove any send events where the Email Address in the send event is not a known Email Address for the Account Id in the send event (it is an Email Address for a different Member Account):

# Email Addresses# Sent
20,277175,681

Leaving us with Send Events for Valid Member Accounts with Matching Email Addresses:

# Email Addresses# Sent
1,163,84247,515,5420

This means that roughly 86.5% of the Non-Member-Club Account Send Events (47,515,542 / 54,903,240) match to the DW.

Why are there Account Id's that don't match to the DW?

I can't come up with a good explanation for this...

Why are there Email Addresses that don't match to the DW?

Possibly, these Email Addresses used to exist in the DW, but they have been updated to different email addresses, rather than logically deleting the old addresses (setting their status to 'D') and adding new ones. If this is the case, this could be a long term problem re: incorporating the email event data, and we will have to decide whether this is a best practice that should be continued.

{
"name": "Non-Member-Club Accounts",
"email_addresses": 1447637,
"sent": 54903240,
"children": [
{
"name": "Known Member Accounts",
"email_addresses": 1404841,
"sent": 53163567,
"children": [
{
"name": "With Email Address In DW",
"email_addresses": 1166903,
"sent": 47691223,
"children": [
{
"name": "Belonging To The Account",
"email_addresses": 1163842,
"sent": 47515542
},
{
"name": "Belonging To Another Account",
"email_addresses": 20277,
"sent": 175681
}
]
},
{
"name": "With Email Address Not In DW",
"email_addresses": 237938,
"sent": 5472344
}
]
},
{
"name": "Unknown Accounts",
"email_addresses": 499086,
"sent": 1739673,
"children": [
{
"name": "With Email Address In DW",
"email_addresses": 434874,
"sent": 1479705
},
{
"name": "With Email Address Not In DW",
"email_addresses": 64212,
"sent": 259968
}
]
}
]
}
<!DOCTYPE html>
<meta charset="utf-8">
<style>
text {
font-family: "Helvetica Neue", Helvetica, sans-serif;
}
.name {
font-weight: bold;
font-size: small;
}
.about {
fill: #777;
font-size: smaller;
}
.link {
fill: none;
stroke: #000;
shape-rendering: crispEdges;
}
</style>
<body>
<script src="http://d3js.org/d3.v2.min.js?2.9.4"></script>
<script>
var integerFormat = d3.format(",d");
var margin = {top: 0, right: 320, bottom: 0, left: 0},
width = 960 - margin.left - margin.right,
height = 500 - margin.top - margin.bottom;
var tree = d3.layout.tree()
.separation(function(a, b) { return a.parent === b.parent ? 1 : .875; })
.children(function(d) { return d.children; })
.size([height, width]);
var svg = d3.select("body").append("svg")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append("g")
.attr("transform", "translate(" + margin.left + "," + margin.top + ")");
d3.json("data.json", function(json) {
var nodes = tree.nodes(json);
var link = svg.selectAll(".link")
.data(tree.links(nodes))
.enter().append("path")
.attr("class", "link")
.attr("d", elbow);
var node = svg.selectAll(".node")
.data(nodes)
.enter().append("g")
.attr("class", "node")
.attr("transform", function(d) { return "translate(" + d.y + "," + d.x + ")"; })
node.append("text")
.attr("class", "name")
.attr("x", 8)
.attr("y", -6)
.text(function(d) { return d.name; });
node.append("text")
.attr("x", 8)
.attr("y", 8)
.attr("dy", ".71em")
.attr("class", "about lifespan")
.text(function(d) { return "# Email Addresses: " + integerFormat(d.email_addresses); });
node.append("text")
.attr("x", 8)
.attr("y", 8)
.attr("dy", "1.86em")
.attr("class", "about lifespan")
.text(function(d) { return "# Sent: " + integerFormat(d.sent); });
});
function elbow(d, i) {
return "M" + d.source.y + "," + d.source.x
+ "H" + d.target.y + "V" + d.target.x
+ (d.target.children ? "" : "h" + margin.right);
}
</script>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment