Skip to content

Instantly share code, notes, and snippets.

@jexp
Forked from Btibert3/summary-stats.adoc
Last active February 6, 2016 17:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jexp/b034638cabda6de11835 to your computer and use it in GitHub Desktop.
Save jexp/b034638cabda6de11835 to your computer and use it in GitHub Desktop.
Cypher query help: Summary stats for email marketing

Email Marketing Analytics App

Problem Statement

I want to summarize the email engagement for a particular contact in my database. The summary stats that I am looking for are:

  • The count of emails that have been sent to that contact

  • The timestamp for the first email sent, as well as the most recent (last) email

  • If it exists, the timestamp for the first and last open across all emails

  • If it exists, the timestamp for the first and last click across all emails

  • A count of how many distinct emails were opened

  • A count of how many distinct emails were clicked

The last two bullets above are giving me some issues.

Toy Database

MERGE (c1:Contact {id:1})
MERGE (c2:Contact {id:2})
MERGE (c3:Contact {id:3})
MERGE (m1:Email {ts:1})
MERGE (m2:Email {ts:2})
MERGE (m3:Email {ts:3})
MERGE (m4:Email {ts:4})
MERGE (m5:Email {ts:5})
MERGE (m6:Email {ts:6})
MERGE (o1:Open {ts:11})
MERGE (o2:Open {ts:12})
MERGE (o3:Open {ts:13})
MERGE (o4:Open {ts:14})
MERGE (l1:Link {ts:20})
MERGE (l2:Link {ts:21})
MERGE (l3:Link {ts:22})
CREATE (c1)-[:SENT]->(m1)
CREATE (m1)-[:NEXT_EMAIL]->(m2)
CREATE (m2)-[:NEXT_EMAIL]->(m3)
CREATE (c2)-[:SENT]->(m4)
CREATE (m4)-[:NEXT_EMAIL]->(m5)
CREATE (m5)-[:NEXT_EMAIL]->(m6)
CREATE (m1)-[:WAS_OPENED]->(o1)
CREATE (m2)-[:WAS_OPENED]->(o2)
CREATE (m2)-[:WAS_OPENED]->(o3)
CREATE (m5)-[:WAS_OPENED]->(o4)
CREATE (m1)-[:WAS_CLICKED]->(l1)
CREATE (m2)-[:WAS_CLICKED]->(l2)
CREATE (m2)-[:WAS_CLICKED]->(l3);

Now that the database is setup, this query almost gets me to the end result.

E-Mail Statistics

MATCH (c:Contact {id:1})-[*]->(e:Email)
OPTIONAL MATCH pl=(e)-[]->(click:Link)
OPTIONAL MATCH po=(e)-[]->(open:Open)
WITH c,
     e,
     click,
     open
RETURN c.id as pidm,
       COUNT(DISTINCT e) as email_count,
       MIN(e.ts) as first_email,
       MAX(e.ts) as last_email,
       MIN(click.ts) as first_click,
       MAX(click.ts) as last_click,
       MIN(open.ts) as first_open,
       MAX(open.ts) as last_open,
       SUM(CASE WHEN EXISTS((e)-[]->(click:Link)) THEN 1 ELSE 0 END) as click_ind,
       SUM(CASE WHEN EXISTS((e)-[]->(open:Open)) THEN 1 ELSE 0 END) as open_ind;

Because an individual email can be opened and clicked on multiple times, I just want to identify if it was ever clicked, or ever opened. These stats allow me to get estimate email engagement; simply, the percentage of emails that were opened and clicked.

In the table above, given the toy dataset, open_ind and click_ind should each return the value of 2

Moving beyond this, I will modify this query to look at the last N emails, which clearly is Neo4j’s wheelhouse and why it’s a great tool for these sort of analytical projects.

Any help will be greatly appreciated.

Test Michael: E-Mail Statistics

  • Q: Why the varlength path to email?

MATCH (c:Contact {id:1})-[*]->(e:Email)
WITH distinct c,e
OPTIONAL MATCH (e)-[relClick]->(click:Link)
OPTIONAL MATCH (e)-[relOpen]->(open:Open)
RETURN c.id as pidm,
       COUNT(DISTINCT e) as email_count,
       MIN(e.ts) as first_email,
       MAX(e.ts) as last_email,
       MIN(click.ts) as first_click,
       MAX(click.ts) as last_click,
       MIN(open.ts) as first_open,
       MAX(open.ts) as last_open,
       COUNT(distinct (case relClick when null then null else e end)) as click_ind,
       COUNT(distinct (case relOpen when null then null else e end)) as open_ind;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment