I want to summarize the email engagement for a particular contact in my database. The summary stats that I am looking for are:
-
The count of emails that have been sent to that contact
-
The timestamp for the first email sent, as well as the most recent (last) email
-
If it exists, the timestamp for the first and last open across all emails
-
If it exists, the timestamp for the first and last click across all emails
-
A count of how many distinct emails were opened
-
A count of how many distinct emails were clicked
The last two bullets above are giving me some issues.
MERGE (c1:Contact {id:1})
MERGE (c2:Contact {id:2})
MERGE (c3:Contact {id:3})
MERGE (m1:Email {ts:1})
MERGE (m2:Email {ts:2})
MERGE (m3:Email {ts:3})
MERGE (m4:Email {ts:4})
MERGE (m5:Email {ts:5})
MERGE (m6:Email {ts:6})
MERGE (o1:Open {ts:11})
MERGE (o2:Open {ts:12})
MERGE (o3:Open {ts:13})
MERGE (o4:Open {ts:14})
MERGE (l1:Link {ts:20})
MERGE (l2:Link {ts:21})
MERGE (l3:Link {ts:22})
CREATE (c1)-[:SENT]->(m1)
CREATE (m1)-[:NEXT_EMAIL]->(m2)
CREATE (m2)-[:NEXT_EMAIL]->(m3)
CREATE (c2)-[:SENT]->(m4)
CREATE (m4)-[:NEXT_EMAIL]->(m5)
CREATE (m5)-[:NEXT_EMAIL]->(m6)
CREATE (m1)-[:WAS_OPENED]->(o1)
CREATE (m2)-[:WAS_OPENED]->(o2)
CREATE (m2)-[:WAS_OPENED]->(o3)
CREATE (m5)-[:WAS_OPENED]->(o4)
CREATE (m1)-[:WAS_CLICKED]->(l1)
CREATE (m2)-[:WAS_CLICKED]->(l2)
CREATE (m2)-[:WAS_CLICKED]->(l3);
Now that the database is setup, this query almost gets me to the end result.
MATCH (c:Contact {id:1})-[*]->(e:Email)
OPTIONAL MATCH pl=(e)-[]->(click:Link)
OPTIONAL MATCH po=(e)-[]->(open:Open)
WITH c,
e,
click,
open
RETURN c.id as pidm,
COUNT(DISTINCT e) as email_count,
MIN(e.ts) as first_email,
MAX(e.ts) as last_email,
MIN(click.ts) as first_click,
MAX(click.ts) as last_click,
MIN(open.ts) as first_open,
MAX(open.ts) as last_open,
SUM(CASE WHEN EXISTS((e)-[]->(click:Link)) THEN 1 ELSE 0 END) as click_ind,
SUM(CASE WHEN EXISTS((e)-[]->(open:Open)) THEN 1 ELSE 0 END) as open_ind;
Because an individual email can be opened and clicked on multiple times, I just want to identify if it was ever clicked, or ever opened. These stats allow me to get estimate email engagement; simply, the percentage of emails that were opened and clicked.
In the table above, given the toy dataset, open_ind
and click_ind
should each return the value of 2
Moving beyond this, I will modify this query to look at the last N
emails, which clearly is Neo4j’s wheelhouse and why it’s a great tool for these sort of analytical projects.
Any help will be greatly appreciated.
-
Q: Why the varlength path to email?
MATCH (c:Contact {id:1})-[*]->(e:Email)
WITH distinct c,e
OPTIONAL MATCH (e)-[relClick]->(click:Link)
OPTIONAL MATCH (e)-[relOpen]->(open:Open)
RETURN c.id as pidm,
COUNT(DISTINCT e) as email_count,
MIN(e.ts) as first_email,
MAX(e.ts) as last_email,
MIN(click.ts) as first_click,
MAX(click.ts) as last_click,
MIN(open.ts) as first_open,
MAX(open.ts) as last_open,
COUNT(distinct (case relClick when null then null else e end)) as click_ind,
COUNT(distinct (case relOpen when null then null else e end)) as open_ind;