Instantly share code, notes, and snippets.

Embed
What would you like to do?
These queries let you define find user sessions against event data logged to Segment SQL, Snowplow, or Google BigQuery.
-- These queries let you define find user sessions against event data
-- logged to Segment SQL, Snowplow, or Google BigQuery.
-- For more details, see the full post:
-- LINK
--- SEGMENT SQL
-- Finding the start of every session
SELECT *
FROM (
SELECT *
LAG(sent_at,1) OVER (PARTITION BY user_id ORDER BY sent_at) AS last_event
FROM "your_project".tracks
) last
WHERE EXTRACT('EPOCH' FROM sent_at) - EXTRACT('EPOCH' FROM last_event) >= (60 * 10)
OR last_event IS NULL
-- Mapping every event to its session
SELECT *,
SUM(is_new_session) OVER (ORDER BY user_id, sent_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS global_session_id,
SUM(is_new_session) OVER (PARTITION BY user_id ORDER BY sent_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS user_session_id
FROM (
SELECT *,
CASE WHEN EXTRACT('EPOCH' FROM sent_at) - EXTRACT('EPOCH' FROM last_event) >= (60 * 10)
OR last_event IS NULL
THEN 1 ELSE 0 END AS is_new_session
FROM (
SELECT *,
LAG(sent_at,1) OVER (PARTITION BY user_id ORDER BY sent_at) AS last_event
FROM "your_project".tracks
) last
) final
--- SNOWPLOW
-- Finding the start of every session
SELECT *
FROM (
SELECT *
LAG(collector_tstamp,1) OVER (PARTITION BY domain_userid ORDER BY collector_tstamp) AS last_event
FROM "your_project".events
) last
WHERE EXTRACT('EPOCH' FROM collector_tstamp) - EXTRACT('EPOCH' FROM last_event) >= (60 * 10)
OR last_event IS NULL
-- Mapping every event to its session
SELECT *,
SUM(is_new_session) OVER (ORDER BY domain_userid, collector_tstamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS global_session_id,
SUM(is_new_session) OVER (PARTITION BY domain_userid ORDER BY collector_tstamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS user_session_id
FROM (
SELECT *,
CASE WHEN EXTRACT('EPOCH' FROM collector_tstamp) - EXTRACT('EPOCH' FROM last_event) >= (60 * 10)
OR last_event IS NULL
THEN 1 ELSE 0 END AS is_new_session
FROM (
SELECT *,
LAG(collector_tstamp,1) OVER (PARTITION BY domain_userid ORDER BY collector_tstamp) AS last_event
FROM "your_project".events
) last
) final
--- GOOGLE BIGQUERY
-- Finding the start of every session
SELECT *
FROM (
SELECT *,
LAG(occurred_at,1) OVER (PARTITION BY fullVisitorId ORDER BY occurred_at) AS last_event
FROM (
SELECT fullVisitorId,
visitStartTime + hits.time/1000 AS occurred_at
FROM "your_tables"
) pre
) last
WHERE occurred_at - last_event >= (60 * 10)
OR last_event IS NULL
-- Mapping every event to its session
SELECT *,
SUM(is_new_session) OVER (ORDER BY fullVisitorId, occurred_at) AS global_session_id
FROM (
SELECT *,
CASE WHEN occurred_at - last_event >= (60 * 10) OR last_event IS NULL THEN 1 ELSE 0 END AS is_new_session
FROM (
SELECT *,
LAG(occurred_at,1) OVER (PARTITION BY fullVisitorId ORDER BY occurred_at) AS last_event
FROM (
SELECT fullVisitorId,
visitStartTime + hits.time/1000 AS occurred_at
FROM "your_tables"
) pre
) last
) final
ORDER BY 1,2
@fraserdeans

This comment has been minimized.

Copy link

fraserdeans commented Dec 23, 2016

Thanks so much for these. FYI there is a missing comma before the LAG on the first Segment SQL.

Should be...

SELECT *
  FROM (
       SELECT *,
              LAG(sent_at,1) OVER (PARTITION BY user_id ORDER BY sent_at) AS last_event
        FROM "your_project".tracks
      ) last
WHERE EXTRACT('EPOCH' FROM sent_at) - EXTRACT('EPOCH' FROM last_event) >= (60 * 10)
   OR last_event IS NULL
@alfurb

This comment has been minimized.

Copy link

alfurb commented Nov 22, 2017

In the Google BigQuery queries, what is the reason for adding hits.time/1000 in visitStartTime + hits.time/1000 AS occurred_at?

@guillaumewibaux

This comment has been minimized.

Copy link

guillaumewibaux commented Jan 8, 2018

A note that could help some:
It can happen than ordering of events using timestamps is nondeterministic a.k.a several timestamps are equal. (let's say in the same second) (More here: https://docs.aws.amazon.com/redshift/latest/dg/r_Examples_order_by_WF.html)

It can led to some fun stuff with the usage of window functions and 'sessions IDs' . To counter it, you could use the event id to make the sorting deterministic across the different windows.

@axrdb

This comment has been minimized.

Copy link

axrdb commented Feb 8, 2018

Is it possible to update the query to also assign new sessions based on the length of the session? we want to calculate sessions as continuous events within 60 min for example. If there is a window of > 60 min for that user then its a different session. Any idea how to either extend your query to accomplish the above?

@ankitagarwal

This comment has been minimized.

Copy link

ankitagarwal commented Apr 16, 2018

For segment wouldn't using "timestamp" field instead of "sent_at" make more sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment