Skip to content

Instantly share code, notes, and snippets.

@oluies
Last active October 7, 2016 16:59
Show Gist options
  • Save oluies/ea9a0cf559ac03d0383e78ecfdeb02d8 to your computer and use it in GitHub Desktop.
Save oluies/ea9a0cf559ac03d0383e78ecfdeb02d8 to your computer and use it in GitHub Desktop.
sqlloggik4_df = """
SELECT *
, CAST(id as BIGINT) *10000 + SUM(new_session)
OVER (PARTITION BY id ORDER BY starttid)
AS session_id
FROM(
SELECT *,
unix_timestamp(l.starttid) - LAG(unix_timestamp(l.starttid)) OVER (PARTITION BY l.id ORDER BY l.starttid) timesincelast,
CASE
WHEN unix_timestamp(l.starttid) - LAG(unix_timestamp(l.starttid)) OVER (PARTITION BY l.id ORDER BY l.starttid) >= 30 * 60
THEN 1
ELSE 0
END as new_session
from loggik_df l
) s1
"""
sql4df = sqlHContext.sql(sqlloggik4_df)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment