Skip to content

Instantly share code, notes, and snippets.

@VibhuJawa
Last active February 3, 2021 18:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save VibhuJawa/a394e18a44c24e8b304d1439c50b0d79 to your computer and use it in GitHub Desktop.
Save VibhuJawa/a394e18a44c24e8b304d1439c50b0d79 to your computer and use it in GitHub Desktop.
# requires data to be in format <user_sk>\t<timestamp>, clustered by user_sk and sorted by <timestamp> ascending
line = ''
current_uid = ''
last_click_time = -1
perUser_sessionID_counter = 1
timeout = 60*60
for line in click_rows:
user_sk, tstamp_str = line.strip().split("\t")
tstamp = long(tstamp_str)
# reset if next partition beginns
if current_uid != user_sk:
current_uid = user_sk
perUser_sessionID_counter = 1
last_click_time = tstamp
# time between clicks exceeds session timeout?
if tstamp - last_click_time > timeout:
perUser_sessionID_counter += 1
last_click_time = tstamp
print(f"{user_sk}, {str(perUser_sessionID_counter)}")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment