Skip to content

Instantly share code, notes, and snippets.

@patternproject
Created April 11, 2017 16:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save patternproject/c21212f250ab47860850c7d8426dead6 to your computer and use it in GitHub Desktop.
Save patternproject/c21212f250ab47860850c7d8426dead6 to your computer and use it in GitHub Desktop.
## `````````````````````````````````````````````
#### Read Me ####
## `````````````````````````````````````````````
## Trying to implement "sessionization" in R
## Details here:
## https://www.dataiku.com/learn/guide/code/reshaping_data/sessionization.html
## `````````````````````````````````````````````
## `````````````````````````````````````````````
#### Load Libraries ####
## `````````````````````````````````````````````
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse)
pacman::p_load(readr)
## `````````````````````````````````````````````
## `````````````````````````````````````````````
#### Helper Function ####
## `````````````````````````````````````````````
## `````````````````````````````````````````````
## `````````````````````````````````````````````
#### Global Settings ####
## `````````````````````````````````````````````
## `````````````````````````````````````````````
## `````````````````````````````````````````````
#### Read Data ####
## `````````````````````````````````````````````
df.master <- read_csv("../toy_data2.csv")
#View(df.master)
## `````````````````````````````````````````````
## `````````````````````````````````````````````
#### Manipulate Data ####
## `````````````````````````````````````````````
# define threshold value
i_session = 30*60
df.master %>%
arrange(user_id, mytimestamp) %>%
group_by(user_id) %>%
mutate(time_interval = mytimestamp - lag(mytimestamp),
# fixing NA values
time_interval = ifelse(is.na(time_interval),0,time_interval),
# setting flag based on 30 minutes of inactivity,
flag = ifelse(time_interval >= i_session,1,0),
# generating session id
session_id = paste(user_id, cumsum(flag), sep = '_')) -> df.1
## `````````````````````````````````````````````
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
user_id mytimestamp
uid1 2013-09-04T15:49:49
uid1 2013-09-04T15:49:58
uid1 2013-09-04T16:37:11
uid1 2013-09-04T16:37:18
uid1 2013-09-04T16:39:27
uid1 2013-09-04T16:43:57
uid1 2013-09-04T20:12:03
uid1 2013-09-05T00:00:17
uid1 2013-09-05T00:20:35
uid2 2013-09-05T00:22:37
uid2 2013-09-05T00:24:10
uid2 2013-09-05T01:19:29
uid1 2013-09-05T01:19:39
uid1 2013-09-05T01:20:03
uid1 2013-09-05T01:20:17
uid1 2013-09-05T02:33:42
@patternproject
Copy link
Author

patternproject commented Apr 11, 2017

The output from above script matches the output at [1] except the last row, where it is 1 for the later and 0 for the former . (Shown in red rectangle below)

comparison2

[1] https://www.dataiku.com/learn/guide/code/reshaping_data/sessionization.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment