Created
April 11, 2017 16:29
-
-
Save patternproject/c21212f250ab47860850c7d8426dead6 to your computer and use it in GitHub Desktop.
Trying to implement "sessionization" in R (https://www.dataiku.com/learn/guide/code/reshaping_data/sessionization.html)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## ````````````````````````````````````````````` | |
#### Read Me #### | |
## ````````````````````````````````````````````` | |
## Trying to implement "sessionization" in R | |
## Details here: | |
## https://www.dataiku.com/learn/guide/code/reshaping_data/sessionization.html | |
## ````````````````````````````````````````````` | |
## ````````````````````````````````````````````` | |
#### Load Libraries #### | |
## ````````````````````````````````````````````` | |
if (!require("pacman")) install.packages("pacman") | |
pacman::p_load(tidyverse) | |
pacman::p_load(readr) | |
## ````````````````````````````````````````````` | |
## ````````````````````````````````````````````` | |
#### Helper Function #### | |
## ````````````````````````````````````````````` | |
## ````````````````````````````````````````````` | |
## ````````````````````````````````````````````` | |
#### Global Settings #### | |
## ````````````````````````````````````````````` | |
## ````````````````````````````````````````````` | |
## ````````````````````````````````````````````` | |
#### Read Data #### | |
## ````````````````````````````````````````````` | |
df.master <- read_csv("../toy_data2.csv") | |
#View(df.master) | |
## ````````````````````````````````````````````` | |
## ````````````````````````````````````````````` | |
#### Manipulate Data #### | |
## ````````````````````````````````````````````` | |
# define threshold value | |
i_session = 30*60 | |
df.master %>% | |
arrange(user_id, mytimestamp) %>% | |
group_by(user_id) %>% | |
mutate(time_interval = mytimestamp - lag(mytimestamp), | |
# fixing NA values | |
time_interval = ifelse(is.na(time_interval),0,time_interval), | |
# setting flag based on 30 minutes of inactivity, | |
flag = ifelse(time_interval >= i_session,1,0), | |
# generating session id | |
session_id = paste(user_id, cumsum(flag), sep = '_')) -> df.1 | |
## ````````````````````````````````````````````` |
We can make this file beautiful and searchable if this error is corrected: No commas found in this CSV file in line 0.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
user_id mytimestamp | |
uid1 2013-09-04T15:49:49 | |
uid1 2013-09-04T15:49:58 | |
uid1 2013-09-04T16:37:11 | |
uid1 2013-09-04T16:37:18 | |
uid1 2013-09-04T16:39:27 | |
uid1 2013-09-04T16:43:57 | |
uid1 2013-09-04T20:12:03 | |
uid1 2013-09-05T00:00:17 | |
uid1 2013-09-05T00:20:35 | |
uid2 2013-09-05T00:22:37 | |
uid2 2013-09-05T00:24:10 | |
uid2 2013-09-05T01:19:29 | |
uid1 2013-09-05T01:19:39 | |
uid1 2013-09-05T01:20:03 | |
uid1 2013-09-05T01:20:17 | |
uid1 2013-09-05T02:33:42 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The output from above script matches the output at [1] except the last row, where it is 1 for the later and 0 for the former . (Shown in red rectangle below)
[1] https://www.dataiku.com/learn/guide/code/reshaping_data/sessionization.html