Skip to content

Instantly share code, notes, and snippets.

@Tadge-Analytics
Last active September 17, 2020 04:02
Show Gist options
  • Save Tadge-Analytics/f842058c2d42cbd163d0debb7b3747fb to your computer and use it in GitHub Desktop.
Save Tadge-Analytics/f842058c2d42cbd163d0debb7b3747fb to your computer and use it in GitHub Desktop.
library(tidyverse)
import <- read_csv("original data/train_users.csv")
intro_month <- as.Date("2013-12-01")
set.seed(123)
tidy <- import %>%
filter(lubridate::year(date_account_created) >= 2013) %>%
mutate(age = if_else(age > 90 | age < 18, NA_real_, age)) %>%
select(id, date_account_created, gender, age, language, first_device_type, first_browser, country_destination) %>%
mutate(booking_val = runif(n(), 40, 200),
booking_val = if_else(country_destination == "NDF", NA_real_, booking_val),
friend_referral = sample(0:1, n(), prob = c(0.3, 0.7), replace = T),
new_method = case_when(date_account_created < intro_month ~ 0L,
date_account_created <= intro_month + lubridate::dmonths(1) ~ sample(0:1, n(), prob = c(0.6, 0.4), replace = T),
date_account_created <= intro_month + lubridate::dmonths(2) ~ sample(0:1, n(), prob = c(0.4, 0.6), replace = T),
date_account_created <= intro_month -2 + lubridate::dmonths(3) ~ sample(0:1, n(), prob = c(0.2, 0.8), replace = T),
TRUE ~ 1L))
tidy %>%
write_csv("train_users.csv", na = "")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment