Skip to content

Instantly share code, notes, and snippets.

@tvladeck
Last active September 18, 2017 19:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tvladeck/bdb2987ac7ac6e4017cef2659de53d5c to your computer and use it in GitHub Desktop.
Save tvladeck/bdb2987ac7ac6e4017cef2659de53d5c to your computer and use it in GitHub Desktop.
psuedocode for sequence clustering
# normalize sequences
tx_sample <-
tx_sample %>%
mutate(
order_date = order_date - first_date
)
# create TraMineRextras object
tx_seq <- seqecreate(
id = tx_sample$customer_id,
timestamp = tx_sample$order_date,
event = tx_sample$sale_type # this could be anything (email, return, channel, etc.)
)
# create distance matrix
seq_dist <- seqedist(
seqe = tx_seq,
idcost = rep(1, 3),
vparam = .1,
interval = "previous"
)
# visualize distance matrix
seq_dist %>% cmdscale %>% plot
# hierarchical clustering
seq_clust <- agnes(
x = seq_dist,
diss = TRUE,
method = 'ward'
)
# visualize some differences
seqedplot(tx_seq, type = "hazard", group = cutree(event_seq_normalized, 3))
# find discriminating subsequences
# not implemented yet—see http://traminer.unige.ch/doc/seqecmpgroup.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment