Skip to content

Instantly share code, notes, and snippets.

@gshotwell
Created October 14, 2017 17:11
Show Gist options
  • Save gshotwell/ed54fe9602b982d32f90c7bdfcf8f855 to your computer and use it in GitHub Desktop.
Save gshotwell/ed54fe9602b982d32f90c7bdfcf8f855 to your computer and use it in GitHub Desktop.
Try to figure out an O(n) algorithm to find all unique substrings
str <- "asdfasdf"
char <- data_frame(
letters = unlist(strsplit(str, "")),
skip = FALSE)
uniques <- c()
for (q in nchar(str):1) {
for (i in 1:(nchar(str) - q + 1)) {
start <- i
end <- i + q - 1
if (all( char$skip[start:end])) {
next
}
sub <- substr(str, start, end)
if (sub %in% uniques) {
char$skip[start:end] <- TRUE
} else {
uniques <- c(uniques, sub)
}
}
}
@gshotwell
Copy link
Author

The basic idea here is that if a substring is a duplicate, then all of its substrings will also be duplicates, so you don't have to check them. I feel like this is about half of the answer to the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment