Created
June 7, 2024 19:01
-
-
Save daranzolin/947959a0341966582fa84581abaa03a0 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
suppressPackageStartupMessages(library(duckdb)) | |
suppressWarnings(suppressPackageStartupMessages(library(tidyverse))) | |
con <- dbConnect(duckdb()) | |
set.seed(1) | |
df <- tibble( | |
id = sample(1:5, 10, replace = TRUE), | |
x = sample(LETTERS[1:4], 10, replace = TRUE) | |
) | |
duckdb_register(con, "df", df, overwrite = TRUE) | |
q <- "select * from( | |
select | |
*, | |
string_agg(x) over( | |
partition by id order by id | |
rows between unbounded preceding and unbounded following | |
exclude current row | |
) as grps | |
from (select distinct id, x from df order by x, id) | |
) | |
where grps is not null | |
order by id" | |
dbGetQuery(con, q) |
performance-wise i feel like it would be better to just do filter(n() > 1)
right aftergroup_by(id)
instead of filtering by grps
at the very end
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Oh! You’re good! I believe nzchar() is faster though. In the event that you’re after performance.