Skip to content

Instantly share code, notes, and snippets.

@daranzolin
Created June 7, 2024 19:01
Show Gist options
  • Save daranzolin/947959a0341966582fa84581abaa03a0 to your computer and use it in GitHub Desktop.
Save daranzolin/947959a0341966582fa84581abaa03a0 to your computer and use it in GitHub Desktop.
suppressPackageStartupMessages(library(duckdb))
suppressWarnings(suppressPackageStartupMessages(library(tidyverse)))
con <- dbConnect(duckdb())
set.seed(1)
df <- tibble(
id = sample(1:5, 10, replace = TRUE),
x = sample(LETTERS[1:4], 10, replace = TRUE)
)
duckdb_register(con, "df", df, overwrite = TRUE)
q <- "select * from(
select
*,
string_agg(x) over(
partition by id order by id
rows between unbounded preceding and unbounded following
exclude current row
) as grps
from (select distinct id, x from df order by x, id)
)
where grps is not null
order by id"
dbGetQuery(con, q)
@idmn
Copy link

idmn commented Jun 8, 2024

performance-wise i feel like it would be better to just do filter(n() > 1) right aftergroup_by(id) instead of filtering by grps at the very end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment