Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save moodymudskipper/3a624deade2c00301f14e2e65f891cfe to your computer and use it in GitHub Desktop.
Save moodymudskipper/3a624deade2c00301f14e2e65f891cfe to your computer and use it in GitHub Desktop.
We often need to do set operations on lists, where the actual unions or
intersections happen on names, not content, and special treatment is to be
applied in the content.
We propose a way to generalize set operations, take the following examples:
```
X <- list(a = 1, b = 2:3, c = 4)
Y <- list(b = 3:4, d = 6)
UNION(x, y, conflict)
UNION(X, Y, ~.x) # list(a = 1, b = 2:3, c = 4, d = 6)
UNION(X, Y, ~.y) # list(a = 1, b = 3:4, c = 4, d = 6)
UNION(X, Y, union) # list(a = 1, b = 2:4, c = 4, d = 6)
UNION(X, Y, c) # list(a = 1, b = c(2, 3, 3, 4), c = 4, d = 6)
UNION(X, Y, list) # list(a = 1, b = list(2:3, 3:4), c = 4, d = 6)
UNION(X, Y, ~NULL) # list(a = 1, c = 4, d = 6) # it's a xor!
INTERSECT(x, y, conflict)
INTERSECT(X, Y, ~.x) # list(b = 2:3)
INTERSECT(X, Y, ~.y) # list(b = 3:4)
INTERSECT(X, Y, union) # list(b = 2:4)
INTERSECT(X, Y, c) # list(b = c(2, 3, 3, 4))
INTERSECT(X, Y, list) # list(b = list(2:3, 3:4))
INTERSECT(X, Y, ~NULL) # list() # not so useful!
SETDIFF(x, y, conflict)
SETDIFF(X, Y, ~.x) # list(a = 1, b = 2:3)
SETDIFF(X, Y, ~.y) # list(a = 1, b = 3:4)
SETDIFF(X, Y, union) # list(a = 1, b = 2:4)
SETDIFF(X, Y, c) # list(a = 1, b = c(2, 3, 3, 4))
SETDIFF(X, Y, list) # list(a = 1, b = list(2:3, 3:4))
SETDIFF(X, Y, ~NULL) # list(a = 1)
```
instead of all caps, could be named `list_union` etc.
We need defaults, maybe :
```
UNION(x, y, ~.x) # adding elements from y IF x doesn't contain them already
INTERSECT(x, y, ~.x) # keeping the values from x but removing names that aren't in y
SETDIFF(x, y, ~NULL) # remove all elements that have names used in y
```
Other possible defaults :
```
UNION(x, y, union)
INTERSECT(x, y, intersect)
SETDIFF(x, y, setdiff)
```
more consistent, though might not reflect the most common use cases ?
We need to think about how to deal with unnamed elements in x and y. Maybe an unnamed argument similar to conflict :
`unnamed = c` is a union of unnamed, keeping duplicates
`unnamed = ~unique(c(.x,.y))` is keeping only unique values
etc
We might also need an argument to define if length 0 elements should be dropped (yes by default)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment