Skip to content

Instantly share code, notes, and snippets.

@mbjarland
Last active October 27, 2015 09:04
Show Gist options
  • Save mbjarland/0366d691e08f1a67cc3c to your computer and use it in GitHub Desktop.
Save mbjarland/0366d691e08f1a67cc3c to your computer and use it in GitHub Desktop.
Groovy / Java retrieve all subsets of size k given a list of size n

What is this?

There are a number of threads on the internet discussing performant ways of picking subsets out of a List or a Set in java / groovy. As I could not find any answers which felt clean to me, I did a bit of research and found the apache commons math Combinations class which seems quite performant, exists in mavenCentral and offers a clean interface.

For a random reference, picking sets of 10 from a list of 30 (about 30M combinations) takes about 500ms on my workhorse machine which is more or less up to date. That is just the apache commons piece which only gives back the indicies into the source (i.e. it sends out sets [1,2], [1,3], [2,3] if you tell it to pick 2 out of a set of 3, you can not ask it for subsets of length 2 for list ['a', 'b', 'c']) list/set.

In the below code we add the ability to work directly with lists and ask for subsets in groovy which means a performance hit, but I would guess that the convenience is in most cases worth the loss in performance.

I'm posting this in the hope that it can save somebody a bit of time when faced with this combinatorics problem in groovy/java.

Worth noting is the 13x performance gain from using the groovy CompileStatic annotation in the below sample. CompileStatic is a groovy 2.x feature and for code where the execution speed of the code (as opposed to the algorithm you are implementing, IO or something else external) is crucial, it really seems to make a huge difference.

Best, Matias Bjarland

keywords: groovy, java, sublist, subset, combinations, combinatorics, sublistof subsetof

// groovy / groovyclient work equally well...we use groovyclient here for faster response times
$ groovyclient subsets.groovy
1 - [a, b, c]
2 - [a, b, d]
3 - [a, c, d]
4 - [b, c, d]
5 - [a, b, e]
6 - [a, c, e]
7 - [b, c, e]
8 - [a, d, e]
9 - [b, d, e]
10 - [c, d, e]
11 - [a, b, f]
12 - [a, c, f]
13 - [b, c, f]
14 - [a, d, f]
15 - [b, d, f]
16 - [c, d, f]
17 - [a, e, f]
18 - [b, e, f]
19 - [c, e, f]
20 - [d, e, f]
dynamic groovy: 30045015 combinations took 14134ms
compile static: 30045015 combinations took 1107ms
// grab the apache math3 utils Combinations class
// which is the basis of this sample
//
// Note: For acceptable performance when executing multiple runs of
// a script with grab annotations, use groovyserv/groovyclient
// http://kobo.github.io/groovyserv/
@Grab('org.apache.commons:commons-math3:3.3')
import org.apache.commons.math3.util.Combinations
import groovy.transform.*
// this method modifies the java List interface to contain two methods
// eachSubset(int size, Closure c) and
// eachSubsetWithIndex(int size, Closure c)
addMetaMethods()
// some code to demonstrating usage
// Case 1: a fixed list and eachSubsetWithIndex
// the return value is whatever is returned
// from the last call to the closure (nothing here)
['a','b','c','d','e','f'].eachSubsetWithIndex(3) { List s, int index ->
println "${(index+1).toString().padLeft(2)} - $s"
}
// larger lists for performance measurements, we do subsets of
// 10 elements (k) out of a list of 30 elements (n)
int n = 30
int k = 10
// Create a list of 1,2,3,...,30 using groovy range syntax
List list = (1..n) as List
// Case 2: measure performance
int j = 0
long start = System.currentTimeMillis()
// iterate through all subsets of size 10 incrementing counter
list.eachSubset(k) { List s -> j++ }
long delta = System.currentTimeMillis() - start
println "dynamic groovy: $j combinations took ${delta}ms"
// Case 3: execute the same performance test using statically
// compiled code for performance comparison
staticPerformance(list, k)
// Supporting classes and code
@CompileStatic
class CombinationIterator implements Iterator {
Iterator<int[]> i
Object[] list
int k
Object[] arr
CombinationIterator(List list, int k) {
this(list as Object[], k)
}
CombinationIterator(Object[] list, int k) {
this.i = new Combinations(list.length, k).iterator()
this.list = list
this.k = k
this.arr = new Object[k]
}
boolean hasNext() {
i.hasNext()
}
Object[] next() {
int[] c = i.next()
for (int j=0; j<k; j++)
arr[j] = list[c[j]]
arr
}
void remove() {
i.remove()
}
}
@CompileStatic
Object eachSubset(List list, int k, Closure c) {
Iterator i = new CombinationIterator(list, k)
Object result
while (i.hasNext()) {
Object[] s = i.next()
result = c(s as List)
}
result
}
@CompileStatic
Object eachSubsetWithIndex(List list, int k, Closure c) {
Iterator i = new CombinationIterator(list, k)
int j = 0
Object result
while (i.hasNext()) {
Object[] s = i.next()
result = c(s as List, j++)
}
result
}
@CompileStatic
void staticPerformance(List list, int k) {
long start = System.currentTimeMillis()
int j = 0
Iterator i = new CombinationIterator(list, k)
while (i.hasNext()) {
Object[] s = i.next()
j++
}
long delta = System.currentTimeMillis() - start
println "compile static: $j combinations took ${delta}ms"
}
def addMetaMethods() {
List.metaClass.eachSubset = { int k, Closure c ->
this.eachSubset delegate, k, c
}
List.metaClass.eachSubsetWithIndex = { int k, Closure c ->
this.eachSubsetWithIndex delegate, k, c
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment