Skip to content

Instantly share code, notes, and snippets.

@Mortimerp9
Created July 14, 2014 18:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Mortimerp9/365fd4f306b70b2570e6 to your computer and use it in GitHub Desktop.
Save Mortimerp9/365fd4f306b70b2570e6 to your computer and use it in GitHub Desktop.
toSet vs distinct in scala

you should probably use distinct instead of toSet:

scala> th.pbenchOff()((1 to 1000).toSet.size)((1 to 1000).distinct.size)
Benchmark comparison (in 23.17 s)
Significantly different (p ~= 0)
  Time ratio:    0.43815   95% CI 0.42991 - 0.44639   (n=20)
    First     125.6 us   95% CI 124.0 us - 127.2 us
    Second    55.03 us   95% CI 54.25 us - 55.81 us
res3: Int = 1000

It's not as pronounced for random sequences, but still preferable:

scala> th.pbenchOff()(Seq.fill(1000)(Random.nextInt).toSet.size)(Seq.fill(1000)(Random.nextInt).distinct.size)
Benchmark comparison (in 32.45 s)
Significantly different (p ~= 0)
  Time ratio:    0.72532   95% CI 0.71481 - 0.73582   (n=20)
    First     142.6 us   95% CI 142.2 us - 142.9 us
    Second    103.4 us   95% CI 101.9 us - 104.9 us
res5: Int = 1000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment