update it was pointed out a major bug in this test is it was always testing for 56, and since we were generating our collection array in order 1..n, 56 will be near the beginning of the array for large n's, and thus give array a near 'best case' performance. So I've run it again, with the test_for
variable being calculated to be the mid-point (close to average case for array?), and another time with the test_for
being a number that is not in the collection at all (array worse case).
In the 'array average case' approximation, Array still does better than Set in some cases that might surprise you, but not as much better, and not in as many cases.
In the 'array worst case', Array doesn't really ever do better than Set, and sometimes does really bad.
See the actual output below, decide for yourself based on this or any other data (and please feel free to add a comment explaining why this data may not be useful for real-world usage prediction, either here or on the reddit thread. I was just trying to explore something I was skeptical of empirically, instead of just arguing about it without measuring it, trying to measure it. If I messed it up, feel free to say so and why.)
Answer: If you have both a non-trivial number of reads and a non-trivial collection size, Set is faster. But even a huge collection with just a few reads, Array is generally faster. And a 10 element collection needs more than 100 reads for Set to take the lead.
'reads' are check for include?
According to my code below (am I in error somewhere?):
-
For a collection of size 10, with only 10 reads, Array is (slightly) faster.
-
For a collection of size 100 with only 10 reads, Array is still (slightly) faster.
-
For a collection size of 100, with 100 reads.... Set is 5x faster.
-
For a collection size of 10 with 100 reads... Array is still very slightly faster.
-
For a collection size of 10 with 500 reads.... Set is faster again, but not by a huge amount.
-
For a collection size of 1000 with 10 reads.... Array is still 2x faster!
-
For a collection size of 10000 with 10 reads... Array somehow about 3x faster?!
Okay, but what if we do lots of writes too?
-
For a collection size of 100, with 100 reads, but where we rewrite the contents 5 times (500 writes, 100 5x)... Set is still faster, but by less than 2x now.
-
For a collection size of 100 with 100 reads but where we rewrite the contents 10 times... Array is slightly faster again.
Note, ALL these 'faster than' measurements come from repeating the whole simulation 10k times, in actual real world app any degree of 'faster' either way may or may not be significant enough to matter. beware the micro-benchmark.