Skip to content

Instantly share code, notes, and snippets.

@matthewpalmer
Created September 4, 2014 11:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save matthewpalmer/33be43a1cb6032fb6775 to your computer and use it in GitHub Desktop.
Save matthewpalmer/33be43a1cb6032fb6775 to your computer and use it in GitHub Desktop.
deduction
# Sort A
**Charts**
http://i.imgur.com/uxBzv0a.png
We ran our standard tests of input data from 1000 to 100,000 in size, with intervals of 1000. This resulted in the above graph.
The average (random) case is roughly polynomial, and it performs nearly as badly on descending data. Interestingly, the algorithm performs very well on ascending data. With these observations, we can narrow the algorithm down to one of the following:
- oblivious bubble sort
- bubble sort with early exit
- Vanilla Insertion sort
- Insertion sort with binary search
- insertion sort on list
- Shell sort times 3 plus 1
- Shell sort times 4
All of the above algorithms have the characteristic that their average case is O(N^2), and that their best case occurs for already sorted data.
This list can be further narrowed down by observing whether the algorithm is stable, since some of the algorithms are stable, and others are not.
If sortA is stable, we would expect to see that for entries with the same key, their order would be preserved after sorting.
**Stable sorts**
- Bubble sort with early exit
- Vanilla insertion sort
- Insertion sort with binary search
- Insertion sort on list
**Unstable sorts**
- Oblivious bubble sort
- Shell sort times 3 plus 1
- Shell sort times 4
We ran sortA on the following set of input data.
```
5 one
7 two
1 three
3 four
9 five
10 six
3 seven
5 eight
2 nine
9 ten
5 eleven
3 twelve
```
and it produced the following output.
```
1 three
2 nine
3 four
3 seven
3 twelve
5 one
5 eight
5 eleven
7 two
9 five
9 ten
10 six
```
Thus, we concluded that sortA is stable. Then it must be one of:
- Bubble sort with early exit
- Vanilla insertion sort
- Insertion sort with binary search
- Insertion sort on list
From this point, we wanted to try to distinguish between the bubble sort and the insertion sorts. A bubble sort works by comparing adjacent entries repeatedly. An insertion sort works through the array, inserting the next element in its appropriate place in the sorted section of the array.
We then tested the array on an input of 10,000, 100,000 lines of '1' followed by a single line of '0'. If it were a bubble sort, the test with 100,000 1's followed by a 0 would be significantly slower than the test with simply 100,000 1's. This is because the single 0 would have to be swapped all the way down the list, which means many more iterations and comparisons. An insertion sort, however, would simply insert the trailing 0 at the start of the list (without re-inserting it along the way as bubble sort would do) and there would be little discernible difference. The results follow.
```
Baseline (all 1's):
10k - 0.00
100k - 0.03
1's then a 0:
10k - 0.01
100k - 0.03
```
Thus, we concluded that sortA was one of the insertion sorts.
Now, if sortA was an insertion sort using a linked list rather than an array, we would expect that the cost of inserting an element at the start would be of constant cost, whereas with an array it would have a cost of O(N) because all of the elements would need to be shifted backwards.
To test this, we needed to compare an input set of random data with an input set that was in descending order. If the data was in descending order, then every insertion would have to occur at the front of the list, and would thus be much slower than the baseline.
However, we had already run this test, and in fact found that descending data was 10% faster than randomised data input. Thus, we ruled out the insertion sort using a linked list.
From here, we had to determine whether it was an insertion sort with a binary search or a vanilla insertion sort.
To determine which of the two insertion sorts sortA was, we wrote a vanilla insertion sort. This is not an ideal method for determining which algorithm was used, since implementation details will affect the performance. However, we reasoned that a vanilla insertion sort was simple enough that significant deviations from the growth rate of our insertion sort compared to sortA would provide insight.
The results follow.
```
10k Random
Our sort: 0.12
sortA: 0.4
100k Random
Our sort: 12.5
sortA: 44.28
```
As you can see, the growth rate of both of the algorithms was the same. Our sort went from 0.12s on 10k input to 12.5s on 100k input (note the N^2 growth rate), which is the same trend that sortA follows.
Thus, we concluded that sortA was a vanilla insertion sort because it followed a nearly identical rate of growth as our vanilla insertion sort.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment