Skip to content

Instantly share code, notes, and snippets.

@kgaughan
Last active December 15, 2016 18:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kgaughan/a6a31918ccf1baaa6d3f14abdf803bef to your computer and use it in GitHub Desktop.
Save kgaughan/a6a31918ccf1baaa6d3f14abdf803bef to your computer and use it in GitHub Desktop.
Search algorithm outline

I've an idea for a merge sort variant that's been rattling around my head for a while.

The idea is that sorting is broken up into two phase: run identification and merging. This has some similarities to Timsort, but it's not quite the same thing.

In the run identification phase, runs are identified and recorded. There is some deperturbation to try an identify the longest runs possible near the potential end of runs where if the next element, x[i], is less than x[i-1] and x[i-2], it is swapped with x[i-1].

If no run longer than N can be identified since the last identified run by M elements, those M elements are sorted using insertion sort to create a run.

When we have a list of runs, we start merging adjacent pairs of runs, starting with the smallest adjacent pairs.

Open questions:

  • How should N (the minimum run length) and M (the maximum span of any non-run), where M > N, be calculated? Should N have an upper or lower bound if calculated based on the sequence length?
  • There should be a way to identify runs in the opposite direction, and a heuristic for identifying those, and then reversing them. (Yes, naturally, but the key is when run reversal ought to be done).
  • What is the best way of doing the merge? There are a few things I think might be worth while:
    • End identification: if you have a pair of runs, compare the last element of the first run with the first element of the second run, and if the former is less than the latter, treat the two runs as merged.
    • Might it be worthwhile trying to minimise the size of the temporary storage needed to perform the run by checking if some chunk at the beginning/end of the two runs can be excluded from the merge?
    • Should reversal of opposite runs be done as part of the merge or after run identification? Is it better to do longer runs in the former and shorter runs in the latter?

Material:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment