Last active
July 25, 2022 07:01
-
-
Save Phryxia/1a0a0f5d70cd265ab3ec2cfe9a04eb92 to your computer and use it in GitHub Desktop.
TypeScript implementation of median value finder
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function medianBySort(values: number[]) { | |
values.sort((a, b) => a - b) | |
const half = Math.floor(values.length / 2) | |
if (values.length % 2) { | |
return values[half] | |
} | |
return (values[half - 1] + values[half]) * 0.5 | |
} | |
function medianByK(values: number[], k: number): number { | |
if (values.length < k ** 2) { | |
return medianBySort(values) | |
} | |
const slicedValues = [] as number[] | |
for (let i = 0; i < values.length; i += k) { | |
slicedValues.push(medianByK(values.slice(i, i + k), k)) | |
} | |
return medianBySort(slicedValues) | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This find an approximation of the median of the given values using median of medians. Note that this one is slightly different to the original Median of median algorithm, rather, it's more simplified version. This algorithm will try to divide the given array into K slices recursively, until its size get equal to K. Then it'll compute the exact median of sliced array, and after collecting them, it'll again compute the exact median of sub values.
This has time complexity of O(N lg K) with K if I just put O(K lg K) for sorting. It's very interesting that K merely affect the time complexity if it's reasonably small. If K is big enough with respect to N, it'll start to behave like plain sorted version.
Accuracy
Here is a small experiment of comparison between different accuracy with K. I created 10000 random arrays containing 0 ~ 1 for each K then run the algorithm. I measured the absolute difference between result and their true median. Note that standard deviation is computed using unbiased sample variance formula.
The result is following
You can run this here by yourself.
I don't have any idea why it ripples. Also there is a very important factor K^2 at line 11. If you change this into K then it's accuracy will be decreased. Subarray which is smaller than K^2 will be divided into less than K and I think this may cause the performance drop.
Caveat
The result of the experiment may change if you fix the array size. This is because of divisibility of such fixed length with respect to K. Empirically increasing K might not effective to the accuracy, or even may get worse. (Sometimes it gets better)
※ Last update at 2022-07-25 - Changed into column base rather than row base division. This enhances accuracy (still I don't know why)