Skip to content

Instantly share code, notes, and snippets.

@Phryxia
Last active July 25, 2022 07:01
Show Gist options
  • Save Phryxia/1a0a0f5d70cd265ab3ec2cfe9a04eb92 to your computer and use it in GitHub Desktop.
Save Phryxia/1a0a0f5d70cd265ab3ec2cfe9a04eb92 to your computer and use it in GitHub Desktop.
TypeScript implementation of median value finder
function medianBySort(values: number[]) {
values.sort((a, b) => a - b)
const half = Math.floor(values.length / 2)
if (values.length % 2) {
return values[half]
}
return (values[half - 1] + values[half]) * 0.5
}
function medianByK(values: number[], k: number): number {
if (values.length < k ** 2) {
return medianBySort(values)
}
const slicedValues = [] as number[]
for (let i = 0; i < values.length; i += k) {
slicedValues.push(medianByK(values.slice(i, i + k), k))
}
return medianBySort(slicedValues)
}
@Phryxia
Copy link
Author

Phryxia commented Jul 24, 2022

This find an approximation of the median of the given values using median of medians. Note that this one is slightly different to the original Median of median algorithm, rather, it's more simplified version. This algorithm will try to divide the given array into K slices recursively, until its size get equal to K. Then it'll compute the exact median of sliced array, and after collecting them, it'll again compute the exact median of sub values.

This has time complexity of O(N lg K) with K if I just put O(K lg K) for sorting. It's very interesting that K merely affect the time complexity if it's reasonably small. If K is big enough with respect to N, it'll start to behave like plain sorted version.

let N = K^d for some natural number d.
T(K) = K lg K
T(N) = K * T(N / K)
     = K^2 * T(N / K^2)
     = K^3 * T(N / K^3)
     = K^(d - 1) * T(N / K^(d - 1))
     = K^(d - 1) * K lg K
     = N lg K

Accuracy

Here is a small experiment of comparison between different accuracy with K. I created 10000 random arrays containing 0 ~ 1 for each K then run the algorithm. I measured the absolute difference between result and their true median. Note that standard deviation is computed using unbiased sample variance formula.

function getStatistics(getSample: () => number, n: number): { average: number, stdvar: number } {
    const samples = [] as number[]
    for (let i = 0; i < n; ++i) {
        samples.push(getSample())
    }
    let average = 0
    for (const sample of samples) {
        average += sample
    }
    average /= n

    let variance = 0
    for (const sample of samples) {
        variance += (sample - average) ** 2
    }
    variance /= n - 1

    return { average, stdvar: variance ** 0.5 }
}

const T = 10000
const L_MAX = 1000

let str = '\nK\tavg\tstdvar'

for (let k = 3; k <= 16; ++k) {
    const { average, stdvar } = getStatistics(() => {
        const L = Math.floor(Math.random() * L_MAX + 1)
        const values = Array.from(new Array(L), () => Math.random())
        const median = medianByK(values, k)
        return Math.abs(median - medianBySort(values))
    }, T)

    str += `\n${k}\t${average}\t${stdvar}`
}

console.log(str)

The result is following

image

image

You can run this here by yourself.

I don't have any idea why it ripples. Also there is a very important factor K^2 at line 11. If you change this into K then it's accuracy will be decreased. Subarray which is smaller than K^2 will be divided into less than K and I think this may cause the performance drop.

Caveat

The result of the experiment may change if you fix the array size. This is because of divisibility of such fixed length with respect to K. Empirically increasing K might not effective to the accuracy, or even may get worse. (Sometimes it gets better)

※ Last update at 2022-07-25 - Changed into column base rather than row base division. This enhances accuracy (still I don't know why)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment