Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Percentiles of rolling windows in time series with Cython
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Copy link

kikocorreoso commented Mar 3, 2017

import numpy as np
from numpy.lib.stride_tricks import as_strided
import pandas as pd

arr = np.random.random(100000)
s = pd.Series(arr)

# ad-hoc rolling window
def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return as_strided(a, shape=shape, strides=strides)

# Result with pure numpy
res_np = np.percentile(
    rolling_window(arr, 10),
    1, # axis

# Result with pandas
tmp = s.rolling(10).apply(np.percentile, args=(10, None, None, False,'lower'))
res_pd = tmp[tmp.notnull()]

# check if all values are similar
print(all(res_np == res_pd))

# Performance (pandas / numpy)
t_np = %timeit -o np.percentile(rolling_window(arr, 10), 10, 1, None, False, 'lower')
t_pd = %timeit -o s.rolling(10).apply(np.percentile, args=(10, None, None, False,'lower'))
print( /


100 loops, best of 3: 13.9 ms per loop
1 loop, best of 3: 5.06 s per loop


Copy link

guillemborrell commented Mar 6, 2017


Que conste que en mi máquina la versión de numpy tarda 24.9 ms.

Con pequeñas modificaciones a la función window_percentile:

def window_percentile(np.ndarray[double, ndim=1] arr, int window, double perc):
    # Create numpy array and set auxiliary array. This requires the GIL.
    cdef int imax = arr.shape[0]
    cdef np.double_t[:] buffer = <np.double_t [:imax]>malloc(imax * sizeof(double))
    cdef double NAN = math.NAN
    cdef int i, j, stride
    with nogil, parallel():
        for i in prange(imax, schedule='dynamic'):
            stride = i-(window-1)
            if i >= window - 1:
                buffer[i] = atomic_percentile(&arr[stride], window, perc)
                buffer[i] = NAN

    return np.asarray(buffer)

Lo bajo a 13.8 ms. Parte de la gracia de Cython es levantar el GIL y poder correr con varios cores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment