Skip to content

Instantly share code, notes, and snippets.

@guillemborrell
Last active October 23, 2021 04:13
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save guillemborrell/2a11a258fbd9cfd8d8e0bc14eac83fa7 to your computer and use it in GitHub Desktop.
Save guillemborrell/2a11a258fbd9cfd8d8e0bc14eac83fa7 to your computer and use it in GitHub Desktop.
Percentiles of rolling windows in time series with Cython
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@kikocorreoso
Copy link

kikocorreoso commented Mar 3, 2017

import numpy as np
from numpy.lib.stride_tricks import as_strided
import pandas as pd

arr = np.random.random(100000)
s = pd.Series(arr)

# ad-hoc rolling window
def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return as_strided(a, shape=shape, strides=strides)

# Result with pure numpy
res_np = np.percentile(
    rolling_window(arr, 10),
    10,
    1, # axis
    None,
    False,
    'lower'
)

# Result with pandas
tmp = s.rolling(10).apply(np.percentile, args=(10, None, None, False,'lower'))
res_pd = tmp[tmp.notnull()]

# check if all values are similar
print(all(res_np == res_pd))

# Performance (pandas / numpy)
t_np = %timeit -o np.percentile(rolling_window(arr, 10), 10, 1, None, False, 'lower')
t_pd = %timeit -o s.rolling(10).apply(np.percentile, args=(10, None, None, False,'lower'))
print(t_pd.best / t_np.best)

Results:

True
100 loops, best of 3: 13.9 ms per loop
1 loop, best of 3: 5.06 s per loop
364.2848075594796

Reference: http://stackoverflow.com/a/6811241

@guillemborrell
Copy link
Author

guillemborrell commented Mar 6, 2017

@kikocorreoso

Que conste que en mi máquina la versión de numpy tarda 24.9 ms.

Con pequeñas modificaciones a la función window_percentile:

@cython.boundscheck(False)
def window_percentile(np.ndarray[double, ndim=1] arr, int window, double perc):
    # Create numpy array and set auxiliary array. This requires the GIL.
    cdef int imax = arr.shape[0]
    cdef np.double_t[:] buffer = <np.double_t [:imax]>malloc(imax * sizeof(double))
    cdef double NAN = math.NAN
    cdef int i, j, stride
    
    with nogil, parallel():
        for i in prange(imax, schedule='dynamic'):
            stride = i-(window-1)
            if i >= window - 1:
                buffer[i] = atomic_percentile(&arr[stride], window, perc)
            else:
                buffer[i] = NAN

    return np.asarray(buffer)

Lo bajo a 13.8 ms. Parte de la gracia de Cython es levantar el GIL y poder correr con varios cores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment