{{ message }}

Instantly share code, notes, and snippets.

# guillemborrell/series.ipynb

Last active Oct 23, 2021
Percentiles of rolling windows in time series with Cython
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

### kikocorreoso commented Mar 3, 2017 • edited

```import numpy as np
from numpy.lib.stride_tricks import as_strided
import pandas as pd

arr = np.random.random(100000)
s = pd.Series(arr)

def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return as_strided(a, shape=shape, strides=strides)

# Result with pure numpy
res_np = np.percentile(
rolling_window(arr, 10),
10,
1, # axis
None,
False,
'lower'
)

# Result with pandas
tmp = s.rolling(10).apply(np.percentile, args=(10, None, None, False,'lower'))
res_pd = tmp[tmp.notnull()]

# check if all values are similar
print(all(res_np == res_pd))

# Performance (pandas / numpy)
t_np = %timeit -o np.percentile(rolling_window(arr, 10), 10, 1, None, False, 'lower')
t_pd = %timeit -o s.rolling(10).apply(np.percentile, args=(10, None, None, False,'lower'))
print(t_pd.best / t_np.best)```

Results:

```True
100 loops, best of 3: 13.9 ms per loop
1 loop, best of 3: 5.06 s per loop
364.2848075594796```

Reference: http://stackoverflow.com/a/6811241

### guillemborrell commented Mar 6, 2017 • edited

@kikocorreoso

Que conste que en mi máquina la versión de numpy tarda 24.9 ms.

Con pequeñas modificaciones a la función window_percentile:

```@cython.boundscheck(False)
def window_percentile(np.ndarray[double, ndim=1] arr, int window, double perc):
# Create numpy array and set auxiliary array. This requires the GIL.
cdef int imax = arr.shape[0]
cdef np.double_t[:] buffer = <np.double_t [:imax]>malloc(imax * sizeof(double))
cdef double NAN = math.NAN
cdef int i, j, stride

with nogil, parallel():
for i in prange(imax, schedule='dynamic'):
stride = i-(window-1)
if i >= window - 1:
buffer[i] = atomic_percentile(&arr[stride], window, perc)
else:
buffer[i] = NAN

return np.asarray(buffer)```

Lo bajo a 13.8 ms. Parte de la gracia de Cython es levantar el GIL y poder correr con varios cores.