from numpy_lru_cache_decorator import np_cache
@np_cache()
def function(array):
...
Sometimes processing numpy arrays can be slow, even more if we are doing image analysis. Simply using functools.lru_cache
won't work because numpy.array
is mutable and not hashable. This workaround allows caching functions that take an arbitrary numpy.array
as first parameter, other parameters are passed as is. Decorator accepts lru_cache
standard parameters (maxsize=128, typed=False)
.
>>> array = np.array([[1, 2, 3], [4, 5, 6]])
>>> @np_cache(maxsize=256)
... def multiply(array, factor):
... print("Calculating...")
... return factor*array
>>> product = multiply(array, 2)
Calculating...
>>> product
array([[ 2, 4, 6],
[ 8, 10, 12]])
>>> multiply(array, 2)
array([[ 2, 4, 6],
[ 8, 10, 12]])
User must be very careful when mutable objects (list
, dict
, numpy.array
...) are returned. A reference to the same object in memory is returned each time from cache and not a copy. Then, if this object is modified, the cache itself looses its validity.
>>> array = np.array([1, 2, 3])
>>> @np_cache()
... def to_list(array):
... print("Calculating...")
... return array.tolist()
>>> result = to_list(array)
Calculating...
>>> result
[1, 2, 3]
>>> result.append("this shouldn't be here") # WARNING, DO NOT do this
>>> result
[1, 2, 3, "this shouldn't be here"]
>>> new_result = to_list(array)
>>> result
[1, 2, 3, "this shouldn't be here"] # CACHE BROKEN!!
To avoid this mutability problem, the usual approaches must be followed. In this case, either list(result)
or result[:]
will create a (shallow) copy. If result were a nested list, deepcopy
must be used. For numpy.array
, array.copy()
must be used, as neither array[:]
nor numpy.array(array)
will make a copy.
Hey,
thanks a lot for the snippet. That's exactly what I was hoping to find!
However, this implementations seems to be rather slow when dealing with large arrays. This is due to the way the arrays are being converted to tuples. If you replace your custom function
array_to_tuple(np_array)
withtuple(map(tuple, np_array))
you'll get much better performance.in:
out: