from numpy_lru_cache_decorator import np_cache
@np_cache()
def function(array):
...
Sometimes processing numpy arrays can be slow, even more if we are doing image analysis. Simply using functools.lru_cache
won't work because numpy.array
is mutable and not hashable. This workaround allows caching functions that take an arbitrary numpy.array
as first parameter, other parameters are passed as is. Decorator accepts lru_cache
standard parameters (maxsize=128, typed=False)
.
>>> array = np.array([[1, 2, 3], [4, 5, 6]])
>>> @np_cache(maxsize=256)
... def multiply(array, factor):
... print("Calculating...")
... return factor*array
>>> product = multiply(array, 2)
Calculating...
>>> product
array([[ 2, 4, 6],
[ 8, 10, 12]])
>>> multiply(array, 2)
array([[ 2, 4, 6],
[ 8, 10, 12]])
User must be very careful when mutable objects (list
, dict
, numpy.array
...) are returned. A reference to the same object in memory is returned each time from cache and not a copy. Then, if this object is modified, the cache itself looses its validity.
>>> array = np.array([1, 2, 3])
>>> @np_cache()
... def to_list(array):
... print("Calculating...")
... return array.tolist()
>>> result = to_list(array)
Calculating...
>>> result
[1, 2, 3]
>>> result.append("this shouldn't be here") # WARNING, DO NOT do this
>>> result
[1, 2, 3, "this shouldn't be here"]
>>> new_result = to_list(array)
>>> result
[1, 2, 3, "this shouldn't be here"] # CACHE BROKEN!!
To avoid this mutability problem, the usual approaches must be followed. In this case, either list(result)
or result[:]
will create a (shallow) copy. If result were a nested list, deepcopy
must be used. For numpy.array
, array.copy()
must be used, as neither array[:]
nor numpy.array(array)
will make a copy.
I've been trying to adapt this implementation to pandas DataFrame, but I am still struggling with arguments other than the first one which in this case is a Pandas dataframe.
My idea is to decompose the dataframe into a tuple with three elements, the first one is a tuple containing the index, the second one a tuple with the columns and the third one, using the array_to_tuple function contains the dataframe values.
However I am having problems with a code like this:
getting the following error:
I cannot understand where is the difference with the numpy cached version.