from numpy_lru_cache_decorator import np_cache
@np_cache()
def function(array):
...
Sometimes processing numpy arrays can be slow, even more if we are doing image analysis. Simply using functools.lru_cache
won't work because numpy.array
is mutable and not hashable. This workaround allows caching functions that take an arbitrary numpy.array
as first parameter, other parameters are passed as is. Decorator accepts lru_cache
standard parameters (maxsize=128, typed=False)
.
>>> array = np.array([[1, 2, 3], [4, 5, 6]])
>>> @np_cache(maxsize=256)
... def multiply(array, factor):
... print("Calculating...")
... return factor*array
>>> product = multiply(array, 2)
Calculating...
>>> product
array([[ 2, 4, 6],
[ 8, 10, 12]])
>>> multiply(array, 2)
array([[ 2, 4, 6],
[ 8, 10, 12]])
User must be very careful when mutable objects (list
, dict
, numpy.array
...) are returned. A reference to the same object in memory is returned each time from cache and not a copy. Then, if this object is modified, the cache itself looses its validity.
>>> array = np.array([1, 2, 3])
>>> @np_cache()
... def to_list(array):
... print("Calculating...")
... return array.tolist()
>>> result = to_list(array)
Calculating...
>>> result
[1, 2, 3]
>>> result.append("this shouldn't be here") # WARNING, DO NOT do this
>>> result
[1, 2, 3, "this shouldn't be here"]
>>> new_result = to_list(array)
>>> result
[1, 2, 3, "this shouldn't be here"] # CACHE BROKEN!!
To avoid this mutability problem, the usual approaches must be followed. In this case, either list(result)
or result[:]
will create a (shallow) copy. If result were a nested list, deepcopy
must be used. For numpy.array
, array.copy()
must be used, as neither array[:]
nor numpy.array(array)
will make a copy.
Hi, @CarloNicolini
I spotted 3 lines that were giving you errors:
First, the decorator must be called, not only mentioned, as in
@df_cache()
instead of@df_cache
. Notice the parenthesis. This could be worked around with something like this https://stackoverflow.com/questions/3931627/how-to-build-a-decorator-with-optional-parametersSecond, my exception handling in
array_to_tuple
was intended to handle the final case when recursion tries to unpack a single element. In your codedataframe_to_tuple
, that exception handling was hiding an error thrown when constructing the returned tupled. That error is thattuple
expects a single iterable parameter. You can instead create a tuple implicitly asinstead of
Finally,
test
function was failing to me becausecached_wrapper
was passing tofunction
a tuple, and not a dataframe as is expected. So thisshould be just
This is the complete code that works for me
Bear in mind that caching full dataframes this way could result in a lot of RAM usage. Maybe this post could be interesting if you run into memory overhead issues https://stackoverflow.com/questions/23477284/memory-aware-lru-caching-in-python