Comparison
julia> includet("./tmp.jl")
julia> @time user_code(LRU_getindex);
6.539568 seconds (840.07 k allocations: 69.739 MiB, 0.18% gc time, 1.18% compilation time: 33% of which was recompilation)
julia> @time user_code(illegal_getindex);
3.374077 seconds (252.80 k allocations: 53.082 MiB, 2.87% compilation time: 25% of which was recompilation)
Speculation of the root problem
The illegal approach avoids expensive "hash table look up", since it first check if the cluster range is cached.
One observations is that, the slowdown ratio widens as ClusterSize
increases, which is counter intuitive. This can only be explained if the majority of time is spent on checking if a cluster is cached, which happens every getindex()
. The illegal approach avoids it.
Based on this, I have also tried to use ConcurrentDict, which seems to be a linear probe dict (no hash table), but the performance is not much better