Ok, now for some diagnosis.
If it's a set/map that's large, the problem is almost certainly ordsets
/orddict
. We want to look at this more. Unfortunately, it's not a search/replace to swap out a different implementation. @seancribbs has ported HashSet
/HashDict
from elixir to erlang (https://github.com/seancribbs/hashtypes), which we may well use. We could swap to sets
/dict
, but their equality is broken. Converting to hashtypes is the lowest hanging fruit currently for increasing performance (though this requires a benchmark, not words on a page from me, without a benchmark).
This is almost certainly the problem when it comes to the exemplar GET performance issues. CRDTs go through a nontrivial decode from riak_object
-> riak_dt_*
-> { protobuffs | http/json }
. In the conversion from dt datatype to protobuffs or json, we call riak_dt_*:value/1
, which does a large orddict traversal. Hello O(n)
.
As for general approaches to diagnosis:
- Issues with get? Instrument
riak_dt_*:value/1
- Issues with put? Instrument
riak_dt_*:update/3
- These don't turn up anything? Instrument
riak_dt_*:to_binary/1
and riak_dt_*:from_binary/1
Which riak_dt_*
module?
- Counters:
riak_dt_pncounter
- Sets:
riak_dt_orswot
- Maps:
riak_dt_map
There's also riak_kv_crdt
. This is the module that does stats, and proxies through calls to the right riak_dt_*
module using information from the cluster. It's probably not worth debugging. I realise it uses orddict
in places, but these should have maximum 3 or so entries (one for each type), so are ignorable.