Skip to content

Instantly share code, notes, and snippets.

@krestenkrab
Last active August 29, 2015 14:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save krestenkrab/374a9a06210db0558102 to your computer and use it in GitHub Desktop.
Save krestenkrab/374a9a06210db0558102 to your computer and use it in GitHub Desktop.
hash treee optimizations

hashtree.erl optimization ideas

On-disk format for segment data

The current on-disk format is that each KV is stored in the tree file as

<<$t,TreeId:22/binary,$s,Segment:64/integer,Key/binary>>

TreeId is 22 bytes (20 bytes from the vnode id, and 2 bytes for N value). The implementation in hashtree uses 1M segments, and the Segment is stored in a 64 bit unsiged. Finally, the Key is usually term_to_binary({Bucket,Key}).

See https://github.com/basho/riak_core/blob/develop/src/hashtree.erl#L651-L662

All in all, we can shave this down significantly, saving valuable I/O.

Imagine this:

Key = <<BucketLen:16,Bucket/binary,Key/binary>>,
TreeId = <<Partition:24, N:8>>
<<$t, TreeId/binary, $s,Segment:32,Key/binary>>

This would shave 33 bytes off each such entry. For a system with 1B keys, N=3, that is an over all reduction of the storage load with 100GB. That's a lot of data in need of being read to do a rehash.

Limitations:

  • can only have 2^32 segments
  • can only have 2^24 partitions
  • max N-value is 255.

But I could live with that.

Hashing

This is a minor optimization compared to the above.

When hashing segments, we load up the full list of {<<Key>>,<<Hash>>}, then run term_to_binary on it, and then pass it to crypto:sha. I imagine, we could save some by utilizing knowledge of the erlang term format.

The problem as I see it, is that the term_to_binary call could result in a very large binary; and a need to copy a whole lot of data.

The associalted sha_test.erl proves me wrong, btw.

-module(sha_test).
-export([test/0]).
hash_segment_keylist(List) ->
Length = length(List),
Init = <<131,108,Length:32>>,
Ctx0 = crypto:hash_init(sha),
Ctx = lists:foldl(fun({Key,Hash},Ctx1) ->
Len1 = byte_size(Key),
Len2 = byte_size(Hash),
IOList = [ <<104,2,109,Len1:32>>, Key,
<<109,Len2:32>>, Hash ],
crypto:hash_update(Ctx1, IOList)
end,
crypto:hash_update(Ctx0, Init),
List),
crypto:hash_final(crypto:hash_update(Ctx, <<106>>)).
hash_keylist2(List) ->
crypto:hash(sha, term_to_binary(List)).
test_list(N) ->
[ { << 0:480 >>, <<0:160>> } || _ <- lists:seq(1,N) ].
test() ->
TL = test_list(10000),
io:format("normal : ~p~n", [ timer:tc( fun hash_keylist2/1, [TL]) ]),
io:format("special: ~p~n", [ timer:tc( fun hash_segment_keylist/1, [TL]) ]),
io:format("normal : ~p~n", [ timer:tc( fun hash_keylist2/1, [TL]) ]),
io:format("special: ~p~n", [ timer:tc( fun hash_segment_keylist/1, [TL]) ]),
ok.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment