Skip to content

Instantly share code, notes, and snippets.

@sdebnath
Last active July 13, 2018 09:39
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save sdebnath/36c235e042cb35db7d1f to your computer and use it in GitHub Desktop.
Save sdebnath/36c235e042cb35db7d1f to your computer and use it in GitHub Desktop.
Add field to Riak YZ Schema with CRDTs
This gist captures what needs to be done to add a new field to Riak's Yokozuna
search index.
Sources:
- https://github.com/basho/yokozuna/issues/130
- http://riak-users.197444.n3.nabble.com/How-to-update-existed-schema-td4032143.html
The code below is for illustration purposes only. Use at your own risk.
1. Create/Update new schema file
2. Upload schema to main node
cat schema/my_bucket.xml | curl -XPUT http://127.0.0.1:49001/search/schema/my_bucket -H 'Content-Type:application/xml' --data-binary @-
3. Reload YZ index on each node
a. individual rpc calls on each node:
rpc:block_call('riak@172.17.0.1', yz_index, reload, [<<"my_bucket">>]).
rpc:block_call('riak@172.17.0.2', yz_index, reload, [<<"my_bucket">>]).
rpc:block_call('riak@172.17.0.3', yz_index, reload, [<<"my_bucket">>]).
b. via multicall
rpc:multicall(['riak@172.17.0.1','riak@172.17.0.2','riak@172.17.0.3'], yz_index, reload, [<<"my_bucket">>]).
If all is well then you should get {ok, Nodes} where Nodes is the
list of nodes in your Riak cluster. If something goes wrong
you'll get {error, Errors} where Errors is a list of errors
for each node that had an error.
At this point any new data inserted is searchable. To get old data re-indexed
with new field definition, we need to read/write all keys in the bucket
18> {ok, Keys} = riakc_pb_socket:list_keys(Pid, {<<"my_bucket">>,<<"my_bucket">>}).
19> lists:foreach(fun(E) -> {ok, Post} =
{ok, M1} = riakc_pb_socket:fetch_type(Pid, {<<"my_bucket">>, <<"my_bucket">>}, E),
M2 = riakc_map:update({<<"some_field">>, set}, fun(S) -> riakc_set:add_element(<<"1">>, S), riakc_set:del_element(<<"1">>, S) end, M1),
riakc_pb_socket:update_type(Pid, {<<"my_bucket">>, <<"my_bucket">>}, E, riakc_map:to_op(M2)) end, Keys).
WARNING: the code above can wreck havoc on your cluster, esp. if you have gazillions
of keys. Think carefully. Unfortunately, this is the only way to achieve what we
need to do as of 06/26/2015
@danqing
Copy link

danqing commented Aug 18, 2015

Hey @sdebnath thanks for the gist! I have two questions:

  1. Where should rpc:block_call('riak@172.17.0.1', yz_index, reload, [<<"my_bucket">>]). be run? In the shell of the riak machine? What's rpc:block_call?
  2. I'd like to understand better what it means by "get old data re-indexed with new field definition". First iiuc we should not remove fields from schema right? And if we add fields, old data should still be query-able as before, just the fields new to the new schema won't work for them until they are re-PUT. Is that correct?

Thanks again :)

@AliakseiMat
Copy link

This commands "rpc" should be performed in Erlang console. You can get this console through command: riak console or riak attach if your node is running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment