russelldb/Data Types.md Secret

## Data Types.md

      
    Raw
  

              Data Types.md
            
          
    Data Types in Riak

Riak 1.4 added Counters to Riak. This was the first time data stored
in Riak was not treaed as an opaque blob. With Counters, Riak was able
to not only detect conflicting writes, but also resolve them,
semantically at the server.
Counters are OK, but you can't build a whole application on them. Riak
2.0 adds more Data Types.
Programmers are used to composing their applications state from
primative data types (like Booleans, Registers, Sets, Maps etc.) If
you can model your application in terms of these data types, and you
can accept their semantics, you will never see sibling values, or need
to write a merge function again.
Top Level

Counters

As in Riak 1.4.
Sets

Sets are collections of things. In Riak we expect you to store
binaries, which is how we encode Strings of text in Erlang. The kind
of thing you’d store in a Set might be the members of a team or
department, followers on social network, or maybe objects in some real
world collection.
You may send a list of operations. The list may contain both Add ElementX and Remove ElementY operations. If you are removing
elements we strongly recommend that you first fetch the Set and it’s
context, and send the context with the remove operation(s.)
Why? See below.
All operations are executed atomically at the coordinating replica. If
any operation in the list fails (only removes can fail!)  then none of
the operations are applied.
Map

A Map is way to compose Data Types into a richer, more complex
structure. A Map is a collection of fields. A field is a {name, Data Type} pair. This is so that we don’t have to deal with merging
fields of a different type. If two fields with the same name but
different types are added to a Map, then they’re two different
fields. You may only store Data Types in a Map.
You may send a list of operations. These are either field operations,
or field update operations. Field operations Add or Remove fields from
the Map. I find it helps to think of the Map as a schema for a (JSON
like?) document. Field operations alter the schema of the Map.
Field Update Operations act on the data stored in the Map. You may
send any number of operations batched together. You may mix Field
Operations and Field Update Operations.
We strongly recommend you send a context with any batch of
operations that contain a Field or Set element Remove, no matter how
deeply nested in the Map.
You do not need to explicitly create a field. Updating a field that is
not present at the coordinating replica will create and update the
field.
Second Level

You can store any of the top level types in a field in a Map,
including a Map. And we’ve also added:-
Registers

A binary value. It might be an email address, or a first name
Flags

A Boolean.
Bucket Types

In order to use the new Data Types, you must create a Bucket Type with
the property datatype set to one of counter, set, map. The
bucket type must also have the property allow_mult=true.
riak-admin bucket-type create mytype '{"props": {"datatype": map, "allow_mult": true}}'
riak-admin bucket-type activate mytype

Semantics

The Data Types are still Eventually Consistent. Counters are still not
Idempotent.
Registers

Last Write Wins using a timestamp on the node handling the write. All
the caveats about clock synchronization therefore apply.
Add Wins

The semantic we’ve chosen for the Set, Map and Flag is “Add Wins”. The
literature also calls this “Observed Remove” but that is an
implementation detail of how the Add Wins.
Set

When any pair of operations on a Set are concurrent, and one adds an
element, while the other removes it, the add wins. If the remove
causally follows the Add, then the Remove is effective. Concurrent
operations on different elements work as you’d expect.
Map

The Map borrows its behaviour directly from the Set. Except that every
time you update the contents of a field (say increment the counter in
the “likes” field, or add a buddy to the “follows” field) then that
counts as “adding” the field. This way a concurrent removal of a field
with an update to a field will see the update winning. Add wins again.
Client API

For Riak 2.0pre5 we ony have a PB interface. So far only the
Riak-Erlang-Client has implemented it.
Example:

See this gist
Remove Context

Why the context? Two reasons, the first is simple:
We don’t allow you to remove something from a Set / Map that is not
there. Since there is no guarantee that the replica coordinating your
remove operation(s) contains the value(s) you want to remove (imagine
an empty fallback spun up to accept the request) the context “seeds”
the handling replica with the values you’ve seen. If you don’t send
the context, and the replica doesn’t have the value(s) you want to
remove, the operations fails with “precondition failure” error. A
precondition of removing an element or Field is that it is present.
The second reason is more subtle. Without a context for a remove, you
may remove more than you planned to. The “Add Wins” semantic is based
on “Observed Remove”, which means only remove that which you have
seen. The context tells the replica handling the operation what you’ve
seen. If an “Add” for the element you want to remove was handled or
seen by the replica after you sent your remove, and there was no
context, the remove would win over the concurrent add. There maybe
times you want this, but in general, use the context for removes.
The context is a compact binary encoding of the Set or Map. We hope to
minimize it further in future releases.
More Details?

See https://gist.github.com/russelldb/f92f44bdfb619e089a4d