Skip to content

Instantly share code, notes, and snippets.

@macintux
Last active December 16, 2015 10:09
Show Gist options
  • Save macintux/5418295 to your computer and use it in GitHub Desktop.
Save macintux/5418295 to your computer and use it in GitHub Desktop.
Draft of a blog post (or possibly a series of posts) diving into the meat of several key Riak configuration parameters. Internal links do not work, but I haven't worried about that since I'm not sure how closely GFM maps to Basho's blogging platform.
@macintux
Copy link
Author

To summarize what I think I know about notfound_ok vs basic_quorum:

If R=1, the number of notfound responses required to trigger a notfound to the client:

  • Default behavior: N
  • basic_quorum=true: quorum
  • notfound_ok=false: 1

@seancribbs
Copy link

@macintux, your third point should be notfound_ok=true, since it treats notfound toward the request quorum, instead of against it.

@macintux
Copy link
Author

I suppose to generalize it...

The number of notfound responses required to trigger a notfound to the client:

  • Both false: N - (R - 1)
  • basic_quorum=true: quorum - (R - 1)
  • notfound_ok=true (default): R

Will try to verify that in the code.

@macintux
Copy link
Author

Reasonably happy with it at this point, seeking more intensive engineering review.

@dmitrizagidulin
Copy link

I second @coderoshi's comment about "Under DW: "value be written to the backend" should probably clarify by comparing W does not wait for the backend to reply. Better to be a little too pedantic than not enough in this case."

In the "Readin' and Writin' (R and W)" section, it should emphasize that "successfully read or write" means X nodes have to acknowledge the request, but not necessarily write it to the back end (except for at least 1, since DW is set to a minimum of 1). Which is more confusing, but necessary to emphasize.

@macintux
Copy link
Author

Thanks Dmitri. I've updated the DW section since then, but will look at W.

@dmitrizagidulin
Copy link

In general, I want to say - this is an amazing writeup, and I learned a ton from it. I'm excited that this is going out on the blog (and into the docs).

One other tweak request: The notfound tuning / basic_quorum section is a bit unclear. (It took me much re-reading, and I'm not quite sure I still get it). Would it be possible to add even a one-sentence explanation of basic_quorum? And in the table illustrating the various behaviors of notfound and basic quorum, can you explicitly expand the leftmost column to list the various combinations? Something like:
| notfound_ok=true and basic_quorum=false (Default/standard behavior) |
| notfound_ok=false and basic_quorum=true |
| notfound_ok=false and basic_quorum=false |

And maybe add an explanation of which use cases / situations would warrant the other two non-default (slower) settings?

@macintux
Copy link
Author

Thanks. I'll work on that. Part of my problem is that I only vaguely grasp the impact of basic_quorum, so conveying its meaning is probably beyond my ken.

@macintux
Copy link
Author

I blew it: the section on notfound_ok is badly wrong. I'm not sure how to generalize the results, but here's what I found for the least optimal scenario.

First: there's a key parameter called FailureThreshold which is used to terminate requests early. If notfound_ok is set to false, that comes into play when dealing with missing keys.

The body values in this table indicate the value for the failure threshold based on true/false values for basic_quorum and the R value for a request:

basic_quorum R=1 R=2 R=3
true 2 2 1
false 3 2 1

As Sean indicated, the key R value that basic_quorum is designed to work with is R=1. It's irrelevant for any other R value.

To carry this analysis forward, let's assume we're asking 3 vnodes for a key, and the first 2 to respond do not have a value for that key, but the 3rd does.

This next table indicates the success vs. failure tallies accumulated by the FSM after each vnode responds; notfound_ok comes into play here, because setting that to true means that it's "ok" to get a notfound response.

notfound_ok vnode 1 vnode 2 vnode 3
true 1/0 2/0 3/0
false 0/1 0/2 1/2

(So if we look at notfound_ok=false, after the 2nd vnode has replied with its notfound, the current tally is 0 successes and 2 failures. It's not until we get to the 3rd vnode's response with a value for the requested key that we finally have a 1 in the success tally.)

Now to decide at which point the client is informed of success or failure, and whether the client is informed of success or failure, we have to (I think) look at enough/1 in riak_kv_get_core.erl. Abbreviated a bit, its logic reads:

if
    NumOk >= R ->
        true;
    NumFail >= FailThreshold ->
        true;

In other words, as soon as the successful value is >= R or the failure value is >= FailThreshold we'll stop waiting for responses and provide a response to the client.

When we merge the data above into a table that indicates the number of vnodes that must respond before the client is informed of the results, we get interesting results.

If we want the value that's only present on the 3rd vnode, then, we have to wait for the response from all 3 vnodes, as highlighted below.

basic_quorum notfound_ok R=1 R=2 R=3
true true 1 2 3
false true 1 2 3
true false 2 2 1
false false 3 2 1

In other words, in this scenario, there are only 3 (really, 2) combinations of configuration values which will return the desired value to the client. (To be clear, all combinations will fix it via read repair after the request is completed.)

If we set R=3 and notfound_ok=true, we'll always get the value.

Alternatively, we can set R=1, basic_quorum=false, and notfound_ok=false.

Configuring the request to fail quickly rather than scan all vnodes unnecessarily is trivial: set notfound_ok=true, which is our default.

@macintux
Copy link
Author

Replaced all of the notfound tuning content. Definitely a rough draft.

@macintux
Copy link
Author

Extended it a bit to talk about why R=1 is interesting for basic_quorum as an illustration for the performance rationale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment