Skip to content

Instantly share code, notes, and snippets.

@aphyr
Created November 22, 2011 00:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aphyr/1384430 to your computer and use it in GitHub Desktop.
Save aphyr/1384430 to your computer and use it in GitHub Desktop.
10:11 <seancribbs> aphyr_: the internal dilemma - to encode or not to encode
10:11 <aphyr_> Seems like Riak should be content-agnostic for keys
10:12 <aphyr_> The encoding should be protocol-dependent and reversed once past
the HTTP/protobufs interface
10:12 <aphyr_> I'm running out of special characters to use as separators, haha
10:13 justinsheehy joined
10:13 amerine joined
10:14 <aphyr_> Dashes appear in datetimes...
10:14 <seancribbs> aphyr: yeah I'm on the fence about that.
10:15 <seancribbs> in one sense, you want keys sent via pbc to be available on h
ttp
10:15 <seancribbs> on the other hand, http severely restricts what "valid" keys
are
10:15 <aphyr_> Naw, keys sent over HTTP should be url-encoded
10:16 <aphyr_> Problem solved!
10:16 <aphyr_> Real key: "1,2". Protobufs: "1,2". HTTP: "1%3A2"
10:16 <seancribbs> O.o
10:16 russelldb joined
10:17 <aphyr_> No seriously, putting arbitrary strings into HTTP URIs is a solve
d problem.
10:17 <seancribbs> yes, the question is really about how much Riak should be con
cerned with that
10:17 <seancribbs> or whether to make client libs solve it
10:17 <seancribbs> there are convincing arguments both ways
10:18 <aphyr_> Riak presents an HTTP interface. It should un-url-encode strings
in URI fragments and treat them as binary internally.
10:18 <aphyr_> Clients are responsible for encoding data for HTTP as necessary.
10:19 zerosanity joined
10:19 mattrepl joined
10:19 <aphyr_> I dunno, has any other HTTP API ever done something different?
10:22 <aphyr_> Hell, I presume riak is already doing this in other places in its
HTTP API. Inline mapreduce, for example, is transmitted over the wire URI-encod
ed.
10:23 <aphyr_> Sorry, just freaking out because this, erm, behavior caused some
major data corruption last night.
10:23 <strmpnk> aphyr_: agreed. not decoding creates some incompatible cases lik
e making it easy to create keys that can't be accessed over HTTP.
10:23 <aphyr_> I'm happy to submit a patch if you guys will consider it.
10:28 <seancribbs> aphyr_: yes, we just need a clear story of how the problem o
ccurred and why the fix is appropriate (and not too far-reaching)
10:29 <aphyr_> Sure. I used commas as separators for my keys because _ and - are
used in some of the key components already.
10:30 Kenstigator joined
10:30 <aphyr_> Ripple was perfectly happy to write and read these keys as "1,2",
but internally stored values like "1%3A2"
10:31 <aphyr_> My erlang and JS mapreduce jobs, meanwhile, were producing result
s like "1%3A2", which were then used as input to riak-client to fetch/store item
s.
10:31 <seancribbs> it might also be a matter of using URI instead of CGI
10:31 <seancribbs> that debate I'm also on the fence about
10:31 sfalcon joined
10:32 <aphyr_> As you can imagine, ripple happily re-uri-encoded "1%3A2" to "1%2
53A2"
10:32 <seancribbs> ah, right. you don't want to double-encode
10:32 <aphyr_> And shit proceeded to hit the fan
10:32 <aphyr_> There is no conceivable case I can envision where a user would wa
nt their input on the HTTP wire to be treated as literally encoded.
10:33 <aphyr_> Just adding url:decode around the key would a.) make every key ac
cessible and b.) prevent confusion over key names.
10:34 <aphyr_> Also, I think this URI-encoding strategy might break links.
10:34 <aphyr_> There's also the fact that every single HTTP API I have ever enco
untered unencodes its input. :)
10:34 moonpolysoft joined
10:35 <aphyr_> This *would* cause backwards incompatibility for users who are cu
rrently using un-url-safe strings over the HTTP interface.
10:36 siculars joined
10:36 <aphyr_> But really, if you're using un-url-safe strings as keys over HTTP
right now, you probably need to reconsider anyway. :)
10:37 <aphyr_> Probably best to make the switch sooner rather than later, to min
imize disruption.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment