Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created November 29, 2010 22:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PharkMillups/720780 to your computer and use it in GitHub Desktop.
Save PharkMillups/720780 to your computer and use it in GitHub Desktop.
14:11 <quentusrex> Anyone know if there is a simple way to pull
documents(about 3 million documents) from a couchdb server into a
new riak cluster?
14:12 <benblack> the only way of which i am aware is reading the
documents and writing them to riak.
14:14 <quentusrex> thanks benblack I was afraid of that.
14:14 <benblack> why?
14:14 <benblack> seems like a pretty short shell script
14:14 <benblack> and 3 million is a pretty small number of documents
14:14 <quentusrex> yes, short. but time consuming based on the number of
documents
14:14 <quentusrex> oops,
14:14 <quentusrex> billion,
14:15 <benblack> 3 billion is rather more, yes
14:15 <quentusrex> not million
14:15 <benblack> surprised you have that in couchdb
14:15 <quentusrex> bigcouch
14:15 <benblack> that makes much more sense
14:16 <quentusrex> We fully moved the data from a sharded postgresql cluster,
to a bigcouch cluster
14:16 <quentusrex> But after disappointments, and some data loss, we are trying to test riak
14:21 <timanglade> benblack: Andy & I had talked about writing some sort
of binary format translator for use-cases like the one outlined by quentusrex.
Any obvious reasons why that wouldn't be possible?
14:22 <benblack> you mean converting the couch btrees to bitcask files?
14:23 <timanglade> obviously, you'd have to unmarshall & re-marshall the documents
14:23 <timanglade> but yeah
14:23 <benblack> no reason you couldn't do that in erlang
14:23 <timanglade> any other projects that could work with?
14:23 <benblack> the main problem being distribution around the ring
14:23 <timanglade> hm right this is where my limited knowledge of Riak stops.
But couldn't we initially load it all on one node and let the ring re-distribute it all?
14:24 <benblack> at EF SF i suggested having a relational db converter and
replication slave
14:24 <benblack> no
14:24 <quentusrex> I would be fine with the ability to tell each node in the
riak cluster to import documents from X to Y range
14:24 <benblack> you'd have to have a single node ring, load the data,
then add new nodes
14:24 <timanglade> benblack: yes
14:24 <timanglade> that was my idea
14:24 <benblack> quentusrex: doesn't work that way
14:24 <timanglade> quentusrex you'd be starting from scratch, right?
14:25 <benblack> quentusrex: placement in a given partition is based on
the sha1 hash, not the key
14:25 <quentusrex> At this point in time, just having the data into
riak safely would make my weekend.
14:25 <timanglade> still wondering if a binary format translation +
redistribution would be faster than a script export/import
14:26 <timanglade> depends how “parallel” the Riak re-distribution can
be, I guess
14:26 <quentusrex> are you talking about the binary files? or binary
interface to couchdb?
14:26 <timanglade> binary files
14:26 <quentusrex> aah, in that case I could just install a
riak node on the couchdb cluster,
14:27 <quentusrex> but my question would be how riak would handle one
node(the one installed on the couchdb cluster) constantly filling with data.
14:27 <quentusrex> At what point would the node start pushing the data off
of the local node to the other nodes in the cluster.
14:27 <timanglade> not well I'd think. As benblack said, you'd need to start with a single-node
riak. Then add more Riak node once the import is done.
14:28 <timanglade> again, not a Riak expert so I'll let somebody else confirm this
14:28 <quentusrex> that would not work well in my case.
No one node could handle all of the data.
14:28 <timanglade> but may I suggest giving Cloudant a try? ^^
14:28 <quentusrex> I am using Cloudant's bigcouch right now
14:28 <benblack> "aah, in that case I could just install a riak node on the couchdb cluster," again, no
14:29 <benblack> you cannot have mixed bigcouch/riak clusters
14:29 <benblack> converting as tim is proposing would be just the files
14:29 <acts_as> the one downside to using riak as my web crawler, instead
of hadoop, is that with no master -- I can't call it my crawl plantation.
14:29 <benblack> and would require boostrapping a cluster in a specific way
14:29 <benblack> acts_as: anarcho-syndicalist commune
14:29 <timanglade> quentusrex: did you share your issues with the Cloudant
guys already? I'm sure they’d love to know what your problems are, even
if you've already set your mind on moving to Riak.
14:30 <acts_as> that's getting there
14:30 <quentusrex> I am not the one who is working on the bigcouch issues.
I'm looking for a replacement. But I will make sure the issues get pushed
to the developers.
14:30 <timanglade> benblack: I think he was just trying to have the binary
file translation done locally on a single server, not do a mixed cluster
14:31 <timanglade> quentusrex: hit me up on tim@cloudant.com or anybody they’re used to dealing with, I'll make sure the appropriate people hear about it
14:31 <quentusrex> will do
14:31 <benblack> quentusrex: were you thinking above that you could
have a riak node participate in a couch cluster?
14:32 <quentusrex> no,
14:32 <quentusrex> Just to remove the network latency for the file translation.
14:32 <benblack> so, not in the cluster
14:32 <quentusrex> Not in the cluster, but on one of the nodes in the cluster.
14:32 <timanglade> quentusrex: so your data does not fit on a single
machine because of what… Storage limitations?
14:33 <benblack> given the data is not on that one node i doubt that buys you much
14:33 <timanglade> ie, disk space?
14:33 <quentusrex> timanglade, correct. It is possible to build a
server with drive space to store it,
14:33 <quentusrex> but our current hardware is limited to 750GB per server.
14:34 <benblack> how large are the documents?
14:34 <timanglade> 3B HTTP lookups is a lot to start with but if you also
add the constraint of moving petabyte data…
14:34 <benblack> each
14:35 <quentusrex> I would estimate about half are roughly 25kb,
a quarter are between 25kb and 250kb, and all but the largest 1% are
between 250kb and 1MB.
14:35 <quentusrex> With the largest at about 20MB.
14:35 <timanglade> that's not bad for an HTTP transfer
14:36 <benblack> how much total data?
14:36 <timanglade> benblack: how would riak react to all nodes but
one going down temporarily. And him adding more local binary files on
the one node that’s still up. Then reconnecting the other nodes?
14:36 <benblack> i would expect badly
14:36 <quentusrex> I can go find out, but those are the numbers I was given.
14:37 <benblack> instead, this would best be handled by dividing up
the key range on the couch side and having a bunch of writers go in parallel
14:37 <benblack> from couch to riak
14:37 <quentusrex> benblack, that was my hope. To split the range,
and then have specific riak nodes pull their range into the riak cluster.
14:38 <timanglade> … if the network arch is not the bottleneck, yeah.
14:38 <benblack> whehter it is push or pull
14:38 <benblack> that division of range is up to you
14:38 <benblack> and you could not do it in such a way as to make a riak
node only pull the documents for which it would be responsible
14:38 <quentusrex> timanglade, the plan was to run gigE from the
riak+couchdb nodes to the riak cluster. Since each server has dual nic.
14:39 <benblack> even if you max it out, that is a lot of data to shove through a gig link
14:39 <timanglade> wait, why would you still have shared nodes?
14:39 <quentusrex> At the moment I'm not aware if the bigcouch nodes are
sharded or not.
14:40 <timanglade> no, shared riak+couchdb
14:40 <quentusrex> We have physically separate racks for production and sandbox
14:40 <timanglade> although I guess localhost transfers could be faster?
14:40 <benblack> shared and sharded are not the same ;)
14:40 <quentusrex> and the couchdb cluster is in production atm.
14:41 <benblack> split your key range yourself
14:41 <benblack> run transfer processes in parallel
14:41 <quentusrex> from couchdb to riak would be localhost(both on the same server),
and from riak to riak would be gigE.
14:42 <benblack> the couchdb to riak being localhost helps you little, imo.
14:42 <benblack> because, again, can you guarantee the key range is entirely
on the local machine?
14:42 <timanglade> benblack: don't all network implementation bypass the
physical network interface on localhost?
14:42 <quentusrex> would riak be able to tell if it is saturating the
riak to riak link?
14:42 <benblack> because, again, can you guarantee the key range
is entirely on the local machine?
14:42 <benblack> no
14:43 <quentusrex> benblack, at the moment no.
14:43 <benblack> right
14:43 <benblack> so the advantage of localhost transfer is pretty small, imo
14:43 <timanglade> maybe I'm left field but wouldn't dedicated network links help?
14:44 <timanglade> I'm sure internal couchdb and riak cluster communications are bad enough for your network to start with
14:44 <timanglade> but running two in parallel on the same network…
14:44 <quentusrex> timanglade, yes. that would help having a dedicated
network path from the couchdb network to the riak network.
14:45 <timanglade> but maybe it's negligible. Not sure what the network
usage patterns & resiliences are
14:45 <timanglade> compared to a load-balanced double gigE connection
that you just shove everything on in parallel
14:48 <timanglade> quentusrex: let me check with Cloudant to see if
(a) you could somehow force the BigCouch partitioning to match
the expected Riak partitioning. Then maybe you could either b) t
ranslate the binary files locally on each machine. Or at least c)
only do localhost transfers on each node separately. benblack, would
that work, assuming condition (a) is met?
14:48 <benblack> extraordinarily unlikely
14:49 <benblack> but have at it :)
14:49 <timanglade> you mean (a) is unlikely?
14:50 <benblack> yes
14:51 <benblack> if it could be done, it would require configuring
riak in a specific way that may or may not be what is desired and
would be really hard to change (ring_creation_size)
14:51 <timanglade> I concur but I don't know for sure that it's
impossible so I'm checking right now with the core team. I'm
just the marketing guy after all…
14:52 <timanglade> mostly, I'm just angry that NOSQL interop is as
bad as it is. I’m sure we could all work towards making it a bit easier…
14:53 davidc_ joined
14:53 <timanglade> quentusrex: at this point, if time is really
your issues, your best bet would be to start that export/import
script right now and hope it completes by Sunday, imo. If you
figure out an optimal network conf, it might work out in time.
14:54 <quentusrex> timanglade, it's been running since yesterday.
14:54 <quentusrex> I was looking for a faster way to finish off the rest.
14:54 <timanglade> quentusrex: ah. What's the progress rate?
14:54 <quentusrex> We forgot to put a progress indicator in the script.
14:55 <timanglade> can't you figure it out roughly with the disk usage?
14:55 <quentusrex> but I think we'll finish sometime soon.
14:55 <quentusrex> within the next day or so
14:55 <timanglade> so what, about 72 hours for how much? 1PB?
14:55 <quentusrex> It wasn't my script, and I'm not on that screen.
14:56 <timanglade> ok, I was just interested to know how practical
those migrations really are
14:56 <quentusrex> Not that much data I don't think
14:56 <benblack> are you running in parallel?
14:56 <quentusrex> atleast not when compressed
14:56 <quentusrex> benblack, 5 parallel instances of the script
14:56 <benblack> are you saturating the network?
14:56 <quentusrex> not the cross link
14:57 <benblack> then run more
14:57 <timanglade> unless he’s overloading the CPU of the CouchDB instances ;)
14:57 <benblack> MOAR PARALLELZ
14:59 <timanglade> benblack: you got a fever?
14:59 <benblack> and the only prescription
14:59 <benblack> is more bandwidth
14:59 <timanglade> is moar parallelz
14:59 <timanglade> ;)
15:00 <timanglade> quentusrex: at any rate, make sure to let me
know how that migration worked out. And if you can have your other
guys point out the issues you had, we want to know.
15:01 <timanglade> benblack: NOSQL has an answer to SQL’s “buy a bigger box”. It's “buy more bandwidth”.
15:02 <timanglade> cheaper in theory but I'm not sure how practical it is…
15:04 <benblack> sql requires a ton of bandwidth and low latency to do this, too
15:04 <benblack> see clustrix
15:04 <timanglade> true. It's just that I see internal bandwidth
requirements as a constant improvement factor in NOSQL systems,
on a bunch of different use-cases.
15:05 <timanglade> which makes total sense considering the design
15:05 <quentusrex> timanglade, benblack I mispoke earlier. each
server is a 7TB hard drive space 8 1TB drives in raid 5.
15:05 <timanglade> it's just another part of standard datacenter
infrastructure that could evolve with NOSQL rising.
15:06 <benblack> raid5? yikes
15:06 <benblack> and that is a _lot_ of data per node
15:06 <benblack> can i suggest your hardware is optimized for
something other than a dynamo system?
15:07 <quentusrex> Rough back of the napkin data calculations:
15:07 <quentusrex> 1.5 billion documents at 25kb = 35.7 TB
15:07 <quentusrex> 750 million documents at 100kb = 71.5 TB
15:07 <quentusrex> 720 million documents at 600kb = 412 TB
15:07 <quentusrex> 30 million documents at 20MB = 586 TB
15:08 <benblack> your hardware selection is not a great choice, imo
15:09 <quentusrex> Agreed.
15:10 <benblack> if you want fat nodes, you should have 10G ethernet
15:10 <benblack> on 1G, you need skinny nodes
15:12 <quentusrex> Know of any good 10G switches?
15:13 <benblack> depends on your budget
15:27 <timanglade> quentusrex, benblack: I was told that it would be
technically possible to hot-remap a BigCouch cluster according to an SHA-1 map
15:27 <timanglade> involves going deep in the code though
15:28 <timanglade> but could be applied to an existing cluster. Hijinks
would probably ensue…
15:32 <quentusrex> The effort would probably be better spent working on
import/export for riak
15:33 <timanglade> yes
15:33 <benblack> timanglade: and would have to inspect the riak ring and
make it match and would require the same partition count in riak
15:34 <timanglade> at that point, I was just interested to know if it was possible
15:34 <timanglade> doesn't just having to calculate 3B object SHA-1
and mapping them make this approach not worth it?
15:36 <timanglade> you'd have to write some much code to distribute
that part of the job… Then plug that into BigCouch somehow. Change the
code. Wait for BigCouch to rebalance. Write a binary translator in
Erlang / transfer over localhost. Configure Riak.
15:36 <timanglade> All the while praying to Saint Jude
15:41 <quentusrex> is there a riak ui similiar to futon?
15:45 <benblack> not to my knowledge
16:35 <quentusrex> I'm getting a 400 bad request from the
following curl request: curl -v -X PUT -d '{"bar":"baz"}' -H "Content-Type: application/json" http://riak1:8098/riak/test
16:37 <quentusrex> trying to add an object to the 'test' bucket, and have
riak generate the key
16:42 <quentusrex> timanglade, it sounds like the biggest issue
with bigcouch was the lack of 'per bucket NRW properties'
16:43 <timanglade> quentusrex: what about the data loss you referred to?
16:44 <quentusrex> timanglade, we lost a couple servers, and the data was lost with them.
16:44 <quentusrex> bigcouch did not lose the data,
16:44 <timanglade> oh, it's you nrw that was not appropriate set then
16:44 <quentusrex> right.
16:45 <quentusrex> We have a ton of data that can be recalculated,
and a 'decent amount' that must be kept no matter how bad the clusters health.
16:46 <timanglade> hm
16:46 <quentusrex> not the only reason, but a big one.
16:46 <timanglade> you can set nrw for each database though
16:46 <timanglade> which would be the BigCouch equivalent of a “bucket”
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment