Created
November 29, 2010 22:40
-
-
Save PharkMillups/720780 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14:11 <quentusrex> Anyone know if there is a simple way to pull | |
documents(about 3 million documents) from a couchdb server into a | |
new riak cluster? | |
14:12 <benblack> the only way of which i am aware is reading the | |
documents and writing them to riak. | |
14:14 <quentusrex> thanks benblack I was afraid of that. | |
14:14 <benblack> why? | |
14:14 <benblack> seems like a pretty short shell script | |
14:14 <benblack> and 3 million is a pretty small number of documents | |
14:14 <quentusrex> yes, short. but time consuming based on the number of | |
documents | |
14:14 <quentusrex> oops, | |
14:14 <quentusrex> billion, | |
14:15 <benblack> 3 billion is rather more, yes | |
14:15 <quentusrex> not million | |
14:15 <benblack> surprised you have that in couchdb | |
14:15 <quentusrex> bigcouch | |
14:15 <benblack> that makes much more sense | |
14:16 <quentusrex> We fully moved the data from a sharded postgresql cluster, | |
to a bigcouch cluster | |
14:16 <quentusrex> But after disappointments, and some data loss, we are trying to test riak | |
14:21 <timanglade> benblack: Andy & I had talked about writing some sort | |
of binary format translator for use-cases like the one outlined by quentusrex. | |
Any obvious reasons why that wouldn't be possible? | |
14:22 <benblack> you mean converting the couch btrees to bitcask files? | |
14:23 <timanglade> obviously, you'd have to unmarshall & re-marshall the documents | |
14:23 <timanglade> but yeah | |
14:23 <benblack> no reason you couldn't do that in erlang | |
14:23 <timanglade> any other projects that could work with? | |
14:23 <benblack> the main problem being distribution around the ring | |
14:23 <timanglade> hm right this is where my limited knowledge of Riak stops. | |
But couldn't we initially load it all on one node and let the ring re-distribute it all? | |
14:24 <benblack> at EF SF i suggested having a relational db converter and | |
replication slave | |
14:24 <benblack> no | |
14:24 <quentusrex> I would be fine with the ability to tell each node in the | |
riak cluster to import documents from X to Y range | |
14:24 <benblack> you'd have to have a single node ring, load the data, | |
then add new nodes | |
14:24 <timanglade> benblack: yes | |
14:24 <timanglade> that was my idea | |
14:24 <benblack> quentusrex: doesn't work that way | |
14:24 <timanglade> quentusrex you'd be starting from scratch, right? | |
14:25 <benblack> quentusrex: placement in a given partition is based on | |
the sha1 hash, not the key | |
14:25 <quentusrex> At this point in time, just having the data into | |
riak safely would make my weekend. | |
14:25 <timanglade> still wondering if a binary format translation + | |
redistribution would be faster than a script export/import | |
14:26 <timanglade> depends how “parallel” the Riak re-distribution can | |
be, I guess | |
14:26 <quentusrex> are you talking about the binary files? or binary | |
interface to couchdb? | |
14:26 <timanglade> binary files | |
14:26 <quentusrex> aah, in that case I could just install a | |
riak node on the couchdb cluster, | |
14:27 <quentusrex> but my question would be how riak would handle one | |
node(the one installed on the couchdb cluster) constantly filling with data. | |
14:27 <quentusrex> At what point would the node start pushing the data off | |
of the local node to the other nodes in the cluster. | |
14:27 <timanglade> not well I'd think. As benblack said, you'd need to start with a single-node | |
riak. Then add more Riak node once the import is done. | |
14:28 <timanglade> again, not a Riak expert so I'll let somebody else confirm this | |
14:28 <quentusrex> that would not work well in my case. | |
No one node could handle all of the data. | |
14:28 <timanglade> but may I suggest giving Cloudant a try? ^^ | |
14:28 <quentusrex> I am using Cloudant's bigcouch right now | |
14:28 <benblack> "aah, in that case I could just install a riak node on the couchdb cluster," again, no | |
14:29 <benblack> you cannot have mixed bigcouch/riak clusters | |
14:29 <benblack> converting as tim is proposing would be just the files | |
14:29 <acts_as> the one downside to using riak as my web crawler, instead | |
of hadoop, is that with no master -- I can't call it my crawl plantation. | |
14:29 <benblack> and would require boostrapping a cluster in a specific way | |
14:29 <benblack> acts_as: anarcho-syndicalist commune | |
14:29 <timanglade> quentusrex: did you share your issues with the Cloudant | |
guys already? I'm sure they’d love to know what your problems are, even | |
if you've already set your mind on moving to Riak. | |
14:30 <acts_as> that's getting there | |
14:30 <quentusrex> I am not the one who is working on the bigcouch issues. | |
I'm looking for a replacement. But I will make sure the issues get pushed | |
to the developers. | |
14:30 <timanglade> benblack: I think he was just trying to have the binary | |
file translation done locally on a single server, not do a mixed cluster | |
14:31 <timanglade> quentusrex: hit me up on tim@cloudant.com or anybody they’re used to dealing with, I'll make sure the appropriate people hear about it | |
14:31 <quentusrex> will do | |
14:31 <benblack> quentusrex: were you thinking above that you could | |
have a riak node participate in a couch cluster? | |
14:32 <quentusrex> no, | |
14:32 <quentusrex> Just to remove the network latency for the file translation. | |
14:32 <benblack> so, not in the cluster | |
14:32 <quentusrex> Not in the cluster, but on one of the nodes in the cluster. | |
14:32 <timanglade> quentusrex: so your data does not fit on a single | |
machine because of what… Storage limitations? | |
14:33 <benblack> given the data is not on that one node i doubt that buys you much | |
14:33 <timanglade> ie, disk space? | |
14:33 <quentusrex> timanglade, correct. It is possible to build a | |
server with drive space to store it, | |
14:33 <quentusrex> but our current hardware is limited to 750GB per server. | |
14:34 <benblack> how large are the documents? | |
14:34 <timanglade> 3B HTTP lookups is a lot to start with but if you also | |
add the constraint of moving petabyte data… | |
14:34 <benblack> each | |
14:35 <quentusrex> I would estimate about half are roughly 25kb, | |
a quarter are between 25kb and 250kb, and all but the largest 1% are | |
between 250kb and 1MB. | |
14:35 <quentusrex> With the largest at about 20MB. | |
14:35 <timanglade> that's not bad for an HTTP transfer | |
14:36 <benblack> how much total data? | |
14:36 <timanglade> benblack: how would riak react to all nodes but | |
one going down temporarily. And him adding more local binary files on | |
the one node that’s still up. Then reconnecting the other nodes? | |
14:36 <benblack> i would expect badly | |
14:36 <quentusrex> I can go find out, but those are the numbers I was given. | |
14:37 <benblack> instead, this would best be handled by dividing up | |
the key range on the couch side and having a bunch of writers go in parallel | |
14:37 <benblack> from couch to riak | |
14:37 <quentusrex> benblack, that was my hope. To split the range, | |
and then have specific riak nodes pull their range into the riak cluster. | |
14:38 <timanglade> … if the network arch is not the bottleneck, yeah. | |
14:38 <benblack> whehter it is push or pull | |
14:38 <benblack> that division of range is up to you | |
14:38 <benblack> and you could not do it in such a way as to make a riak | |
node only pull the documents for which it would be responsible | |
14:38 <quentusrex> timanglade, the plan was to run gigE from the | |
riak+couchdb nodes to the riak cluster. Since each server has dual nic. | |
14:39 <benblack> even if you max it out, that is a lot of data to shove through a gig link | |
14:39 <timanglade> wait, why would you still have shared nodes? | |
14:39 <quentusrex> At the moment I'm not aware if the bigcouch nodes are | |
sharded or not. | |
14:40 <timanglade> no, shared riak+couchdb | |
14:40 <quentusrex> We have physically separate racks for production and sandbox | |
14:40 <timanglade> although I guess localhost transfers could be faster? | |
14:40 <benblack> shared and sharded are not the same ;) | |
14:40 <quentusrex> and the couchdb cluster is in production atm. | |
14:41 <benblack> split your key range yourself | |
14:41 <benblack> run transfer processes in parallel | |
14:41 <quentusrex> from couchdb to riak would be localhost(both on the same server), | |
and from riak to riak would be gigE. | |
14:42 <benblack> the couchdb to riak being localhost helps you little, imo. | |
14:42 <benblack> because, again, can you guarantee the key range is entirely | |
on the local machine? | |
14:42 <timanglade> benblack: don't all network implementation bypass the | |
physical network interface on localhost? | |
14:42 <quentusrex> would riak be able to tell if it is saturating the | |
riak to riak link? | |
14:42 <benblack> because, again, can you guarantee the key range | |
is entirely on the local machine? | |
14:42 <benblack> no | |
14:43 <quentusrex> benblack, at the moment no. | |
14:43 <benblack> right | |
14:43 <benblack> so the advantage of localhost transfer is pretty small, imo | |
14:43 <timanglade> maybe I'm left field but wouldn't dedicated network links help? | |
14:44 <timanglade> I'm sure internal couchdb and riak cluster communications are bad enough for your network to start with | |
14:44 <timanglade> but running two in parallel on the same network… | |
14:44 <quentusrex> timanglade, yes. that would help having a dedicated | |
network path from the couchdb network to the riak network. | |
14:45 <timanglade> but maybe it's negligible. Not sure what the network | |
usage patterns & resiliences are | |
14:45 <timanglade> compared to a load-balanced double gigE connection | |
that you just shove everything on in parallel | |
14:48 <timanglade> quentusrex: let me check with Cloudant to see if | |
(a) you could somehow force the BigCouch partitioning to match | |
the expected Riak partitioning. Then maybe you could either b) t | |
ranslate the binary files locally on each machine. Or at least c) | |
only do localhost transfers on each node separately. benblack, would | |
that work, assuming condition (a) is met? | |
14:48 <benblack> extraordinarily unlikely | |
14:49 <benblack> but have at it :) | |
14:49 <timanglade> you mean (a) is unlikely? | |
14:50 <benblack> yes | |
14:51 <benblack> if it could be done, it would require configuring | |
riak in a specific way that may or may not be what is desired and | |
would be really hard to change (ring_creation_size) | |
14:51 <timanglade> I concur but I don't know for sure that it's | |
impossible so I'm checking right now with the core team. I'm | |
just the marketing guy after all… | |
14:52 <timanglade> mostly, I'm just angry that NOSQL interop is as | |
bad as it is. I’m sure we could all work towards making it a bit easier… | |
14:53 davidc_ joined | |
14:53 <timanglade> quentusrex: at this point, if time is really | |
your issues, your best bet would be to start that export/import | |
script right now and hope it completes by Sunday, imo. If you | |
figure out an optimal network conf, it might work out in time. | |
14:54 <quentusrex> timanglade, it's been running since yesterday. | |
14:54 <quentusrex> I was looking for a faster way to finish off the rest. | |
14:54 <timanglade> quentusrex: ah. What's the progress rate? | |
14:54 <quentusrex> We forgot to put a progress indicator in the script. | |
14:55 <timanglade> can't you figure it out roughly with the disk usage? | |
14:55 <quentusrex> but I think we'll finish sometime soon. | |
14:55 <quentusrex> within the next day or so | |
14:55 <timanglade> so what, about 72 hours for how much? 1PB? | |
14:55 <quentusrex> It wasn't my script, and I'm not on that screen. | |
14:56 <timanglade> ok, I was just interested to know how practical | |
those migrations really are | |
14:56 <quentusrex> Not that much data I don't think | |
14:56 <benblack> are you running in parallel? | |
14:56 <quentusrex> atleast not when compressed | |
14:56 <quentusrex> benblack, 5 parallel instances of the script | |
14:56 <benblack> are you saturating the network? | |
14:56 <quentusrex> not the cross link | |
14:57 <benblack> then run more | |
14:57 <timanglade> unless he’s overloading the CPU of the CouchDB instances ;) | |
14:57 <benblack> MOAR PARALLELZ | |
14:59 <timanglade> benblack: you got a fever? | |
14:59 <benblack> and the only prescription | |
14:59 <benblack> is more bandwidth | |
14:59 <timanglade> is moar parallelz | |
14:59 <timanglade> ;) | |
15:00 <timanglade> quentusrex: at any rate, make sure to let me | |
know how that migration worked out. And if you can have your other | |
guys point out the issues you had, we want to know. | |
15:01 <timanglade> benblack: NOSQL has an answer to SQL’s “buy a bigger box”. It's “buy more bandwidth”. | |
15:02 <timanglade> cheaper in theory but I'm not sure how practical it is… | |
15:04 <benblack> sql requires a ton of bandwidth and low latency to do this, too | |
15:04 <benblack> see clustrix | |
15:04 <timanglade> true. It's just that I see internal bandwidth | |
requirements as a constant improvement factor in NOSQL systems, | |
on a bunch of different use-cases. | |
15:05 <timanglade> which makes total sense considering the design | |
15:05 <quentusrex> timanglade, benblack I mispoke earlier. each | |
server is a 7TB hard drive space 8 1TB drives in raid 5. | |
15:05 <timanglade> it's just another part of standard datacenter | |
infrastructure that could evolve with NOSQL rising. | |
15:06 <benblack> raid5? yikes | |
15:06 <benblack> and that is a _lot_ of data per node | |
15:06 <benblack> can i suggest your hardware is optimized for | |
something other than a dynamo system? | |
15:07 <quentusrex> Rough back of the napkin data calculations: | |
15:07 <quentusrex> 1.5 billion documents at 25kb = 35.7 TB | |
15:07 <quentusrex> 750 million documents at 100kb = 71.5 TB | |
15:07 <quentusrex> 720 million documents at 600kb = 412 TB | |
15:07 <quentusrex> 30 million documents at 20MB = 586 TB | |
15:08 <benblack> your hardware selection is not a great choice, imo | |
15:09 <quentusrex> Agreed. | |
15:10 <benblack> if you want fat nodes, you should have 10G ethernet | |
15:10 <benblack> on 1G, you need skinny nodes | |
15:12 <quentusrex> Know of any good 10G switches? | |
15:13 <benblack> depends on your budget | |
15:27 <timanglade> quentusrex, benblack: I was told that it would be | |
technically possible to hot-remap a BigCouch cluster according to an SHA-1 map | |
15:27 <timanglade> involves going deep in the code though | |
15:28 <timanglade> but could be applied to an existing cluster. Hijinks | |
would probably ensue… | |
15:32 <quentusrex> The effort would probably be better spent working on | |
import/export for riak | |
15:33 <timanglade> yes | |
15:33 <benblack> timanglade: and would have to inspect the riak ring and | |
make it match and would require the same partition count in riak | |
15:34 <timanglade> at that point, I was just interested to know if it was possible | |
15:34 <timanglade> doesn't just having to calculate 3B object SHA-1 | |
and mapping them make this approach not worth it? | |
15:36 <timanglade> you'd have to write some much code to distribute | |
that part of the job… Then plug that into BigCouch somehow. Change the | |
code. Wait for BigCouch to rebalance. Write a binary translator in | |
Erlang / transfer over localhost. Configure Riak. | |
15:36 <timanglade> All the while praying to Saint Jude | |
15:41 <quentusrex> is there a riak ui similiar to futon? | |
15:45 <benblack> not to my knowledge | |
16:35 <quentusrex> I'm getting a 400 bad request from the | |
following curl request: curl -v -X PUT -d '{"bar":"baz"}' -H "Content-Type: application/json" http://riak1:8098/riak/test | |
16:37 <quentusrex> trying to add an object to the 'test' bucket, and have | |
riak generate the key | |
16:42 <quentusrex> timanglade, it sounds like the biggest issue | |
with bigcouch was the lack of 'per bucket NRW properties' | |
16:43 <timanglade> quentusrex: what about the data loss you referred to? | |
16:44 <quentusrex> timanglade, we lost a couple servers, and the data was lost with them. | |
16:44 <quentusrex> bigcouch did not lose the data, | |
16:44 <timanglade> oh, it's you nrw that was not appropriate set then | |
16:44 <quentusrex> right. | |
16:45 <quentusrex> We have a ton of data that can be recalculated, | |
and a 'decent amount' that must be kept no matter how bad the clusters health. | |
16:46 <timanglade> hm | |
16:46 <quentusrex> not the only reason, but a big one. | |
16:46 <timanglade> you can set nrw for each database though | |
16:46 <timanglade> which would be the BigCouch equivalent of a “bucket” |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment