Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created October 29, 2010 23:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PharkMillups/654652 to your computer and use it in GitHub Desktop.
Save PharkMillups/654652 to your computer and use it in GitHub Desktop.
09:11 <reiddraper> morning all
09:14 <reiddraper> I've got a riak search question, if anyone's around
09:22 <chids> I'm around but I'm not sure I can help - haven't really dug into search yet. But feel
free to fire away :)
09:24 <reiddraper> ok -- with solr/lucene, you have to call `commit` for added/updated documents
to be retrievable via a query, this doesn't seem to be the case with riak search, but
i wanted to verify
09:24 <chids> I'm pretty sure that's not the case
09:24 <chids> Riak Search plugs into the Riak K/V store using the post-commit hook
09:25 <chids> As to at what stage the "search engine" itself commits data to the index , I have no idea.
09:25 <reiddraper> right, ok
09:25 <chids> Info on pre/post commit hooks can be found here: https://wiki.basho.com/display/RIAK/Pre-+and+Post-Commit+Hooks
09:27 <reiddraper> thanks
09:27 <chids> I would simply assume that when you've stored a document it "immediately" becomes
available in the index.
09:27 <chids> Immediately being as soon as the analyzer/indexer has done it's work
09:28 <reiddraper> that's how it appears in the riak search tutorial, but because i know
\if uses lucene under the hood, I was curious about some of those inner mechanisms
09:28 <chids> Then I'm afraid I'm not able to help you
09:30 <reiddraper> no problem
09:30 <jlouis> I would expect eventual indexing. That is, after a store and some time elapsed, the new
document will be in the index.
09:31 <chids> jlouis: Absolutely. First it has to be processed. Then there's the *probably* the
possibility of a delay for distribution between nodes in the cluster.
09:36 <pharkmillups> reiddraper: rlophaus would be the person to handle those type of questions.
He's not around at the moment - neck deep in some search code :) Your best bet is to use the Riak Mailing List.
09:37 <pharkmillups> s/rlophaus/rklophaus
09:37 <reiddraper> thanks
10:03 <reiddraper> is it possible to use riak for map-reduce processing, rather than queries?
for example, instead of returning the results of map, i'd like to just store the result in another key?
10:12 <justinsheehy> reiddraper: re your earlier question, the indexing is near-real-time. data
is indexed incrementally, not in batch commits.
10:12 <justinsheehy> and thus should be searchable very shortly after the KV storage operation
10:13 <reiddraper> justinsheehy: thanks. and I guess you never need to call `optimize` on the index?
10:13 <justinsheehy> nope!
10:13 <reiddraper> that's awesome
10:14 <justinsheehy> it doesn't use lucene, by the way. it can use lucene's analyzer classes,
but lucene does not power the indexing or retreival.
10:14 <justinsheehy> C-t
10:15 <reiddraper> ah, ok. I misunderstood that.
10:15 <justinsheehy> hard to get away from batch commits with lucene
10:15 <justinsheehy> on your map/reduce question, there is no built-in functionality to store
results instead of streaming them out.
10:15 <justinsheehy> could be done, but doesn't currently exist.
10:16 <reiddraper> ok, could be cool to be able to use riak like a hadoop cluster
10:19 <justinsheehy> reiddraper: could be cool indeed. wasn't part of the original idea,
and in fact bulk-processing at hadoop-like throughput requires different compromises than many of Riak's goals.
10:19 <justinsheehy> and so Hadoop and Riak are generally complementary tech, even if each
can do some of what the other is capable of.
10:20 <reiddraper> right, makes sense
10:23 <_sri> would be really cool if java was optional for riak-search
10:23 <reiddraper> justinsheehy: thanks for answering my questions
10:24 <justinsheehy> reiddraper: happy to help
10:25 <justinsheehy> _sri: it's almost optional now. there are some built-in native erlang
analyzers, but some other issues like the logic of whether or not to try to start the JVM
needs some care before it can be fully optional.
10:26 <bingeldac> reiddraper: https://gist.github.com/1cfec81c2425e9d99d0a
10:26 <bingeldac> that is something I did with a customer, an erlang reduce phase to save the data
10:26 <bingeldac> not sure if it meets your needs, but I thought I would toss it out
10:28 <reiddraper> bingeldac: thanks. it's enough to know that you can create and save keys
with the erlang m/r api
10:28 <_sri> justinsheehy: looking forward to it :)
10:29 <_sri> thanks for riak btw. very impressed so far
10:29 <justinsheehy> _sri: I should be clear, no one that I know of is focusing on making java optional
right now. it's not fundamentally hard but also not super high priority unless someone jumps on it.
10:30 <_sri> :/
10:30 <justinsheehy> _sri: glad you're enjoying it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment