PharkMillups/gist:654652

## gistfile1.txt
09:11 <reiddraper> morning all

09:14 <reiddraper> I've got a riak search question, if anyone's around

09:22 <chids> I'm around but I'm not sure I can help - haven't really dug into search yet. But feel
free to fire away :)

09:24 <reiddraper> ok -- with solr/lucene, you have to call `commit` for added/updated documents
to be retrievable via a query, this doesn't seem to be the case with riak search, but
i wanted to verify

09:24 <chids> I'm pretty sure that's not the case

09:24 <chids> Riak Search plugs into the Riak K/V store using the post-commit hook

09:25 <chids> As to at what stage the "search engine" itself commits data to the index , I have no idea.

09:25 <reiddraper> right, ok

09:25 <chids> Info on pre/post commit hooks can be found here: https://wiki.basho.com/display/RIAK/Pre-+and+Post-Commit+Hooks

09:27 <reiddraper> thanks

09:27 <chids> I would simply assume that when you've stored a document it "immediately" becomes
available in the index.

09:27 <chids> Immediately being as soon as the analyzer/indexer has done it's work

09:28 <reiddraper> that's how it appears in the riak search tutorial, but because i know
\if uses lucene under the hood, I was curious about some of those inner mechanisms

09:28 <chids> Then I'm afraid I'm not able to help you

09:30 <reiddraper> no problem

09:30 <jlouis> I would expect eventual indexing. That is, after a store and some time elapsed, the new
document will be in the index.

09:31 <chids> jlouis: Absolutely. First it has to be processed. Then there's the *probably* the
possibility of a delay for distribution between nodes in the cluster.

09:36 <pharkmillups> reiddraper: rlophaus would be the person to handle those type of questions.
He's not around at the moment - neck deep in some search code :) Your best bet is to use the Riak Mailing List.

09:37 <pharkmillups> s/rlophaus/rklophaus

09:37 <reiddraper> thanks

10:03 <reiddraper> is it possible to use riak for map-reduce processing, rather than queries?
for example, instead of returning the results of map, i'd like to just store the result in another key?

10:12 <justinsheehy> reiddraper: re your earlier question, the indexing is near-real-time. data
is indexed incrementally, not in batch commits.

10:12 <justinsheehy> and thus should be searchable very shortly after the KV storage operation

10:13 <reiddraper> justinsheehy: thanks. and I guess you never need to call `optimize` on the index?

10:13 <justinsheehy> nope!

10:13 <reiddraper> that's awesome

10:14 <justinsheehy> it doesn't use lucene, by the way. it can use lucene's analyzer classes,
but lucene does not power the indexing or retreival.

10:14 <justinsheehy> C-t

10:15 <reiddraper> ah, ok. I misunderstood that.

10:15 <justinsheehy> hard to get away from batch commits with lucene

10:15 <justinsheehy> on your map/reduce question, there is no built-in functionality to store
results instead of streaming them out.

10:15 <justinsheehy> could be done, but doesn't currently exist.

10:16 <reiddraper> ok, could be cool to be able to use riak like a hadoop cluster

10:19 <justinsheehy> reiddraper: could be cool indeed. wasn't part of the original idea,
and in fact bulk-processing at hadoop-like throughput requires different compromises than many of Riak's goals.

10:19 <justinsheehy> and so Hadoop and Riak are generally complementary tech, even if each
can do some of what the other is capable of.

10:20 <reiddraper> right, makes sense

10:23 <_sri> would be really cool if java was optional for riak-search

10:23 <reiddraper> justinsheehy: thanks for answering my questions

10:24 <justinsheehy> reiddraper: happy to help

10:25 <justinsheehy> _sri: it's almost optional now. there are some built-in native erlang
analyzers, but some other issues like the logic of whether or not to try to start the JVM
needs some care before it can be fully optional.

10:26 <bingeldac> reiddraper: https://gist.github.com/1cfec81c2425e9d99d0a

10:26 <bingeldac> that is something I did with a customer, an erlang reduce phase to save the data

10:26 <bingeldac> not sure if it meets your needs, but I thought I would toss it out

10:28 <reiddraper> bingeldac: thanks. it's enough to know that you can create and save keys
with the erlang m/r api

10:28 <_sri> justinsheehy: looking forward to it :)

10:29 <_sri> thanks for riak btw. very impressed so far

10:29 <justinsheehy> _sri: I should be clear, no one that I know of is focusing on making java optional
right now. it's not fundamentally hard but also not super high priority unless someone jumps on it.

10:30 <_sri> :/
10:30 <justinsheehy> _sri: glad you're enjoying it
	09:11 <reiddraper> morning all

	09:14 <reiddraper> I've got a riak search question, if anyone's around

	09:22 <chids> I'm around but I'm not sure I can help - haven't really dug into search yet. But feel
	free to fire away :)

	09:24 <reiddraper> ok -- with solr/lucene, you have to call `commit` for added/updated documents
	to be retrievable via a query, this doesn't seem to be the case with riak search, but
	i wanted to verify

	09:24 <chids> I'm pretty sure that's not the case

	09:24 <chids> Riak Search plugs into the Riak K/V store using the post-commit hook

	09:25 <chids> As to at what stage the "search engine" itself commits data to the index , I have no idea.

	09:25 <reiddraper> right, ok

	09:25 <chids> Info on pre/post commit hooks can be found here: https://wiki.basho.com/display/RIAK/Pre-+and+Post-Commit+Hooks

	09:27 <reiddraper> thanks

	09:27 <chids> I would simply assume that when you've stored a document it "immediately" becomes
	available in the index.

	09:27 <chids> Immediately being as soon as the analyzer/indexer has done it's work

	09:28 <reiddraper> that's how it appears in the riak search tutorial, but because i know
	\if uses lucene under the hood, I was curious about some of those inner mechanisms

	09:28 <chids> Then I'm afraid I'm not able to help you

	09:30 <reiddraper> no problem

	09:30 <jlouis> I would expect eventual indexing. That is, after a store and some time elapsed, the new
	document will be in the index.

	09:31 <chids> jlouis: Absolutely. First it has to be processed. Then there's the probably the
	possibility of a delay for distribution between nodes in the cluster.

	09:36 <pharkmillups> reiddraper: rlophaus would be the person to handle those type of questions.
	He's not around at the moment - neck deep in some search code :) Your best bet is to use the Riak Mailing List.

	09:37 <pharkmillups> s/rlophaus/rklophaus

	09:37 <reiddraper> thanks

	10:03 <reiddraper> is it possible to use riak for map-reduce processing, rather than queries?
	for example, instead of returning the results of map, i'd like to just store the result in another key?

	10:12 <justinsheehy> reiddraper: re your earlier question, the indexing is near-real-time. data
	is indexed incrementally, not in batch commits.

	10:12 <justinsheehy> and thus should be searchable very shortly after the KV storage operation

	10:13 <reiddraper> justinsheehy: thanks. and I guess you never need to call `optimize` on the index?

	10:13 <justinsheehy> nope!

	10:13 <reiddraper> that's awesome

	10:14 <justinsheehy> it doesn't use lucene, by the way. it can use lucene's analyzer classes,
	but lucene does not power the indexing or retreival.

	10:14 <justinsheehy> C-t

	10:15 <reiddraper> ah, ok. I misunderstood that.

	10:15 <justinsheehy> hard to get away from batch commits with lucene

	10:15 <justinsheehy> on your map/reduce question, there is no built-in functionality to store
	results instead of streaming them out.

	10:15 <justinsheehy> could be done, but doesn't currently exist.

	10:16 <reiddraper> ok, could be cool to be able to use riak like a hadoop cluster

	10:19 <justinsheehy> reiddraper: could be cool indeed. wasn't part of the original idea,
	and in fact bulk-processing at hadoop-like throughput requires different compromises than many of Riak's goals.

	10:19 <justinsheehy> and so Hadoop and Riak are generally complementary tech, even if each
	can do some of what the other is capable of.

	10:20 <reiddraper> right, makes sense

	10:23 <_sri> would be really cool if java was optional for riak-search

	10:23 <reiddraper> justinsheehy: thanks for answering my questions

	10:24 <justinsheehy> reiddraper: happy to help

	10:25 <justinsheehy> _sri: it's almost optional now. there are some built-in native erlang
	analyzers, but some other issues like the logic of whether or not to try to start the JVM
	needs some care before it can be fully optional.

	10:26 <bingeldac> reiddraper: https://gist.github.com/1cfec81c2425e9d99d0a

	10:26 <bingeldac> that is something I did with a customer, an erlang reduce phase to save the data

	10:26 <bingeldac> not sure if it meets your needs, but I thought I would toss it out

	10:28 <reiddraper> bingeldac: thanks. it's enough to know that you can create and save keys
	with the erlang m/r api

	10:28 <_sri> justinsheehy: looking forward to it :)

	10:29 <_sri> thanks for riak btw. very impressed so far

	10:29 <justinsheehy> _sri: I should be clear, no one that I know of is focusing on making java optional
	right now. it's not fundamentally hard but also not super high priority unless someone jumps on it.

	10:30 <_sri> :/
	10:30 <justinsheehy> _sri: glad you're enjoying it