PharkMillups/gist:767341

## gistfile1.txt
14:22 <sragu> I have a question with riak search

14:22 <sragu> The default indexing of xml documents uses the
element name with parent path

14:22 <sragu> list of elements in xml document are indexed as
single name. How could I index them separately?

14:23 <sragu> how could I customize the index created for a xml
document with riak?

14:32 <rustyk> sragu: You have two choices… one is that you can
do some preprocessing on the XML document before sending it to search,
renaming fields to keep them distinct

14:33 <rustyk> sragu: The other option is to create a custom
extractor for your documents, but if you go down this route
I would wait for the next release (0.14) release, as the current
release has a tricky bug around this support

14:33 <rustyk> sragu: An extractor inspects your document
and generates Field/Value pairs, so it can return whatever
field names you would like.

14:43 <sragu> rustyk: is the Lucene Analyzer will do the same as the extractor?

14:46 <rustyk> sragu: no, the extractor takes a document
and extracts Field/Value pairs, the analyzers take a
Field/Value pair and convert the Value into tokens.
They are different things, a two stage process.

14:48 <sragu> gotcha. thanks.

14:51 <sragu> rustyk: Can I write the extractor in java,
I see that the current extractor code is in Erlang?

14:52 <rustyk> sragu: There is no support for that yet.
Currently your options are Erlang and Javascript

14:56 <sragu> rustyk: Writing a custom extractor is a
standard way of tackling this issue? Will future releases
of riak would continue supporting this custom extractor feature?

14:59 <rustyk> sragu: Yes, the extractors were made
extensible for exactly this reason. Future versions of Search
will continue to support extractors, though there's always a
chance that interfaces will change as we learn more about how
people use them

15:52 <sragu> rustyk: Thanks
	14:22 <sragu> I have a question with riak search

	14:22 <sragu> The default indexing of xml documents uses the
	element name with parent path

	14:22 <sragu> list of elements in xml document are indexed as
	single name. How could I index them separately?

	14:23 <sragu> how could I customize the index created for a xml
	document with riak?

	14:32 <rustyk> sragu: You have two choices… one is that you can
	do some preprocessing on the XML document before sending it to search,
	renaming fields to keep them distinct

	14:33 <rustyk> sragu: The other option is to create a custom
	extractor for your documents, but if you go down this route
	I would wait for the next release (0.14) release, as the current
	release has a tricky bug around this support

	14:33 <rustyk> sragu: An extractor inspects your document
	and generates Field/Value pairs, so it can return whatever
	field names you would like.

	14:43 <sragu> rustyk: is the Lucene Analyzer will do the same as the extractor?

	14:46 <rustyk> sragu: no, the extractor takes a document
	and extracts Field/Value pairs, the analyzers take a
	Field/Value pair and convert the Value into tokens.
	They are different things, a two stage process.

	14:48 <sragu> gotcha. thanks.

	14:51 <sragu> rustyk: Can I write the extractor in java,
	I see that the current extractor code is in Erlang?

	14:52 <rustyk> sragu: There is no support for that yet.
	Currently your options are Erlang and Javascript

	14:56 <sragu> rustyk: Writing a custom extractor is a
	standard way of tackling this issue? Will future releases
	of riak would continue supporting this custom extractor feature?

	14:59 <rustyk> sragu: Yes, the extractors were made
	extensible for exactly this reason. Future versions of Search
	will continue to support extractors, though there's always a
	chance that interfaces will change as we learn more about how
	people use them

	15:52 <sragu> rustyk: Thanks