Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created January 6, 2011 01:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PharkMillups/767341 to your computer and use it in GitHub Desktop.
Save PharkMillups/767341 to your computer and use it in GitHub Desktop.
14:22 <sragu> I have a question with riak search
14:22 <sragu> The default indexing of xml documents uses the
element name with parent path
14:22 <sragu> list of elements in xml document are indexed as
single name. How could I index them separately?
14:23 <sragu> how could I customize the index created for a xml
document with riak?
14:32 <rustyk> sragu: You have two choices… one is that you can
do some preprocessing on the XML document before sending it to search,
renaming fields to keep them distinct
14:33 <rustyk> sragu: The other option is to create a custom
extractor for your documents, but if you go down this route
I would wait for the next release (0.14) release, as the current
release has a tricky bug around this support
14:33 <rustyk> sragu: An extractor inspects your document
and generates Field/Value pairs, so it can return whatever
field names you would like.
14:43 <sragu> rustyk: is the Lucene Analyzer will do the same as the extractor?
14:46 <rustyk> sragu: no, the extractor takes a document
and extracts Field/Value pairs, the analyzers take a
Field/Value pair and convert the Value into tokens.
They are different things, a two stage process.
14:48 <sragu> gotcha. thanks.
14:51 <sragu> rustyk: Can I write the extractor in java,
I see that the current extractor code is in Erlang?
14:52 <rustyk> sragu: There is no support for that yet.
Currently your options are Erlang and Javascript
14:56 <sragu> rustyk: Writing a custom extractor is a
standard way of tackling this issue? Will future releases
of riak would continue supporting this custom extractor feature?
14:59 <rustyk> sragu: Yes, the extractors were made
extensible for exactly this reason. Future versions of Search
will continue to support extractors, though there's always a
chance that interfaces will change as we learn more about how
people use them
15:52 <sragu> rustyk: Thanks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment