Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created December 9, 2010 01:13
Show Gist options
  • Select an option

  • Save PharkMillups/734186 to your computer and use it in GitHub Desktop.

Select an option

Save PharkMillups/734186 to your computer and use it in GitHub Desktop.
10:24 <kotoko> Forgive the newbie question about Riak-search tokenizers
10:24 <kotoko> Is it possible to use the tokenizers and index to do custom Named
Entity Recognition?
10:52 <rustyk> kotoko: re: your question about Riak Search tokenizers,
the analyzers are simply responsible for taking an incoming string and breaking
it into tokens
10:52 <rustyk> kotoko: so we don't do any entity extraction or
classification out of the box, but you could certainly write your
own analyzer and fake it out
10:53 <rustyk> kotoko: in other words, you'd want to write an analyzer
to break the text into tokens that you could later search on that
included some embedded information about the entities
10:58 <kotoko> thanks rustyk, yeah that's the med-term plan, right now
I'm using python libraries through erlang
10:59 <kotoko> but, I see that I could probably write custom items
on top of the Riak-Search by digging through the ability to do customization
11:00 <kotoko> so basically that sort of customization would require
going through Riak-Search in detail
11:00 <kotoko> ?
11:04 <rustyk> kotoko: hrm... not sure what you mean by custom items
11:04 <rustyk> the primary area for customization now is in the analyzers
11:05 <rustyk> kotoko: and you don't really need to do *too* much digging
for that, just model your custom analyzer off of an existing analyzer
(which are relatively short, usually a few dozen lines or less)
11:13 <kotoko> I was thinking about the classifier and ner code
11:13 <kotoko> to integrate with Riak-Search
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment