Created
December 9, 2010 01:13
-
-
Save PharkMillups/734186 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| 10:24 <kotoko> Forgive the newbie question about Riak-search tokenizers | |
| 10:24 <kotoko> Is it possible to use the tokenizers and index to do custom Named | |
| Entity Recognition? | |
| 10:52 <rustyk> kotoko: re: your question about Riak Search tokenizers, | |
| the analyzers are simply responsible for taking an incoming string and breaking | |
| it into tokens | |
| 10:52 <rustyk> kotoko: so we don't do any entity extraction or | |
| classification out of the box, but you could certainly write your | |
| own analyzer and fake it out | |
| 10:53 <rustyk> kotoko: in other words, you'd want to write an analyzer | |
| to break the text into tokens that you could later search on that | |
| included some embedded information about the entities | |
| 10:58 <kotoko> thanks rustyk, yeah that's the med-term plan, right now | |
| I'm using python libraries through erlang | |
| 10:59 <kotoko> but, I see that I could probably write custom items | |
| on top of the Riak-Search by digging through the ability to do customization | |
| 11:00 <kotoko> so basically that sort of customization would require | |
| going through Riak-Search in detail | |
| 11:00 <kotoko> ? | |
| 11:04 <rustyk> kotoko: hrm... not sure what you mean by custom items | |
| 11:04 <rustyk> the primary area for customization now is in the analyzers | |
| 11:05 <rustyk> kotoko: and you don't really need to do *too* much digging | |
| for that, just model your custom analyzer off of an existing analyzer | |
| (which are relatively short, usually a few dozen lines or less) | |
| 11:13 <kotoko> I was thinking about the classifier and ner code | |
| 11:13 <kotoko> to integrate with Riak-Search |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment