The underlying intent here is for a user to be able to narrow done its research to only relevant document. The “Apple” case is representative. The way we do it now will return a bunch of documents containing the fruit. However, thanks to NER (Named Entity Recognition) we will be able to only extract only the “apple” referring to the brand. NER provides an additional information about the extracted entity: its category. We currently support 5 categories: organisation, person, event, product, location.
In the short run we want the user to be able to retrieve mentions containing a given entity and, optionally, from a given category. In the long run we want the user to be able to disambiguate the entity it is searching for. For example, there exists a bunch of “Michael Jackson”: the singer, the soccer player, … So ideally the user should be able to retrieve only mentions referring to one of them.
The proposed map