Skip to content

Instantly share code, notes, and snippets.

@wpm
Created December 4, 2017 16:12
Show Gist options
  • Save wpm/3d08ceb815dec11ec3b56caf363f439c to your computer and use it in GitHub Desktop.
Save wpm/3d08ceb815dec11ec3b56caf363f439c to your computer and use it in GitHub Desktop.
Entity Highlighting in Context
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Entity Highlighting in Context"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The visualization tools in the [spaCy](https://spacy.io/) natural language toolkit can display entity annotations for an entire document.\n",
"Here we produce highlight just those sentences in the document that contain the specified entities.\n",
"\n",
"(You will have to [install the large English language model](https://spacy.io/usage/models) separately.)"
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import spacy\n",
"from spacy import displacy\n",
"from itertools import groupby\n",
"\n",
"nlp = spacy.load(\"en_core_web_lg\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following function displays all the sentences in a parsed document containing the specified entity types. If no entity types are specified, all entities are highlighted. If a sentence does not contain any entities of interest, it is not displayed."
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def entities_in_context(doc, *entity_types):\n",
" def highlight_entity(entity_label):\n",
" if not entity_types:\n",
" return True\n",
" return entity_label in entity_types\n",
" \n",
" for context, group in groupby([(entity.sent, entity) for entity in doc.ents if highlight_entity(entity.label_)], \n",
" key=lambda t:t[0]):\n",
" entities = [{\"start\": (entity.start_char - context.start_char), \n",
" \"end\":entity.end_char - context.end_char, \n",
" \"label\":entity.label_} for _, entity in group]\n",
" context_document = {\"text\": str(context), \"ents\": entities, \"title\": None}\n",
" displacy.render(context_document, style=\"ent\", jupyter=True, manual=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following document consists of three sentences, two of which contain dates."
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"text = u\"\"\"Miles Davis was born on May 26, 1926 and died on September 28, 1991.\n",
" He was a world-renowned musician.\n",
" His album Kind of Blue was released on August 17, 1959.\"\"\"\n",
"doc = nlp(text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Print only those sentences that contain DATE or PERSON entities. Note that the second sentence in the document is not printed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"entities_in_context(doc, \"DATE\", \"PERSON\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment