jzonthemtn/nlp-building-blocks-jupyter.ipynb

## nlp-building-blocks-jupyter.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook we show how to use the NLP Building Blocks to perform named-entity extraction from natural language text. You can launch the NLP Building Blocks in Docker containers using the docker-compose script at https://github.com/mtnfog/nlp-building-blocks.\n",
    "\n",
    "First, we'll include the Python requests library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import requests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are going to define a function to perform sentence extraction. This function makes an API call to Prose Sentence Extraction Engine. The API takes in natural language text and returns a JSON array containing the individual sentences in the text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "def extract_sentences(text):\n",
    "    headers = {'Content-Type': 'text/plain'}\n",
    "    api_url = 'http://192.168.1.134:8060/api/sentences'\n",
    "    response = requests.post(api_url, headers=headers, data=text)\n",
    "    return json.loads(response.content.decode('utf-8'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similar to above, we are now going to define a function to perform tokenization using Sonnet Tokenization Engine. The API call takes an individual sentence (extracted by the function above) and returns a JSON array containing the individual tokens in the sentence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "def tokenize(sentence):\n",
    "    headers = {'Content-Type': 'text/plain'}\n",
    "    api_url = 'http://192.168.1.134:9040/api/tokenize'\n",
    "    response = requests.post(api_url, headers=headers, data=sentence)\n",
    "    return json.loads(response.content.decode('utf-8'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lastly, we make a function to extract named-entities from the text. This API call uses Idyl E3 Entity Extraction Engine. The API call posts the tokens (extracted by the function above) and returns any found named-entities. Our Idyl E3 is running a trained model for English-language person entities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def extract_entities(tokens):\n",
    "    headers = {'Content-Type': 'application/json'}\n",
    "    api_url = 'http://192.168.1.134:9000/api/extract'\n",
    "    response = requests.post(api_url, headers=headers, json=tokens)\n",
    "    return json.loads(response.content.decode('utf-8'))  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are ready to execute the API calls. We first extract the sentences from text, then we tokenize each sentence, and lastly, we look for named-entities in the tokens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{u'entities': [], u'extractionTime': 1}\n",
      "{u'entities': [{u'languageCode': u'eng', u'confidence': 0.96, u'span': {u'tokenEnd': 2, u'tokenStart': 0}, u'extractionDate': 1514939299940, u'text': u'George Washington', u'type': u'person', u'metadata': {u'x-model-filename': u'mtnfog-en-person.bin'}}], u'extractionTime': 1}\n"
     ]
    }
   ],
   "source": [
    "# Extract the sentences in the input text.\n",
    "sentences = extract_sentences('This is a sentence. George Washington was president.')\n",
    "\n",
    "for s in sentences:\n",
    "    # Tokenize each sentence.\n",
    "    tokens = tokenize(s)\n",
    "    # Extract entities from the tokens.\n",
    "    print(extract_entities(tokens))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the output we see the results for the two sentences. The first sentence did not include any named-entities. The second sentence included one named-entity \"George Washington\" identified as a person. The response also includes the entity's location in the text, the confidence that this is an entity, as well as the file name of the model that identified this entity."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"In this notebook we show how to use the NLP Building Blocks to perform named-entity extraction from natural language text. You can launch the NLP Building Blocks in Docker containers using the docker-compose script at https://github.com/mtnfog/nlp-building-blocks.\n",
	"\n",
	"First, we'll include the Python requests library."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 39,
	"metadata": {},
	"outputs": [],
	"source": [
	"import json\n",
	"import requests"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we are going to define a function to perform sentence extraction. This function makes an API call to Prose Sentence Extraction Engine. The API takes in natural language text and returns a JSON array containing the individual sentences in the text."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 40,
	"metadata": {},
	"outputs": [],
	"source": [
	"def extract_sentences(text):\n",
	" headers = {'Content-Type': 'text/plain'}\n",
	" api_url = 'http://192.168.1.134:8060/api/sentences'\n",
	" response = requests.post(api_url, headers=headers, data=text)\n",
	" return json.loads(response.content.decode('utf-8'))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Similar to above, we are now going to define a function to perform tokenization using Sonnet Tokenization Engine. The API call takes an individual sentence (extracted by the function above) and returns a JSON array containing the individual tokens in the sentence."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 41,
	"metadata": {},
	"outputs": [],
	"source": [
	"def tokenize(sentence):\n",
	" headers = {'Content-Type': 'text/plain'}\n",
	" api_url = 'http://192.168.1.134:9040/api/tokenize'\n",
	" response = requests.post(api_url, headers=headers, data=sentence)\n",
	" return json.loads(response.content.decode('utf-8'))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Lastly, we make a function to extract named-entities from the text. This API call uses Idyl E3 Entity Extraction Engine. The API call posts the tokens (extracted by the function above) and returns any found named-entities. Our Idyl E3 is running a trained model for English-language person entities."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"def extract_entities(tokens):\n",
	" headers = {'Content-Type': 'application/json'}\n",
	" api_url = 'http://192.168.1.134:9000/api/extract'\n",
	" response = requests.post(api_url, headers=headers, json=tokens)\n",
	" return json.loads(response.content.decode('utf-8')) "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we are ready to execute the API calls. We first extract the sentences from text, then we tokenize each sentence, and lastly, we look for named-entities in the tokens."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 43,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"{u'entities': [], u'extractionTime': 1}\n",
	"{u'entities': [{u'languageCode': u'eng', u'confidence': 0.96, u'span': {u'tokenEnd': 2, u'tokenStart': 0}, u'extractionDate': 1514939299940, u'text': u'George Washington', u'type': u'person', u'metadata': {u'x-model-filename': u'mtnfog-en-person.bin'}}], u'extractionTime': 1}\n"
	]
	}
	],
	"source": [
	"# Extract the sentences in the input text.\n",
	"sentences = extract_sentences('This is a sentence. George Washington was president.')\n",
	"\n",
	"for s in sentences:\n",
	" # Tokenize each sentence.\n",
	" tokens = tokenize(s)\n",
	" # Extract entities from the tokens.\n",
	" print(extract_entities(tokens))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"In the output we see the results for the two sentences. The first sentence did not include any named-entities. The second sentence included one named-entity \"George Washington\" identified as a person. The response also includes the entity's location in the text, the confidence that this is an entity, as well as the file name of the model that identified this entity."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 2
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.12"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}