psychemedia/Quantity Parsing.ipynb

## Quantity Parsing.ipynb
{
  "cells": [
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "# Simple Tools from Extracting Quantities from Strings\n\nSuppose we have a report and we want to find the sentences that are talking about numerical things....\n\n*Originally inspired by [When you get data in sentences: how to use a spreadsheet to extract numbers from phrases](https://onlinejournalismblog.com/2019/07/29/when-you-get-data-in-sentences-how-to-use-a-spreadsheet-to-extract-numbers-from-phrases/), Paul Bradshaw, Online Journalism blog, form which some of the example sentences (sic!) are taken.*"
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "sentences = [\n    '4 years and 6 months’ imprisonment with a licence extension of 2 years and 6 months',\n    'No quantities here',\n    'I measured it as 2 meters and 30 centimeters.',\n    \"four years and six months' imprisonment with a licence extension of 2 years and 6 months\",\n    'it cost £250... bargain...',\n    'it weighs four hundred kilograms.',\n    'It weighs 400kg.',\n    'three million, two hundred & forty, you say?',\n    'it weighs four hundred and twenty kilograms.'\n    \n]",
      "execution_count": 152,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## `quantulum3`\n\n[`quantulum3`](https://github.com/nielstron/quantulum3) is a Python package *\"for information extraction of quantities from unstructured text\"*."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#!pip3 install quantulum3\nfrom quantulum3 import parser",
      "execution_count": 153,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true,
        "scrolled": false
      },
      "cell_type": "code",
      "source": "for sent in sentences:\n    print(sent)\n    p = parser.parse(sent)\n    if p:\n        print('\\tSpoken:',parser.inline_parse_and_expand(sent))\n        print('\\tNumeric elements:')\n        for q in p:\n            display(q)\n            print('\\t\\t{} :: {}'.format(q.surface, q))\n    print('\\n---------\\n')",
      "execution_count": 154,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "4 years and 6 months’ imprisonment with a licence extension of 2 years and 6 months\n\tSpoken: four years and six months’ imprisonment with a licence extension of two years and six months\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(4, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\t4 years :: four years\n"
        },
        {
          "data": {
            "text/plain": "Quantity(6, \"Unit(name=\"month\", entity=Entity(\"time\"), uri=Month)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\t6 months :: six months\n"
        },
        {
          "data": {
            "text/plain": "Quantity(2, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\t2 years :: two years\n"
        },
        {
          "data": {
            "text/plain": "Quantity(6, \"Unit(name=\"month\", entity=Entity(\"time\"), uri=Month)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\t6 months :: six months\n\n---------\n\nNo quantities here\n\n---------\n\nI measured it as 2 meters and 30 centimeters.\n\tSpoken: I measured it as two metres and thirty centimetres.\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(2, \"Unit(name=\"metre\", entity=Entity(\"length\"), uri=Metre)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\t2 meters :: two metres\n"
        },
        {
          "data": {
            "text/plain": "Quantity(30, \"Unit(name=\"centimetre\", entity=Entity(\"length\"), uri=Centimetre)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\t30 centimeters :: thirty centimetres\n\n---------\n\nfour years and six months' imprisonment with a licence extension of 2 years and 6 months\n\tSpoken: four years and six months imprisonment with a licence extension of two years and six months\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(4, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\tfour years :: four years\n"
        },
        {
          "data": {
            "text/plain": "Quantity(6, \"Unit(name=\"month\", entity=Entity(\"time\"), uri=Month)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\tsix months' :: six months\n"
        },
        {
          "data": {
            "text/plain": "Quantity(2, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\t2 years :: two years\n"
        },
        {
          "data": {
            "text/plain": "Quantity(6, \"Unit(name=\"month\", entity=Entity(\"time\"), uri=Month)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\t6 months :: six months\n\n---------\n\nit cost £250... bargain...\n\tSpoken: it cost two hundred and fifty pounds sterling, zero pence... bargain...\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(250, \"Unit(name=\"pound sterling\", entity=Entity(\"currency\"), uri=Pound_sterling)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\t£250 :: two hundred and fifty pounds sterling, zero pence\n\n---------\n\nit weighs four hundred kilograms.\n\tSpoken: it weighs four hundred kilograms.\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(400, \"Unit(name=\"kilogram\", entity=Entity(\"mass\"), uri=Kilogram)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\tfour hundred kilograms :: four hundred kilograms\n\n---------\n\nIt weighs 400kg.\n\tSpoken: It weighs four hundred kilograms.\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(400, \"Unit(name=\"kilogram\", entity=Entity(\"mass\"), uri=Kilogram)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\t400kg :: four hundred kilograms\n\n---------\n\nthree million, two hundred & forty, you say?\n\tSpoken: three million, two hundred & forty, you say?\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(3e+06, \"Unit(name=\"dimensionless\", entity=Entity(\"dimensionless\"), uri=Dimensionless_quantity)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\tthree million :: three million\n"
        },
        {
          "data": {
            "text/plain": "Quantity(200, \"Unit(name=\"dimensionless\", entity=Entity(\"dimensionless\"), uri=Dimensionless_quantity)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\ttwo hundred :: two hundred\n"
        },
        {
          "data": {
            "text/plain": "Quantity(40, \"Unit(name=\"dimensionless\", entity=Entity(\"dimensionless\"), uri=Dimensionless_quantity)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\tforty :: forty\n\n---------\n\nit weighs four hundred and twenty kilograms.\n\tSpoken: it weighs four hundred and twenty kilograms.\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(420, \"Unit(name=\"kilogram\", entity=Entity(\"mass\"), uri=Kilogram)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\tfour hundred and twenty kilograms :: four hundred and twenty kilograms\n\n---------\n\n"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Finding quantity statements in large texts\n\nIf we have a large blog of text, we might want to quickly skim it for quantity containing sentences, we can do something like the following..."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "import spacy\nnlp = spacy.load('en_core_web_lg', disable = ['ner'])",
      "execution_count": 155,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "text = '''\nOnce upon a time, there was a thing. The thing weighed forty kilogrammes and cost £250. \nIt was blue. It took forty five minutes to get it home. \nWhat a day that was. I didn't get back until 2.15pm. Then I had some cake for tea.\n'''",
      "execution_count": 171,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "doc = nlp(text)\nfor sent in doc.sents:\n    print(sent)",
      "execution_count": 172,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\nOnce upon a time, there was a thing.\nThe thing weighed forty kilogrammes and cost £250. \n\nIt was blue.\nIt took forty five minutes to get it home. \n\nWhat a day that was.\nI didn't get back until 2.15pm.\nThen I had some cake for tea.\n\n"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "for sent in doc.sents:\n    sent = sent.text\n    p = parser.parse(sent)\n    if p:\n        print('\\tSpoken:',parser.inline_parse_and_expand(sent))\n        print('\\tNumeric elements:')\n        for q in p:\n            display(q)\n            print('\\t\\t{} :: {}'.format(q.surface, q))\n    print('\\n---------\\n')",
      "execution_count": 173,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\tSpoken: \nOnce upon one instance, there was a thing.\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(1, \"Unit(name=\"count\", entity=Entity(\"dimensionless\"), uri=Count_data)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\ta time :: one instance\n\n---------\n\n\tSpoken: The thing weighed forty kilograms and cost two hundred and fifty pounds sterling, zero pence. \n\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(40, \"Unit(name=\"kilogram\", entity=Entity(\"mass\"), uri=Kilogram)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\tforty kilogrammes :: forty kilograms\n"
        },
        {
          "data": {
            "text/plain": "Quantity(250, \"Unit(name=\"pound sterling\", entity=Entity(\"currency\"), uri=Pound_sterling)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\t£250 :: two hundred and fifty pounds sterling, zero pence\n\n---------\n\n\n---------\n\n\tSpoken: It took forty-five minutes to get it home. \n\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(45, \"Unit(name=\"minute of arc\", entity=Entity(\"angle\"), uri=Minute_and_second_of_arc)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\tforty five minutes :: forty-five minutes\n\n---------\n\n\tSpoken: What one day that was.\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(1, \"Unit(name=\"day\", entity=Entity(\"time\"), uri=Day)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\ta day :: one day\n\n---------\n\n\tSpoken: I didn't get back until two point one five picometres.\n\tNumeric elements:\n"
        },
        {
          "data": {
            "text/plain": "Quantity(2.15, \"Unit(name=\"picometre\", entity=Entity(\"length\"), uri=Picometre)\")"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "\t\t2.15pm :: two point one five picometres\n\n---------\n\n\n---------\n\n"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## Annotating a dataset\n\nCan we extract numbers from sentences in a CSV file? Yes we can..."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "url = 'https://raw.githubusercontent.com/BBC-Data-Unit/unduly-lenient-sentences/master/ULS+for+Sankey.csv'",
      "execution_count": 174,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "import pandas as pd\n\ndf = pd.read_csv(url)\ndf.head()",
      "execution_count": 175,
      "outputs": [
        {
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Year</th>\n      <th>Offence category REFINED</th>\n      <th>Original sentence (refined)</th>\n      <th>Crown Court</th>\n      <th>Outcome of Decision</th>\n      <th>Revised?</th>\n      <th>People</th>\n      <th>Top 7</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2015</td>\n      <td>Drug offence</td>\n      <td>3 years imprisonment</td>\n      <td>Bristol</td>\n      <td>Not referred</td>\n      <td>No</td>\n      <td>1</td>\n      <td>Y</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2015</td>\n      <td>Death or serious injury - unlawful driving</td>\n      <td>6 years imprisonment - Disqualified driving - ...</td>\n      <td>Portsmouth</td>\n      <td>Not referred</td>\n      <td>No</td>\n      <td>1</td>\n      <td>Y</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2015</td>\n      <td>Sexual offence</td>\n      <td>9 months imprisonment suspended for 2 years</td>\n      <td>Nottingham</td>\n      <td>Out of time</td>\n      <td>No</td>\n      <td>1</td>\n      <td>Y</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2015</td>\n      <td>Theft offence</td>\n      <td>4 years and 10 months imprisonment - consecuti...</td>\n      <td>St Albans</td>\n      <td>Not referred</td>\n      <td>No</td>\n      <td>1</td>\n      <td>Y</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2015</td>\n      <td>Theft offence</td>\n      <td>unknown</td>\n      <td>unknown</td>\n      <td>Not in scheme</td>\n      <td>No</td>\n      <td>1</td>\n      <td>Y</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "   Year                    Offence category REFINED  \\\n0  2015                                Drug offence   \n1  2015  Death or serious injury - unlawful driving   \n2  2015                              Sexual offence   \n3  2015                               Theft offence   \n4  2015                               Theft offence   \n\n                         Original sentence (refined) Crown Court  \\\n0                               3 years imprisonment     Bristol   \n1  6 years imprisonment - Disqualified driving - ...  Portsmouth   \n2        9 months imprisonment suspended for 2 years  Nottingham   \n3  4 years and 10 months imprisonment - consecuti...   St Albans   \n4                                            unknown     unknown   \n\n  Outcome of Decision Revised?  People Top 7  \n0        Not referred       No       1     Y  \n1        Not referred       No       1     Y  \n2         Out of time       No       1     Y  \n3        Not referred       No       1     Y  \n4       Not in scheme       No       1     Y  "
          },
          "execution_count": 175,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#get a row\ndf.iloc[1]",
      "execution_count": 178,
      "outputs": [
        {
          "data": {
            "text/plain": "Year                                                                        2015\nOffence category REFINED              Death or serious injury - unlawful driving\nOriginal sentence (refined)    6 years imprisonment - Disqualified driving - ...\nCrown Court                                                           Portsmouth\nOutcome of Decision                                                 Not referred\nRevised?                                                                      No\nPeople                                                                         1\nTop 7                                                                          Y\nName: 1, dtype: object"
          },
          "execution_count": 178,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "#and a, erm. sentence...\ndf.iloc[1]['Original sentence (refined)']",
      "execution_count": 179,
      "outputs": [
        {
          "data": {
            "text/plain": "'6 years imprisonment - Disqualified driving - 8 years'"
          },
          "execution_count": 179,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "parser.parse(df.iloc[1]['Original sentence (refined)'])",
      "execution_count": 180,
      "outputs": [
        {
          "data": {
            "text/plain": "[Quantity(6, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\"),\n Quantity(8, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")]"
          },
          "execution_count": 180,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "def amountify(txt):\n    try:\n        if txt:\n            p = parser.parse(txt)\n            x=[]\n            for q in p:\n                x.append( '{} {}'.format(q.value, q.unit.name))\n            return '::'.join(x)\n        return ''\n    except:\n        return",
      "execution_count": 206,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df['amounts'] = df['Original sentence (refined)'].apply(amountify)",
      "execution_count": 207,
      "outputs": []
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.head()",
      "execution_count": 208,
      "outputs": [
        {
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Year</th>\n      <th>Offence category REFINED</th>\n      <th>Original sentence (refined)</th>\n      <th>Crown Court</th>\n      <th>Outcome of Decision</th>\n      <th>Revised?</th>\n      <th>People</th>\n      <th>Top 7</th>\n      <th>amounts</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2015</td>\n      <td>Drug offence</td>\n      <td>3 years imprisonment</td>\n      <td>Bristol</td>\n      <td>Not referred</td>\n      <td>No</td>\n      <td>1</td>\n      <td>Y</td>\n      <td>3.0 year</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2015</td>\n      <td>Death or serious injury - unlawful driving</td>\n      <td>6 years imprisonment - Disqualified driving - ...</td>\n      <td>Portsmouth</td>\n      <td>Not referred</td>\n      <td>No</td>\n      <td>1</td>\n      <td>Y</td>\n      <td>6.0 year::8.0 year</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2015</td>\n      <td>Sexual offence</td>\n      <td>9 months imprisonment suspended for 2 years</td>\n      <td>Nottingham</td>\n      <td>Out of time</td>\n      <td>No</td>\n      <td>1</td>\n      <td>Y</td>\n      <td>9.0 month::2.0 year</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2015</td>\n      <td>Theft offence</td>\n      <td>4 years and 10 months imprisonment - consecuti...</td>\n      <td>St Albans</td>\n      <td>Not referred</td>\n      <td>No</td>\n      <td>1</td>\n      <td>Y</td>\n      <td>4.0 year::10.0 month</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2015</td>\n      <td>Theft offence</td>\n      <td>unknown</td>\n      <td>unknown</td>\n      <td>Not in scheme</td>\n      <td>No</td>\n      <td>1</td>\n      <td>Y</td>\n      <td></td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "   Year                    Offence category REFINED  \\\n0  2015                                Drug offence   \n1  2015  Death or serious injury - unlawful driving   \n2  2015                              Sexual offence   \n3  2015                               Theft offence   \n4  2015                               Theft offence   \n\n                         Original sentence (refined) Crown Court  \\\n0                               3 years imprisonment     Bristol   \n1  6 years imprisonment - Disqualified driving - ...  Portsmouth   \n2        9 months imprisonment suspended for 2 years  Nottingham   \n3  4 years and 10 months imprisonment - consecuti...   St Albans   \n4                                            unknown     unknown   \n\n  Outcome of Decision Revised?  People Top 7               amounts  \n0        Not referred       No       1     Y              3.0 year  \n1        Not referred       No       1     Y    6.0 year::8.0 year  \n2         Out of time       No       1     Y   9.0 month::2.0 year  \n3        Not referred       No       1     Y  4.0 year::10.0 month  \n4       Not in scheme       No       1     Y                        "
          },
          "execution_count": 208,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "We could then do something to split mutliple amounts into mutliple rows or columns..."
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "",
      "execution_count": null,
      "outputs": []
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3",
      "language": "python"
    },
    "language_info": {
      "name": "python",
      "version": "3.7.3",
      "mimetype": "text/x-python",
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "pygments_lexer": "ipython3",
      "nbconvert_exporter": "python",
      "file_extension": ".py"
    },
    "gist": {
      "id": "",
      "data": {
        "description": "Example of parsing quantities from sentences",
        "public": true
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}
	{
	"cells": [
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "# Simple Tools from Extracting Quantities from Strings\n\nSuppose we have a report and we want to find the sentences that are talking about numerical things....\n\nOriginally inspired by [When you get data in sentences: how to use a spreadsheet to extract numbers from phrases](https://onlinejournalismblog.com/2019/07/29/when-you-get-data-in-sentences-how-to-use-a-spreadsheet-to-extract-numbers-from-phrases/), Paul Bradshaw, Online Journalism blog, form which some of the example sentences (sic!) are taken."
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "sentences = [\n '4 years and 6 months’ imprisonment with a licence extension of 2 years and 6 months',\n 'No quantities here',\n 'I measured it as 2 meters and 30 centimeters.',\n \"four years and six months' imprisonment with a licence extension of 2 years and 6 months\",\n 'it cost £250... bargain...',\n 'it weighs four hundred kilograms.',\n 'It weighs 400kg.',\n 'three million, two hundred & forty, you say?',\n 'it weighs four hundred and twenty kilograms.'\n \n]",
	"execution_count": 152,
	"outputs": []
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "## `quantulum3`\n\n[`quantulum3`](https://github.com/nielstron/quantulum3) is a Python package \"for information extraction of quantities from unstructured text\"."
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "#!pip3 install quantulum3\nfrom quantulum3 import parser",
	"execution_count": 153,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true,
	"scrolled": false
	},
	"cell_type": "code",
	"source": "for sent in sentences:\n print(sent)\n p = parser.parse(sent)\n if p:\n print('\\tSpoken:',parser.inline_parse_and_expand(sent))\n print('\\tNumeric elements:')\n for q in p:\n display(q)\n print('\\t\\t{} :: {}'.format(q.surface, q))\n print('\\n---------\\n')",
	"execution_count": 154,
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "4 years and 6 months’ imprisonment with a licence extension of 2 years and 6 months\n\tSpoken: four years and six months’ imprisonment with a licence extension of two years and six months\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(4, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\t4 years :: four years\n"
	},
	{
	"data": {
	"text/plain": "Quantity(6, \"Unit(name=\"month\", entity=Entity(\"time\"), uri=Month)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\t6 months :: six months\n"
	},
	{
	"data": {
	"text/plain": "Quantity(2, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\t2 years :: two years\n"
	},
	{
	"data": {
	"text/plain": "Quantity(6, \"Unit(name=\"month\", entity=Entity(\"time\"), uri=Month)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\t6 months :: six months\n\n---------\n\nNo quantities here\n\n---------\n\nI measured it as 2 meters and 30 centimeters.\n\tSpoken: I measured it as two metres and thirty centimetres.\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(2, \"Unit(name=\"metre\", entity=Entity(\"length\"), uri=Metre)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\t2 meters :: two metres\n"
	},
	{
	"data": {
	"text/plain": "Quantity(30, \"Unit(name=\"centimetre\", entity=Entity(\"length\"), uri=Centimetre)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\t30 centimeters :: thirty centimetres\n\n---------\n\nfour years and six months' imprisonment with a licence extension of 2 years and 6 months\n\tSpoken: four years and six months imprisonment with a licence extension of two years and six months\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(4, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\tfour years :: four years\n"
	},
	{
	"data": {
	"text/plain": "Quantity(6, \"Unit(name=\"month\", entity=Entity(\"time\"), uri=Month)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\tsix months' :: six months\n"
	},
	{
	"data": {
	"text/plain": "Quantity(2, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\t2 years :: two years\n"
	},
	{
	"data": {
	"text/plain": "Quantity(6, \"Unit(name=\"month\", entity=Entity(\"time\"), uri=Month)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\t6 months :: six months\n\n---------\n\nit cost £250... bargain...\n\tSpoken: it cost two hundred and fifty pounds sterling, zero pence... bargain...\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(250, \"Unit(name=\"pound sterling\", entity=Entity(\"currency\"), uri=Pound_sterling)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\t£250 :: two hundred and fifty pounds sterling, zero pence\n\n---------\n\nit weighs four hundred kilograms.\n\tSpoken: it weighs four hundred kilograms.\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(400, \"Unit(name=\"kilogram\", entity=Entity(\"mass\"), uri=Kilogram)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\tfour hundred kilograms :: four hundred kilograms\n\n---------\n\nIt weighs 400kg.\n\tSpoken: It weighs four hundred kilograms.\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(400, \"Unit(name=\"kilogram\", entity=Entity(\"mass\"), uri=Kilogram)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\t400kg :: four hundred kilograms\n\n---------\n\nthree million, two hundred & forty, you say?\n\tSpoken: three million, two hundred & forty, you say?\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(3e+06, \"Unit(name=\"dimensionless\", entity=Entity(\"dimensionless\"), uri=Dimensionless_quantity)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\tthree million :: three million\n"
	},
	{
	"data": {
	"text/plain": "Quantity(200, \"Unit(name=\"dimensionless\", entity=Entity(\"dimensionless\"), uri=Dimensionless_quantity)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\ttwo hundred :: two hundred\n"
	},
	{
	"data": {
	"text/plain": "Quantity(40, \"Unit(name=\"dimensionless\", entity=Entity(\"dimensionless\"), uri=Dimensionless_quantity)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\tforty :: forty\n\n---------\n\nit weighs four hundred and twenty kilograms.\n\tSpoken: it weighs four hundred and twenty kilograms.\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(420, \"Unit(name=\"kilogram\", entity=Entity(\"mass\"), uri=Kilogram)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\tfour hundred and twenty kilograms :: four hundred and twenty kilograms\n\n---------\n\n"
	}
	]
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "## Finding quantity statements in large texts\n\nIf we have a large blog of text, we might want to quickly skim it for quantity containing sentences, we can do something like the following..."
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "import spacy\nnlp = spacy.load('en_core_web_lg', disable = ['ner'])",
	"execution_count": 155,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "text = '''\nOnce upon a time, there was a thing. The thing weighed forty kilogrammes and cost £250. \nIt was blue. It took forty five minutes to get it home. \nWhat a day that was. I didn't get back until 2.15pm. Then I had some cake for tea.\n'''",
	"execution_count": 171,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "doc = nlp(text)\nfor sent in doc.sents:\n print(sent)",
	"execution_count": 172,
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\nOnce upon a time, there was a thing.\nThe thing weighed forty kilogrammes and cost £250. \n\nIt was blue.\nIt took forty five minutes to get it home. \n\nWhat a day that was.\nI didn't get back until 2.15pm.\nThen I had some cake for tea.\n\n"
	}
	]
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "for sent in doc.sents:\n sent = sent.text\n p = parser.parse(sent)\n if p:\n print('\\tSpoken:',parser.inline_parse_and_expand(sent))\n print('\\tNumeric elements:')\n for q in p:\n display(q)\n print('\\t\\t{} :: {}'.format(q.surface, q))\n print('\\n---------\\n')",
	"execution_count": 173,
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\tSpoken: \nOnce upon one instance, there was a thing.\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(1, \"Unit(name=\"count\", entity=Entity(\"dimensionless\"), uri=Count_data)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\ta time :: one instance\n\n---------\n\n\tSpoken: The thing weighed forty kilograms and cost two hundred and fifty pounds sterling, zero pence. \n\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(40, \"Unit(name=\"kilogram\", entity=Entity(\"mass\"), uri=Kilogram)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\tforty kilogrammes :: forty kilograms\n"
	},
	{
	"data": {
	"text/plain": "Quantity(250, \"Unit(name=\"pound sterling\", entity=Entity(\"currency\"), uri=Pound_sterling)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\t£250 :: two hundred and fifty pounds sterling, zero pence\n\n---------\n\n\n---------\n\n\tSpoken: It took forty-five minutes to get it home. \n\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(45, \"Unit(name=\"minute of arc\", entity=Entity(\"angle\"), uri=Minute_and_second_of_arc)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\tforty five minutes :: forty-five minutes\n\n---------\n\n\tSpoken: What one day that was.\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(1, \"Unit(name=\"day\", entity=Entity(\"time\"), uri=Day)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\ta day :: one day\n\n---------\n\n\tSpoken: I didn't get back until two point one five picometres.\n\tNumeric elements:\n"
	},
	{
	"data": {
	"text/plain": "Quantity(2.15, \"Unit(name=\"picometre\", entity=Entity(\"length\"), uri=Picometre)\")"
	},
	"metadata": {},
	"output_type": "display_data"
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "\t\t2.15pm :: two point one five picometres\n\n---------\n\n\n---------\n\n"
	}
	]
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "## Annotating a dataset\n\nCan we extract numbers from sentences in a CSV file? Yes we can..."
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "url = 'https://raw.githubusercontent.com/BBC-Data-Unit/unduly-lenient-sentences/master/ULS+for+Sankey.csv'",
	"execution_count": 174,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "import pandas as pd\n\ndf = pd.read_csv(url)\ndf.head()",
	"execution_count": 175,
	"outputs": [
	{
	"data": {
	"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Year</th>\n <th>Offence category REFINED</th>\n <th>Original sentence (refined)</th>\n <th>Crown Court</th>\n <th>Outcome of Decision</th>\n <th>Revised?</th>\n <th>People</th>\n <th>Top 7</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>2015</td>\n <td>Drug offence</td>\n <td>3 years imprisonment</td>\n <td>Bristol</td>\n <td>Not referred</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2015</td>\n <td>Death or serious injury - unlawful driving</td>\n <td>6 years imprisonment - Disqualified driving - ...</td>\n <td>Portsmouth</td>\n <td>Not referred</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2015</td>\n <td>Sexual offence</td>\n <td>9 months imprisonment suspended for 2 years</td>\n <td>Nottingham</td>\n <td>Out of time</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n </tr>\n <tr>\n <th>3</th>\n <td>2015</td>\n <td>Theft offence</td>\n <td>4 years and 10 months imprisonment - consecuti...</td>\n <td>St Albans</td>\n <td>Not referred</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2015</td>\n <td>Theft offence</td>\n <td>unknown</td>\n <td>unknown</td>\n <td>Not in scheme</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n </tr>\n </tbody>\n</table>\n</div>",
	"text/plain": " Year Offence category REFINED \\\n0 2015 Drug offence \n1 2015 Death or serious injury - unlawful driving \n2 2015 Sexual offence \n3 2015 Theft offence \n4 2015 Theft offence \n\n Original sentence (refined) Crown Court \\\n0 3 years imprisonment Bristol \n1 6 years imprisonment - Disqualified driving - ... Portsmouth \n2 9 months imprisonment suspended for 2 years Nottingham \n3 4 years and 10 months imprisonment - consecuti... St Albans \n4 unknown unknown \n\n Outcome of Decision Revised? People Top 7 \n0 Not referred No 1 Y \n1 Not referred No 1 Y \n2 Out of time No 1 Y \n3 Not referred No 1 Y \n4 Not in scheme No 1 Y "
	},
	"execution_count": 175,
	"metadata": {},
	"output_type": "execute_result"
	}
	]
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "#get a row\ndf.iloc[1]",
	"execution_count": 178,
	"outputs": [
	{
	"data": {
	"text/plain": "Year 2015\nOffence category REFINED Death or serious injury - unlawful driving\nOriginal sentence (refined) 6 years imprisonment - Disqualified driving - ...\nCrown Court Portsmouth\nOutcome of Decision Not referred\nRevised? No\nPeople 1\nTop 7 Y\nName: 1, dtype: object"
	},
	"execution_count": 178,
	"metadata": {},
	"output_type": "execute_result"
	}
	]
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "#and a, erm. sentence...\ndf.iloc[1]['Original sentence (refined)']",
	"execution_count": 179,
	"outputs": [
	{
	"data": {
	"text/plain": "'6 years imprisonment - Disqualified driving - 8 years'"
	},
	"execution_count": 179,
	"metadata": {},
	"output_type": "execute_result"
	}
	]
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "parser.parse(df.iloc[1]['Original sentence (refined)'])",
	"execution_count": 180,
	"outputs": [
	{
	"data": {
	"text/plain": "[Quantity(6, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\"),\n Quantity(8, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")]"
	},
	"execution_count": 180,
	"metadata": {},
	"output_type": "execute_result"
	}
	]
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "def amountify(txt):\n try:\n if txt:\n p = parser.parse(txt)\n x=[]\n for q in p:\n x.append( '{} {}'.format(q.value, q.unit.name))\n return '::'.join(x)\n return ''\n except:\n return",
	"execution_count": 206,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "df['amounts'] = df['Original sentence (refined)'].apply(amountify)",
	"execution_count": 207,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "df.head()",
	"execution_count": 208,
	"outputs": [
	{
	"data": {
	"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Year</th>\n <th>Offence category REFINED</th>\n <th>Original sentence (refined)</th>\n <th>Crown Court</th>\n <th>Outcome of Decision</th>\n <th>Revised?</th>\n <th>People</th>\n <th>Top 7</th>\n <th>amounts</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>2015</td>\n <td>Drug offence</td>\n <td>3 years imprisonment</td>\n <td>Bristol</td>\n <td>Not referred</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n <td>3.0 year</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2015</td>\n <td>Death or serious injury - unlawful driving</td>\n <td>6 years imprisonment - Disqualified driving - ...</td>\n <td>Portsmouth</td>\n <td>Not referred</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n <td>6.0 year::8.0 year</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2015</td>\n <td>Sexual offence</td>\n <td>9 months imprisonment suspended for 2 years</td>\n <td>Nottingham</td>\n <td>Out of time</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n <td>9.0 month::2.0 year</td>\n </tr>\n <tr>\n <th>3</th>\n <td>2015</td>\n <td>Theft offence</td>\n <td>4 years and 10 months imprisonment - consecuti...</td>\n <td>St Albans</td>\n <td>Not referred</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n <td>4.0 year::10.0 month</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2015</td>\n <td>Theft offence</td>\n <td>unknown</td>\n <td>unknown</td>\n <td>Not in scheme</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n <td></td>\n </tr>\n </tbody>\n</table>\n</div>",
	"text/plain": " Year Offence category REFINED \\\n0 2015 Drug offence \n1 2015 Death or serious injury - unlawful driving \n2 2015 Sexual offence \n3 2015 Theft offence \n4 2015 Theft offence \n\n Original sentence (refined) Crown Court \\\n0 3 years imprisonment Bristol \n1 6 years imprisonment - Disqualified driving - ... Portsmouth \n2 9 months imprisonment suspended for 2 years Nottingham \n3 4 years and 10 months imprisonment - consecuti... St Albans \n4 unknown unknown \n\n Outcome of Decision Revised? People Top 7 amounts \n0 Not referred No 1 Y 3.0 year \n1 Not referred No 1 Y 6.0 year::8.0 year \n2 Out of time No 1 Y 9.0 month::2.0 year \n3 Not referred No 1 Y 4.0 year::10.0 month \n4 Not in scheme No 1 Y "
	},
	"execution_count": 208,
	"metadata": {},
	"output_type": "execute_result"
	}
	]
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "We could then do something to split mutliple amounts into mutliple rows or columns..."
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "",
	"execution_count": null,
	"outputs": []
	}
	],
	"metadata": {
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3",
	"language": "python"
	},
	"language_info": {
	"name": "python",
	"version": "3.7.3",
	"mimetype": "text/x-python",
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"pygments_lexer": "ipython3",
	"nbconvert_exporter": "python",
	"file_extension": ".py"
	},
	"gist": {
	"id": "",
	"data": {
	"description": "Example of parsing quantities from sentences",
	"public": true
	}
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}