Created
August 6, 2019 13:35
-
-
Save psychemedia/1b1b795c7cffbc9809da33a703842354 to your computer and use it in GitHub Desktop.
Example of parsing quantities from sentences
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "# Simple Tools from Extracting Quantities from Strings\n\nSuppose we have a report and we want to find the sentences that are talking about numerical things....\n\n*Originally inspired by [When you get data in sentences: how to use a spreadsheet to extract numbers from phrases](https://onlinejournalismblog.com/2019/07/29/when-you-get-data-in-sentences-how-to-use-a-spreadsheet-to-extract-numbers-from-phrases/), Paul Bradshaw, Online Journalism blog, form which some of the example sentences (sic!) are taken.*" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "sentences = [\n '4 years and 6 months’ imprisonment with a licence extension of 2 years and 6 months',\n 'No quantities here',\n 'I measured it as 2 meters and 30 centimeters.',\n \"four years and six months' imprisonment with a licence extension of 2 years and 6 months\",\n 'it cost £250... bargain...',\n 'it weighs four hundred kilograms.',\n 'It weighs 400kg.',\n 'three million, two hundred & forty, you say?',\n 'it weighs four hundred and twenty kilograms.'\n \n]", | |
"execution_count": 152, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## `quantulum3`\n\n[`quantulum3`](https://github.com/nielstron/quantulum3) is a Python package *\"for information extraction of quantities from unstructured text\"*." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "#!pip3 install quantulum3\nfrom quantulum3 import parser", | |
"execution_count": 153, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true, | |
"scrolled": false | |
}, | |
"cell_type": "code", | |
"source": "for sent in sentences:\n print(sent)\n p = parser.parse(sent)\n if p:\n print('\\tSpoken:',parser.inline_parse_and_expand(sent))\n print('\\tNumeric elements:')\n for q in p:\n display(q)\n print('\\t\\t{} :: {}'.format(q.surface, q))\n print('\\n---------\\n')", | |
"execution_count": 154, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "4 years and 6 months’ imprisonment with a licence extension of 2 years and 6 months\n\tSpoken: four years and six months’ imprisonment with a licence extension of two years and six months\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(4, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\t4 years :: four years\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(6, \"Unit(name=\"month\", entity=Entity(\"time\"), uri=Month)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\t6 months :: six months\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(2, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\t2 years :: two years\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(6, \"Unit(name=\"month\", entity=Entity(\"time\"), uri=Month)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\t6 months :: six months\n\n---------\n\nNo quantities here\n\n---------\n\nI measured it as 2 meters and 30 centimeters.\n\tSpoken: I measured it as two metres and thirty centimetres.\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(2, \"Unit(name=\"metre\", entity=Entity(\"length\"), uri=Metre)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\t2 meters :: two metres\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(30, \"Unit(name=\"centimetre\", entity=Entity(\"length\"), uri=Centimetre)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\t30 centimeters :: thirty centimetres\n\n---------\n\nfour years and six months' imprisonment with a licence extension of 2 years and 6 months\n\tSpoken: four years and six months imprisonment with a licence extension of two years and six months\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(4, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\tfour years :: four years\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(6, \"Unit(name=\"month\", entity=Entity(\"time\"), uri=Month)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\tsix months' :: six months\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(2, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\t2 years :: two years\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(6, \"Unit(name=\"month\", entity=Entity(\"time\"), uri=Month)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\t6 months :: six months\n\n---------\n\nit cost £250... bargain...\n\tSpoken: it cost two hundred and fifty pounds sterling, zero pence... bargain...\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(250, \"Unit(name=\"pound sterling\", entity=Entity(\"currency\"), uri=Pound_sterling)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\t£250 :: two hundred and fifty pounds sterling, zero pence\n\n---------\n\nit weighs four hundred kilograms.\n\tSpoken: it weighs four hundred kilograms.\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(400, \"Unit(name=\"kilogram\", entity=Entity(\"mass\"), uri=Kilogram)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\tfour hundred kilograms :: four hundred kilograms\n\n---------\n\nIt weighs 400kg.\n\tSpoken: It weighs four hundred kilograms.\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(400, \"Unit(name=\"kilogram\", entity=Entity(\"mass\"), uri=Kilogram)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\t400kg :: four hundred kilograms\n\n---------\n\nthree million, two hundred & forty, you say?\n\tSpoken: three million, two hundred & forty, you say?\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(3e+06, \"Unit(name=\"dimensionless\", entity=Entity(\"dimensionless\"), uri=Dimensionless_quantity)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\tthree million :: three million\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(200, \"Unit(name=\"dimensionless\", entity=Entity(\"dimensionless\"), uri=Dimensionless_quantity)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\ttwo hundred :: two hundred\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(40, \"Unit(name=\"dimensionless\", entity=Entity(\"dimensionless\"), uri=Dimensionless_quantity)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\tforty :: forty\n\n---------\n\nit weighs four hundred and twenty kilograms.\n\tSpoken: it weighs four hundred and twenty kilograms.\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(420, \"Unit(name=\"kilogram\", entity=Entity(\"mass\"), uri=Kilogram)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\tfour hundred and twenty kilograms :: four hundred and twenty kilograms\n\n---------\n\n" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Finding quantity statements in large texts\n\nIf we have a large blog of text, we might want to quickly skim it for quantity containing sentences, we can do something like the following..." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "import spacy\nnlp = spacy.load('en_core_web_lg', disable = ['ner'])", | |
"execution_count": 155, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "text = '''\nOnce upon a time, there was a thing. The thing weighed forty kilogrammes and cost £250. \nIt was blue. It took forty five minutes to get it home. \nWhat a day that was. I didn't get back until 2.15pm. Then I had some cake for tea.\n'''", | |
"execution_count": 171, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "doc = nlp(text)\nfor sent in doc.sents:\n print(sent)", | |
"execution_count": 172, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\nOnce upon a time, there was a thing.\nThe thing weighed forty kilogrammes and cost £250. \n\nIt was blue.\nIt took forty five minutes to get it home. \n\nWhat a day that was.\nI didn't get back until 2.15pm.\nThen I had some cake for tea.\n\n" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "for sent in doc.sents:\n sent = sent.text\n p = parser.parse(sent)\n if p:\n print('\\tSpoken:',parser.inline_parse_and_expand(sent))\n print('\\tNumeric elements:')\n for q in p:\n display(q)\n print('\\t\\t{} :: {}'.format(q.surface, q))\n print('\\n---------\\n')", | |
"execution_count": 173, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\tSpoken: \nOnce upon one instance, there was a thing.\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(1, \"Unit(name=\"count\", entity=Entity(\"dimensionless\"), uri=Count_data)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\ta time :: one instance\n\n---------\n\n\tSpoken: The thing weighed forty kilograms and cost two hundred and fifty pounds sterling, zero pence. \n\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(40, \"Unit(name=\"kilogram\", entity=Entity(\"mass\"), uri=Kilogram)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\tforty kilogrammes :: forty kilograms\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(250, \"Unit(name=\"pound sterling\", entity=Entity(\"currency\"), uri=Pound_sterling)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\t£250 :: two hundred and fifty pounds sterling, zero pence\n\n---------\n\n\n---------\n\n\tSpoken: It took forty-five minutes to get it home. \n\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(45, \"Unit(name=\"minute of arc\", entity=Entity(\"angle\"), uri=Minute_and_second_of_arc)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\tforty five minutes :: forty-five minutes\n\n---------\n\n\tSpoken: What one day that was.\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(1, \"Unit(name=\"day\", entity=Entity(\"time\"), uri=Day)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\ta day :: one day\n\n---------\n\n\tSpoken: I didn't get back until two point one five picometres.\n\tNumeric elements:\n" | |
}, | |
{ | |
"data": { | |
"text/plain": "Quantity(2.15, \"Unit(name=\"picometre\", entity=Entity(\"length\"), uri=Picometre)\")" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": "\t\t2.15pm :: two point one five picometres\n\n---------\n\n\n---------\n\n" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "## Annotating a dataset\n\nCan we extract numbers from sentences in a CSV file? Yes we can..." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "url = 'https://raw.githubusercontent.com/BBC-Data-Unit/unduly-lenient-sentences/master/ULS+for+Sankey.csv'", | |
"execution_count": 174, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "import pandas as pd\n\ndf = pd.read_csv(url)\ndf.head()", | |
"execution_count": 175, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Year</th>\n <th>Offence category REFINED</th>\n <th>Original sentence (refined)</th>\n <th>Crown Court</th>\n <th>Outcome of Decision</th>\n <th>Revised?</th>\n <th>People</th>\n <th>Top 7</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>2015</td>\n <td>Drug offence</td>\n <td>3 years imprisonment</td>\n <td>Bristol</td>\n <td>Not referred</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2015</td>\n <td>Death or serious injury - unlawful driving</td>\n <td>6 years imprisonment - Disqualified driving - ...</td>\n <td>Portsmouth</td>\n <td>Not referred</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2015</td>\n <td>Sexual offence</td>\n <td>9 months imprisonment suspended for 2 years</td>\n <td>Nottingham</td>\n <td>Out of time</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n </tr>\n <tr>\n <th>3</th>\n <td>2015</td>\n <td>Theft offence</td>\n <td>4 years and 10 months imprisonment - consecuti...</td>\n <td>St Albans</td>\n <td>Not referred</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2015</td>\n <td>Theft offence</td>\n <td>unknown</td>\n <td>unknown</td>\n <td>Not in scheme</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Year Offence category REFINED \\\n0 2015 Drug offence \n1 2015 Death or serious injury - unlawful driving \n2 2015 Sexual offence \n3 2015 Theft offence \n4 2015 Theft offence \n\n Original sentence (refined) Crown Court \\\n0 3 years imprisonment Bristol \n1 6 years imprisonment - Disqualified driving - ... Portsmouth \n2 9 months imprisonment suspended for 2 years Nottingham \n3 4 years and 10 months imprisonment - consecuti... St Albans \n4 unknown unknown \n\n Outcome of Decision Revised? People Top 7 \n0 Not referred No 1 Y \n1 Not referred No 1 Y \n2 Out of time No 1 Y \n3 Not referred No 1 Y \n4 Not in scheme No 1 Y " | |
}, | |
"execution_count": 175, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "#get a row\ndf.iloc[1]", | |
"execution_count": 178, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "Year 2015\nOffence category REFINED Death or serious injury - unlawful driving\nOriginal sentence (refined) 6 years imprisonment - Disqualified driving - ...\nCrown Court Portsmouth\nOutcome of Decision Not referred\nRevised? No\nPeople 1\nTop 7 Y\nName: 1, dtype: object" | |
}, | |
"execution_count": 178, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "#and a, erm. sentence...\ndf.iloc[1]['Original sentence (refined)']", | |
"execution_count": 179, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "'6 years imprisonment - Disqualified driving - 8 years'" | |
}, | |
"execution_count": 179, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "parser.parse(df.iloc[1]['Original sentence (refined)'])", | |
"execution_count": 180, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": "[Quantity(6, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\"),\n Quantity(8, \"Unit(name=\"year\", entity=Entity(\"time\"), uri=Year)\")]" | |
}, | |
"execution_count": 180, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "def amountify(txt):\n try:\n if txt:\n p = parser.parse(txt)\n x=[]\n for q in p:\n x.append( '{} {}'.format(q.value, q.unit.name))\n return '::'.join(x)\n return ''\n except:\n return", | |
"execution_count": 206, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "df['amounts'] = df['Original sentence (refined)'].apply(amountify)", | |
"execution_count": 207, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "df.head()", | |
"execution_count": 208, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Year</th>\n <th>Offence category REFINED</th>\n <th>Original sentence (refined)</th>\n <th>Crown Court</th>\n <th>Outcome of Decision</th>\n <th>Revised?</th>\n <th>People</th>\n <th>Top 7</th>\n <th>amounts</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>2015</td>\n <td>Drug offence</td>\n <td>3 years imprisonment</td>\n <td>Bristol</td>\n <td>Not referred</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n <td>3.0 year</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2015</td>\n <td>Death or serious injury - unlawful driving</td>\n <td>6 years imprisonment - Disqualified driving - ...</td>\n <td>Portsmouth</td>\n <td>Not referred</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n <td>6.0 year::8.0 year</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2015</td>\n <td>Sexual offence</td>\n <td>9 months imprisonment suspended for 2 years</td>\n <td>Nottingham</td>\n <td>Out of time</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n <td>9.0 month::2.0 year</td>\n </tr>\n <tr>\n <th>3</th>\n <td>2015</td>\n <td>Theft offence</td>\n <td>4 years and 10 months imprisonment - consecuti...</td>\n <td>St Albans</td>\n <td>Not referred</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n <td>4.0 year::10.0 month</td>\n </tr>\n <tr>\n <th>4</th>\n <td>2015</td>\n <td>Theft offence</td>\n <td>unknown</td>\n <td>unknown</td>\n <td>Not in scheme</td>\n <td>No</td>\n <td>1</td>\n <td>Y</td>\n <td></td>\n </tr>\n </tbody>\n</table>\n</div>", | |
"text/plain": " Year Offence category REFINED \\\n0 2015 Drug offence \n1 2015 Death or serious injury - unlawful driving \n2 2015 Sexual offence \n3 2015 Theft offence \n4 2015 Theft offence \n\n Original sentence (refined) Crown Court \\\n0 3 years imprisonment Bristol \n1 6 years imprisonment - Disqualified driving - ... Portsmouth \n2 9 months imprisonment suspended for 2 years Nottingham \n3 4 years and 10 months imprisonment - consecuti... St Albans \n4 unknown unknown \n\n Outcome of Decision Revised? People Top 7 amounts \n0 Not referred No 1 Y 3.0 year \n1 Not referred No 1 Y 6.0 year::8.0 year \n2 Out of time No 1 Y 9.0 month::2.0 year \n3 Not referred No 1 Y 4.0 year::10.0 month \n4 Not in scheme No 1 Y " | |
}, | |
"execution_count": 208, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "We could then do something to split mutliple amounts into mutliple rows or columns..." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "", | |
"execution_count": null, | |
"outputs": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3", | |
"language": "python" | |
}, | |
"language_info": { | |
"name": "python", | |
"version": "3.7.3", | |
"mimetype": "text/x-python", | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"pygments_lexer": "ipython3", | |
"nbconvert_exporter": "python", | |
"file_extension": ".py" | |
}, | |
"gist": { | |
"id": "", | |
"data": { | |
"description": "Example of parsing quantities from sentences", | |
"public": true | |
} | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment