Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Notebook showing example NLP marking support tool for free text answers against a specimen answer
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Free Text Marking Helper\n",
"\n",
"One of the issues faced by script markers marking free text examples against a canned answer.\n",
"\n",
"The following recipe provides a quick and dirty similarity check between a provvded answer and a specimen answer using a small `spacy` natural language model, `en_core_web_sm`.\n",
"\n",
"Performance of the similarity matcher may be improved by using a larger model (eg the medium size `en_core_web_md` model or the full `en_core_web_lg` model), but these are rather slower to downlad, install and run.\n",
"\n",
"In an *nbgrader* context, we can do something like the following in a question cell defined as part of a manually graded test:\n",
"\n",
"\n",
"\n",
"```python\n",
"# Add your commentary here\n",
"\n",
"#For now, the specimen answer needs to be assigned to a specific variable\n",
"#It would make more sense to define an %%assignment_text block magic that lets you write some\n",
"# text in a code cell and then let the magic assign this to a variable that can then be automatically tested.\n",
"\n",
"answer_txt = ''' \n",
"YOUR ANSWER TEXT\n",
"'''\n",
"```\n",
"\n",
"and then in the associated test cell assign the specimen answer to `___specimen_txt`:\n",
"\n",
"```python\n",
"\n",
"### BEGIN HIDDEN TESTS\n",
"___specimen_txt = '''\n",
"As much English wine's consumed as each of Bordeaux red and Burgundy white.\n",
"The biggest spike is on the Reception Wines, and we don't really know anything\n",
" about the provenance of those wines.\n",
"\n",
"So given the global pre-eminence of French wines, maybe HoC\n",
" aren't doing too bad a job of making the case for English\n",
" and Welsh.\n",
"'''\n",
"\n",
"lang_test(___specimen_txt, answer_txt)\n",
"\n",
"### END HIDDEN TESTS\n",
"\n",
"```\n",
"\n",
"Once again, it may make more sense to define some magic to handle this, such as `%%mark_against_specimen` that could take a block of answer text, use that as the basis of comparison and display or return the result directly.\n",
"\n",
"The marker can then review the automated marking support similarity grid, assign an appropriate manual mark.\n",
"\n",
"It might also be worth considering whether similarity marks should be fed back to the student or used in support of generating (or at least drafting) canned feedback."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Essential Imports\n",
"\n",
"Import the necessary model (note: `spacy` is required for the similarity checker and `pandas` used, overkill style, to display the result)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"#!pip install spacy\n",
"#!pip install pandas\n",
"import spacy\n",
"try:\n",
" import en_core_web_sm\n",
"except:\n",
" import spacy.cli\n",
" spacy.cli.download(\"en_core_web_sm\")\n",
" import en_core_web_sm\n",
"\n",
"import warnings\n",
"from spacy.errors import ModelsWarning\n",
"warnings.filterwarnings(\"ignore\",category=ModelsWarning)\n",
"\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"#Loading the model may take some time\n",
"nlp = en_core_web_sm.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test and Report\n",
"\n",
"Generate a couple of reports:\n",
"\n",
"- an overall similarity score between the answer text and the provided specimen answer;\n",
"- a similarity score between each sentence in the answer text and each sentence in the specimen.\n",
"\n",
"\n",
"We can add a limit value to the marker so that only sentences with similarity matches between supplied and specimen answer sentence exceeding a minimum value are displayed. (This can mean where a supplied answer is spread over sever sentences compared to the specimen answer"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import HTML\n",
"\n",
"def lang_test(__specimen, answer_txt, threshold = 0, show_side_by_side=True, retval=False):\n",
"\n",
" ___specimen = nlp(___specimen_txt.replace('\\n',' '))\n",
" ___answer = nlp(answer_txt.replace('\\n',' '))\n",
"\n",
" # It doesn't matter which order we present the parsed sentences for similarity scoring\n",
" display(HTML('<h3>Similarity score overall</h3><p>Overall similarity score: {}</p>'.format(___specimen.similarity(___answer))))\n",
"\n",
" # Another approach may be to split answer into sentences,\n",
" # and the compare each sentence against each sentence in the specimen,\n",
" # perhaps retaining the best match score for each answer sentence?\n",
" ___sentence_scores = pd.DataFrame(columns=['Answer', 'Similarity', 'Specimen',\n",
" 'Answer (no stopwords)', 'Specimen (no stopwords)'])\n",
" \n",
" ___sentence_full_scores = pd.DataFrame(columns=['Answer', 'Similarity Overall'])\n",
"\n",
" for ___answer_sent in ___answer.sents:\n",
" ___answer_sent_nostop = nlp(' '.join([token.text for token in ___answer_sent \n",
" if not token.is_stop and not token.is_punct]))\n",
" \n",
" ___sentence_full_scores = pd.concat([___sentence_full_scores,\n",
" pd.DataFrame({'Answer':[___answer_sent.text],\n",
" 'Similarity Overall':[___specimen.similarity(___answer_sent_nostop)]\n",
" })])\n",
" \n",
" for ___specimen_sent in ___specimen.sents:\n",
" ___specimen_sent_nostop = nlp(' '.join([token.text for token in ___specimen_sent \n",
" if not token.is_stop and not token.is_punct]))\n",
" #print('\\nComparing:', ___answer_sent, '\\nvs.\\n', ___specimen_sent,\n",
" # '\\nMark:\\t',___specimen_sent.similarity(___answer_sent) )\n",
" ___sentence_scores = pd.concat([___sentence_scores,\n",
" pd.DataFrame({'Answer':[___answer_sent.text],\n",
" 'Similarity': [___specimen_sent_nostop.similarity(___answer_sent_nostop)],\n",
" 'Specimen': [___specimen_sent.text],\n",
" 'Answer (no stopwords)': [___answer_sent_nostop.text],\n",
" 'Specimen (no stopwords)': [___specimen_sent_nostop.text]\n",
" })])\n",
" \n",
" ___pd_default_colwidth = pd.get_option('display.max_colwidth')\n",
" pd.set_option('display.max_colwidth', -1)\n",
" \n",
" if show_side_by_side:\n",
" display(HTML('''<table><thead><td>Provided</td><td>Specimen</td></thead>\n",
" <tr><td>{}</td><td>{}</td></tr></table>'''.format(answer_txt,__specimen)))\n",
" \n",
" display(HTML('<h3>Sentence level similarity with full specimen answer</h3>'))\n",
" display(___sentence_full_scores[___sentence_full_scores['Similarity Overall']>threshold])\n",
" \n",
" display(HTML('<h3>Sentence level matching</h3>'))\n",
" ___sentence_scores.set_index(['Answer', 'Specimen'], inplace=True)\n",
" display(___sentence_scores[___sentence_scores['Similarity']>threshold])\n",
" \n",
" \n",
" pd.set_option('display.max_colwidth', ___pd_default_colwidth)\n",
" \n",
" if retval:\n",
" return pd"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"answer_txt = '''\n",
"Reception wine was the most consumed, but we donlt know where that comes from.\n",
"\n",
"English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux.\n",
"'''"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<h3>Similarity score overall</h3><p>Overall similarity score: 0.81345909359606</p>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<table><thead><td>Provided</td><td>Specimen</td></thead>\n",
" <tr><td>\n",
"Reception wine was the most consumed, but we donlt know where that comes from.\n",
"\n",
"English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux.\n",
"</td><td>\n",
"As much English wine's consumed as each of Bordeaux red and Burgundy white.\n",
"The biggest spike is on the Reception Wines, and we don't really know anything\n",
" about the provenance of those wines.\n",
"\n",
"So given the global pre-eminence of French wines, maybe HoC\n",
" aren't doing too bad a job of making the case for English\n",
" and Welsh.\n",
"</td></tr></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<h3>Sentence level similarity with full specimen answer</h3>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Answer</th>\n",
" <th>Similarity Overall</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Reception wine was the most consumed, but we donlt know where that comes from.</td>\n",
" <td>0.527520</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux.</td>\n",
" <td>0.558373</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Answer \\\n",
"0 Reception wine was the most consumed, but we donlt know where that comes from. \n",
"0 English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux. \n",
"\n",
" Similarity Overall \n",
"0 0.527520 \n",
"0 0.558373 "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<h3>Sentence level matching</h3>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Similarity</th>\n",
" <th>Answer (no stopwords)</th>\n",
" <th>Specimen (no stopwords)</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Answer</th>\n",
" <th>Specimen</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">Reception wine was the most consumed, but we donlt know where that comes from.</th>\n",
" <th>As much English wine's consumed as each of Bordeaux red and Burgundy white.</th>\n",
" <td>0.656954</td>\n",
" <td>Reception wine consumed donlt know comes</td>\n",
" <td>English wine consumed Bordeaux red Burgundy white</td>\n",
" </tr>\n",
" <tr>\n",
" <th>The biggest spike is on the Reception Wines, and we don't really know anything about the provenance of those wines.</th>\n",
" <td>0.637424</td>\n",
" <td>Reception wine consumed donlt know comes</td>\n",
" <td>biggest spike Reception Wines know provenance wines</td>\n",
" </tr>\n",
" <tr>\n",
" <th>So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh.</th>\n",
" <td>0.722088</td>\n",
" <td>Reception wine consumed donlt know comes</td>\n",
" <td>given global pre eminence French wines maybe HoC bad job making case English Welsh</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux.</th>\n",
" <th>As much English wine's consumed as each of Bordeaux red and Burgundy white.</th>\n",
" <td>0.835747</td>\n",
" <td>English Welsh wines consumed followed white Burgundy red Bordeaux</td>\n",
" <td>English wine consumed Bordeaux red Burgundy white</td>\n",
" </tr>\n",
" <tr>\n",
" <th>The biggest spike is on the Reception Wines, and we don't really know anything about the provenance of those wines.</th>\n",
" <td>0.443787</td>\n",
" <td>English Welsh wines consumed followed white Burgundy red Bordeaux</td>\n",
" <td>biggest spike Reception Wines know provenance wines</td>\n",
" </tr>\n",
" <tr>\n",
" <th>So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh.</th>\n",
" <td>0.671232</td>\n",
" <td>English Welsh wines consumed followed white Burgundy red Bordeaux</td>\n",
" <td>given global pre eminence French wines maybe HoC bad job making case English Welsh</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Similarity \\\n",
"Answer Specimen \n",
" Reception wine was the most consumed, but we donlt know where that comes from. As much English wine's consumed as each of Bordeaux red and Burgundy white. 0.656954 \n",
" The biggest spike is on the Reception Wines, and we don't really know anything about the provenance of those wines. 0.637424 \n",
" So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. 0.722088 \n",
"English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux. As much English wine's consumed as each of Bordeaux red and Burgundy white. 0.835747 \n",
" The biggest spike is on the Reception Wines, and we don't really know anything about the provenance of those wines. 0.443787 \n",
" So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. 0.671232 \n",
"\n",
" Answer (no stopwords) \\\n",
"Answer Specimen \n",
" Reception wine was the most consumed, but we donlt know where that comes from. As much English wine's consumed as each of Bordeaux red and Burgundy white. Reception wine consumed donlt know comes \n",
" The biggest spike is on the Reception Wines, and we don't really know anything about the provenance of those wines. Reception wine consumed donlt know comes \n",
" So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. Reception wine consumed donlt know comes \n",
"English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux. As much English wine's consumed as each of Bordeaux red and Burgundy white. English Welsh wines consumed followed white Burgundy red Bordeaux \n",
" The biggest spike is on the Reception Wines, and we don't really know anything about the provenance of those wines. English Welsh wines consumed followed white Burgundy red Bordeaux \n",
" So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. English Welsh wines consumed followed white Burgundy red Bordeaux \n",
"\n",
" Specimen (no stopwords) \n",
"Answer Specimen \n",
" Reception wine was the most consumed, but we donlt know where that comes from. As much English wine's consumed as each of Bordeaux red and Burgundy white. English wine consumed Bordeaux red Burgundy white \n",
" The biggest spike is on the Reception Wines, and we don't really know anything about the provenance of those wines. biggest spike Reception Wines know provenance wines \n",
" So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. given global pre eminence French wines maybe HoC bad job making case English Welsh \n",
"English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux. As much English wine's consumed as each of Bordeaux red and Burgundy white. English wine consumed Bordeaux red Burgundy white \n",
" The biggest spike is on the Reception Wines, and we don't really know anything about the provenance of those wines. biggest spike Reception Wines know provenance wines \n",
" So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. given global pre eminence French wines maybe HoC bad job making case English Welsh "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"___specimen_txt = '''\n",
"As much English wine's consumed as each of Bordeaux red and Burgundy white.\n",
"The biggest spike is on the Reception Wines, and we don't really know anything\n",
" about the provenance of those wines.\n",
"\n",
"So given the global pre-eminence of French wines, maybe HoC\n",
" aren't doing too bad a job of making the case for English\n",
" and Welsh.\n",
"'''\n",
"\n",
"lang_test(___specimen_txt, answer_txt)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<h3>Similarity score overall</h3><p>Overall similarity score: 0.81345909359606</p>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<table><thead><td>Provided</td><td>Specimen</td></thead>\n",
" <tr><td>\n",
"Reception wine was the most consumed, but we donlt know where that comes from.\n",
"\n",
"English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux.\n",
"</td><td>\n",
"As much English wine's consumed as each of Bordeaux red and Burgundy white.\n",
"The biggest spike is on the Reception Wines, and we don't really know anything\n",
" about the provenance of those wines.\n",
"\n",
"So given the global pre-eminence of French wines, maybe HoC\n",
" aren't doing too bad a job of making the case for English\n",
" and Welsh.\n",
"</td></tr></table>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<h3>Sentence level similarity with full specimen answer</h3>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Answer</th>\n",
" <th>Similarity Overall</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [Answer, Similarity Overall]\n",
"Index: []"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<h3>Sentence level matching</h3>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Similarity</th>\n",
" <th>Answer (no stopwords)</th>\n",
" <th>Specimen (no stopwords)</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Answer</th>\n",
" <th>Specimen</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Reception wine was the most consumed, but we donlt know where that comes from.</th>\n",
" <th>So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh.</th>\n",
" <td>0.722088</td>\n",
" <td>Reception wine consumed donlt know comes</td>\n",
" <td>given global pre eminence French wines maybe HoC bad job making case English Welsh</td>\n",
" </tr>\n",
" <tr>\n",
" <th>English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux.</th>\n",
" <th>As much English wine's consumed as each of Bordeaux red and Burgundy white.</th>\n",
" <td>0.835747</td>\n",
" <td>English Welsh wines consumed followed white Burgundy red Bordeaux</td>\n",
" <td>English wine consumed Bordeaux red Burgundy white</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Similarity \\\n",
"Answer Specimen \n",
" Reception wine was the most consumed, but we donlt know where that comes from. So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. 0.722088 \n",
"English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux. As much English wine's consumed as each of Bordeaux red and Burgundy white. 0.835747 \n",
"\n",
" Answer (no stopwords) \\\n",
"Answer Specimen \n",
" Reception wine was the most consumed, but we donlt know where that comes from. So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. Reception wine consumed donlt know comes \n",
"English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux. As much English wine's consumed as each of Bordeaux red and Burgundy white. English Welsh wines consumed followed white Burgundy red Bordeaux \n",
"\n",
" Specimen (no stopwords) \n",
"Answer Specimen \n",
" Reception wine was the most consumed, but we donlt know where that comes from. So given the global pre-eminence of French wines, maybe HoC aren't doing too bad a job of making the case for English and Welsh. given global pre eminence French wines maybe HoC bad job making case English Welsh \n",
"English and Welsh wines are the next most consumed, followed by white Burgundy and red Bordeaux. As much English wine's consumed as each of Bordeaux red and Burgundy white. English wine consumed Bordeaux red Burgundy white "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"lang_test(___specimen_txt, answer_txt, 0.7)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"celltoolbar": "Create Assignment",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.