mapio/ww.ipynb

## ww.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A simple monotonic grammar for the repeated word language"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is well known that the language $L = \\left\\{ww | w \\in \\{a, b\\}^+ \\right\\}$ is *context-sensitive*, yet many examples of grammars for $L$ (provided in textbooks and on-line) are actually *unrestricted* grammars containing $\\varepsilon$-rules instead of *context-sensitive* grammars.\n",
    "\n",
    "Leveraging the ideas behind such examples, here I design a simple *monotonic* grammar for $L$ in [Kuroda normal form](https://en.wikipedia.org/wiki/Kuroda_normal_form) that can be easily translated to a *context-sensitive* grammar using the György Révész transformation.\n",
    "\n",
    "One of the simplest and more common approach to obtain $L$ is to generate $ww^R$ and then reverse the second half of the word. More precisely, the reversed part is generated as a non-terminal word $W \\in \\{A, B\\}^+$; from $wW^R$ a set of \"swapping\" productions of the kind $xX \\rightarrow Xx$ are then used to reverse the non termnial part (and $A$ and $B$ are turned to $a$ and $b$). So far, so good: the grammar is in Kuroda normal form and it would be a straightforward exercise to turn it to a *context-free* grammar.\n",
    "\n",
    "But terminal symbols from the second half of the word should not be mixed with ones from the first half by the swapping productions: a nonterminal *marker* is usually added (to generate $w\\$W^R$) so that swapping happens just to the right of it. Such marker is eventually disposed with a $\\varepsilon$-rule. Eliminating such rule (that makes the grammar *unrestricted* and not *context-free*, nor *monotone*) requires a bit more ingenuity, since there is no well know textbook transformation for the general case.\n",
    "\n",
    "Observe on passing that similar approaches put the marker at the and and reverse the swapping productions, or use two markers moving towards each other (initially at the middle and end of the reversed nonterminal word) and dispose of them when they get to touch. But shrinking the marker (or markers) seems to be ubiquitous in all suggested solutions.\n",
    " \n",
    "To cut a long story short, one of the most suggested and simple *unrestricted* grammars $G$ for $L$ is given by the set of productions\n",
    "$$\n",
    "\\begin{align*} \n",
    "S &\\rightarrow a S A | b S B | \\$ \\\\\n",
    "\\$ &\\rightarrow \\varepsilon \\\\\n",
    "\\$ A &\\rightarrow \\$ a \\\\\n",
    "\\$ B &\\rightarrow \\$ b \\\\\n",
    "a A &\\rightarrow A a \\\\\n",
    "a B &\\rightarrow B a \\\\\n",
    "b A &\\rightarrow A b \\\\\n",
    "b B &\\rightarrow B b \\\\\n",
    "\\end{align*}\n",
    "$$\n",
    "\n",
    "It is easy to check that this works, informally: \n",
    "\n",
    "* the productions involving $S$ generate $w\\$W^R$,\n",
    "* the marker can only be disposed as the last nonterminal (otherwise $A$\n",
    "  and $B$ will never be removed from the sentential form),\n",
    "* only the leftmost $A$ or $B$ can be converted to a teminal (when it is\n",
    "  next to the marker), such terminal can then only travel to the right,\n",
    "  possibly pushing other nonterminal to the left, but never corss any\n",
    "  other terminal.\n",
    "\n",
    "So how to dispose of the marker without shrinking?\n",
    "\n",
    "The idea is to turn the $\\varepsilon$-rule to $\\$\\rightarrow a|b$. To keep the final word of even length, one can use two markers starting the swap process from $w\\$W^R\\$$, but turning *both* the occurences of the marker to the *same* terminal seems too much for a *context-sensitive* grammar. \n",
    "\n",
    "On the other hand, if one substitutes the first two lines of $G$ with\n",
    "\n",
    "$$\n",
    "\\begin{align*} \n",
    "S &\\rightarrow R $ \\\\\n",
    "R &\\rightarrow a R A | b R B | \\$ \\\\\n",
    "\\$ &\\rightarrow a\n",
    "\\end{align*}\n",
    "$$\n",
    "\n",
    "obtains a grammar $G_a$ that, as it is easy to check, generates $L_a = \\left\\{wawa | w \\in \\{a, b\\}^+ \\right\\}$. Similarly one can obtain a grammar $G_b$ for $L_b = \\left\\{wbwb | w \\in \\{a, b\\}^+ \\right\\}$. Since $L = L_a \\cup L_b$ deriving a *context-free* grammar for $L$ using $G_a$ and $G_b$ is again a straightforward exercise.\n",
    "\n",
    "Putting everithing together:\n",
    "\n",
    "$$\n",
    "\\begin{align*} \n",
    "S &\\rightarrow R_a\\$_a | R_b\\$_b \\\\\n",
    "R_a &\\rightarrow a R_a A | b R_a B | \\$_a \\\\\n",
    "R_b &\\rightarrow a R_b A | b R_b B | \\$_b \\\\\n",
    "\\$_a &\\rightarrow a \\\\\n",
    "\\$_b &\\rightarrow b \\\\\n",
    "\\$_a A &\\rightarrow \\$_a a \\\\\n",
    "\\$_a B &\\rightarrow \\$_a b \\\\\n",
    "\\$_b A &\\rightarrow \\$_b a \\\\\n",
    "\\$_b B &\\rightarrow \\$_b b \\\\\n",
    "a A &\\rightarrow A a \\\\\n",
    "a B &\\rightarrow B a \\\\\n",
    "b A &\\rightarrow A b \\\\\n",
    "b B &\\rightarrow B b \\\\\n",
    "\\end{align*}\n",
    "$$\n",
    "\n",
    "This grammar generates $L = L_a \\cup L_b$ and is in Kuroda normal form."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# A simple monotonic grammar for the repeated word language"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"It is well known that the language $L = \\left\\{ww \| w \\in \\{a, b\\}^+ \\right\\}$ is context-sensitive, yet many examples of grammars for $L$ (provided in textbooks and on-line) are actually unrestricted grammars containing $\\varepsilon$-rules instead of context-sensitive grammars.\n",
	"\n",
	"Leveraging the ideas behind such examples, here I design a simple monotonic grammar for $L$ in [Kuroda normal form](https://en.wikipedia.org/wiki/Kuroda_normal_form) that can be easily translated to a context-sensitive grammar using the György Révész transformation.\n",
	"\n",
	"One of the simplest and more common approach to obtain $L$ is to generate $ww^R$ and then reverse the second half of the word. More precisely, the reversed part is generated as a non-terminal word $W \\in \\{A, B\\}^+$; from $wW^R$ a set of \"swapping\" productions of the kind $xX \\rightarrow Xx$ are then used to reverse the non termnial part (and $A$ and $B$ are turned to $a$ and $b$). So far, so good: the grammar is in Kuroda normal form and it would be a straightforward exercise to turn it to a context-free grammar.\n",
	"\n",
	"But terminal symbols from the second half of the word should not be mixed with ones from the first half by the swapping productions: a nonterminal marker is usually added (to generate $w\\$W^R$) so that swapping happens just to the right of it. Such marker is eventually disposed with a $\\varepsilon$-rule. Eliminating such rule (that makes the grammar unrestricted and not context-free, nor monotone) requires a bit more ingenuity, since there is no well know textbook transformation for the general case.\n",
	"\n",
	"Observe on passing that similar approaches put the marker at the and and reverse the swapping productions, or use two markers moving towards each other (initially at the middle and end of the reversed nonterminal word) and dispose of them when they get to touch. But shrinking the marker (or markers) seems to be ubiquitous in all suggested solutions.\n",
	" \n",
	"To cut a long story short, one of the most suggested and simple unrestricted grammars $G$ for $L$ is given by the set of productions\n",
	"$$\n",
	"\\begin{align*} \n",
	"S &\\rightarrow a S A \| b S B \| \\$ \\\\\n",
	"\\$ &\\rightarrow \\varepsilon \\\\\n",
	"\\$ A &\\rightarrow \\$ a \\\\\n",
	"\\$ B &\\rightarrow \\$ b \\\\\n",
	"a A &\\rightarrow A a \\\\\n",
	"a B &\\rightarrow B a \\\\\n",
	"b A &\\rightarrow A b \\\\\n",
	"b B &\\rightarrow B b \\\\\n",
	"\\end{align*}\n",
	"$$\n",
	"\n",
	"It is easy to check that this works, informally: \n",
	"\n",
	"* the productions involving $S$ generate $w\\$W^R$,\n",
	"* the marker can only be disposed as the last nonterminal (otherwise $A$\n",
	" and $B$ will never be removed from the sentential form),\n",
	"* only the leftmost $A$ or $B$ can be converted to a teminal (when it is\n",
	" next to the marker), such terminal can then only travel to the right,\n",
	" possibly pushing other nonterminal to the left, but never corss any\n",
	" other terminal.\n",
	"\n",
	"So how to dispose of the marker without shrinking?\n",
	"\n",
	"The idea is to turn the $\\varepsilon$-rule to $\\$\\rightarrow a\|b$. To keep the final word of even length, one can use two markers starting the swap process from $w\\$W^R\\$$, but turning both the occurences of the marker to the same terminal seems too much for a context-sensitive grammar. \n",
	"\n",
	"On the other hand, if one substitutes the first two lines of $G$ with\n",
	"\n",
	"$$\n",
	"\\begin{align*} \n",
	"S &\\rightarrow R $ \\\\\n",
	"R &\\rightarrow a R A \| b R B \| \\$ \\\\\n",
	"\\$ &\\rightarrow a\n",
	"\\end{align*}\n",
	"$$\n",
	"\n",
	"obtains a grammar $G_a$ that, as it is easy to check, generates $L_a = \\left\\{wawa \| w \\in \\{a, b\\}^+ \\right\\}$. Similarly one can obtain a grammar $G_b$ for $L_b = \\left\\{wbwb \| w \\in \\{a, b\\}^+ \\right\\}$. Since $L = L_a \\cup L_b$ deriving a context-free grammar for $L$ using $G_a$ and $G_b$ is again a straightforward exercise.\n",
	"\n",
	"Putting everithing together:\n",
	"\n",
	"$$\n",
	"\\begin{align*} \n",
	"S &\\rightarrow R_a\\$_a \| R_b\\$_b \\\\\n",
	"R_a &\\rightarrow a R_a A \| b R_a B \| \\$_a \\\\\n",
	"R_b &\\rightarrow a R_b A \| b R_b B \| \\$_b \\\\\n",
	"\\$_a &\\rightarrow a \\\\\n",
	"\\$_b &\\rightarrow b \\\\\n",
	"\\$_a A &\\rightarrow \\$_a a \\\\\n",
	"\\$_a B &\\rightarrow \\$_a b \\\\\n",
	"\\$_b A &\\rightarrow \\$_b a \\\\\n",
	"\\$_b B &\\rightarrow \\$_b b \\\\\n",
	"a A &\\rightarrow A a \\\\\n",
	"a B &\\rightarrow B a \\\\\n",
	"b A &\\rightarrow A b \\\\\n",
	"b B &\\rightarrow B b \\\\\n",
	"\\end{align*}\n",
	"$$\n",
	"\n",
	"This grammar generates $L = L_a \\cup L_b$ and is in Kuroda normal form."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.2"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}