jonathaneunice/DedentDeadCode.ipynb Secret

## DedentDeadCode.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`textwrap` has almost 100% test coverage. Close, but no cigar. Several key bits of logic remain untested. I set out to fix this. I've submitted PRs closing 2 of the remaining 3 gaps ([bpo-30591](https://github.com/python/cpython/pull/1988) and [bpo-30603](https://github.com/python/cpython/pull/2014)). Just one small segment left to go! But, studying that segment, I discovered it's bogus logic. It can never execute. \n",
    "\n",
    "Below is a deductive proof of this claim, plus the results of emprical checks. The associated PR  deletes those two spurious lines."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The key `textwrap.dedent()` code in question, with logical names superimposed. Other than names, exactly as found in [textwrap in the CPython repo](https://github.com/python/cpython/blob/5edf827c8052958b9d293f75ce8d93b66c1d58da/Lib/textwrap.py#L432-L454).\n",
    "    \n",
    "![code](https://content.screencast.com/users/jonathaneunice/folders/Jing/media/78db0983-f3ef-49d2-ba56-275ea0f6e78d/00000577.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deductive Proof\n",
    "\n",
    "All of the conditions that could lead to the Clause 4 loop `else` are precluded. Here are the steps to that conclusion, using `typing` notation.\n",
    "\n",
    "* `margin == None` to start, and by the end of the first iteration of the main loop, has narrowed from `margin: str | None` to `margin: str` (namely, `indents[0]`).\n",
    "* `indents: List[str]` because it's assigned from `re.findall() -> List[str]`.\n",
    "* `indent: str` because it's assigned iteratively via `for` from `indents: List[str]`.\n",
    "* Clause 2 and Clause 3 definitively handle the cases where `margin` or `indent` are prefixes of one another.\n",
    "* Therefore by Clause 4, neither `margin` nor `indent` is a prefix of the other. \n",
    "* By Clause 4, `margin != '' and indent != ''`. Both contain characters and have a non-zero length. Had either been an empty string, it would have been considered a legitimate (zero-length) prefix of the other, and handled by either Clause 2 or Clause 3. `s.startswith('') == True` for any `s: str`. \n",
    "* Because `margin` and `indent` are both non-empty strings, `zip`  will pairwise iterate down their respective characters to the length of the shortest. Let's call that point `k` where `k = min(len(margin), len(indent))`. The Clause 4 loop will iterate `i` through `range(k)`, setting `x == margin[i] and y == indent[i]` at each step. \n",
    "* At some point, `x` and `y` **must** differ. If they did not, either `margin` would be a proper prefix of `indent`, or `indent` would be a proper prefix of `margin`, which we have already established would handled in Clause 2 or 3, and would not enter Clause 4. So there will always be an `0 <= i < k` such that `x != y`. The loop will therefore **always** exit via `break`. Because the loop exit via `break`, the `else` of the Clause 4 `for` can never be executed.\n",
    "* The only way a `for` loop `else` clause can be triggered is exhaustion of the iteration. Either (a) its iterable is empty at the outset, so the `else` is immediately triggered, or (b) the iterable is non-empty and the loop naturally exhausts it through iteration without a `break`. We have shown that neither (a) nor (b) can happen. (a) cannot happen because neither `margin` nor `indent` are empty strings. There will always be some characters to co-iterate over. But neither can (b) happen because we've shown that there must be some point at which `x` and `y` differ (else one would have been a substring, and previously handled prior to Clause 4).\n",
    "* Ergo the Clause 4 `else: margin = margin[:len(indent)]` is dead code, which onlly appears to be live code. It can never execute."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Randomized and Empirical Testing\n",
    "\n",
    "Humbly remembering that even apparent proofs are occasionally faulty, and not wanting to be caught out or surprised by any magical or oddball cases that escaped the above logic, I ran two kinds of empirical verification.\n",
    "\n",
    "I added a `raise RuntimeError` to the suspect `else`.\n",
    "\n",
    "First I randomly generated texts with varying amounts and composition of leading whitespace, and called `dedent()` on those, looking to trigger the exception. After 500 million attempts, the exception was never triggered.\n",
    "\n",
    "I separately ran `dedent` on the contents of text-ish files from my development system. These include numerous files from Web (HTML, CSS, JS, SVG, Less, SCSS, ...), JavaScript, Java, Python, Perl, PHP, Ruby, SQL, XML, Unix (shell, config, logging...), data and configuration (JSON, YAML, INI, ...), authoring (Markdown, RST, TXT...), C, and DevOps communities.\n",
    "None of those >1.2 million files triggered the exception. \n",
    "\n",
    "It's impossible to absolutely \"prove a negative\" through empirical means. One can always imagine testing against a larger random or actual corpus. But that such large bodies of likely suspects never jiggled the tripwire adds confidence that the suspect code lines simply never execute under any input conditions, just as the proof says."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "Having tried constructively to identify any cases where the `else` in the Clause 4 loop would execute, I've deductively shown it never will. Arguing in the alternative, I tried large bodies of both randomized and empirical data to find a counter-example; after extensive tests, no counter-examples were found. I therefore assert it is dead code, can never be tested or executed, and should be removed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"`textwrap` has almost 100% test coverage. Close, but no cigar. Several key bits of logic remain untested. I set out to fix this. I've submitted PRs closing 2 of the remaining 3 gaps ([bpo-30591](https://github.com/python/cpython/pull/1988) and [bpo-30603](https://github.com/python/cpython/pull/2014)). Just one small segment left to go! But, studying that segment, I discovered it's bogus logic. It can never execute. \n",
	"\n",
	"Below is a deductive proof of this claim, plus the results of emprical checks. The associated PR deletes those two spurious lines."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The key `textwrap.dedent()` code in question, with logical names superimposed. Other than names, exactly as found in [textwrap in the CPython repo](https://github.com/python/cpython/blob/5edf827c8052958b9d293f75ce8d93b66c1d58da/Lib/textwrap.py#L432-L454).\n",
	" \n",
	"![code](https://content.screencast.com/users/jonathaneunice/folders/Jing/media/78db0983-f3ef-49d2-ba56-275ea0f6e78d/00000577.png)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Deductive Proof\n",
	"\n",
	"All of the conditions that could lead to the Clause 4 loop `else` are precluded. Here are the steps to that conclusion, using `typing` notation.\n",
	"\n",
	"* `margin == None` to start, and by the end of the first iteration of the main loop, has narrowed from `margin: str \| None` to `margin: str` (namely, `indents[0]`).\n",
	"* `indents: List[str]` because it's assigned from `re.findall() -> List[str]`.\n",
	"* `indent: str` because it's assigned iteratively via `for` from `indents: List[str]`.\n",
	"* Clause 2 and Clause 3 definitively handle the cases where `margin` or `indent` are prefixes of one another.\n",
	"* Therefore by Clause 4, neither `margin` nor `indent` is a prefix of the other. \n",
	"* By Clause 4, `margin != '' and indent != ''`. Both contain characters and have a non-zero length. Had either been an empty string, it would have been considered a legitimate (zero-length) prefix of the other, and handled by either Clause 2 or Clause 3. `s.startswith('') == True` for any `s: str`. \n",
	"* Because `margin` and `indent` are both non-empty strings, `zip` will pairwise iterate down their respective characters to the length of the shortest. Let's call that point `k` where `k = min(len(margin), len(indent))`. The Clause 4 loop will iterate `i` through `range(k)`, setting `x == margin[i] and y == indent[i]` at each step. \n",
	"* At some point, `x` and `y` must differ. If they did not, either `margin` would be a proper prefix of `indent`, or `indent` would be a proper prefix of `margin`, which we have already established would handled in Clause 2 or 3, and would not enter Clause 4. So there will always be an `0 <= i < k` such that `x != y`. The loop will therefore always exit via `break`. Because the loop exit via `break`, the `else` of the Clause 4 `for` can never be executed.\n",
	"* The only way a `for` loop `else` clause can be triggered is exhaustion of the iteration. Either (a) its iterable is empty at the outset, so the `else` is immediately triggered, or (b) the iterable is non-empty and the loop naturally exhausts it through iteration without a `break`. We have shown that neither (a) nor (b) can happen. (a) cannot happen because neither `margin` nor `indent` are empty strings. There will always be some characters to co-iterate over. But neither can (b) happen because we've shown that there must be some point at which `x` and `y` differ (else one would have been a substring, and previously handled prior to Clause 4).\n",
	"* Ergo the Clause 4 `else: margin = margin[:len(indent)]` is dead code, which onlly appears to be live code. It can never execute."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Randomized and Empirical Testing\n",
	"\n",
	"Humbly remembering that even apparent proofs are occasionally faulty, and not wanting to be caught out or surprised by any magical or oddball cases that escaped the above logic, I ran two kinds of empirical verification.\n",
	"\n",
	"I added a `raise RuntimeError` to the suspect `else`.\n",
	"\n",
	"First I randomly generated texts with varying amounts and composition of leading whitespace, and called `dedent()` on those, looking to trigger the exception. After 500 million attempts, the exception was never triggered.\n",
	"\n",
	"I separately ran `dedent` on the contents of text-ish files from my development system. These include numerous files from Web (HTML, CSS, JS, SVG, Less, SCSS, ...), JavaScript, Java, Python, Perl, PHP, Ruby, SQL, XML, Unix (shell, config, logging...), data and configuration (JSON, YAML, INI, ...), authoring (Markdown, RST, TXT...), C, and DevOps communities.\n",
	"None of those >1.2 million files triggered the exception. \n",
	"\n",
	"It's impossible to absolutely \"prove a negative\" through empirical means. One can always imagine testing against a larger random or actual corpus. But that such large bodies of likely suspects never jiggled the tripwire adds confidence that the suspect code lines simply never execute under any input conditions, just as the proof says."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Conclusion\n",
	"\n",
	"Having tried constructively to identify any cases where the `else` in the Clause 4 loop would execute, I've deductively shown it never will. Arguing in the alternative, I tried large bodies of both randomized and empirical data to find a counter-example; after extensive tests, no counter-examples were found. I therefore assert it is dead code, can never be tested or executed, and should be removed."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.1"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}