Skip to content

Instantly share code, notes, and snippets.

@natec425
Last active March 7, 2017 17:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save natec425/10013230 to your computer and use it in GitHub Desktop.
Save natec425/10013230 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "",
"signature": "sha256:71de6f77b06d248aca60a5b61bbff3f9715f168edb8f07fc124ce7fd39612271"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dynamic Programming with the Typesetting Problem\n",
"--------------------------------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this assignment, you will write a program that breaks a paragraph of words into lines of text. It is called the type setting problem. An effective heuristic that makes the paragraph look attractive is to minimize the sum of the squares of \"slack\" for each line except the last line.\n",
"Your assignment is to use dynamic programming to do this."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To test my solutions as I go, I will be using the known problem fitting the words 'aa', 'bb', 'cc', 'dd', 'eeeeeeee', and a max line length of 8."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"words = ['aa', 'bb', 'cc', 'dd', 'eeeeeeee']\n",
"max_length = 8"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, We'll create a tool to calculate the length of a line that contains <b>words</b><sub><em>i</em></sub> through <b>words</b><sub><em>j</em></sub> where words indexed from 1 to N.\n",
"\n",
"\\begin{equation*} length(i,j) = (j - i) + \\sum\\limits_{k=i}^j len(words_k) \\end{equation*}"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def line_length(words, i, j):\n",
" # insert your code here\n",
"\n",
"print(line_length(words, 1, 1), 'should be 2')\n",
"print(line_length(words, 1, 2), 'should be 5')\n",
"print(line_length(words, 2, 4), 'should be 8')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have a simple, expressive way to calculate the line length of <b>words</b><sub><em>i</em></sub> to <b>words</b><sub><em>j</em></sub>. We can use that tool to calculate the trailing spaces produced by <b>words</b><sub><em>i</em></sub> to <b>words</b><sub><em>j</em></sub> when given a maximum line length.\n",
"\\begin{equation*}slack(i, j) = maxLength - length(i, j)\\end{equation*}"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def slack(words, max_length, i, j):\n",
" # insert your code here\n",
"\n",
"print(slack(words, max_length, 1, 1), 'should be 6')\n",
"print(slack(words, max_length, 1, 3), 'should be 0')\n",
"print(slack(words, max_length, 2, 4), 'should be 0')\n",
"print(slack(words, max_length, 5, 5), 'should be 0')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With our newly created tools, calculating the cost, according to our heuristic, of putting <b>words</b><sub><em>i</em></sub> to <b>words</b><sub><em>j</em></sub> on a single line of a given maximum length is simple.\n",
"\n",
"Our heuristic states that the cost of putting <b>words</b><sub><em>i</em></sub> to <b>words</b><sub><em>j</em></sub> on a single line of a given maximum length is as follows:\n",
"\n",
"- <em>if the length of <b>words</b><sub>i</sub> to <b>words</b><sub>j</sub> is greater than the maximum line length:\n",
" the cost is infinitely high (because it is an invalid option)</em>\n",
" \n",
"- <em>or, if the last word is contained in the line: the cost is zero (because it is the last line)</em>\n",
" \n",
"- <em>otherwise: the cost is the square of the slack of putting <b>words</b><sub><em>i</em></sub> to <b>words</b><sub><em>j</em></sub> on a single line of the provided maximum length</em>\n",
"\n",
"\\begin{equation*} cost(i, j) = \\begin{cases} \\infty & : length(i,j) > maxLength \\\\\n",
" 0 & : j = N \\\\\n",
" slack(i,j)^2 & : otherwise \\end{cases} \\end{equation*}"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def cost(words, max_length, i, j):\n",
" # insert your code here\n",
"\n",
"print(cost(words, max_length, 1, 1), 'should be 36')\n",
"print(cost(words, max_length, 2, 3), 'should be 9')\n",
"print(cost(words, max_length, 5, 5), 'should be 0')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Now we have the tools to begin creating our dynamic programming tables and reconstructing our solution from the table."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It can be tricky to see how to calculate the optimal cost of <em><b>j</b></em> words. So, let's look at a piece of our known case.\n",
"\n",
"The optimal cost of ['aa', 'bb', 'cc', 'dd'] can be thought of the minimum of the following options:\n",
"\n",
"- (starting from 4) the optimal cost of ['aa', 'bb', 'cc'] + ['dd']\n",
"- (starting from 3) the optimal cost of ['aa',' bb'] + the cost of ['cc', 'dd']\n",
"- (starting from 2) the optimal cost of ['aa'] + the cost of ['bb', 'cc', 'dd']\n",
"- (starting from 1)the cost of ['aa', 'bb', 'cc', 'dd']\n",
"\n",
"Now notice what we have done. We have set out our options as the cost of representing <b>words</b><sub><em>1..(i-1)</em></sub> + the cost of putting <b>words</b><sub><em>i..j</em></sub> on the final line. And this is exactly what we want. We just want the minimum of these options. So to put numbers to this example:\n",
"\n",
"- (starting from 4) the optimal cost of ['aa', 'bb', 'cc'] + the cost of ['dd'] = 0 + 36 = 36\n",
"- (starting from 3) the optimal cost of ['aa', 'bb'] + the cost of ['cc', 'dd'] = 9 + 9 = 18\n",
"- (starting from 2) the optimal cost of ['aa'] + the cost of ['bb', 'cc', 'dd'] = 36 + 0 = 36\n",
"- (starting from 1) the cost of ['aa', 'bb', 'cc', 'dd'] = infinity\n",
"\n",
"So in our known case, the optimal cost of ['aa', 'bb', 'cc', 'dd'] is 18 and the subproblem that produced is 3 (<b>words</b><sub><em>3..4</em></sub> on the last line). This is exactly what we wanted, and this logic extends to the other cases.\n",
"\n",
"To be as concise as possible (formula from original assignment document):\n",
" \n",
" \\begin{equation*}c(j) = \\min_{1 \\le i \\le j} \\{c(i-1) + c(i, j)\\}\\end{equation*}\n",
"\n",
"*Implementation Note: This function should return both the optimal cost, <b>and</b> the number of the subproblem that produced the optimal cost.*"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def word_wrap_cost(table, words, max_length, j):\n",
" # insert your code here\n",
"\n",
"print(word_wrap_cost([0], words, max_length, 1), 'should be (1, 36)')\n",
"print(word_wrap_cost([0, 36], words, max_length, 2), 'should be (1, 9)')\n",
"print(word_wrap_cost([0, 36, 9], words, max_length, 3), 'should be (1, 0)')\n",
"print(word_wrap_cost([0, 36, 9, 0], words, max_length, 4), 'should be (3, 18)')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It may not be trivial to see, but we have 2 tables to build. We want to optimize our costs, so we need a table for the costs. However, we also want to know where we should insert a line break. So, as we are recording the optimal cost for a given subproblem, we record the optimal cost <em><b>and</b></em> the subproblem that produced the optimal cost. Recording the optimal costs is where we really leverage dynamic programming, but it is the means of finding the optimal line breaks. Once we have the optimal line breaks (our desired information), the costs table has served its purpose, and we return the optimal line breaks."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def word_wrap_breaks(words, max_length):\n",
" # insert your code here\n",
"\n",
"print(word_wrap_breaks(words, max_length), 'should be [1, 1, 1, 3, 5]')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have our table of optimal breaks. We have to reconstruct <em><b>the breaks that produced the final solution</b></em>. We can do that by working backwards through our table. Let's look at our known case again. \n",
"\n",
"The optimal breaks for ['aa', 'bb', 'cc', 'dd', 'eeeeeeee'] are [1, 1, 1, 3, 5]. This means that the optimal breaks for each subset are as follows:\n",
"\n",
"- 1: ['aa'] = 1 (<b>words</b><sub><em>1</em></sub> to <b>words</b><sub><em>1</em></sub> on the last line)\n",
"- 2: ['aa', 'bb'] = 1 (<b>words</b><sub><em>1</em></sub> to <b>words</b><sub><em>2</em></sub> on the last line)\n",
"- 3: ['aa', 'bb', 'cc'] = 1 (<b>words</b><sub><em>1</em></sub> to <b>words</b><sub><em>3</em></sub> on the last line)\n",
"- 4: ['aa', 'bb', 'cc', 'dd' ] = 3 (<b>words</b><sub><em>3</em></sub> to <b>words</b><sub><em>4</em></sub> on the last line)\n",
"- 5: ['aa', 'bb', 'cc', 'dd', 'eeeeeeee'] = 5 (<b>words</b><sub><em>5</em></sub> to <b>words</b><sub><em>5</em></sub> on the last line)\n",
"\n",
"\n",
"To reconstruct the breaks, we move from <b>breaks</b>[x] to <b>breaks</b>[breaks[x]]. This produces the flow from 5 >> 3 >> 1. This means that our expected output should be the list [1, 3, 5]. At this point, most of our work is done. We now know:\n",
"\n",
"- the first line starts at <b>words</b><sub><em>1</em></sub>\n",
"- the second line starts at <b>words</b><sub><em>3</em></sub>\n",
"- the final line starts at <b>words</b><sub><em>5</em></sub>\n",
"\n",
"This could also be thought of as each line being\n",
"\n",
"- <b>words</b><sub><em>1..(3-1)</em></sub> or ['aa', 'bb']\n",
"- <b>words</b><sub><em>3..(5-1)</em></sub> or ['cc', 'dd']\n",
"- <b>words</b><sub><em>5..end</em></sub> or ['eeeeeeee']\n",
"\n",
"This illustrates how each break serves as a sort of bookend for line."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def solve_word_wrap_table(breaks):\n",
" # insert your code here\n",
"\n",
"print(solve_word_wrap_table(word_wrap_breaks(words, max_length)), 'should be [1, 3, 5]')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally\n",
"-------\n",
"Our problem is almost solved. Now, all we have to do is string together <b>words</b><sub><em>breaks[x]..breaks[x+1]</em></sub> with some spaces for every x where 1 <= x <= (n<sub><em>breaks</em></sub> - 1). With all of our lines in tact, we simply put them each on a line in their respective order."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def typeset(words, max_length):\n",
" # insert your code here\n",
"\n",
"print(typeset(words, max_length))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print(typeset(open('DeclarationOfIndependence(part)(1).txt').read().split(), 40))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print(typeset(open(\"MSUHonorCode.txt\").read().split(), 40))"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment