tungwaiyip/turnpike.ipynb

## turnpike.ipynb
{
 "metadata": {
  "language": "Julia",
  "name": "turnpike"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "import Base.Collections: heapify!, heappush!, heappop!\nusing DataStructures\ninclude(\"gray_calhoun.jl\")\ninclude(\"io_util.jl\"); import IOUtil\n#include(\"bio_util.jl\"); import BioUtil\n#include(\"graph_util.jl\"); import GraphUtil",
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 149
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "## The Turnpike Problem (2i)\n\nConsider\n\nSuppose there is $N$ increasing integers $ A_1 < A_2 , ... < A_n$, we are given the difference of between every pair of points. The Turnpike reconstruction problem is to reconstruct the point set $A$ from the distances of the form $|A_i \u2212 A_j|$.\n\nConsider this example\n\n A = [0 2 4 7 10] \n \n\u0394A = [-10 -8 -7 -6 -5 -4 -3 -3 -2 -2 0 0 0 0 0 2 2 3 3 4 5 6 7 8 10]\n\nWe can oragnize the values of $\\Delta A$ into a matrix $D$, where\n\n  $D_{i,j} = A_j - A_i \\hspace{3em}\\text{for }1 \\le i, j \\le N$ \n\n     1  0 2 4 7 10\n     2    0 2 5  8\n     3      0 3  6\n     4        0  3\n     5           0\n     \n        1 2 3 4  5\n        \n        \nNote that $A + c$ for any $c$ has the same $\u0394A$. So we choose $A_1 = 0$ to constraint the result. As such the top row $D_{1,j}$ is exactly equal to $A$. The matrix diagonal $D_{i,i}$ are all 0. Instead we refer to row immediate above, $D_{i-1,i}$, as the diagonal of $D$ in the context of this algorithm. Notice the diagonal $D_{i-1,i}$ are the first order difference of $A$. It has all the information needed to reconstruct $A$.\n\n*(clarify all discussion below concern only the elements in the upper right triangle above or on the diagonal)*\n\nFrom the definition of D, we have the identity\n\n  $D_{i,j} = D_{i,k} + D_{k,j}      \\hspace{2em} \\text{for } i \\le k \\le j$\n  \nFrom this we can construct the chain rule\n\n  $D_{i,j} = \\sum_{k=i}^{j-1} {D_{k,k+1}}$\n\nwhich can be see geometically that for any element, form a triangle with this element at the upper right hand corner, then its value is equal to the sum of the diagonal elements in the triangle. Also this value is the maximum value for all elements in the triangle, i.e.\n\n  $D_{r,s} \\le D_{i,j}      \\hspace{2em} \\text{for } r \\ge i \\text{ and } s \\le j$\n\n\n### The Algorithm\n\nWe know that the top right corner $D_{1,N}$ has the maximum value of $\u0394A$. Starting from the anchor at the top right corner, it attempts to search for value to place on the top row toward the left, and the right column toward the bottom while checking for consistency to the construction of $D$. Once the top row and right column are filled, we can recover the whole $D$ and thus $A$. (Reminder, the top row equals to $A$).\n\nAt each step, we have assign a value either to the top row $D_{1, n..N}$ or the right column $D_{N,1..m}$. From the known value of the top row and right column, we will show than we can also derive the values in the rectangle $D_{1..m, n..N}$ (up to the diagonal) and the subset of the diagonal $D_{i-1,i}$ where $1 \\le i \\lt m$ or $n \\le i \\lt N$. \n\nThis condition holds at the first step where only the top right corner $D_{1,N}$ is known (i.e. $m=1, n=N$). Assume this hold for $m, n$, we will show that it also holds for $m+1, n$ and $m, n-1$.\n\nIn the first case, we place a value below the right column at $D_{m+1, N}$. From this it can derive another value at the diagonal\n\n$ D_{m, m+1} = D_{m, N} - D_{m+1, N} $\n\nWe can also filled another row below the rectangle.\n\n$ D_{m+1, k} = D_{m, k} - D_{m, m+1} \\hspace{2em}\\text{where } n \\le k \\lt N \\text{ and } k > m+1 $\n\nThus the condition continues to hold for $m+1, n$.\n\nIn the second case, we place a value to the left of the top row at $D_{1, n-1}$. From this it can derive another value at the diagonal\n\n$ D_{n-1, n} = D_{1, n} - D_{1, n-1} $\n\nWe can also filled another column to the left of the rectangle.\n\n$ D_{k, n-1} = D_{k, n} - D_{n-1, n} \\hspace{2em}\\text{where } 1 \\lt k \\le m \\text{ and } k < n-1 $\n\nThus the condition continues to hold for $m, n-1$. By induction this holds for all $m, n$ *(in the valid range)*.\n\nThis also gives us an algorithm to construct $D$. Starting from $D_{1,N}$, at each step, we find the maximum number remain in $\u0394A$. It is easy to show the maximum number can only be placed at the next location on the top row or the next location in the right column. It cannot be at any other place in the matrix. We create a search tree to attempt to put the maximum value at each place. For each derived values, we remove the corresponding value in $\u0394A$. If the value is not avilable in $\u0394A$, then the attempt fails. It backtracks to try the next possiblity. This algorithm ends when the diagonal is filled after $N$ *(or is it N-1, N-2?)* successful attempts. The maximum number of trials is $2^N$. But the search tree is expected to be pruned greatly."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "INPUT = \"\"\"\\\n-10 -8 -7 -6 -5 -4 -3 -3 -2 -2 0 0 0 0 0 2 2 3 3 4 5 6 7 8 10\n\"\"\"\nfp = IOBuffer(INPUT)\nfp = open(\"rosalind_2i.txt\");\n\n\u0394A = map(parseint, split(readline(fp)))\nclose(fp)\n\n#\u0394A = sort([18,14,16,12,12,14,7,10,10,11,4,5,8,7,6,2,2,3,5,2,4])\n#\u0394A = [\u0394A, -\u0394A, [0, 0, 0, 0, 0, 0, 0]]\n\nprintln(\"\u0394A=$(size(\u0394A))\")\n\n#------------------------------------------------------------------------\n\nN = int(sqrt(length(\u0394A)))\n@assert N^2 == length(\u0394A)\n@assert N > 2\n@assert sum(\u0394A) == 0\n@assert count(x -> x== 0, \u0394A) == N\n\npositive_\u0394A = filter(x -> x > 0, \u0394A)\nd_max = maximum(positive_\u0394A)\n\n@assert d_max < 99999",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\u0394A=(17956,)"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 181
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "function turnpike(positive_\u0394A, N)\n    \n    ordered_values = sort(unique(positive_\u0394A), rev=true)\n    \n    # return max unused value in \u0394A, move ptr to next \n    function find_next_unused_value(ptr)\n        # linear search for next unused, probably not most efficient\n        for i in ptr:length(ordered_values)\n            if counts[ordered_values[i]] > 0\n                return i\n            end\n        end\n        throw(ErrorException(\"no more usused value\"))\n    end        \n        \n    function try_assign(i, j, value, stack)    \n        if value <= 0 || counts[value] <= 0\n#            println(\"Fail to find $value -> $i,$j\")\n            return false\n        end\n        D[i, j] = value\n        counts[value] -= 1\n        push!(stack, (i, j, value))\n        return true\n    end        \n        \n    function turnpike_next(m, n, unused_ptr, place_right)\n        if m == n\n            # success, all diagonal values are found\n            return true\n        end\n\n        stack = (Int, Int, Int)[]        \n        max_value = ordered_values[unused_ptr]\n#=\n        println(\"m=$(m) n=$(n)\\n$D\") # $D\n        @printf(\"\\nPlace %s max_value=%d -> (%d, %d)\\n\", \n            place_right ? \"right\" : \"top\",\n            max_value, \n            place_right ? m+1 : 1,\n            place_right ? N : n-1\n        )\n=#\n        \n        if place_right\n\n            # place left\n            if m+2 >= N                \n                return false    # don't place on or below diagonal\n            end        \n            try_assign(m+1, N, max_value, stack)\n\n            # assign diagonal value\n            d_m = D[m,N] - D[m+1, N]\n#            println(d_m <= 0 ? \n#                \"diagonal value = $(d_m)?\" : \n#                \"diagonal value = $(d_m)$(counts[d_m]>0 ? '!' : '?') -> $(m),$(m+1)\")            \n\n            if try_assign(m, m+1, d_m, stack)\n                \n                # assign bottom row, empty list mean true\n                from_idx = max(n, m+2)\n#                println(\"bottom row $(m+1),$(from_idx:N-1)\")\n                if @all [try_assign(m+1, k, D[m,k] - d_m, stack) for k in from_idx:N-1]\n                \n                    # next recusion step\n                    unused_ptr = find_next_unused_value(unused_ptr)\n                    if turnpike_next(m+1, n, unused_ptr, false) ||  turnpike_next(m+1, n, unused_ptr, true)\n                        return true\n                    end                        \n                end\n            end            \n        else\n\n            # place top\n            if n-1 <= 2                \n                return false    # don't place on or below diagonal\n            end        \n            try_assign(1, n-1, max_value, stack)\n\n            # assign diagonal value\n            d_n = D[1,n] - D[1,n-1]\n#            println(d_n <= 0 ?\n#                \"diagonal value = $(d_n)?\" : \n#                \"diagonal value = $(d_n)$(counts[d_n]>0 ? '!' : '?') -> $(n-1),$(n)\")            \n            if try_assign(n-1, n, d_n, stack)\n\n                # assign left column, empty list mean true\n                to_idx = min(m, n-3)\n#                println(\"left column $(2:to_idx),$(n-1)\")\n                if @all [try_assign(k, n-1, D[k,n] - d_n, stack) for k in 2:to_idx]\n                    \n                    # next recusion step\n                    unused_ptr = find_next_unused_value(unused_ptr)\n                    if turnpike_next(m, n-1, unused_ptr, true) ||  turnpike_next(m, n-1, unused_ptr, false)\n                        return true\n                    end\n                end\n            end                            \n        end\n\n        # failed, backtrack\n        for (i,j,v) in stack\n            D[i,j] = 0\n            counts[v] += 1\n        end            \n        return false\n    end\n                        \n    #counts = Dict(counter(positive_\u0394A))\n    # use sparse array for counts rather than dictionary\n    d_max = maximum(positive_\u0394A)\n    @assert d_max < 99999\n    counts = zeros(Int, d_max)\n    for x in positive_\u0394A\n        counts[x] += 1\n    end\n    \n    D = zeros(Int, (N,N))\n    # assign d_max to top right corner\n    d_max = ordered_values[1]\n    D[1,N] = d_max\n    counts[d_max] -= 1\n    unused_ptr = 2\n    \n    result = turnpike_next(1, N, unused_ptr, true) || turnpike_next(1, N, unused_ptr, false)\n    return result, D, counts\nend\n\nfunction build_A(D, N)\n    [0, [cumsum([D[i,i+1] for i in 1:N-1])]]\nend\n",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 184,
       "text": "build_A (generic function with 1 method)"
      }
     ],
     "prompt_number": 184
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "#result, D, counts = turnpike(positive_\u0394A, N)\nresult, D, counts = turnpike(positive_\u0394A, N)\nif result\n    A = build_A(D, N)\n    IOUtil.write_vector(\"out\", A)\nelse\n    println(\"failed\")\n    D\nend    \n",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "Written 134 items to out"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "\n"
      }
     ],
     "prompt_number": 185
    }
   ],
   "metadata": {}
  }
 ]
}
	{
	"metadata": {
	"language": "Julia",
	"name": "turnpike"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "import Base.Collections: heapify!, heappush!, heappop!\nusing DataStructures\ninclude(\"gray_calhoun.jl\")\ninclude(\"io_util.jl\"); import IOUtil\n#include(\"bio_util.jl\"); import BioUtil\n#include(\"graph_util.jl\"); import GraphUtil",
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 149
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "## The Turnpike Problem (2i)\n\nConsider\n\nSuppose there is $N$ increasing integers $ A_1 < A_2 , ... < A_n$, we are given the difference of between every pair of points. The Turnpike reconstruction problem is to reconstruct the point set $A$ from the distances of the form $\|A_i \u2212 A_j\|$.\n\nConsider this example\n\n A = [0 2 4 7 10] \n \n\u0394A = [-10 -8 -7 -6 -5 -4 -3 -3 -2 -2 0 0 0 0 0 2 2 3 3 4 5 6 7 8 10]\n\nWe can oragnize the values of $\\Delta A$ into a matrix $D$, where\n\n $D_{i,j} = A_j - A_i \\hspace{3em}\\text{for }1 \\le i, j \\le N$ \n\n 1 0 2 4 7 10\n 2 0 2 5 8\n 3 0 3 6\n 4 0 3\n 5 0\n \n 1 2 3 4 5\n \n \nNote that $A + c$ for any $c$ has the same $\u0394A$. So we choose $A_1 = 0$ to constraint the result. As such the top row $D_{1,j}$ is exactly equal to $A$. The matrix diagonal $D_{i,i}$ are all 0. Instead we refer to row immediate above, $D_{i-1,i}$, as the diagonal of $D$ in the context of this algorithm. Notice the diagonal $D_{i-1,i}$ are the first order difference of $A$. It has all the information needed to reconstruct $A$.\n\n(clarify all discussion below concern only the elements in the upper right triangle above or on the diagonal)\n\nFrom the definition of D, we have the identity\n\n $D_{i,j} = D_{i,k} + D_{k,j} \\hspace{2em} \\text{for } i \\le k \\le j$\n \nFrom this we can construct the chain rule\n\n $D_{i,j} = \\sum_{k=i}^{j-1} {D_{k,k+1}}$\n\nwhich can be see geometically that for any element, form a triangle with this element at the upper right hand corner, then its value is equal to the sum of the diagonal elements in the triangle. Also this value is the maximum value for all elements in the triangle, i.e.\n\n $D_{r,s} \\le D_{i,j} \\hspace{2em} \\text{for } r \\ge i \\text{ and } s \\le j$\n\n\n### The Algorithm\n\nWe know that the top right corner $D_{1,N}$ has the maximum value of $\u0394A$. Starting from the anchor at the top right corner, it attempts to search for value to place on the top row toward the left, and the right column toward the bottom while checking for consistency to the construction of $D$. Once the top row and right column are filled, we can recover the whole $D$ and thus $A$. (Reminder, the top row equals to $A$).\n\nAt each step, we have assign a value either to the top row $D_{1, n..N}$ or the right column $D_{N,1..m}$. From the known value of the top row and right column, we will show than we can also derive the values in the rectangle $D_{1..m, n..N}$ (up to the diagonal) and the subset of the diagonal $D_{i-1,i}$ where $1 \\le i \\lt m$ or $n \\le i \\lt N$. \n\nThis condition holds at the first step where only the top right corner $D_{1,N}$ is known (i.e. $m=1, n=N$). Assume this hold for $m, n$, we will show that it also holds for $m+1, n$ and $m, n-1$.\n\nIn the first case, we place a value below the right column at $D_{m+1, N}$. From this it can derive another value at the diagonal\n\n$ D_{m, m+1} = D_{m, N} - D_{m+1, N} $\n\nWe can also filled another row below the rectangle.\n\n$ D_{m+1, k} = D_{m, k} - D_{m, m+1} \\hspace{2em}\\text{where } n \\le k \\lt N \\text{ and } k > m+1 $\n\nThus the condition continues to hold for $m+1, n$.\n\nIn the second case, we place a value to the left of the top row at $D_{1, n-1}$. From this it can derive another value at the diagonal\n\n$ D_{n-1, n} = D_{1, n} - D_{1, n-1} $\n\nWe can also filled another column to the left of the rectangle.\n\n$ D_{k, n-1} = D_{k, n} - D_{n-1, n} \\hspace{2em}\\text{where } 1 \\lt k \\le m \\text{ and } k < n-1 $\n\nThus the condition continues to hold for $m, n-1$. By induction this holds for all $m, n$ (in the valid range).\n\nThis also gives us an algorithm to construct $D$. Starting from $D_{1,N}$, at each step, we find the maximum number remain in $\u0394A$. It is easy to show the maximum number can only be placed at the next location on the top row or the next location in the right column. It cannot be at any other place in the matrix. We create a search tree to attempt to put the maximum value at each place. For each derived values, we remove the corresponding value in $\u0394A$. If the value is not avilable in $\u0394A$, then the attempt fails. It backtracks to try the next possiblity. This algorithm ends when the diagonal is filled after $N$ (or is it N-1, N-2?) successful attempts. The maximum number of trials is $2^N$. But the search tree is expected to be pruned greatly."
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "INPUT = \"\"\"\\\n-10 -8 -7 -6 -5 -4 -3 -3 -2 -2 0 0 0 0 0 2 2 3 3 4 5 6 7 8 10\n\"\"\"\nfp = IOBuffer(INPUT)\nfp = open(\"rosalind_2i.txt\");\n\n\u0394A = map(parseint, split(readline(fp)))\nclose(fp)\n\n#\u0394A = sort([18,14,16,12,12,14,7,10,10,11,4,5,8,7,6,2,2,3,5,2,4])\n#\u0394A = [\u0394A, -\u0394A, [0, 0, 0, 0, 0, 0, 0]]\n\nprintln(\"\u0394A=$(size(\u0394A))\")\n\n#------------------------------------------------------------------------\n\nN = int(sqrt(length(\u0394A)))\n@assert N^2 == length(\u0394A)\n@assert N > 2\n@assert sum(\u0394A) == 0\n@assert count(x -> x== 0, \u0394A) == N\n\npositive_\u0394A = filter(x -> x > 0, \u0394A)\nd_max = maximum(positive_\u0394A)\n\n@assert d_max < 99999",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "\u0394A=(17956,)"
	},
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "\n"
	}
	],
	"prompt_number": 181
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "function turnpike(positive_\u0394A, N)\n \n ordered_values = sort(unique(positive_\u0394A), rev=true)\n \n # return max unused value in \u0394A, move ptr to next \n function find_next_unused_value(ptr)\n # linear search for next unused, probably not most efficient\n for i in ptr:length(ordered_values)\n if counts[ordered_values[i]] > 0\n return i\n end\n end\n throw(ErrorException(\"no more usused value\"))\n end \n \n function try_assign(i, j, value, stack) \n if value <= 0 \|\| counts[value] <= 0\n# println(\"Fail to find $value -> $i,$j\")\n return false\n end\n D[i, j] = value\n counts[value] -= 1\n push!(stack, (i, j, value))\n return true\n end \n \n function turnpike_next(m, n, unused_ptr, place_right)\n if m == n\n # success, all diagonal values are found\n return true\n end\n\n stack = (Int, Int, Int)[] \n max_value = ordered_values[unused_ptr]\n#=\n println(\"m=$(m) n=$(n)\\n$D\") # $D\n @printf(\"\\nPlace %s max_value=%d -> (%d, %d)\\n\", \n place_right ? \"right\" : \"top\",\n max_value, \n place_right ? m+1 : 1,\n place_right ? N : n-1\n )\n=#\n \n if place_right\n\n # place left\n if m+2 >= N \n return false # don't place on or below diagonal\n end \n try_assign(m+1, N, max_value, stack)\n\n # assign diagonal value\n d_m = D[m,N] - D[m+1, N]\n# println(d_m <= 0 ? \n# \"diagonal value = $(d_m)?\" : \n# \"diagonal value = $(d_m)$(counts[d_m]>0 ? '!' : '?') -> $(m),$(m+1)\") \n\n if try_assign(m, m+1, d_m, stack)\n \n # assign bottom row, empty list mean true\n from_idx = max(n, m+2)\n# println(\"bottom row $(m+1),$(from_idx:N-1)\")\n if @all [try_assign(m+1, k, D[m,k] - d_m, stack) for k in from_idx:N-1]\n \n # next recusion step\n unused_ptr = find_next_unused_value(unused_ptr)\n if turnpike_next(m+1, n, unused_ptr, false) \|\| turnpike_next(m+1, n, unused_ptr, true)\n return true\n end \n end\n end \n else\n\n # place top\n if n-1 <= 2 \n return false # don't place on or below diagonal\n end \n try_assign(1, n-1, max_value, stack)\n\n # assign diagonal value\n d_n = D[1,n] - D[1,n-1]\n# println(d_n <= 0 ?\n# \"diagonal value = $(d_n)?\" : \n# \"diagonal value = $(d_n)$(counts[d_n]>0 ? '!' : '?') -> $(n-1),$(n)\") \n if try_assign(n-1, n, d_n, stack)\n\n # assign left column, empty list mean true\n to_idx = min(m, n-3)\n# println(\"left column $(2:to_idx),$(n-1)\")\n if @all [try_assign(k, n-1, D[k,n] - d_n, stack) for k in 2:to_idx]\n \n # next recusion step\n unused_ptr = find_next_unused_value(unused_ptr)\n if turnpike_next(m, n-1, unused_ptr, true) \|\| turnpike_next(m, n-1, unused_ptr, false)\n return true\n end\n end\n end \n end\n\n # failed, backtrack\n for (i,j,v) in stack\n D[i,j] = 0\n counts[v] += 1\n end \n return false\n end\n \n #counts = Dict(counter(positive_\u0394A))\n # use sparse array for counts rather than dictionary\n d_max = maximum(positive_\u0394A)\n @assert d_max < 99999\n counts = zeros(Int, d_max)\n for x in positive_\u0394A\n counts[x] += 1\n end\n \n D = zeros(Int, (N,N))\n # assign d_max to top right corner\n d_max = ordered_values[1]\n D[1,N] = d_max\n counts[d_max] -= 1\n unused_ptr = 2\n \n result = turnpike_next(1, N, unused_ptr, true) \|\| turnpike_next(1, N, unused_ptr, false)\n return result, D, counts\nend\n\nfunction build_A(D, N)\n [0, [cumsum([D[i,i+1] for i in 1:N-1])]]\nend\n",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 184,
	"text": "build_A (generic function with 1 method)"
	}
	],
	"prompt_number": 184
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "#result, D, counts = turnpike(positive_\u0394A, N)\nresult, D, counts = turnpike(positive_\u0394A, N)\nif result\n A = build_A(D, N)\n IOUtil.write_vector(\"out\", A)\nelse\n println(\"failed\")\n D\nend \n",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "Written 134 items to out"
	},
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "\n"
	}
	],
	"prompt_number": 185
	}
	],
	"metadata": {}
	}
	]
	}