jiffyclub/Decode.ipynb

## Decode.ipynb
{
 "metadata": {
  "name": "Decode"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In a recent job interview I was given the following challenge:\n",
      "\n",
      "> Assume a character encoding a = 1, b = 2, c = 3, ... , z = 26.\n",
      "> Given a string of numbers (e.g. `'123456'`), figure out how many\n",
      "> different ways that string could be decoded.\n",
      "\n",
      "I didn't arrive at a solution in the few minutes I had in the interview. The interviewer briefly described a potential solution doing a heirarchical partitioning of the string. I kept thinking about it afterwards and I couldn't shake the feeling that there was a tricky math problem in here, not just a programming problem.\n",
      "\n",
      "The difficult part of this challenge is figuring out how many decodings there are when there are overlapping possible groupings of numbers. For example, the string `'111'` can be decoded as `['1', '1', '1']`, `['11', '1']`, or `['1', '11']`. I never figured out a formula for calculating the number of decodings but by manually figuring out the number of decodings for the strings `'1'`, `'11'`, `'111'`, `'1111'`, `'11111'`, `'111111'`, etc. I noticed that the number of decodings follow the [Fibonacci sequence](http://en.wikipedia.org/wiki/Fibonacci_number) starting with 1, 2, 3, 5, 8...! Unfortunately I don't have a good explanation for why that is the case. (Though it makes gut level sense in the way the Fibonacci sequence is a sum of everything that has come before.)\n",
      "\n",
      "So, my strategy for solving this problem follows these basic steps:\n",
      "\n",
      "1. Break the given string down into substrings such that each substring is one digit, `'10'` or `'20'`, \n",
      "   or cannot be broken down further because it has an unknown number of possible decodings.\n",
      "   These substrings can be considered independently for the purposes of this problem.\n",
      "    * For example, `'956'` becomes `['9', '5', '6']` and `'121956'` becomes `['1219', '5', '6']`.\n",
      "2. For each substring figure out the number of decodings. For single digits, `'10'`, and `'20'` this is just 1.\n",
      "   For other substrings this is the $N$th value in the Fibonacci series 1, 2, 3, 5, 8... where\n",
      "   $N$ is the length of the substring.\n",
      "3. Multiply together all of the number of decodings for the individual substrings to get the number of\n",
      "   of decodings for the original string of numbers.\n",
      "\n",
      "You can see my code solving this problem below. The functions roughly corresponding to the above steps are:\n",
      "\n",
      "1. `chunk_numbers`\n",
      "2. `calc_substring_combos` + `series`\n",
      "3. `calc_combos`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from operator import mul"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def calc_combos(nums):\n",
      "    \"\"\"\n",
      "    Calculate the number of possible ways a string of numbers can be decoded\n",
      "    using the scheme 1 -> a, 2 -> b, ... , 26 -> z.\n",
      "\n",
      "    Parameters\n",
      "    ----------\n",
      "    num : str\n",
      "        String of numeric characters. It is assumed that the string\n",
      "        has at least one valid decoding.\n",
      "\n",
      "    Returns\n",
      "    -------\n",
      "    combos : int\n",
      "        The number of ways `num` can be decoded.\n",
      "\n",
      "    \"\"\"\n",
      "    return reduce(mul, (calc_substring_combos(chunk) for chunk in chunk_numbers(nums)))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def chunk_numbers(nums):\n",
      "    \"\"\"\n",
      "    Separate a string of numbers into a list of substrings.\n",
      "    Returned substrings are as small as possible (e.g. one or two characters)\n",
      "    except where numbers are grouped such that they have multiple\n",
      "    decodings.\n",
      "\n",
      "    Parameters\n",
      "    ----------\n",
      "    num : str\n",
      "        String of numeric characters. It is assumed that the string\n",
      "        has at least one valid decoding.\n",
      "\n",
      "    Returns\n",
      "    -------\n",
      "    subnums : list of str\n",
      "        List of substrings broken down as much as possible.\n",
      "\n",
      "    \"\"\"\n",
      "    subnums = ['']\n",
      "    \n",
      "    for i, n in enumerate(nums[:-1]):\n",
      "        subnums[-1] += n\n",
      "        if n == '1':\n",
      "            continue\n",
      "        elif n == '2' and nums[i + 1] in '0123456':\n",
      "            continue\n",
      "        else:\n",
      "            subnums.append('')\n",
      "            \n",
      "    subnums[-1] += nums[-1]\n",
      "\n",
      "    return subnums"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def series(n):\n",
      "    \"\"\"\n",
      "    Return the n-th member of the series 1, 2, 3, 5, 8, 13...\n",
      "    (Which is actually the Fibonacci series minus the first 1.)\n",
      "\n",
      "    \"\"\"\n",
      "    r0 = 1\n",
      "    r1 = 1\n",
      "    \n",
      "    for _ in xrange(n):\n",
      "        r0, r1 = r1, r0 + r1\n",
      "    \n",
      "    return r0"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def calc_substring_combos(subnums):\n",
      "    \"\"\"\n",
      "    For a given string of numbers, return the number\n",
      "    of possible cominations for decoding that string.\n",
      "    \n",
      "    Parameters\n",
      "    ----------\n",
      "    subnums : str\n",
      "        String of digits. Should be irreducable if passed through\n",
      "        check_numbers. Assumed to have a valid decoding.\n",
      "\n",
      "    Returns\n",
      "    -------\n",
      "    combos : int\n",
      "        The number of possible decodings for this substring.\n",
      "\n",
      "    \"\"\"\n",
      "    if len(subnums) == 1 or (len(subnums) == 2 and subnums[1] == '0'):\n",
      "        return 1\n",
      "    else:\n",
      "        return series(len(subnums))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# tests for calc_combos\n",
      "assert calc_combos('9') == 1, calc_combos('9')\n",
      "assert calc_combos('999') == 1, calc_combos('999')\n",
      "assert calc_combos('1111') == series(4), calc_combos('1111')\n",
      "assert calc_combos('11911') == 6, calc_combos('11911')\n",
      "assert calc_combos('119119') == 9, calc_combos('119119')\n",
      "assert calc_combos('119119119') == 27, calc_combos('119119119')\n",
      "assert calc_combos('12911') == 4, calc_combos('12911')\n",
      "assert calc_combos('1211') == series(4), calc_combos('1211')\n",
      "assert calc_combos('101010') == 1, calc_combos('101010')\n",
      "assert calc_combos('202020') == 1, calc_combos('202020')\n",
      "assert calc_combos('212121') == series(6), calc_combos('212121')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 6
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# tests for calc_substring_combos\n",
      "assert calc_substring_combos('1') == series(1), calc_substring_combos('1')\n",
      "assert calc_substring_combos('11') == series(2), calc_substring_combos('11')\n",
      "assert calc_substring_combos('212121') == series(6), calc_substring_combos('212121')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# tests for series\n",
      "assert series(1) == 1, series(1)\n",
      "assert series(2) == 2, series(2)\n",
      "assert series(3) == 3, series(3)\n",
      "assert series(4) == 5, series(4)\n",
      "assert series(5) == 8, series(5)\n",
      "assert series(6) == 13, series(6)\n",
      "assert series(7) == 21, series(7)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 8
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# tests for chunk_numbers\n",
      "assert chunk_numbers('1') == ['1'], chunk_numbers('1')\n",
      "assert chunk_numbers('10') == ['10'], chunk_numbers('10')\n",
      "assert chunk_numbers('567') == ['5', '6', '7'], chunk_numbers('567')\n",
      "assert chunk_numbers('1111') == ['1111'], chunk_numbers('1111')\n",
      "assert chunk_numbers('11911') == ['119', '11'], chunk_numbers('11911')\n",
      "assert chunk_numbers('12911') == ['12', '9', '11'], chunk_numbers('12911')\n",
      "assert chunk_numbers('123') == ['123'], chunk_numbers('123')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    }
   ],
   "metadata": {}
  }
 ]
}
	{
	"metadata": {
	"name": "Decode"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"In a recent job interview I was given the following challenge:\n",
	"\n",
	"> Assume a character encoding a = 1, b = 2, c = 3, ... , z = 26.\n",
	"> Given a string of numbers (e.g. `'123456'`), figure out how many\n",
	"> different ways that string could be decoded.\n",
	"\n",
	"I didn't arrive at a solution in the few minutes I had in the interview. The interviewer briefly described a potential solution doing a heirarchical partitioning of the string. I kept thinking about it afterwards and I couldn't shake the feeling that there was a tricky math problem in here, not just a programming problem.\n",
	"\n",
	"The difficult part of this challenge is figuring out how many decodings there are when there are overlapping possible groupings of numbers. For example, the string `'111'` can be decoded as `['1', '1', '1']`, `['11', '1']`, or `['1', '11']`. I never figured out a formula for calculating the number of decodings but by manually figuring out the number of decodings for the strings `'1'`, `'11'`, `'111'`, `'1111'`, `'11111'`, `'111111'`, etc. I noticed that the number of decodings follow the [Fibonacci sequence](http://en.wikipedia.org/wiki/Fibonacci_number) starting with 1, 2, 3, 5, 8...! Unfortunately I don't have a good explanation for why that is the case. (Though it makes gut level sense in the way the Fibonacci sequence is a sum of everything that has come before.)\n",
	"\n",
	"So, my strategy for solving this problem follows these basic steps:\n",
	"\n",
	"1. Break the given string down into substrings such that each substring is one digit, `'10'` or `'20'`, \n",
	" or cannot be broken down further because it has an unknown number of possible decodings.\n",
	" These substrings can be considered independently for the purposes of this problem.\n",
	" * For example, `'956'` becomes `['9', '5', '6']` and `'121956'` becomes `['1219', '5', '6']`.\n",
	"2. For each substring figure out the number of decodings. For single digits, `'10'`, and `'20'` this is just 1.\n",
	" For other substrings this is the $N$th value in the Fibonacci series 1, 2, 3, 5, 8... where\n",
	" $N$ is the length of the substring.\n",
	"3. Multiply together all of the number of decodings for the individual substrings to get the number of\n",
	" of decodings for the original string of numbers.\n",
	"\n",
	"You can see my code solving this problem below. The functions roughly corresponding to the above steps are:\n",
	"\n",
	"1. `chunk_numbers`\n",
	"2. `calc_substring_combos` + `series`\n",
	"3. `calc_combos`"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"from operator import mul"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 1
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"def calc_combos(nums):\n",
	" \"\"\"\n",
	" Calculate the number of possible ways a string of numbers can be decoded\n",
	" using the scheme 1 -> a, 2 -> b, ... , 26 -> z.\n",
	"\n",
	" Parameters\n",
	" ----------\n",
	" num : str\n",
	" String of numeric characters. It is assumed that the string\n",
	" has at least one valid decoding.\n",
	"\n",
	" Returns\n",
	" -------\n",
	" combos : int\n",
	" The number of ways `num` can be decoded.\n",
	"\n",
	" \"\"\"\n",
	" return reduce(mul, (calc_substring_combos(chunk) for chunk in chunk_numbers(nums)))"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 2
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"def chunk_numbers(nums):\n",
	" \"\"\"\n",
	" Separate a string of numbers into a list of substrings.\n",
	" Returned substrings are as small as possible (e.g. one or two characters)\n",
	" except where numbers are grouped such that they have multiple\n",
	" decodings.\n",
	"\n",
	" Parameters\n",
	" ----------\n",
	" num : str\n",
	" String of numeric characters. It is assumed that the string\n",
	" has at least one valid decoding.\n",
	"\n",
	" Returns\n",
	" -------\n",
	" subnums : list of str\n",
	" List of substrings broken down as much as possible.\n",
	"\n",
	" \"\"\"\n",
	" subnums = ['']\n",
	" \n",
	" for i, n in enumerate(nums[:-1]):\n",
	" subnums[-1] += n\n",
	" if n == '1':\n",
	" continue\n",
	" elif n == '2' and nums[i + 1] in '0123456':\n",
	" continue\n",
	" else:\n",
	" subnums.append('')\n",
	" \n",
	" subnums[-1] += nums[-1]\n",
	"\n",
	" return subnums"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 3
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"def series(n):\n",
	" \"\"\"\n",
	" Return the n-th member of the series 1, 2, 3, 5, 8, 13...\n",
	" (Which is actually the Fibonacci series minus the first 1.)\n",
	"\n",
	" \"\"\"\n",
	" r0 = 1\n",
	" r1 = 1\n",
	" \n",
	" for _ in xrange(n):\n",
	" r0, r1 = r1, r0 + r1\n",
	" \n",
	" return r0"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 4
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"def calc_substring_combos(subnums):\n",
	" \"\"\"\n",
	" For a given string of numbers, return the number\n",
	" of possible cominations for decoding that string.\n",
	" \n",
	" Parameters\n",
	" ----------\n",
	" subnums : str\n",
	" String of digits. Should be irreducable if passed through\n",
	" check_numbers. Assumed to have a valid decoding.\n",
	"\n",
	" Returns\n",
	" -------\n",
	" combos : int\n",
	" The number of possible decodings for this substring.\n",
	"\n",
	" \"\"\"\n",
	" if len(subnums) == 1 or (len(subnums) == 2 and subnums[1] == '0'):\n",
	" return 1\n",
	" else:\n",
	" return series(len(subnums))"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 5
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"# tests for calc_combos\n",
	"assert calc_combos('9') == 1, calc_combos('9')\n",
	"assert calc_combos('999') == 1, calc_combos('999')\n",
	"assert calc_combos('1111') == series(4), calc_combos('1111')\n",
	"assert calc_combos('11911') == 6, calc_combos('11911')\n",
	"assert calc_combos('119119') == 9, calc_combos('119119')\n",
	"assert calc_combos('119119119') == 27, calc_combos('119119119')\n",
	"assert calc_combos('12911') == 4, calc_combos('12911')\n",
	"assert calc_combos('1211') == series(4), calc_combos('1211')\n",
	"assert calc_combos('101010') == 1, calc_combos('101010')\n",
	"assert calc_combos('202020') == 1, calc_combos('202020')\n",
	"assert calc_combos('212121') == series(6), calc_combos('212121')"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 6
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"# tests for calc_substring_combos\n",
	"assert calc_substring_combos('1') == series(1), calc_substring_combos('1')\n",
	"assert calc_substring_combos('11') == series(2), calc_substring_combos('11')\n",
	"assert calc_substring_combos('212121') == series(6), calc_substring_combos('212121')"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 7
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"# tests for series\n",
	"assert series(1) == 1, series(1)\n",
	"assert series(2) == 2, series(2)\n",
	"assert series(3) == 3, series(3)\n",
	"assert series(4) == 5, series(4)\n",
	"assert series(5) == 8, series(5)\n",
	"assert series(6) == 13, series(6)\n",
	"assert series(7) == 21, series(7)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 8
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"# tests for chunk_numbers\n",
	"assert chunk_numbers('1') == ['1'], chunk_numbers('1')\n",
	"assert chunk_numbers('10') == ['10'], chunk_numbers('10')\n",
	"assert chunk_numbers('567') == ['5', '6', '7'], chunk_numbers('567')\n",
	"assert chunk_numbers('1111') == ['1111'], chunk_numbers('1111')\n",
	"assert chunk_numbers('11911') == ['119', '11'], chunk_numbers('11911')\n",
	"assert chunk_numbers('12911') == ['12', '9', '11'], chunk_numbers('12911')\n",
	"assert chunk_numbers('123') == ['123'], chunk_numbers('123')"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 9
	}
	],
	"metadata": {}
	}
	]
	}