Skip to content

Instantly share code, notes, and snippets.

@jiffyclub
Created October 4, 2012 03:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jiffyclub/3831343 to your computer and use it in GitHub Desktop.
Save jiffyclub/3831343 to your computer and use it in GitHub Desktop.
Counting the number of ways to decode a string of numbers such as '123456' with a standard a = 1, b = 2, c = 3 encoding.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "Decode"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In a recent job interview I was given the following challenge:\n",
"\n",
"> Assume a character encoding a = 1, b = 2, c = 3, ... , z = 26.\n",
"> Given a string of numbers (e.g. `'123456'`), figure out how many\n",
"> different ways that string could be decoded.\n",
"\n",
"I didn't arrive at a solution in the few minutes I had in the interview. The interviewer briefly described a potential solution doing a heirarchical partitioning of the string. I kept thinking about it afterwards and I couldn't shake the feeling that there was a tricky math problem in here, not just a programming problem.\n",
"\n",
"The difficult part of this challenge is figuring out how many decodings there are when there are overlapping possible groupings of numbers. For example, the string `'111'` can be decoded as `['1', '1', '1']`, `['11', '1']`, or `['1', '11']`. I never figured out a formula for calculating the number of decodings but by manually figuring out the number of decodings for the strings `'1'`, `'11'`, `'111'`, `'1111'`, `'11111'`, `'111111'`, etc. I noticed that the number of decodings follow the [Fibonacci sequence](http://en.wikipedia.org/wiki/Fibonacci_number) starting with 1, 2, 3, 5, 8...! Unfortunately I don't have a good explanation for why that is the case. (Though it makes gut level sense in the way the Fibonacci sequence is a sum of everything that has come before.)\n",
"\n",
"So, my strategy for solving this problem follows these basic steps:\n",
"\n",
"1. Break the given string down into substrings such that each substring is one digit, `'10'` or `'20'`, \n",
" or cannot be broken down further because it has an unknown number of possible decodings.\n",
" These substrings can be considered independently for the purposes of this problem.\n",
" * For example, `'956'` becomes `['9', '5', '6']` and `'121956'` becomes `['1219', '5', '6']`.\n",
"2. For each substring figure out the number of decodings. For single digits, `'10'`, and `'20'` this is just 1.\n",
" For other substrings this is the $N$th value in the Fibonacci series 1, 2, 3, 5, 8... where\n",
" $N$ is the length of the substring.\n",
"3. Multiply together all of the number of decodings for the individual substrings to get the number of\n",
" of decodings for the original string of numbers.\n",
"\n",
"You can see my code solving this problem below. The functions roughly corresponding to the above steps are:\n",
"\n",
"1. `chunk_numbers`\n",
"2. `calc_substring_combos` + `series`\n",
"3. `calc_combos`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from operator import mul"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def calc_combos(nums):\n",
" \"\"\"\n",
" Calculate the number of possible ways a string of numbers can be decoded\n",
" using the scheme 1 -> a, 2 -> b, ... , 26 -> z.\n",
"\n",
" Parameters\n",
" ----------\n",
" num : str\n",
" String of numeric characters. It is assumed that the string\n",
" has at least one valid decoding.\n",
"\n",
" Returns\n",
" -------\n",
" combos : int\n",
" The number of ways `num` can be decoded.\n",
"\n",
" \"\"\"\n",
" return reduce(mul, (calc_substring_combos(chunk) for chunk in chunk_numbers(nums)))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def chunk_numbers(nums):\n",
" \"\"\"\n",
" Separate a string of numbers into a list of substrings.\n",
" Returned substrings are as small as possible (e.g. one or two characters)\n",
" except where numbers are grouped such that they have multiple\n",
" decodings.\n",
"\n",
" Parameters\n",
" ----------\n",
" num : str\n",
" String of numeric characters. It is assumed that the string\n",
" has at least one valid decoding.\n",
"\n",
" Returns\n",
" -------\n",
" subnums : list of str\n",
" List of substrings broken down as much as possible.\n",
"\n",
" \"\"\"\n",
" subnums = ['']\n",
" \n",
" for i, n in enumerate(nums[:-1]):\n",
" subnums[-1] += n\n",
" if n == '1':\n",
" continue\n",
" elif n == '2' and nums[i + 1] in '0123456':\n",
" continue\n",
" else:\n",
" subnums.append('')\n",
" \n",
" subnums[-1] += nums[-1]\n",
"\n",
" return subnums"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def series(n):\n",
" \"\"\"\n",
" Return the n-th member of the series 1, 2, 3, 5, 8, 13...\n",
" (Which is actually the Fibonacci series minus the first 1.)\n",
"\n",
" \"\"\"\n",
" r0 = 1\n",
" r1 = 1\n",
" \n",
" for _ in xrange(n):\n",
" r0, r1 = r1, r0 + r1\n",
" \n",
" return r0"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def calc_substring_combos(subnums):\n",
" \"\"\"\n",
" For a given string of numbers, return the number\n",
" of possible cominations for decoding that string.\n",
" \n",
" Parameters\n",
" ----------\n",
" subnums : str\n",
" String of digits. Should be irreducable if passed through\n",
" check_numbers. Assumed to have a valid decoding.\n",
"\n",
" Returns\n",
" -------\n",
" combos : int\n",
" The number of possible decodings for this substring.\n",
"\n",
" \"\"\"\n",
" if len(subnums) == 1 or (len(subnums) == 2 and subnums[1] == '0'):\n",
" return 1\n",
" else:\n",
" return series(len(subnums))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# tests for calc_combos\n",
"assert calc_combos('9') == 1, calc_combos('9')\n",
"assert calc_combos('999') == 1, calc_combos('999')\n",
"assert calc_combos('1111') == series(4), calc_combos('1111')\n",
"assert calc_combos('11911') == 6, calc_combos('11911')\n",
"assert calc_combos('119119') == 9, calc_combos('119119')\n",
"assert calc_combos('119119119') == 27, calc_combos('119119119')\n",
"assert calc_combos('12911') == 4, calc_combos('12911')\n",
"assert calc_combos('1211') == series(4), calc_combos('1211')\n",
"assert calc_combos('101010') == 1, calc_combos('101010')\n",
"assert calc_combos('202020') == 1, calc_combos('202020')\n",
"assert calc_combos('212121') == series(6), calc_combos('212121')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# tests for calc_substring_combos\n",
"assert calc_substring_combos('1') == series(1), calc_substring_combos('1')\n",
"assert calc_substring_combos('11') == series(2), calc_substring_combos('11')\n",
"assert calc_substring_combos('212121') == series(6), calc_substring_combos('212121')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# tests for series\n",
"assert series(1) == 1, series(1)\n",
"assert series(2) == 2, series(2)\n",
"assert series(3) == 3, series(3)\n",
"assert series(4) == 5, series(4)\n",
"assert series(5) == 8, series(5)\n",
"assert series(6) == 13, series(6)\n",
"assert series(7) == 21, series(7)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# tests for chunk_numbers\n",
"assert chunk_numbers('1') == ['1'], chunk_numbers('1')\n",
"assert chunk_numbers('10') == ['10'], chunk_numbers('10')\n",
"assert chunk_numbers('567') == ['5', '6', '7'], chunk_numbers('567')\n",
"assert chunk_numbers('1111') == ['1111'], chunk_numbers('1111')\n",
"assert chunk_numbers('11911') == ['119', '11'], chunk_numbers('11911')\n",
"assert chunk_numbers('12911') == ['12', '9', '11'], chunk_numbers('12911')\n",
"assert chunk_numbers('123') == ['123'], chunk_numbers('123')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
}
],
"metadata": {}
}
]
}
@MysticHLE
Copy link

I too had the same intuition after examining the problem - and also couldn't figure out why it followed the Fib sequence. However, this post below sheds some light on the structure and pattern of the problem as to why it follows Fib.

http://www.geeksforgeeks.org/count-possible-decodings-given-digit-sequence/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment