jeffhussmann/more_dictionaries

## more_dictionaries
{
 "metadata": {
  "name": "more_dictionaries"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": "Building dictionaries"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Dictionary comprehensions provide a powerful syntax for building dictionaries."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "ascii_value = {letter: ord(letter) for letter in 'abcdefg'}\n\nprint ascii_value['e']",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "101\n"
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "You can also build a dictionary from an iterable of pairs, but this has largely been superceded by dictionary comprehensions."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "word_number_pairs = [('one', 1), ('two', 2), ('three', 3)]\nword_to_number = dict(word_number_pairs)\nprint word_to_number['two']",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "2\n"
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": "Providing default values"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "People from Perl backgrounds may be suprised by Python's lack of auto-vivification of default values for unseen keys.\n\nA clunky way to provide a default value for previously-unseen keys is call `get()` on the dictionary and supply a second argument."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "print word_to_number.get('four', 'do not know that one yet')",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "do not know that one yet\n"
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "`defaultdict` in the `collections` module provides a more convenient way to do this."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "from collections import defaultdict",
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Suppose we have a list of (gene name, transcript name) pairs, and we want to build a dictionary of (gene name, list of all transcripts associated with that gene name) pairs."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "gene_transcript_pairs = [('ENSG00000204802', 'ENST00000430302'),\n                         ('ENSG00000204802', 'ENST00000412631'),\n                         ('ENSG00000204802', 'ENST00000429818'),\n                         ('ENSG00000204802', 'ENST00000377518'),\n                         ('ENSG00000243838', 'ENST00000495689'),\n                         ('ENSG00000239600', 'ENST00000478870'),\n                         ('ENSG00000248425', 'ENST00000514401'),\n                         ('ENSG00000261970', 'ENST00000571144'),\n                         ('ENSG00000236437', 'ENST00000452629'),\n                         ('ENSG00000240567', 'ENST00000473595'),\n                        ]",
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "`defaultdict` takes an argument that is the function it should use to build the default starting value of any new keys. We want our default starting values to be empty lists. "
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "gene_to_transcripts = defaultdict(list)",
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Now we can just go through the list of pairs and append each transcript to it's gene's list."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "for gene, transcript in gene_transcript_pairs:\n    gene_to_transcripts[gene].append(transcript)\n    \nprint gene_to_transcripts['ENSG00000204802']",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "['ENST00000430302', 'ENST00000412631', 'ENST00000429818', 'ENST00000377518']\n"
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "If you want a `defaultdict` who's default starting value is itself a `defaultdict`, it gets a little hacky."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "dd_of_dds = defaultdict(lambda: defaultdict(int))\n\ndd_of_dds['new key']['new subkey'] += 1\n\nprint dd_of_dds['new key']['new subkey']",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "1\n"
      }
     ],
     "prompt_number": 8
    }
   ],
   "metadata": {}
  }
 ]
}
	{
	"metadata": {
	"name": "more_dictionaries"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "heading",
	"level": 3,
	"metadata": {},
	"source": "Building dictionaries"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Dictionary comprehensions provide a powerful syntax for building dictionaries."
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "ascii_value = {letter: ord(letter) for letter in 'abcdefg'}\n\nprint ascii_value['e']",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "101\n"
	}
	],
	"prompt_number": 1
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "You can also build a dictionary from an iterable of pairs, but this has largely been superceded by dictionary comprehensions."
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "word_number_pairs = [('one', 1), ('two', 2), ('three', 3)]\nword_to_number = dict(word_number_pairs)\nprint word_to_number['two']",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "2\n"
	}
	],
	"prompt_number": 2
	},
	{
	"cell_type": "heading",
	"level": 3,
	"metadata": {},
	"source": "Providing default values"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "People from Perl backgrounds may be suprised by Python's lack of auto-vivification of default values for unseen keys.\n\nA clunky way to provide a default value for previously-unseen keys is call `get()` on the dictionary and supply a second argument."
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "print word_to_number.get('four', 'do not know that one yet')",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "do not know that one yet\n"
	}
	],
	"prompt_number": 3
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "`defaultdict` in the `collections` module provides a more convenient way to do this."
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "from collections import defaultdict",
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 4
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Suppose we have a list of (gene name, transcript name) pairs, and we want to build a dictionary of (gene name, list of all transcripts associated with that gene name) pairs."
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "gene_transcript_pairs = [('ENSG00000204802', 'ENST00000430302'),\n ('ENSG00000204802', 'ENST00000412631'),\n ('ENSG00000204802', 'ENST00000429818'),\n ('ENSG00000204802', 'ENST00000377518'),\n ('ENSG00000243838', 'ENST00000495689'),\n ('ENSG00000239600', 'ENST00000478870'),\n ('ENSG00000248425', 'ENST00000514401'),\n ('ENSG00000261970', 'ENST00000571144'),\n ('ENSG00000236437', 'ENST00000452629'),\n ('ENSG00000240567', 'ENST00000473595'),\n ]",
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 5
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "`defaultdict` takes an argument that is the function it should use to build the default starting value of any new keys. We want our default starting values to be empty lists. "
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "gene_to_transcripts = defaultdict(list)",
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 6
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Now we can just go through the list of pairs and append each transcript to it's gene's list."
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "for gene, transcript in gene_transcript_pairs:\n gene_to_transcripts[gene].append(transcript)\n \nprint gene_to_transcripts['ENSG00000204802']",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "['ENST00000430302', 'ENST00000412631', 'ENST00000429818', 'ENST00000377518']\n"
	}
	],
	"prompt_number": 7
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "If you want a `defaultdict` who's default starting value is itself a `defaultdict`, it gets a little hacky."
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "dd_of_dds = defaultdict(lambda: defaultdict(int))\n\ndd_of_dds['new key']['new subkey'] += 1\n\nprint dd_of_dds['new key']['new subkey']",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "1\n"
	}
	],
	"prompt_number": 8
	}
	],
	"metadata": {}
	}
	]
	}