Skip to content

Instantly share code, notes, and snippets.

@jeffhussmann
Last active December 17, 2015 10:39
Show Gist options
  • Save jeffhussmann/5596406 to your computer and use it in GitHub Desktop.
Save jeffhussmann/5596406 to your computer and use it in GitHub Desktop.
more advanced dictionary stuff
{
"metadata": {
"name": "more_dictionaries"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "Building dictionaries"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Dictionary comprehensions provide a powerful syntax for building dictionaries."
},
{
"cell_type": "code",
"collapsed": false,
"input": "ascii_value = {letter: ord(letter) for letter in 'abcdefg'}\n\nprint ascii_value['e']",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "101\n"
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": "You can also build a dictionary from an iterable of pairs, but this has largely been superceded by dictionary comprehensions."
},
{
"cell_type": "code",
"collapsed": false,
"input": "word_number_pairs = [('one', 1), ('two', 2), ('three', 3)]\nword_to_number = dict(word_number_pairs)\nprint word_to_number['two']",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "2\n"
}
],
"prompt_number": 2
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "Providing default values"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "People from Perl backgrounds may be suprised by Python's lack of auto-vivification of default values for unseen keys.\n\nA clunky way to provide a default value for previously-unseen keys is call `get()` on the dictionary and supply a second argument."
},
{
"cell_type": "code",
"collapsed": false,
"input": "print word_to_number.get('four', 'do not know that one yet')",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "do not know that one yet\n"
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": "`defaultdict` in the `collections` module provides a more convenient way to do this."
},
{
"cell_type": "code",
"collapsed": false,
"input": "from collections import defaultdict",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Suppose we have a list of (gene name, transcript name) pairs, and we want to build a dictionary of (gene name, list of all transcripts associated with that gene name) pairs."
},
{
"cell_type": "code",
"collapsed": false,
"input": "gene_transcript_pairs = [('ENSG00000204802', 'ENST00000430302'),\n ('ENSG00000204802', 'ENST00000412631'),\n ('ENSG00000204802', 'ENST00000429818'),\n ('ENSG00000204802', 'ENST00000377518'),\n ('ENSG00000243838', 'ENST00000495689'),\n ('ENSG00000239600', 'ENST00000478870'),\n ('ENSG00000248425', 'ENST00000514401'),\n ('ENSG00000261970', 'ENST00000571144'),\n ('ENSG00000236437', 'ENST00000452629'),\n ('ENSG00000240567', 'ENST00000473595'),\n ]",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": "`defaultdict` takes an argument that is the function it should use to build the default starting value of any new keys. We want our default starting values to be empty lists. "
},
{
"cell_type": "code",
"collapsed": false,
"input": "gene_to_transcripts = defaultdict(list)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Now we can just go through the list of pairs and append each transcript to it's gene's list."
},
{
"cell_type": "code",
"collapsed": false,
"input": "for gene, transcript in gene_transcript_pairs:\n gene_to_transcripts[gene].append(transcript)\n \nprint gene_to_transcripts['ENSG00000204802']",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "['ENST00000430302', 'ENST00000412631', 'ENST00000429818', 'ENST00000377518']\n"
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": "If you want a `defaultdict` who's default starting value is itself a `defaultdict`, it gets a little hacky."
},
{
"cell_type": "code",
"collapsed": false,
"input": "dd_of_dds = defaultdict(lambda: defaultdict(int))\n\ndd_of_dds['new key']['new subkey'] += 1\n\nprint dd_of_dds['new key']['new subkey']",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "1\n"
}
],
"prompt_number": 8
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment