Last active
December 17, 2015 10:39
-
-
Save jeffhussmann/5596406 to your computer and use it in GitHub Desktop.
more advanced dictionary stuff
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "more_dictionaries" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "heading", | |
"level": 3, | |
"metadata": {}, | |
"source": "Building dictionaries" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "Dictionary comprehensions provide a powerful syntax for building dictionaries." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "ascii_value = {letter: ord(letter) for letter in 'abcdefg'}\n\nprint ascii_value['e']", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "101\n" | |
} | |
], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "You can also build a dictionary from an iterable of pairs, but this has largely been superceded by dictionary comprehensions." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "word_number_pairs = [('one', 1), ('two', 2), ('three', 3)]\nword_to_number = dict(word_number_pairs)\nprint word_to_number['two']", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "2\n" | |
} | |
], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 3, | |
"metadata": {}, | |
"source": "Providing default values" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "People from Perl backgrounds may be suprised by Python's lack of auto-vivification of default values for unseen keys.\n\nA clunky way to provide a default value for previously-unseen keys is call `get()` on the dictionary and supply a second argument." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "print word_to_number.get('four', 'do not know that one yet')", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "do not know that one yet\n" | |
} | |
], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "`defaultdict` in the `collections` module provides a more convenient way to do this." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "from collections import defaultdict", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "Suppose we have a list of (gene name, transcript name) pairs, and we want to build a dictionary of (gene name, list of all transcripts associated with that gene name) pairs." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "gene_transcript_pairs = [('ENSG00000204802', 'ENST00000430302'),\n ('ENSG00000204802', 'ENST00000412631'),\n ('ENSG00000204802', 'ENST00000429818'),\n ('ENSG00000204802', 'ENST00000377518'),\n ('ENSG00000243838', 'ENST00000495689'),\n ('ENSG00000239600', 'ENST00000478870'),\n ('ENSG00000248425', 'ENST00000514401'),\n ('ENSG00000261970', 'ENST00000571144'),\n ('ENSG00000236437', 'ENST00000452629'),\n ('ENSG00000240567', 'ENST00000473595'),\n ]", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "`defaultdict` takes an argument that is the function it should use to build the default starting value of any new keys. We want our default starting values to be empty lists. " | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "gene_to_transcripts = defaultdict(list)", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "Now we can just go through the list of pairs and append each transcript to it's gene's list." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "for gene, transcript in gene_transcript_pairs:\n gene_to_transcripts[gene].append(transcript)\n \nprint gene_to_transcripts['ENSG00000204802']", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "['ENST00000430302', 'ENST00000412631', 'ENST00000429818', 'ENST00000377518']\n" | |
} | |
], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "If you want a `defaultdict` who's default starting value is itself a `defaultdict`, it gets a little hacky." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "dd_of_dds = defaultdict(lambda: defaultdict(int))\n\ndd_of_dds['new key']['new subkey'] += 1\n\nprint dd_of_dds['new key']['new subkey']", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "1\n" | |
} | |
], | |
"prompt_number": 8 | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment