Skip to content

Instantly share code, notes, and snippets.

@jeffhussmann
Last active December 17, 2015 10:49
Show Gist options
  • Save jeffhussmann/5597248 to your computer and use it in GitHub Desktop.
Save jeffhussmann/5597248 to your computer and use it in GitHub Desktop.
Activity 2
{
"metadata": {
"name": "statement_2"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "Activity 2"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "The file `data/names.txt` has a million lines, each of which contains a saccharomyces cerevisiae gene name."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!wc -l data/names.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "1000000 data/names.txt\r\n"
}
],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": "!head data/names.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "YIL054W\r\nYOR366W\r\nYNL244C\r\nYNL085W\r\nYDL244W\r\nYGL204C\r\nYNL046W\r\nYIL006W\r\nYBR077C\r\nYHL029C\r\n"
}
],
"prompt_number": 2
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "Main goal"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Create a dictionary of (gene name: number of times that gene name appears in the file) pairs."
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "Stretch goals"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Which gene appears the most?\n\nVisualize the distribution of counts for all names."
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "Super strech goal"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Implement the naive array-based scheme described in the lecture and compare its speed to using a dictionary."
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment