Skip to content

Instantly share code, notes, and snippets.

@jeffhussmann
Last active December 17, 2015 10:49
Show Gist options
  • Save jeffhussmann/5597236 to your computer and use it in GitHub Desktop.
Save jeffhussmann/5597236 to your computer and use it in GitHub Desktop.
Activity 3
{
"metadata": {
"name": "Untitled0"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "Activity 3"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "You have carried out a fake experiment measuring expression levels in yeast.\n\nYour fake data is in `data/OLN_data.txt`.\n\nEach line in the file is a gene name and its expression level, separated by a tab."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!head data/OLN_data.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "YGL019W\t192.924770778\r\nYCR036W\t379.209040304\r\nYMR322C\t50.0815929232\r\nYDR545C-A\t375.637949832\r\nYMR176W\t301.77533648\r\nYNR030W\t69.4445692651\r\nYCL075W\t174.385080591\r\nYKR014C\t302.691713461\r\nYOL025W\t98.5213144898\r\nYML030W\t377.692625747\r\n"
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Your fake collaborators have carried out another experiment measuring expression under a different condition, and you want to explore if expression levels under these conditions are related.\n\nUnfortunately, while both experiments measured expression of the same set of genes, you used the OLN naming scheme to label the genes, while your collaborators used Swiss-Prot Accession (SP_AC) names.\n\nTheir data is in `data/SP_AC_data.txt`, but their genes are in an unknown different order than yours and with different labels."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!head data/SP_AC_data.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "Q03629\t45.1685224209\r\nQ12207\t111.091336815\r\nP38629\t103.962701913\r\nQ05164\t125.420349007\r\nP39933\t119.327855598\r\nQ05777\t45.9788559517\r\nP38215\t94.5685010113\r\nP53309\t77.3282636025\r\nQ06091\t56.6344387824\r\nP33314\t150.651546268\r\n"
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Thankfully, you have a file that lists the mappings from the OLN naming scheme to the SP_AC naming scheme.\n\nEach line in the file consists of an OLN name and the corrsponding SP_AC name for the same gene, separated by a tab."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!head data/OLN_to_SP_AC.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "YNL284C-A\tQ12391\r\nYNL150W\tP53902\r\nYOL106W\tQ08241\r\nYHL039W\tP38732\r\nYCR045C\tP25381\r\nYMR132C\tP40206\r\nYNR072W\tP53631\r\nYDR237W\tP36519\r\nYIL139C\tP38927\r\nYKL148C\tQ00711\r\n"
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": "You can use these mappings to connect the two sets of experimental data."
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "Main goal"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Produce a scatter plot showing the two expression measurements for each gene."
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "Stetch goal"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "A third expression experiment has been carried out, but in a hilarious misunderstanding, yet another gene naming scheme (SGD) was used."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!head data/SGD_data.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "S000003935\t73.6664201254\r\nS000004021\t49.0963787652\r\nS000002157\t100.290286672\r\nS000004185\t102.677778865\r\nS000004589\t115.631200468\r\nS000001571\t147.603579861\r\nS000004897\t64.6961743153\r\nS000003044\t96.2297902338\r\nS000001050\t56.6618543558\r\nS000003142\t145.732563992\r\n"
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Unfortunately, this time we don't have a direct mapping from OLN or SP_AC to SGD. All we have is a list of mappings from SP_AC to a fourth (!) naming scheme, SP_Entry..."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!head data/SP_AC_to_SP_Entry.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "P46962\tCTK2_YEAST\r\nQ04093\tYD444_YEAST\r\nQ08955\tCSM4_YEAST\r\nP40012\tPPOX_YEAST\r\nP41808\tSMK1_YEAST\r\nP38228\tTCM62_YEAST\r\nP23638\tPSA3_YEAST\r\nQ99207\tNOP14_YEAST\r\nP36093\tPHD1_YEAST\r\nP38427\tTSL1_YEAST\r\n"
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": "... and a list of mappings from SGD to SP_Entry."
},
{
"cell_type": "code",
"collapsed": false,
"input": "!head data/SGD_to_SP_Entry.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "S000000618\tSYNM_YEAST\r\nS000003930\tYL007_YEAST\r\nS000028580\tYO16W_YEAST\r\nS000000821\tISC1_YEAST\r\nS000003480\tSOL4_YEAST\r\nS000002236\tMDHP_YEAST\r\nS000000207\tCOQ1_YEAST\r\nS000001634\tNNRD_YEAST\r\nS000001688\tXPOT_YEAST\r\nS000005732\tNOC2_YEAST\r\n"
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Use these two sets of mappings to connect the third set of expression values to the first two.\n\nVisualize all three at the same time by using the third measured values to control the color of each point in your scatter plot."
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment