Skip to content

Instantly share code, notes, and snippets.

@kantale
Created December 15, 2014 22:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kantale/dda2a1bf861bd9bef91e to your computer and use it in GitHub Desktop.
Save kantale/dda2a1bf861bd9bef91e to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "",
"signature": "sha256:d5382c1d8e4afa21ab324258164968c311a06f8bb46612ce600cbce5cbb39b75"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"A scenario of bioinformatics analysis with existing articles of pypedia.com"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook we demonstrate how [pypedia](http://www.pypedia.com) can be used to perform simple bioinformatics analysis with openly available data. We also demonstrate how IPython can be combined with pypedia.com. IPython notebook is a convenient tool not only for performimg an analysis but also to distribute and reproduce it. \n"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Prerequisites"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Initially we assume that the pypedia library has been installed locally. To do that run the following from the computer's terminal (not in python):\n",
"\n",
" git clone git://github.com/kantale/pypedia.git\n",
" \n",
"This example assumes a python 2.7 (or higher but not 3.x) environment. The first command imports the pypedia library. This library maintains a connection between the local environment and pypedia.com"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pypedia"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Downloading a dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this demonstration we can experiment with the files available in:\n",
"http://pngu.mgh.harvard.edu/~purcell/plink/res.shtml\n",
"For example the file that contains the CEU founders (release 23, 60 individuals, filtered 2.3 million SNPs) can be downloaded locally with the following commands.\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from pypedia import download_link \n",
"link = 'http://pngu.mgh.harvard.edu/~purcell/plink/dist/hapmap_CEU_r23a_filtered.zip' \n",
"download_link(link)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Extracted filename from url: hapmap_CEU_r23a_filtered.zip\n",
"Saving at: hapmap_CEU_r23a_filtered.zip\n",
"Downloading: hapmap_CEU_r23a_filtered.zip Bytes: 31287683"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"\r"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[ 0% ]"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" \r",
"[****************100%******************] 31285249 of 31287683 complete"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next step is to unzip the downloaded file"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from pypedia import unzip\n",
"unzip('hapmap_CEU_r23a_filtered.zip', './')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Unzipping file: ./hapmap_CEU_r23a_filtered.bed\n",
"Unzipping file: ./hapmap_CEU_r23a_filtered.bim"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Unzipping file: ./hapmap_CEU_r23a_filtered.fam"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Reading a dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data are in binary BED / BIM / FAM format. We can apply a reader to this data. Readers are python generators. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from pypedia import BED_PLINK_reader\n",
"reader = BED_PLINK_reader(\n",
" \t'hapmap_CEU_r23a_filtered.bed',\n",
" \t'hapmap_CEU_r23a_filtered.bim',\n",
" \t'hapmap_CEU_r23a_filtered.fam')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see from the documentation (http://www.pypedia.com/index.php/BED_PLINK_reader), BED_PLINK_reader is a python generator. The first item that it generates is a dictionary that contains the family, sample and phenotype information of the dataset:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"header = reader.next()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Loading: hapmap_CEU_r23a_filtered.bim\n",
"Read 2333521 SNPs"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Loading: hapmap_CEU_r23a_filtered.fam\n",
"Read 60 Individuals\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The subsequent items of this generator contain genotype information. Suppose that we are interested on two particular SNPs: rs16839450 and rs16839451. We can extract the genotypes of these snps:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"interesting_snps = [x for x in reader if x['rs_id'] in ['rs16839450', 'rs16839451']]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"BED file type: SNP_major\n",
"\r",
"[ 0% ]"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" \r",
"[****************100%******************] 2330001 of 2333521 complete"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Basic genetics statistics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One experiment we can perform is to check if the SNPs that we selected are in LD (Linkage Disequilibrium):"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from pypedia import Pairwise_linkage_disequilibrium as PLD\n",
"ld = PLD(interesting_snps[0]['genotypes'], interesting_snps[1]['genotypes'])\n",
"ld"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"{'Dprime': 1.0,\n",
" 'R_sq': 0.8896046482253379,\n",
" 'haplotypes': [('CA', 0.7807017543859649, 0.5986842105263158),\n",
" ('CG', 0.00877192982456141, 0.19078947368421056),\n",
" ('TA', 6.268997962918469e-18, 0.15964912280701754),\n",
" ('TG', 0.21052631578947367, 0.050877192982456146)]}"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result ('R_sq' ~= 0.89) shows that these SNPs are in LD. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Subsequently we can measure the Hardy Weinberg Equilibrium of all SNPs:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"reader = BED_PLINK_reader(\n",
" \t'hapmap_CEU_r23a_filtered.bed',\n",
" \t'hapmap_CEU_r23a_filtered.bim',\n",
" \t'hapmap_CEU_r23a_filtered.fam')\n",
"header = reader.next()\n",
"\n",
"from pypedia import hardy_weinberg_equilibrium as hwe\n",
"hwe_values = [hwe(x['genotypes']) for x in reader]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Loading: hapmap_CEU_r23a_filtered.bim\n",
"Read 2333521 SNPs"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Loading: hapmap_CEU_r23a_filtered.fam\n",
"Read 60 Individuals\n",
"BED file type:"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" SNP_major\n",
"\r",
"[ 0% ]"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" \r",
"[****************100%******************] 2330001 of 2333521 complete"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also create a Q-Q plot that will show if the p-values are normally distributed, or if an inflation is taking place:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%matplotlib inline"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from pypedia import qq_plot\n",
"qq_plot(hwe_values, show_plot='on_screen')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"lambda: 0.483347\n"
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEPCAYAAABP1MOPAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGzRJREFUeJzt3X+0XGV97/H3JxAwJGgEJEkpmJYFYigxMZDaBurUGhZS\nsbSrgWVsF6U/4opeyF2990q8t72k6lpFXGVRyhJKBYqVH4bWILG15ZcjwVQxmEASIDG3BFFIKJJg\nsBqEfO8fe59kODkzZ+ac2bP3zPN5rXVWZu+ZPft7JrP293yf59nPo4jAzMzSNaHsAMzMrFxOBGZm\niXMiMDNLnBOBmVninAjMzBLnRGBmlrjCE4Gk4yV9TdJmSZskXZrvP0rSvZK2SrpH0tSiYzEzs4Op\n6PsIJE0HpkfEBklTgEeA84GLgRci4kpJlwFvjojlhQZjZmYHKbwiiIgdEbEhf/wy8ARwHPAB4Jb8\nZbeQJQczM+uxnvYRSJoJzAW+BUyLiJ35UzuBab2MxczMMj1LBHmz0D8ByyJiT+NzkbVPea4LM7MS\nHNqLk0iaSJYE/iEi7sp375Q0PSJ2SJoBPD/CcU4OZmZjEBFq97W9GDUk4Ebg8Yi4uuGpu4GL8scX\nAXcNPxYgIvwTweWXX156DFX58Wfhz8KfReufTvWiIlgA/B7wmKT1+b6PA1cAKyX9EbAduKAHsZiZ\n2TCFJ4KIeIjmlcd7iz6/mZm15juL+0StVis7hMrwZ3GAP4sD/FmMXeE3lI2HpKhyfGZmVSSJqFJn\nsZmZVZsTgZlZ4pwIzMyqZN8+uOYa+NjHenZK9xGYmVXFtm3wh38Ia9Zk25s2wamndvw27iMwM+s3\nQ1XA7NlZEpg2DVatGlMSGIueTDFhZmZNDK8CFi/OksLRR/csBFcEZmZlaFYF3HprT5MAuCIwM+u9\nClQBjVwRmJn1SoWqgEauCMzMeqFiVUAjVwRmZkWqaBXQyBWBJWXJEti6FY44Am67DaZObb3fbFwq\nXAU0ckVgSdm6Fb7+dfjqV7OL/2j7zcakD6qARq4ILClHHJH9e/rpcMMNo+8361ifVAGNPMWEJWX3\n7uwv/htueH3zT7P9Zm3btw+uvRaWL4ef/CSrAq6/Hs4/v+ehdDrFhBOBDTy3/1vhKlYFeK4hs2Hc\n/m+F6bO+gGbcR2ADz+3/VoiKVQHj4aYhG3hu/7euqlBfQDPuIzAzK0qfVAHuIzAz67YB6Qtoxn0E\nZmat9EkVMB6uCMzMRjLgVUAjVwRmZsMlUAU0ckVgZjYkoSqgkSsCMzNIrgpo5IrAzNKWaBXQyBWB\nmaUr4SqgkSsCM0uPq4DXcUVgZmlxFXAQVwRmlgZXAU25IrCB5XUIbD9XAS25IrCB5XUIzFVAe1wR\n2MDyOgSJcxXQtsIrAkk3SdopaWPDvhWSvi9pff5zTtFxWHpuuw0WLYJ773WzUFJcBXSs8PUIJJ0F\nvAx8PiJOy/ddDuyJiKtGOdbrEZhZ+1wFABVcjyAi1gC7Rniq7SDNzFpyFTAuZXYWXyLpUUk3SnLh\nbmZjs20b1GqwbFm2dOTixbB5c6WWjqy6sjqLrwM+kT/+JPBXwB+N9MIVK1bsf1yr1ajVagWHZv1q\nyRJYvRpefBEmTYIzzoA773T/wMDqg7WDe6Ver1Ov18d8fE/WLJY0E1g91EfQwXPuI7C21WrZcNFG\nixbBypWlhGNFcl9AS5XrIxiJpBkNm78NbGz2WrN2DQ0XHTJnjoeNDhz3BRSiF6OGbgfeDRwD7AQu\nB2rAHCCAp4APR8TOEY51RWBt270bLr4YXnkFJk6Ev/97NwsNFFcBbeu0IuhJ09BYORGYmfsCOtdp\nIvCdxWZWXa4CesJzDZlZ9bgvoKdcEZhZtbgK6DlXBGZWDa4CSuOKwMzK5yqgVK4IzKw8rgIqwRWB\nmZXDVUBluCIws95yFVA5rghsIHh94j7hKqCSXBHYQPD6xBXnKqDSXBHYQPD6xBXmKqDyPNeQDYTd\nu7NK4IYb3CxUGZ4jqDSedM7MyucqoFR9sR6BWbctWZItTHPuuVl1YCVxX0Bfch+BDYShzmLIkoJX\nJSuBq4C+5YrABoI7i0vkKqDvuY/ABoI7i0viKqCS3FlsZsXziKBK8wplZlYsVwEDx30E1teWLMn6\nByZOzK5DTz9ddkQDzH0BA6tl05CkY4FFwK8BM4EAngYeBO6MiOcLDc5NQzaKGTNgx44D2z//8/DM\nM+XFM7BcBfSVrjUNSboROBH4KnA98BwgYAYwH1gpaVtE/PH4QjYbu717DzyeMAEeeqi8WAaS+wKS\n0LQikDQ7Ih5reXAbrxkPVwQ2moUL4b77sqahdeuyVgvrElcBfcujhiwpHjZaAFcBfa9riUDSVGA5\ncD4wjax/4HngLuCKiCj8Rn4nArMecxUwELo519BKYBdQA46KiKOAXwd258+Z2aDwiKCktaoItkbE\nyZ0+102uCKwVr0rWJa4CBk43K4KnJX1M0rSGN58u6TLge+MJ0qwbvCrZOLkKsFyrRHAhcAzwdUm7\nJO0C6sDRwAU9iM2sJU80Nw7btmXzdi9blnUIL14Mmze7QzhRHjVkfcsjhsbAI4KS0JPho5Iujoib\nOz6w8/M4EZh1i/sCktGrRPBMRBzf8YGdn8eJwGy8XAUkp5tTTGxscdyxHUVlZuVwFWBtaDUN9bHA\nOWT3Egy3tphwzKwrXAVYB1olgn8GpkTE+uFPSPp6cSGZ2bi4CrAOFT5qSNJNwG8Cz0fEafm+o4Av\nAm8FtgMXjDRlhfsIzDrgKsByXbuhTNKRbZxs1NcAN5M1MTVaDtyb3518f75t1pElS7Kh8Oeemw0l\nTZrvC7BxaDXFxH3AFuDLwLqIeDHffzRwOtlkdCdFxHtHPYk0E1jdUBE8Cbw7InZKmg7UI+KUEY5z\nRWAjOuWU7Nr32mvZ9qJFsDLFGbBcBdgIujp8VNJ7gMXAAuDn8t3PAg8Bt0ZEvc2gZvL6RLArIt6c\nPxbw4tD2sOOcCGxEU6fCSy9ljydOhOefT/CmMvcFWBNdXbw+Ih4AHhh3VK3PEZJ8tbdRLVkCf/d3\nB+9/4IHEkoCrAOuylokAQNI8srUIGr0EPB0Rr47xvDslTY+IHZJmkK1zMKIVK1bsf1yr1ajVamM8\npfW71atH3n/NNXDmmb2NpTSuAmwE9Xqder0+5uNHHTUk6ZvAPGBoScrTgM3Am4ClEfFvo57k4Kah\nK4EfRsSnJS0HpkbEQR3GbhqyRocckv0x3Ojtb4e1axOoCFwFWAe6OQ31kGeBORExLyLmAXOA/wAW\nAle2EdDtZDegvU3SM5IuBq4AFkraCrwn3zZrScO+1gsXJpIEPCLICtZORbA5Ik4daZ+kDRExp7Dg\nXBFYgze8AfbuhQkTYP36BBaqdxVgY1RERbBZ0nWS3i2pJumzwOOSDgd+NuZIzdo0dWpWDezdm23v\n2wef+lS5MRXOVYD1UDsVwRHAR8iGkAJ8A/gs8FNgckTsKSw4VwTJG94cNGTXrgFtEnIVYF1QyDTU\n+V//Q2sUb4mIV8YYX0ecCNLWLAmsXJndQDZwPCLIuqSr9xHkb1gDbgGeznedIOmiiPDEc9Z1zS7+\nQz7zmQFMAq4CrGSjJgLgKuDsiNgCIOlk4A7gnUUGZuk55aBJRl5vICsBVwFWAe10Fh86lAQAImIr\n7SUQs45s2dL8uX/5lwFLAvv2ZRf82bOzJDBtGqxaBbfe6iRgPdfOBf0RSZ8DvgAI+BCwrtCoLDmt\nmoQefXTAhoq6CrCKaWfU0BuAj3Jg1NAa4LMRsbfg2NxZnJDhiWDBAnjooXJiKYz7AqxHerJ4fa84\nEaRhpGpg4P7bXQVYD/Vq8fqIiEEq1q0kI90LMGVK7+MojKsA6wOt+gjO61kUlqyhNQUabdrU+zgK\n4SrA+kTTRBAR24fvk/T+iPhKoRFZMkZqEjr3XHjrW3sfS1e5CrA+01EfgaT1ETG3wHiGn899BAOq\n2Sihvv/vdhVgFVDEpHNmXdUsCSxd2ts4usr3BVgf67QimB8RDxcYz/DzuSIYMM2SQF8PF3UVYBXj\n4aNWWRMmjNz0c/TR8MILvY9n3NwXYBXV9UnnzLqhWSUwbRrs2NHbWLrCVYANEPcRWGmWLu3DJOC+\nABtATZuGJM0Dgmx+oYNeFBHfKTY0Nw0NioEZIeQqwPpE1/oIJNXJEsAkYB7wWP7UbGBdRPzK+EJt\nIzgngr43EEnAfQHWZ7o2fDQiahHx68CzwDsjYl5EzAPm5vvMWmqWBCZO7G0c4+K1gy0B7fQRnBIR\n++cdiohNwNuLC8kG3Ss9Weh0nNwXYAlpZ9TQY8PWI1gMPFpoVNbXWq0t0BdNQu4LsMS0sx7BJGAp\ncFa+60Hguoj4acGxuY+gT/Vtv4D7AmxAFHJDmaQjgBMi4snxBNcpJ4L+0ywJTJkCe/b0NpaOuAqw\nAdL1uYYkfQBYD/xrvj1X0t1jD9EGVV8mAfcFmLXVWbwC+GVgF0BErAd+scCYbMBUNgl4RJAZ0F4i\n+FlE7B62b18RwVj/6qt+AVcBZq/TzqihzZI+BBwq6STgUmBtsWFZP+mrJOC+ALODtFMRXAKcCuwF\nbgd+BPz3IoOy/tFqqGiluAowa6qd4aPv7MW8Qk3O7VFDFdY39wu4CrDEFLFC2VWSnpT0SUm/NI7Y\nLBGVSQKuAsza0u59BDOAC/KfNwIrI+KTBcfmiqDCKt8v4CrAElboCmWSTgMuAy6MiMKnDnMiqK6R\nEkEl/qt8d7BZ91cokzSLrBL4XeCHwBeBPx1zhNb3KttB7CrAbEza6Sz+d7KL/8qI6On0064Iqqly\n1YCrALPX6WpFIOlQ4KmIuHrckY38/tvJhqO+Rnbj2vwizmPdU7lqwFWA2bi1TAQR8aqkEyQdHhF7\nCzh/ALWIeLGA97Yuq1QHsasAs65p587ip4CH8onm/ivfFxFxVZdiqNrfmNaBd7yjhJO6CjDrqnbu\nI/h/wD/nr52S/xzZpfMHcJ+kdZL+pEvvaQVoVg1s2NDDIHxfgFkh2h4+KmlyRPy4qyeXZkTEc5Le\nAtwLXBIRaxqed2dxRZTeQewqwKxtRQwf/VXgc2RVwPGS3gF8OCI+MvYwMxHxXP7vf0paBcwH1jS+\nZsWKFfsf12o1arXaeE9rHSq1g9h9AWajqtfr1Ov1MR/fzvDRh8nuIfhyRMzN922OiFPHfFb2r3p2\nSETskTQZuAf4i4i4p+E1rghKVmoHsasAszHpekUAEBHf0+uvCK92GtgIpgGr8vc9FLi1MQlYwlwF\nmPVUO4nge5IWAEg6jGw9gifGe+KIeAqYM973seKUUg24CjDruXZGDS0FPgocB/wAmJtvW4IKSwIe\nEWRWmo4mnes19xGUp6fVgKsAs67q+noEkj4j6Y2SJkq6X9ILkn5/fGFaP+p6EnAVYFYJ7TQNnR0R\nPwLeD2wHTgT+V5FBWbl6Mlx02zao1WDZsqxDePFi2LzZHcJmJWgnEQx1KL8f+MeIeInsjmBLSNeq\nAVcBZpXTzqih1ZKeBH4KLJV0bP7YBlCh1YD7Aswqqd2lKo8CXoqI1/Kbv944dFdwocG5s7jnCplK\nwvcFmPVUEVNMTAIuBs6UFGRTQFw39hAtKa4CzCqvnSkm7iRbPOYLZFNGLwbeFBGLCg/OFUFPdbUa\ncBVgVpoippg4NSJmNWw/IOnxzkOzZLgKMOsr7Ywa+o6kXxnakPQu4JHiQrK+5RFBZn2paUUgaWPD\na74h6RmyYaMnAFt6EJv10LibhVwFmPWtVk1D5w3bHroseGlJO8B9AWZ9r93ho3OAs8iSwZqIeLTo\nwPLzurO4B8Y8r5CrALNKKmKuoWVkI4beQraGwBckXTr2EK0ftEwC7gswGyjtDB/dCLxraL3i/Iay\nb0bEaYUH54qgJzrqH3AVYFZ5Xa8IcvuaPLY+13YScBVgNrDauY/gZuBbkr5E1lF8PnBToVFZtbgK\nMBto7XYWzwPO5EBn8fqiA8vP66ahgrWsCDwiyKwvddo05BXKEtYyCbgKMOtbRfURWCrcF2CWnHb6\nCGwAjXzvwEtQO89VgFli3DSUqIMTQRCTJrsvwGwAuGnIRjVSEoAfee1gs0S5aSgxTaeTmPY2uH6V\nE4BZgtw0lJhm1UC88Kr7AswGRBEL09iAkPaR3RM49P3Ik0C8qbygzKx07iNIxMFJIOMkYGZOBINu\n376mScBLS5gZOBEMtm3b0CEwchIYx8L0ZjZQnAgGUX53sE76RZwEzGw07iweNPkcQVpTx0nAzNrh\nimBQNMwR5CRgZp3wfQSD4KAqAJwEzNLlKSZS0rQKcBIws/a5IuhXbVYB4CRglpq+qggknSPpSUnf\nlXRZmbH0jQ6qAHASMLPRlVYRSDoE2AK8F/gB8G3ggxHxRMNrXBE0chVgZm3op4pgPrAtIrZHxM+A\nO4DfKjGe6nIVYGYFKjMRHAc807D9/XyfNcrvDtayS9BPfsxoCcBJwMw6VeYNZW1dslasWLH/ca1W\no1arFRROxezbB9deC8uXA0MJoDknALN01et16vX6mI8vs4/gXcCKiDgn3/44sC8iPt3wmjT7CF55\nBRYuhAcfBEAMTRp3sBQ/HjNrrZ/6CNYBJ0maKekw4ELg7hLjqY7DDoNZs7K1g1etwknAzIpU6n0E\nkt4HXA0cAtwYEX857Pk0KwKAPXuyysCrhplZhzqtCHxDmZnZgOmnpiEzM6sAJwIzs8Q5EZiZJc6J\nwMwscU4EZmaJcyIwM0ucE4GZWeKcCMzMEudEYGaWOCcCM7PEORGYmSXOicDMLHFOBGZmiXMiMDNL\nnBOBmVninAjMzBLnRGBmljgnAjOzxDkRmJklzonAzCxxTgRmZolzIjAzS5wTgZlZ4pwIzMwS50Rg\nZpY4JwIzs8Q5EZiZJc6JwMwscU4EZmaJcyIwM0ucE4GZWeKcCMzMEudEYGaWOCcCM7PEORGYmSWu\nlEQgaYWk70tan/+cU0YcZmZWXkUQwFURMTf/+deS4ugb9Xq97BAqw5/FAf4sDvBnMXZlNg2pxHP3\nHX/JD/BncYA/iwP8WYxdmYngEkmPSrpR0tQS4zAzS1phiUDSvZI2jvDzAeA64BeAOcBzwF8VFYeZ\nmbWmiCg3AGkmsDoiThvhuXKDMzPrUxHRdvP7oUUG0oykGRHxXL7528DGkV7XyS9iZmZjU0oiAD4t\naQ7Z6KGngA+XFIeZWfJKbxoyM7NyVf7O4tRvPpN0jqQnJX1X0mVlx1M2SdslPZZ/Fx4uO55ekXST\npJ2SNjbsOyoflLFV0j2pjL5r8lkkeZ2QdLykr0naLGmTpEvz/R19NypfEUi6HNgTEVeVHUuvSToE\n2AK8F/gB8G3ggxHxRKmBlUjSU8C8iHix7Fh6SdJZwMvA54cGVki6EnghIq7M/0h4c0QsLzPOXmjy\nWSR5nZA0HZgeERskTQEeAc4HLqaD70blK4Jcqp3G84FtEbE9In4G3AH8VskxVUFy34eIWAPsGrb7\nA8At+eNbyC4AA6/JZwFpfi92RMSG/PHLwBPAcXT43eiXRJDqzWfHAc80bH8/35eyAO6TtE7Sn5Qd\nTMmmRcTO/PFOYFqZwVRAqtcJYP9Q/LnAt+jwu1GJROCbz5qqdrtdORZExFzgfcBH82aC5EXWxpvy\n9yXl6wR5s9A/AcsiYk/jc+18N8oaPvo6EbGwnddJ+hywuuBwquQHwPEN28eTVQXJGrr/JCL+U9Iq\nsuazNeVGVZqdkqZHxA5JM4Dnyw6oLBGx/3dP7TohaSJZEviHiLgr393Rd6MSFUEr+S8xpOnNZwNq\nHXCSpJmSDgMuBO4uOabSSDpC0pH548nA2aT1fRjubuCi/PFFwF0tXjvQUr1OSBJwI/B4RFzd8FRH\n341+GDX0ebJyb//NZw1tXwNP0vuAq4FDgBsj4i9LDqk0kn4BWJVvHgrcmsrnIel24N3AMWRtvv8X\n+DKwEjgB2A5cEBG7y4qxV0b4LC4HaiR4nZB0JvAg8BgHmn8+DjxMB9+NyicCMzMrVuWbhszMrFhO\nBGZmiXMiMDNLnBOBmVninAjMzBLnRGBmljgnAkuapHfk92p0elxd0rxRXjOzcarksZB0X8NNdAdN\nv5zvH3HKYUmzJd04nvNbGpwILHVzgXPHcFzhc/tIeg+wpWHumJuBkebZXw7cGxEnA/fn20TEY8CJ\nko4tMk7rf04EVhmSfk/St/KFRa6XNEHSGfmMkodLmpwvvjFLUk3Sg5K+ki/cc11+uz2Szpa0VtIj\nklbm01GQv9c3JG2Q9E1JbwQ+AVyYn3NRfo6b8ji+k098iKRJku6Q9LikLwGT6GDaY0lvkHSzskV1\nviOplu8/Io9xs6Qv5XENVRqLye4eBlpOv9xqyuGvAovajdPS5ERglSDp7cAFwK/ms4vuAz4UEd8m\nmzflU8CnySbWejw/7AzgvwGzgBOB35F0DPB/gN+IiHlkC3X8aT4x1xeBSyNiDtliPz8G/hy4IyLm\nRsSd+bH3R8QvA+8BPiPpCGAp8HJEzCKb0mAenVUEHwVei4jZwAeBWyQdDnwE+GFEnJrH0vi+C8jm\nmxpNqymHHwZ+rYM4LUGVmH3UDPgNsovguvwP+0nAjvy5T5BdEH8CXNJwzMMRsR32zz9zJvBTssSw\nNn+fw4C1wNuAZyPiEdi/iMfQpF2Nf9mfDZwn6X/m24eTzddyFvDX+bEbJT3W4e+3ALgmP36LpKeB\nk/P9V+f7Nw9735/rdCW2iAhJjQnqOWBmh7FaYpwIrEpuiYj/PcL+Y4DJZBPvTQL+K9/feMFTvi2y\n9vLFjW8g6bQm5xzpr/rfiYjvDjt+6BxNSZoP/G2++efApuEvaXZoq/dtQ6sph4c+F7Om3DRkVXE/\n8LuS3gL7R8KckD/3t8CfAbeRNQ8NmZ+PzJlA1qy0BvgmsEDSifn7TJZ0EvAkMEPS6fn+I5WtCb0H\nOLLhPf8NuHRoQ9Lc/OGDZG32SPolYPbwXyAiHs6bmOZGxFeGPb0G+FB+/MlkVcYW4Bt57EiaBTQm\nrGclHd3iMxvSasrhGcDTbbyHJcyJwCohIp4gu9jfI+lR4B6yC/fvA3sj4g7gCuCMvKM1gG8D1wKP\nA/8REasi4gXgD4Db8/dZC7wtX/P5QuBvJG0gu+AfDnwNmDXUWQx8EpiYd+puAv4iD/E6YIqkx/N9\n7bTdw4G/xj8LTMibfu4ALoqIV/L9b5G0OT/3ZuCl/JiHgNOH3ihv/loLnCzpGUkX509dASyUtJWs\nX+OKhvPPJ0tiZk15GmrrS3ky+B8RcV7ZsYxHXs1MjIi9eRVzL3ByRLya/44XRsTScbx/nWwu+mRX\nL7PRuY/A+tWgrNE7GXggH9UkYGlEvAoQEXVJfybpyOHr0LZD0mxgm5OAjcYVgZlZ4txHYGaWOCcC\nM7PEORGYmSXOicDMLHFOBGZmiXMiMDNL3P8HzkqHYxMk3IAAAAAASUVORK5CYII=\n",
"text": [
"<matplotlib.figure.Figure at 0x10666b150>"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"<module 'matplotlib.pyplot' from '/Users/alexandroskanterakis/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc'>"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is clearly an inlfation of the hardy weinberg statistic in this dataset. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally we can also perform a genetic association test. For demonstration purposes we can assign random values as phenotypes:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Get the number of samples \n",
"samples = len(header['sex_ids']) \n",
"\n",
"#Split cases and controls equally \n",
"controls = samples/2 \n",
"cases = samples - controls\n",
"\n",
"#Create random phenotype\n",
"pheno = [1] * cases + [2] * controls\n",
"from random import shuffle\n",
"shuffle(pheno)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [genetic_association_test](http://www.pypedia.com/index.php/genetic_association_test) method contains a collection of genetic association tests."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from pypedia import genetic_association_test as gat "
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can create a more complex analysis, for example report all SNPs where they pass the hardy weinberg statistic (p-value>0.0001) and the p-value of the genotypic test for association with the phenotype is lower than 10^-4. Of course, since this is a random assigned phenotype we expect the method to find a few SNPs."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# \"rewind\" reader\n",
"reader = BED_PLINK_reader(\n",
" \t'hapmap_CEU_r23a_filtered.bed',\n",
" \t'hapmap_CEU_r23a_filtered.bim',\n",
" \t'hapmap_CEU_r23a_filtered.fam')\n",
"header = reader.next()\n",
"\n",
"reported_snps = []\n",
"\n",
"#Iterate through all markers \n",
"for marker in reader:\n",
" \t#Get the genotype\n",
" \tgenotypes = marker['genotypes']\n",
"\n",
" \t#Discard markers with low hardy weinberg test values\n",
" \thwe_value = hwe(genotypes)\n",
" \tif hwe_value < 1e-4:\n",
" \t\tcontinue\n",
"\n",
" \t#On the remaining snps perform an association test\n",
" \tassoc = gat(genotypes, pheno, tests=['GENO'])\n",
" \tp_value = assoc['GENO']['P']\n",
"\n",
" \t#Report SNPs with p_values < 10^-4 \n",
"\tif p_value < 1e-4:\n",
" \t\treported_snps += [(marker['rs_id'], str(p_value))]\n",
"\n",
"#Print first 10 reported SNPs\n",
"for s in reported_snps[0:10]:\n",
" print 'SNP: %s p_value: %s' % (s[0], s[1])\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Loading: hapmap_CEU_r23a_filtered.bim\n",
"Read 2333521 SNPs"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Loading: hapmap_CEU_r23a_filtered.fam\n",
"Read 60 Individuals\n",
"BED file type: SNP_major\n",
"\r",
"[ 0% ]"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" \r",
"[****************100%******************] 2330001 of 2333521 complete"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" SNP: rs9729887 p_value: 7.41715632301e-05\n",
"SNP: rs4650608 p_value: 4.25942698892e-05\n",
"SNP: rs12120254 p_value: 5.56259796442e-05\n",
"SNP: rs12022730 p_value: 8.07413948606e-05\n",
"SNP: rs3806412 p_value: 5.29801197352e-05\n",
"SNP: rs6693754 p_value: 6.08087642015e-05\n",
"SNP: rs17380246 p_value: 5.9213847799e-05\n",
"SNP: rs17371791 p_value: 6.72423732165e-05\n",
"SNP: rs11124566 p_value: 8.67876069542e-05\n",
"SNP: rs2540973 p_value: 4.51223345855e-05\n"
]
}
],
"prompt_number": 13
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Saving a method in pypedia.com"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Suppose that we have developed a function (or class) and we want to make it part of the pypedia collection. We can create and edit an article in the pypedia.com website as with any other wiki. Alternativelly we can use the pypedia library to complete this task. To do that we should already have an account in pypedia.com. Then we must declare our username and password locally:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"pypedia.username='JohnDoe'\n",
"pypedia.password='secretPassword'"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we can create a new article in pypedia.com by using our account.\n",
"Some notes:\n",
"* It is very important the name of the article (and the name of the function or class) to end in \"user_< Username >\". (See supplementary file 2)\n",
"* The first letter of the username should be capital.\n",
"* In the [documentation](http://www.pypedia.com/index.php/PyPedia:Documentation) page we describe how to universally define these values (for security purposes).\n",
"\n",
"For example:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"pypedia.add('foo_user_JohnDoe')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Article foo_user_JohnDoe saved\n",
"Next: \n",
" Edit the article online: http://www.pypedia.com/index.php/foo_user_JohnDoe\n",
" Or edit the article locally: /Users/alexandroskanterakis/del/pypedia/pypCode/pyp_foo_user_JohnDoe.py\n",
" To push the changes to pypedia.com run: pypedia.push()\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As it is indicated in the output messages there two ways to continue. Either to edit the article online (http://www.pypedia.com/index.php/foo_user_JohnDoe) or open the the file pyp_foo_user_JohnDoe.py with your favorite text editor, add the desired functionality and then type:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"pypedia.push()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Additional information"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Additional documentation and functionality of the pypedia library can be found here: http://www.pypedia.com/index.php/PyPedia:Documentation"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment