Skip to content

Instantly share code, notes, and snippets.

@yoavram
Created October 9, 2012 17:17
Show Gist options
  • Save yoavram/3860144 to your computer and use it in GitHub Desktop.
Save yoavram/3860144 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": "ma_analysis"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Converting MA data from MongoDB to an R Data Frame\n",
"All the relevant data was created using [ma.py](https://bitbucket.org/yoavram/masim/src/tip/ma.py).\n",
"## Setup environment\n",
"Change to default working directory, import *matplotlib* so that saved *py* files will work."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import os\n",
"os.chdir(\"d:\\\\workspace\\\\MaSim\")\n",
"from matplotlib.pyplot import plot"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a MongoDB interface\n",
"The host is *britanya409-5.tau.ac.il*. From the PC I use *localhost* and a PuTTy tunnel.\n",
"The db is *ma* and the collection is *results*. The code uses the [pymongo](http://api.mongodb.org/python/current/) package to interface MongoDB. Current version is 2.3."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pymongo\n",
"print \"pymongo version\", pymongo.version"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"pymongo version 2.3\n"
]
}
],
"prompt_number": 28
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def ma_collection():\n",
" con = pymongo.Connection(\"localhost\") \n",
" col = con.ma.results\n",
" return col"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"col = ma_collection()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 23
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a list of all available specs.\n",
"Specs are the parameter sets. \n",
"Currently most of the parameters are constants, and only *s*, *U*, and *$\\pi$* are variables."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"specs = []\n",
"for s in col.distinct('s'):\n",
" for U in col.find({'s':s}).distinct('U'):\n",
" for pi in col.find({'s':s,'U':U}).distinct('pi'):\n",
" specs.append( {'s':s,'U':U,'pi':pi} ) "
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get a cursor over the data \n",
"Get only the required date - the parameters and *w* which is the _mean fitness after the bottleneck_ (if *b==1* then it is the fitness of the single individual after the bottleneck)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"cur = col.find({},['w','tau','B','genes','epistasis','s','b','U','pop','pi'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Look at a sample record\n",
"*d* is some record. We show the parameter values and the plot of the mean fitness. Note that *w* and *_id* do not count as parameters because the first is the fitness time series and the second is the Mongo ID of the record."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"d = cur.next()\n",
"params = [(k,v) for k,v in d.items() if (k!='_id' and k!='w')]\n",
"for k,v in params: print k,'=',v"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"tau = 1\n",
"B = 300\n",
"genes = 100\n",
"pop = 100000000.0\n",
"epistasis = 1.0\n",
"s = 0.01\n",
"b = 1\n",
"U = 0.003\n",
"pi = 200\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"w = d['w']\n",
"plot(w);"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAD9CAYAAABUS3cAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHT5JREFUeJzt3XtwVGWexvFvczfABEkjIEIAzaQ7EJJWSMhuTHpwTFIi\nt8ERUrWWImN1uUpQArOlU7vi7rpo6Wgw4hhrZJcZhamddRViqZmwbhNEE6IGRBLG4RIYissGGNpg\nwhjM2T9aWgK50pfTl+dTlRq6z+331nGenLznPe+xGIZhICIiMaWf2QWIiEjoKfxFRGKQwl9EJAYp\n/EVEYpDCX0QkBin8RURiULfhf//99zN69GhSU1O7XOexxx5j8uTJ3HLLLezbt8/3fVVVFXa7naSk\nJEpLSwNXsYiI+K3b8F+yZAnvv/9+l8t37tzJ9u3b+eSTT1i5ciUrV670LVu+fDllZWVs3bqVdevW\ncerUqcBVLSIifuk2/G+99VauvfbaLpfX1NRw1113MXLkSAoLC2loaADA4/EAkJOTQ2JiInl5edTU\n1ASwbBER8Ydfff47d+4kJSXF93nUqFEcOHCA2tpabDab7/uUlBSqq6v9OZSIiATQAH82NgyDy2eH\nsFgsfdpHX9cXEREvf2bn8evKPzMzk/r6et/npqYmJk+ezPTp0zvc/N27dy8zZ87scj8Xf4lc+vP0\n0warVl35faT9PPHEE6bXoPapfbHWtlhon7/8Dv8333yT06dPs3HjRux2OwAjRowAvCN+Ghsbqays\nJDMzs0/7TkiA06f9qU5ERLrSbbdPYWEh27Zt49SpU4wfP54nn3yStrY2AFwuFxkZGWRnZzN9+nRG\njhzJ66+/7tu2pKQEl8tFW1sbRUVFWK3WPhVmtYIGCImIBEe34b9p06Yed/D000/z9NNPX/F9bm6u\nb/TP1YiWK3+n02l2CUGl9kWuaG4bRH/7/GUxAtF55E8BFkun/Vf19bBwIfjx+0NEJGp1lZ29FbbT\nOyQkqNtHRCRYwvbKv60N4uLgr3+FfmH7K0pExBxRe+U/cCAMHQrfPSwsIiIBFLbhD+r6EREJlrAP\n/2gY8SMiEm7COvytVoW/iEgwhHX4q9tHRCQ4/JrYLdgSEqCxEU6c6N26AwcGvSQRkagQtkM9Adav\nh8cf73kfra2wdCk8/3yAixMRCVP+DvUM6/Dvrffe8wZ/ZWWAihIRCXNRO86/L5KT4Y9/NLsKEZHI\nERVX/t9+C8OGeW8ODx0aoMJERMKYrvyB/v3hxhvhT38yuxIRkcgQFeEPYLOp60dEpLeiJvyTk+GS\nN0eKiEg3wnqcf1+kpMAjj8CWLV2v89hjcNddoatJRCRcRcUNX/BOAb1nD3S1q82b4dgx+PWv/T6U\niIjp/M3OqLnyHzgQbr656+XnzsEvfhG6ekREwlnU9Pn3RPcERES+FzPhP3o0XLigieJERCCGwt9i\n0XBQEZGLYib8QdNAiIhcFDU3fHvDZoNXXoFPPw38vm+4wTuUVEQkEkTNUM/eOHYM3nor8Ps1DFi5\nEs6ehSFDAr9/EZHLaUrnMGG3w+9/D1Onml2JiMQCTewWJmw2DSUVkcih8A8Qhb+IRJIew7+qqgq7\n3U5SUhKlpaVXLG9ubqa4uJj09HSysrI4cOCAb9nEiROZNm0aDoeDjIyMwFYeZhT+IhJJegz/5cuX\nU1ZWxtatW1m3bh2nLntKatOmTbS1tbFr1y6ef/55fv7zn/uWWSwW3G43dXV17Ny5M/DVhxGFv4hE\nkm6Heno8HgBycnIAyMvLo6amhtmzZ/vW+eCDD1iyZAkAWVlZ7N+/v8M+ouFmbm8kJ0NDA5SUfP/d\nT38K48aZV5OISFe6vfKvra3FZrP5PqekpFBdXd1hnfz8fDZt2kRraytbtmxhz549HDp0CPBe+c+a\nNYv58+ezpbu5lqPAiBHeieMaG70/Gzd6f0REwpHfD3ktWrSIo0ePkpubS3JyMklJSQwePBiAHTt2\nMHbsWBoaGpgzZw4ZGRmMGTPmin2sXr3a92+n04nT6fS3LFM8/vj3/371Vbjs96SIyFVzu9243e6A\n7a/bcf4ejwen00ldXR0Ay5Yto6CgoEO3z6XOnTtHdnY2u3btumLZihUrsNvtPPDAAx0LiJJx/pfb\nvh3+4R/go4/MrkREolFQx/nHx8cD3hE/jY2NVFZWkpmZ2WEdj8fDN998Q0tLC2vWrOH2228HoKWl\nhebmZgCampqoqKigoKDgqguNNBdvAEfh7zURiQI9dvuUlJTgcrloa2ujqKgIq9VKWVkZAC6Xi/r6\neu677z7a29vJysrilVdeAeDkyZMsWLAAgISEBIqLixk/fnwQmxJeRo2Cfv2gqQmuu87sakREOtL0\nDkGUnQ1PPQW5uWZXIiLRRq9xDGM2G7zxBvz5z8E7xrBhMH9+8PYvItFJV/5B9L//C6+9FtxjvPkm\nHDni7WYSkdihWT1jnM3mnababje7EhEJJc3qGeOsVu9NZRGRvlD4R7hRo/RSehHpO4V/hNOVv4hc\nDYV/hNOVv4hcDYV/hNOVv4hcDYV/hNOVv4hcDYV/hNOVv4hcDYV/hNOVv4hcDYV/hNOVv4hcDYV/\nhNOVv4hcDYV/hIuL874z4Ouvza5ERCKJZvWMcBaL9+r/hRe87xHuq+nTYebMwNclIuFNE7tFgZdf\nhvr6vm934oS3yyiArwUVkRDRrJ5y1Y4dA4cDTp40uxIR6SvN6ilXbexYOH9eN4xFYpHCP4ZZLJCS\nAg0NZlciIqGm8I9xdvvV3S8Qkcim8I9xuvIXiU0a6hnjUlLgV7/ydgF1JysL7r47NDWJSPBptE+M\n+/prWL8eLlzoep2TJ+G992D37tDVJSLd01BPCbqWFu8cQl99BQP0t6JIWNBQTwm6uDgYMwYOHTK7\nEhEJFIW/9EpKikYFiUQThb/0isJfJLoo/KVXFP4i0aXH8K+qqsJut5OUlERpaekVy5ubmykuLiY9\nPZ2srCwOHDjQ620lckyZAlu2wK23fv+TkwN79phdmYhcjR5H+zgcDtauXUtiYiL5+fl8+OGHWK1W\n3/JXX32VL774ghdffJGPP/6Y5557jjfffLNX24JG+0QKw4CaGmhr+/67F1+E7GxYvty8ukRilb/Z\n2e3APY/HA0BOTg4AeXl51NTUMHv2bN86H3zwAUuWLAEgKyuL/fv393pbiRwWy5Xz/n/+ufdHRCJP\nt90+tbW12Gw23+eUlBSqq6s7rJOfn8+mTZtobW1ly5Yt7Nmzh0OHDvVqW4lsug8gErn8fmRn0aJF\nHD16lNzcXJKTk0lKSmLw4MF92sfq1at9/3Y6nTidTn/LkhCYMgX27vV2CfU0PYSI+MftduMO4JuX\nuu3z93g8OJ1O6urqAFi2bBkFBQVddt2cO3eO7Oxsdu3axdmzZ/nRj37U47bq849chuF9heQXX3gf\nAhOR0AnqE77x8fGAd9ROY2MjlZWVZGZmdljH4/HwzTff0NLSwpo1a7j99tsBGPHdC2W721Yim8Xi\nvfrftcv7i0BEIkeP3T4lJSW4XC7a2tooKirCarVSVlYGgMvlor6+nvvuu4/29naysrJ45ZVXut1W\nosvf/i3ceSc8/jj88z+bXY2I9JYmdhO/bdkCr7wC775rdiUisUMTu4npLt74FZHIoSt/8Vt7Owwf\nDidOeP9XRIJPV/5iun79wGbTmH+RSKLwl4CYMkXhLxJJ9F4mCYgpU+C//9v7V0BXfvxjGDcudDWJ\nSNcU/hIQc+ZAQwN88EHnyxsavD9PPx3aukSkc7rhKyHx1lveF8WXl5tdiUh00A1fiQgaDioSXnTl\nLyHx7bfeYaD/938wbJjZ1YhEPl35S0To3x9++ENvv7+ImE/hLyEzdaq6fkTChcJfQiY1Ff7xH+Gp\np8yuRETU5y8h8/XXsHkz/Nu/ed8BICJXz9/sVPhLSJ0/D9deCx4PDBpkdjUikUs3fCWiDBkCiYnw\n5ZdmVyIS2xT+EnJTp6rbR8RsCn8JOYW/iPk0t4+E3NSp8NJLsG2b2ZV43XKLHjyT2KMbvhJyx47B\nPffAhQtmVwKNjVBUBMXFZlci0jf+Zqeu/CXkrr8e/ud/zK7C69e/hg8/NLsKkdBTn7/EtNRU2LPH\n7CpEQk/dPhLTmpth9Gjv//bvb3Y1Ir2ncf4ifhg+3Bv+Bw+aXYlIaCn8JeZp6KnEInX7SMz7xS/g\n9ddhzJjvv0tIgHfe6f6dxCJm0tw+In5qbob6+o7fLVgAO3bApEnm1CTSEw31FPHT8OGQmdnxu7Q0\nb1eQwl+ilf6oFenE1KkaAirRrcfwr6qqwm63k5SURGlp6RXLW1tbuffee3E4HOTm5rJ582bfsokT\nJzJt2jQcDgcZGRmBrVwkiFJTdRNYoluP3T7Lly+nrKyMxMRE8vPzKSwsxGq1+pZv2LCBoUOHUldX\nx+HDh5k1axZz587FYrFgsVhwu92MHDkyqI0QCbTUVHj2WbOrEAmebsPf4/EAkJOTA0BeXh41NTXM\nnj3bt058fDzNzc20tbVx5swZ4uLisFgsvuW6mSuRyG6H/fvhtdfgkv+cr/CDH8Bdd4WuLpFA6Tb8\na2trsdlsvs8pKSlUV1d3CP/CwkLKy8uxWq1cuHCBjz76yLfMYrEwa9YsJk2axP3338/cuXOD0ASR\nwBsyBB57zDvipzu//z1kZcG4caGpSyRQ/B7t89JLLzFgwACOHz/Onj17uPPOOzly5AgWi4UdO3Yw\nduxYGhoamDNnDhkZGYy5dDD1d1avXu37t9PpxOl0+luWiN/+6Z96XufwYe+NYYW/BJvb7cbtdgds\nf92O8/d4PDidTurq6gBYtmwZBQUFHa787777bpYuXUp+fj4AmZmZbNiwocNfDAArVqzAbrfzwAMP\ndCxA4/wlgj36qHeW0lWrzK5EYk1Q5/aJj48HvCN+GhsbqaysJPOyAdG33XYb5eXltLe3c/DgQc6c\nOYPNZqOlpYXm5mYAmpqaqKiooKCg4KoLFQlHmhVUIlWP3T4lJSW4XC7a2tooKirCarVSVlYGgMvl\nYvHixdTX1zN9+nRGjRrF2rVrAThx4gQ/+clPAEhISKC4uJjx48cHsSkioZeaCi++aHYVIn2n6R1E\n/NDSAlYreDwwcKDZ1Ugs0ZTOIiaKi4MbboA//cnsSkT6RuEv4qfUVPj8c7OrEOkbhb+In6ZN001f\niTwKfxE/acSPRCKFv4if1O0jkUijfUT89O23EB8Px4555/oRCQWN9hExWf/+3vn/b73VO8/Pf/2X\n2RWJ9ExX/iIB8Oc/w9Gj8P77cPAg/Pa3Zlck0U6vcRQJA+PHe38GDYIlS8yuRqRnuvIXCaDWVhg5\n0vvE76BBZlcj0Ux9/iJh5JprYOJE2LfP7EpEuqfwFwmwadM09FPCn/r8RQIsLQ1eegl27rxymcUC\nxcUwYULo6xK5lMJfJMDuvReGDet82ZtvwjvvwN//fWhrErmcwl8kwMaNg6Kizpf16we7d4e2HpHO\nqM9fJITS0nQ/QMKDhnqKhNDZs97nATwe718BIldLQz1FIsiIEZCQ4H0KWMRMuvIXCbG5c71v/0pJ\n+f678eNh3jzzapLI4292KvxFQqyyEjZv/v6zYcCGDd6uoP79zatLIovCXyQKTJ4M770HyclmVyKR\nQn3+IlEgLU1DQCW0FP4iYUDhL6Gm8BcJAwp/CTU94SsSBtLS4JNP4D//s+t1Jk6EjIyQlSRRTjd8\nRcJAezs8+CD85S+dL//rX+Gzz7xvDBMBjfYRiQnt7d6XxB8+7H1ZjIhG+4jEgH79dF9AAqvH8K+q\nqsJut5OUlERpaekVy1tbW7n33ntxOBzk5uay+ZKnV3raVkR6T+EvgdRj+C9fvpyysjK2bt3KunXr\nOHXqVIflGzZsYOjQodTV1fGb3/yGFStW+P4U6WlbEem99HTYtcvsKiRadDvax+PxAJCTkwNAXl4e\nNTU1zJ4927dOfHw8zc3NtLW1cebMGeLi4rBYLL3aVkR6Ly0Nnn0W3n47tMe1WiE7O7THlODrNvxr\na2ux2Wy+zykpKVRXV3cI8MLCQsrLy7FarVy4cIGPP/6419uKSO9NmwbTp8N//Edoj1tR4R2FNGRI\naI8rweX3OP+XXnqJAQMGcPz4cfbs2cPs2bM5fPhwn/axevVq37+dTidOp9PfskSizpAhsHFj6I87\nbRrU18PNN4f+2PI9t9uN2+0O2P66Df8ZM2awatUq3+e9e/dSUFDQYZ2qqiqWLl1KXFwcmZmZXH/9\n9Xz55Ze92vaiS8NfRMLLxXsNCn9zXX5h/OSTT/q1v25v+MbHxwPegG9sbKSyspLMzMwO69x2222U\nl5fT3t7OwYMHOXPmDDabrVfbikj4043m6NRjt09JSQkul4u2tjaKioqwWq2UlZUB4HK5WLx4MfX1\n9UyfPp1Ro0axdu3abrcVkciSnh76m8wSfHrCV0S6dfo0TJoE313zhYTVCrffHrrjRSJN7yAiQVdc\nDMeOhe54b78NTU0wbFjojhlpFP4iEnWmT4fSUsjKMruS8KW5fUQk6jgcUFdndhXRTeEvImFHI4yC\nT+EvImFHV/7Bpz5/EQk7zc0wejT867+G/tj9+8PPfgZDh4b+2H3hb3bqNY4iEnaGD4d/+Rc4ejT0\nx37nHZg8GebMCf2xQ0nhLyJhqbjYnOMOGuTtcor28Fefv4jIJWLlfoPCX0TkEgp/EZEYdNNN3ikt\nzpwxu5LgUviLiFyiXz/vW9N+9jP49FOzqwkehb+IyGVeeAEGD4bf/tbsSoJH4S8icpkZM7xX/p99\nZnYlwaOHvEREOnHmjHcq67/8xdsVFG40sZuISBCMHAkjRsDBg2ZXEhwKfxGRLtx8c/R2/ajbR0Sk\nC089BevWwXXXhe6YTzwBCxb0vJ5e5iIiEiStrfDHP4bueJs3Q2Mj/Pu/97yuJnYTEQmSa67xvlsg\nVC5cgKVLQ3MsXfmLiISJ8+e9N5rPnIEhQ7pfV6N9RESixJAh8MMfwp49wT+Wwl9EJIzcfDNs3w5N\nTd57DsGi8BcRCSM//jGsWQPJyfA3fxO846jPX0QkDH3zjfchs1OnIC7uyuXq8xcRiUKDBkFKCuze\nHZz9K/xFRMJUMJ8wVviLiISpW24J3jsF1OcvIhKmamth0SJYtcr7uV8/KCyEH/wgBH3+VVVV2O12\nkpKSKC0tvWL5c889h8PhwOFwkJqayoABAzh79iwAEydOZNq0aTgcDjIyMq66SBGRWJSWBvPnw+ef\ne39KSmDLlsDsu8crf4fDwdq1a0lMTCQ/P58PP/wQq9Xa6brvvPMOJSUlbN26FYBJkybx6aefMnLk\nyK4L0JW/iEivPPMMnDjhfdNYUK/8PR4PADk5OSQmJpKXl0dNTU2X62/cuJHCwsIO3ynYRUQCI5D3\nALoN/9raWmw2m+9zSkoK1dXVna7b0tJCRUUFCxcu9H1nsViYNWsW8+fPZ0ug/lYREYlRN98Mu3ZB\ne7v/+wrYrJ7l5eVkZ2czYsQI33c7duxg7NixNDQ0MGfOHDIyMhgzZswV265evdr3b6fTidPpDFRZ\nIiJRwe1243a76d8fior831+3ff4ejwen00ldXR0Ay5Yto6CggNmzZ1+x7oIFC1i0aBGLFy/udF8r\nVqzAbrfzwAMPdCxAff4iIr320596bwL/3d8F+WUuF2/4TpgwgYKCgk5v+Ho8HiZPnszRo0e55ppr\nAG830Lfffsvw4cNpamrC6XTy/vvvM378+I4FKPxFRHrt4EHvtA8JCUF+mUtJSQkul4u2tjaKioqw\nWq2UlZUB4HK5AHj77bfJz8/3BT/AyZMnWfDdu8gSEhIoLi6+IvhFRKRvJk8OzH70kJeISATSxG4i\nItJnCn8RkRik8BcRiUEKfxGRGKTwFxGJQQp/EZEYpPAXEYlBCn8RkRik8BcRiUEKfxGRGKTwFxGJ\nQQp/EZEYpPAXEYlBCn8RkRik8BcRiUEKfxGRGKTwFxGJQQp/EZEYpPAXEYlBCn8RkRik8BcRiUEK\nfxGRGKTwFxGJQQp/EZEYpPAXEYlBCn8RkRik8BcRiUEKfxGRGNRj+FdVVWG320lKSqK0tPSK5c89\n9xwOhwOHw0FqaioDBgzg7Nmzvdo2FrjdbrNLCCq1L3JFc9sg+tvnrx7Df/ny5ZSVlbF161bWrVvH\nqVOnOixfuXIldXV11NXVsWbNGpxOJyNGjOjVtrEg2v8DVPsiVzS3DaK/ff7qNvw9Hg8AOTk5JCYm\nkpeXR01NTZfrb9y4kcLCwqvaVkREQqfb8K+trcVms/k+p6SkUF1d3em6LS0tVFRUsHDhwj5vKyIi\noTUgUDsqLy8nOzvb1+XTFxaLJVBlhKUnn3zS7BKCSu2LXNHcNoj+9vmj2/CfMWMGq1at8n3eu3cv\nBQUFna77u9/9ztfl05dtDcPoc9EiIuKfbrt94uPjAe+oncbGRiorK8nMzLxiPY/HQ1VVFfPmzevz\ntiIiEno9dvuUlJTgcrloa2ujqKgIq9VKWVkZAC6XC4C3336b/Px8rrnmmh63FRGRMGCYZNu2bYbN\nZjNuuukm48UXXzSrjIBKTEw0UlNTjfT0dGPGjBmGYRjGV199ZcydO9cYP368MW/ePKO5udnkKntv\nyZIlxnXXXWdMnTrV91137Vm7dq1x0003GXa73di+fbsZJfdaZ2174oknjHHjxhnp6elGenq68e67\n7/qWRVLbDMMwjhw5YjidTiMlJcXIzc013njjDcMwouf8ddW+aDmHra2tRkZGhpGWlmZkZmYazz//\nvGEYgT1/poV/enq6sW3bNqOxsdFITk42mpqazColYCZOnGicPn26w3fPPPOM8fDDDxvnz583Hnro\nIePZZ581qbq+q6qqMj777LMOAdlVe06ePGkkJycbhw8fNtxut+FwOMwqu1c6a9vq1auNX/7yl1es\nG2ltMwzDOH78uFFXV2cYhmE0NTUZkyZNMr766quoOX9dtS+azuHXX39tGIZhnD9/3pgyZYrx5Zdf\nBvT8mTK9QzQ/A2BcdgN7586dLF26lMGDB3P//fdHVDtvvfVWrr322g7fddWempoaCgoKmDBhArm5\nuRiGQXNzsxll90pnbYPOByBEWtsAxowZQ3p6OgBWq5UpU6ZQW1sbNeevq/ZB9JzDuLg4AM6dO8eF\nCxcYPHhwQM+fKeEfrc8AWCwWZs2axfz589myZQvQsa02m42dO3eaWaLfumpPTU0Ndrvdt15ycnJE\ntrW0tJSZM2fyzDPP+P7Ps3Pnzohu2/79+9m7dy8ZGRlRef4utu/igJJoOYft7e2kpaUxevRoHn74\nYSZMmBDQ86eJ3QJox44d7N69mzVr1rBixQpOnDgRdUNZ+9KeSHt+48EHH+TQoUNUVFRw4MAB38CG\nztocKW1rbm5m0aJFvPDCCwwbNizqzt+l7Rs6dGhUncN+/fqxe/du9u/fz8svv0xdXV1Az58p4T9j\nxgz27dvn+7x3715mzpxpRikBNXbsWADsdjtz586lvLycGTNm0NDQAEBDQwMzZswws0S/ddWezMxM\n6uvrfevt27cv4tp63XXXYbFYiI+P56GHHuKtt94CIrdtbW1tLFy4kHvuucc3DDuazl9n7Yu2cwgw\nceJE7rjjDmpqagJ6/kwJ/2h8BqClpcX3J2ZTUxMVFRUUFBSQmZnJ+vXraW1tZf369RH/S66r9mRk\nZFBRUcGRI0dwu93069eP4cOHm1xt3xw/fhyACxcusHHjRu644w4gMttmGAZLly5l6tSpPPLII77v\no+X8ddW+aDmHp06d8s2OfPr0af7whz8wb968wJ6/gN6e7gO3223YbDbjxhtvNNauXWtWGQFz8OBB\nIy0tzUhLSzNmzZplvPbaa4ZhRPZQz8WLFxtjx441Bg0aZNxwww3G+vXru21PSUmJceONNxp2u92o\nqqoysfKeXWzbwIEDjRtuuMF47bXXjHvuucdITU01brnlFuPRRx/tMHIrktpmGIaxfft2w2KxGGlp\nab5hj++9917UnL/O2vfuu+9GzTn8/PPPDYfDYUybNs3Iy8szNmzYYBhG93nS1/ZZDCPKOqVFRKRH\nuuErIhKDFP4iIjFI4S8iEoMU/iIiMUjhLyISgxT+IiIx6P8B2Noc8gqcm7UAAAAASUVORK5CYII=\n"
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convert a single record to a data frame\n",
"First we convert a *dict* with a *list* value in key *w* to a *list* of *dict*s, each with a single value for key *w*."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"time_series = [None]*len(w) \n",
"for t in range(len(w)):\n",
" time_point = {'w':w[t]}\n",
" time_point.update(params)\n",
" time_series[t] = time_point"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now to convert this to a data frame using the [pandas](http://pandas.pydata.org/) package (current version 0.9.0). The relevant doc for this usage is [here](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#from-a-list-of-dicts)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"print \"pandas version\", pd.__version__"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"pandas version 0.9.0\n"
]
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df = pd.DataFrame(time_series)\n",
"df.to_csv(\"test.csv\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convert the entire record set to a data frame\n",
"Start by writing a function that does what we did before - convert a record from MongoDB to a *list* of *dict*s."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def record_to_data_list_of_dicts(record):\n",
" params = [(k,v) for k,v in record.items() if (k!='_id' and k!='w')]\n",
" time_series = [None]*len(w) \n",
" for t in range(len(w)):\n",
" time_point = {'w':w[t],'t':t+1}\n",
" time_point.update(params)\n",
" time_series[t] = time_point\n",
" return time_series"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, here is a function that takes a cursor and converts all the records to one big *list* of *dict*. Use a *limit* of 10 when testing."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def cursor_to_data_frame(cur):\n",
" time_series = []\n",
" for record in cur:\n",
" time_series.extend( record_to_data_list_of_dicts(record) )\n",
" return time_series"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This function creates a cursor of just 10 records, for testing, and converts it to a *DataFrame* and saves it to *csv*. \n",
"\n",
"The *csv* file can be opened in *R* with the following:\n",
"<pre class=\"prettyprint lang-r\">\n",
"data<-read.csv('test.csv')\n",
"dim(data) # should be 3000 12\n",
"</pre>"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def mongo_to_csv(csv_fname, limit):\n",
" col = ma_collection()\n",
" cur = col.find({},['w','tau','B','genes','epistasis','s','b','U','pop','pi'], limit=limit)\n",
" time_series = cursor_to_data_frame(cur) \n",
" cur.close()\n",
" df = pd.DataFrame(time_series)\n",
" df.to_csv(csv_fname)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 24
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"mongo_to_csv(\"test.csv\", 10)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 25
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's it, all the functions are in place, the above is a test case. To run on all the data just use:\n",
"\n",
"<pre class=\"prettyprint\">\n",
"mongo_to_csv(\"ma_output.csv\", 0)\n",
"</pre>\n",
"\n",
"Note: *limit=0* gets all the records."
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment