olgabot/data_cleaning.ipynb

## data_cleaning.ipynb
{
 "metadata": {
  "name": "",
  "signature": "sha256:cee0de1ec12d856434dfc5e5496bd627b1714cee7e3862b892477cb08786855f"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "BIOM262 January 29th, 2015: Cleaning data"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* Olga Botvinnik\n",
      "* 3rd year Bioinformatics PhD Student\n",
      "* Email: obotvinn@ucsd.edu\n",
      "* Twitter: @olgabot\n",
      "* Website: http://www.olgabotvinnik.com\n",
      "\n",
      "Instructions:\n",
      "1. Download this IPython notebook from NBViewer (upper right corner)\n",
      "2. Download the `biom262_2015_01_29_cleaning_data.zip` file emailed out\n",
      "3. Unzip the `zip` file (this will create a folder)\n",
      "4. Open up the terminal and navigate to the folder `biom262_2015_01_29_cleaning_data`\n",
      "5. Start an IPython notebook server by typing `ipython notebook` into the terminal\n",
      "\n",
      "For this activity, work in pairs. I will give you a blue and a pink sticky note. Put the pink one on one of your laptops to show that you're stuck or have a question. Put the blue one on your laptop if you and your partner are cruisin' through the exercises and don't want to be bothered."
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Background"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Whenever you work with any kind of data, such as your own Excel spreadsheets or data downloaded from a paper that you'd like to analyze, 99.9999999999% of the time it's not formatted well, and you have to do a bunch of manual cleaning. This is a real-world example of four different metadata files describing post-mortem RNA-sequencing data from 212 post-mortem subjects and 2 cell lines, from 32 different tissues, for a total of 4500 samples. In other words, this is **not** a dataset you would enjoy cleaning up in Excel! You can read more about this project at the [GTEx portal](http://www.gtexportal.org/home/documentationPage).\n",
      "\n",
      "The goal of this exercise is to create a single, easily human-readable table from two `.txt` files, and two Excel files. Because this is such a big project, they had to standardize everything and their tables are at least consistently formatted. But there's a lot of jargon in the data that's specific to this session, which we will replace with human-readable terms. And in 2 main tables, `GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt` and `GTEx_Data_2014-01-17_Annotations_SampleAttributesDS.txt`, the sample identifiers aren't exactly the same, so we'll have to do extra work to merge them.\n",
      "\n",
      "One package that doesn't come with the Anaconda Python distribution is `seaborn`, so to install it, run this command.\n",
      " \n",
      "### IPython Tips:\n",
      "* When it says \"run this command,\" it means to press \"Shift\" and \"Enter\" together. If there's a cell with some code in it, that implies to run the cell and look at the output.\n",
      "* If you want help on any Python function or object, you can type \"list??\" and it will pop up help at the bottom of the screen.\n",
      "* IF you want help on any Python function, like `pd.read_table()`, i.e. the ones that have parentheses, you can get it by moving your cursor in between the parentheses, and pressing \"Shift\" and \"Tab\". This will pop up a help window next to the parentheses. Press \"Tab\" once more, and the window will get bigger. Press \"Tab\" a third time, and it will pop up a big help screen at the bottom."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "! pip install seaborn"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's import everything we need for this session."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Pandas or \"Panel Data Analysis\" toolkit for Data Frames/Data Tables\n",
      "import pandas as pd\n",
      "\n",
      "# Numpy or \"Numerical Python\"\n",
      "import numpy as np\n",
      "\n",
      "# Powerful R-style/statistical plotting\n",
      "import seaborn as sns\n",
      "\n",
      "# These styles are my personal preferences\n",
      "# For more options, see this page: http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/aesthetics.html\n",
      "sns.set(style='whitegrid', context='notebook')\n",
      "\n",
      "# Show the figures directly in the IPython notebok\n",
      "%matplotlib inline"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Initial inspection of files with Unix commands"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**Note: all these commands are meant to be run ***within*** the notebook, not on the terminal**\n",
      "\n",
      "The unix command `cd` moves where the notebook looks for data, and `ls` will also list the files. These are some key Unix commands that can be used without the exclamation point \"`!`\", as we will use later.\n",
      "\n",
      "Change directories to where you downloaded the example data. For me, that's the directory \"`~/Downloads/biom262_2015_01_29_cleaning_data`\". Remember that the character \"`~`\" (pronounced \"tilde\") indicates your home directory, which for me is `/Users/olga`, but I didn't feel like typing that out, so I used the tilde instead."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "cd ~/Downloads/biom262_2015_01_29_cleaning_data"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "And make sure you have all the right files around"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ls"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's start with the `SubjectPhenotypes` file, `GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt`. The first step is to look at the first 10 lines of the file with the Unix command `head`. In the IPython notebook, you can call unix/bash commands by starting the line with an exclamation point, `!`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "! head GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "So there's some generic sample, a gender of `1` or `2`, a range of years, and some kind of `DTTHDY` thing. I don't know what that means, but we'll  deal with that once we open the file in `pandas`. It looks like it ranges from `0` to `4`, but I don't see any entries of `1`. By default, `head` outputs the first 10 lines. We can modify the number of lines with the flag `-n` and then provide a number. For example, let's look at the first 23 lines of the file instead."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "! head -n 23 GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Still no entries with a `DTHHRDY` of 1. Maybe we need to look at the end of the file. We can do that with the command `tail`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "! tail GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Ooh, our first `1` in `DTHHRDY`! Hmm, some of the rows don't have a value for the `DTHHRDY` column! This will come into play in the future."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Look at the last 17 lines of the file."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's stop for a brief exercise. How would you look at the last 17 lines of the file?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Exercise: modify the `tail` command below to look at the last 17 lines of the file.\n",
      "\n",
      "! tail GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Another question we may have about this file is how many lines are in it. We can do this with the Unix command `wc`, or \"word, line, character and byte counter.\" Specifically `wc -l` will count the number of lines in the file."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "! wc -l GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: How do you count the number of columns in a file?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Exercise: do a web search for \"unix count number of columns\" and \n",
      "# check the command on the file, GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt\n",
      "\n",
      "! # Column-counting code goes here"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Reading tabular data with the `pandas` library in Python"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We will make heavy use of the `pandas` library, which is a godsend to Pythonista Data Scientists. It makes working with weirdly formatted data much easier, as you will soon see."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "subject_phenotypes = pd.read_table('GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This has created a `pandas` `DataFrame` variable called `subject_phenotypes`. Python's standards are to name variables `lowercase_with_underscores`, so we'll stick to that :)\n",
      "\n",
      "Let's look at the top of `subject_phenotypes`, again with a command called `head`, but we call it a little differently now that we're in Python and not Unix (notice no \"`!`\" at the beginnings of the lines anymore)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "subject_phenotypes.head()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The `pandas` head shows the first 5 rows, instead the first 10 rows like Unix `head`."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can also access individual columns in two ways. The first way is by using square brackets around the string of the column name, like this for `SUBJID`:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Run this cell. Try entering 'subjid' and 'subject id' as well.\n",
      "subject_phenotypes['SUBJID']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If a column name consists of only letters, numbers, and underscores, and starts with a letter, you can also access it with the column name, no quotes, after the dataframe name and a dot."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "subject_phenotypes.SUBJID"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You may hear the word \"series\" get tossed around. A \"series\" is the `pandas`-specific technical name for a column of a dataframe."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: How would you look at the last 8 rows of the column `AGE`?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Hint: `head` and `tail` work for series as well as whole dataframes."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Exercise: Show the last 8 rows.\n",
      "\n",
      "# Code for looking at the last 8 rows goes here."
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Converting data types from one to another"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's get to these `GENDER` and `DTHHRDY` columns. Open up the file, `GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DD.xlsx` in Excel, and see what a gender of `1` and `2` means."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# A python \"dictionary\", or mapping from one thing to the next.\n",
      "# In this case, we're mapping the numbers 1 and 2 to strings\n",
      "# indicating the gender.\n",
      "gender = {1: 'fillmein',\n",
      "          2: 'fillmein'}"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You can access items of a dictionary with square brackets, much like the columns of a dataframe."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Run this cell. How do you access the other gender?\n",
      "gender[2]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now, we can create a new column called `'gender'` (all lowercase, because I feel like ALL CAPS COLUMNS ARE YELLING AT ME), using the `gender` dictionary.\n",
      "\n",
      "In `pandas`, you can create a new column by pretending to access an existing column in the dataframe, and assigning it to some value. Here's an example of creating a new column called `\"don't worry\"` with the value `\"be yonce\"` in every cell."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "subject_phenotypes[\"don't worry\"] = \"be yonce\""
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Look at the top of the dataframe to see what that did."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to look at the top of the dataframe goes here \n",
      "# hint: remember the command \"head\"? How did we use it to look at the dataframe when we first loaded it?\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 199
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Add a column with a name and value of your choice"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to add a column with programmer's choice of name and value goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Another convenient operation is `map` on a series, which performs the operation specified on every element of a column. It's as if you wrote a `for`-loop to access every item of the `GENDER` column, and use that item to access the `gender` dictionary, and replace the value."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "subject_phenotypes.GENDER.map(gender)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Using `map`, it's as if we wrote the following `for`-loop, but `map` is less code, and more concise."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for g in subject_phenotypes.GENDER:\n",
      "    print gender[g]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Has this changed the dataframe `subject_phenotypes`?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to check if the dataframe subject_phenotypes has changed goes here\n",
      "# You can check if it has changed by looking at it using your favorite body endpoint\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The dataframe shouldn't have changed."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: What happens when you `map` the dictionary `gender` onto the column `DTHHRDY`?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to `map` `gender` onto DTHHRDY goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 197
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Combine bracket-based column creation and `map` on a series"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Your next exercise is to combine these previous two concepts of creating a column and using `map`, to create a column called `\"gender\"` (all lowercase) which is the result of using `map` with the dictionary `gender` on the `GENDER` column of `subject_phenotypes`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to create a new column called \"gender\" in `subject_phenotypes` that is the result of using `map` \n",
      "# with the `gender` dictionary on the \"GENDER\" column of `subject_phenotypes`.\n",
      "\n",
      "\n",
      "# Code to check if the dataframe has changed goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Great! Now we have a column called `gender`, that makes sense to a human without having to look something up in some other table."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Convert the `DTHHRDY` column values into human-readable values"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Using the `GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DD.xlsx` spreadsheet again, find out what the different numbers in `DTHHRDY` mean, make a dictionary mapping numbers to words like we did with `gender`, and create a new column with a human-understandable name. \n",
      "\n",
      "* Test for human-understandable: you could show the column name to someone who doesn't know the data, and they understand it without you doing any extra explaining"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to create a new column in `subject_phenotypes` that is a human-readable version of the column `DTHHRDY` goes here\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now that we have our cleaned-up dataframe, let's do some plotting!\n",
      "\n",
      "Let's use `seaborn` (which we imported as the variable `sns` for brevity) to plot this. We will use the function [`factorplot`](http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.factorplot.html) which has a bunch of options, but for now we'll just focus on two. \n",
      "\n",
      "The first argument is the name of the column you want to plot, and then we provide the keyword argument `data=subject_phenotypes`, to specify the dataframe we want to get this column from."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sns.factorplot('gender', data=subject_phenotypes)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Nice, so we can now see the distribution of the number of subjects of these two genders.\n",
      "\n",
      "What if we also want to see how many people of the two genders, have different `DTHHRDY` categorizations?"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Plot the distribution of both gender and `DTHHRDY`"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Change `human_variable_DTHHRDY` to the new column you created."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Edit the argument `hue=...`\n",
      "sns.factorplot('gender', data=subject_phenotypes, hue=human_readable_DTHHRDY)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You can break this down even more by also plotting the age of the subjects, and showing a separate plot, as below. The argument `col='AGE'` means to plot each age group onto a separate column of plots."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sns.factorplot('gender', data=subject_phenotypes, hue='DTHHRDY', col='AGE')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Plot Age as the main `x` variable, and `'gender'` as each column"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code goes below\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Combining information across dataframes"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "So far, we've been working with one `pandas` dataframe, and an external `*.xlsx` file. Now we're going to work on combining `sample_phenotypes` with a new dataframe, which we will call `sample_attributes`."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Inspect the new data table with Unix, and read it in using Python"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to look at the top of GTEx_Data_2014-01-17_Annotations_SampleAttributesDS.txt goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to count the number of lines in GTEx_Data_2014-01-17_Annotations_SampleAttributesDS.txt goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to count the number of columns in GTEx_Data_2014-01-17_Annotations_SampleAttributesDS.txt goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to read the table GTEx_Data_2014-01-17_Annotations_SampleAttributesDS.txt goes here\n",
      "pd.read_table('GTEx_Data_2014-01-17_Annotations_SampleAttributesDS.txt')\n",
      "\n",
      "# Code to look at the top of the `DataFrame` you just created\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>SAMPID</th>\n",
        "      <th>SMATSSCR</th>\n",
        "      <th>SMCENTER</th>\n",
        "      <th>SMPTHNTS</th>\n",
        "      <th>SMRIN</th>\n",
        "      <th>SMTS</th>\n",
        "      <th>SMTSD</th>\n",
        "      <th>SMTSISCH</th>\n",
        "      <th>SMNABTCH</th>\n",
        "      <th>SMNABTCHT</th>\n",
        "      <th>...</th>\n",
        "      <th>SME1ANTI</th>\n",
        "      <th>SMSPLTRD</th>\n",
        "      <th>SMBSMMRT</th>\n",
        "      <th>SME1SNSE</th>\n",
        "      <th>SME1PCTS</th>\n",
        "      <th>SMRRNART</th>\n",
        "      <th>SME1MPRT</th>\n",
        "      <th>SMNUM5CD</th>\n",
        "      <th>SMDPMPRT</th>\n",
        "      <th>SME2PCTS</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0   </th>\n",
        "      <td>      GTEX-N7MS-0007-SM-26GME</td>\n",
        "      <td>NaN</td>\n",
        "      <td>     C1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  8.2</td>\n",
        "      <td>       Blood</td>\n",
        "      <td>                               Whole Blood</td>\n",
        "      <td> 16-19 hours</td>\n",
        "      <td> BP-16653</td>\n",
        "      <td>          RNA isolation_PAXgene Blood RNA (Manual)</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1   </th>\n",
        "      <td>      GTEX-N7MS-0007-SM-26GMV</td>\n",
        "      <td>NaN</td>\n",
        "      <td>     C1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  8.2</td>\n",
        "      <td>       Blood</td>\n",
        "      <td>                               Whole Blood</td>\n",
        "      <td> 16-19 hours</td>\n",
        "      <td> BP-16653</td>\n",
        "      <td>          RNA isolation_PAXgene Blood RNA (Manual)</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2   </th>\n",
        "      <td>      GTEX-N7MS-0007-SM-2D43E</td>\n",
        "      <td>NaN</td>\n",
        "      <td>     C1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  8.2</td>\n",
        "      <td>       Blood</td>\n",
        "      <td>                               Whole Blood</td>\n",
        "      <td> 16-19 hours</td>\n",
        "      <td> BP-16653</td>\n",
        "      <td>          RNA isolation_PAXgene Blood RNA (Manual)</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3   </th>\n",
        "      <td>      GTEX-N7MS-0007-SM-2D7W1</td>\n",
        "      <td>NaN</td>\n",
        "      <td>     C1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  8.2</td>\n",
        "      <td>       Blood</td>\n",
        "      <td>                               Whole Blood</td>\n",
        "      <td> 16-19 hours</td>\n",
        "      <td> BP-16653</td>\n",
        "      <td>          RNA isolation_PAXgene Blood RNA (Manual)</td>\n",
        "      <td>...</td>\n",
        "      <td> 13705136</td>\n",
        "      <td> 18432744</td>\n",
        "      <td> 0.002456</td>\n",
        "      <td> 13447728</td>\n",
        "      <td> 49.526005</td>\n",
        "      <td> 0.041526</td>\n",
        "      <td> 0.835199</td>\n",
        "      <td> 840</td>\n",
        "      <td> 0.563503</td>\n",
        "      <td> 51.361324</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4   </th>\n",
        "      <td>      GTEX-N7MS-0008-SM-4E3JI</td>\n",
        "      <td>NaN</td>\n",
        "      <td>     C1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td> 10.0</td>\n",
        "      <td>        Skin</td>\n",
        "      <td>           Cells - Transformed fibroblasts</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-37581</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 17962165</td>\n",
        "      <td> 20910366</td>\n",
        "      <td> 0.004087</td>\n",
        "      <td> 18012435</td>\n",
        "      <td> 50.069874</td>\n",
        "      <td> 0.028395</td>\n",
        "      <td> 0.948329</td>\n",
        "      <td> 879</td>\n",
        "      <td> 0.226835</td>\n",
        "      <td> 50.270794</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>5   </th>\n",
        "      <td>      GTEX-N7MS-0009-SM-2BWY4</td>\n",
        "      <td>NaN</td>\n",
        "      <td>     C1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  NaN</td>\n",
        "      <td>       Blood</td>\n",
        "      <td>                               Whole Blood</td>\n",
        "      <td> 16-19 hours</td>\n",
        "      <td> BP-16657</td>\n",
        "      <td> DNA isolation_Whole Blood _QIAGEN Puregene (Ma...</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>6   </th>\n",
        "      <td>      GTEX-N7MS-0009-SM-2XK1D</td>\n",
        "      <td>NaN</td>\n",
        "      <td>     C1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  NaN</td>\n",
        "      <td>       Blood</td>\n",
        "      <td>                               Whole Blood</td>\n",
        "      <td> 16-19 hours</td>\n",
        "      <td> BP-16657</td>\n",
        "      <td> DNA isolation_Whole Blood _QIAGEN Puregene (Ma...</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>7   </th>\n",
        "      <td> GTEX-N7MS-0011-R10A-SM-2HMJK</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.1</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>              Brain - Frontal Cortex (BA9)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-19253</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td> 18948398</td>\n",
        "      <td> 12221905</td>\n",
        "      <td> 0.004294</td>\n",
        "      <td> 18747238</td>\n",
        "      <td> 49.733180</td>\n",
        "      <td> 0.051237</td>\n",
        "      <td> 0.875680</td>\n",
        "      <td> 859</td>\n",
        "      <td> 0.330709</td>\n",
        "      <td> 50.619534</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>8   </th>\n",
        "      <td> GTEX-N7MS-0011-R10A-SM-2IZJW</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.1</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>              Brain - Frontal Cortex (BA9)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-19253</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>9   </th>\n",
        "      <td> GTEX-N7MS-0011-R11A-SM-2HMJS</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  6.6</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>             Brain - Cerebellar Hemisphere</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-19253</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td> 19024292</td>\n",
        "      <td> 12200496</td>\n",
        "      <td> 0.003643</td>\n",
        "      <td> 18954711</td>\n",
        "      <td> 49.908398</td>\n",
        "      <td> 0.016711</td>\n",
        "      <td> 0.893391</td>\n",
        "      <td> 851</td>\n",
        "      <td> 0.193112</td>\n",
        "      <td> 50.387028</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>10  </th>\n",
        "      <td> GTEX-N7MS-0011-R11A-SM-2IZJZ</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  6.6</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>             Brain - Cerebellar Hemisphere</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-19253</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>11  </th>\n",
        "      <td>  GTEX-N7MS-0011-R1a-SM-2AXVJ</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.3</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>                       Brain - Hippocampus</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>12  </th>\n",
        "      <td>  GTEX-N7MS-0011-R1a-SM-2HMJG</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.3</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>                       Brain - Hippocampus</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td> 15226514</td>\n",
        "      <td>  8806379</td>\n",
        "      <td> 0.004332</td>\n",
        "      <td> 15122489</td>\n",
        "      <td> 49.828620</td>\n",
        "      <td> 0.041028</td>\n",
        "      <td> 0.790736</td>\n",
        "      <td> 835</td>\n",
        "      <td> 0.324148</td>\n",
        "      <td> 50.618090</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>13  </th>\n",
        "      <td>  GTEX-N7MS-0011-R2a-SM-2HML6</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.0</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>                  Brain - Substantia nigra</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td> 11678187</td>\n",
        "      <td>  8637924</td>\n",
        "      <td> 0.004073</td>\n",
        "      <td> 11575007</td>\n",
        "      <td> 49.778137</td>\n",
        "      <td> 0.028304</td>\n",
        "      <td> 0.628574</td>\n",
        "      <td> 837</td>\n",
        "      <td> 0.275110</td>\n",
        "      <td> 50.561234</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>14  </th>\n",
        "      <td>  GTEX-N7MS-0011-R2a-SM-2IZK7</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.0</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>                  Brain - Substantia nigra</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>15  </th>\n",
        "      <td>  GTEX-N7MS-0011-R3a-SM-2AXVU</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.6</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>  Brain - Anterior cingulate cortex (BA24)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>16  </th>\n",
        "      <td>  GTEX-N7MS-0011-R3a-SM-2HMKD</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.6</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>  Brain - Anterior cingulate cortex (BA24)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td>  8114816</td>\n",
        "      <td>  4205357</td>\n",
        "      <td> 0.004996</td>\n",
        "      <td>  8085575</td>\n",
        "      <td> 49.909750</td>\n",
        "      <td> 0.004037</td>\n",
        "      <td> 0.151460</td>\n",
        "      <td> 806</td>\n",
        "      <td> 0.364988</td>\n",
        "      <td> 50.379780</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>17  </th>\n",
        "      <td>  GTEX-N7MS-0011-R3a-SM-33HC6</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.6</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>  Brain - Anterior cingulate cortex (BA24)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td> 36663415</td>\n",
        "      <td> 22956649</td>\n",
        "      <td> 0.003256</td>\n",
        "      <td> 36619121</td>\n",
        "      <td> 49.969780</td>\n",
        "      <td> 0.033465</td>\n",
        "      <td> 0.916319</td>\n",
        "      <td> 875</td>\n",
        "      <td> 0.341985</td>\n",
        "      <td> 50.874683</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>18  </th>\n",
        "      <td>  GTEX-N7MS-0011-R4a-SM-2AXW2</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  6.2</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>                          Brain - Amygdala</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>19  </th>\n",
        "      <td>  GTEX-N7MS-0011-R4a-SM-2HMKW</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  6.2</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>                          Brain - Amygdala</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td> 11651563</td>\n",
        "      <td>  8137082</td>\n",
        "      <td> 0.004180</td>\n",
        "      <td> 11545498</td>\n",
        "      <td> 49.771380</td>\n",
        "      <td> 0.035598</td>\n",
        "      <td> 0.657749</td>\n",
        "      <td> 811</td>\n",
        "      <td> 0.334207</td>\n",
        "      <td> 50.741714</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>20  </th>\n",
        "      <td>  GTEX-N7MS-0011-R5a-SM-2AXW7</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.8</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>           Brain - Caudate (basal ganglia)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>21  </th>\n",
        "      <td>  GTEX-N7MS-0011-R5a-SM-2HMK8</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.8</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>           Brain - Caudate (basal ganglia)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td> 21641531</td>\n",
        "      <td> 13525503</td>\n",
        "      <td> 0.004141</td>\n",
        "      <td> 21325223</td>\n",
        "      <td> 49.631920</td>\n",
        "      <td> 0.050445</td>\n",
        "      <td> 0.889377</td>\n",
        "      <td> 891</td>\n",
        "      <td> 0.312833</td>\n",
        "      <td> 50.695858</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>22  </th>\n",
        "      <td>  GTEX-N7MS-0011-R6a-SM-2AXWD</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.6</td>\n",
        "      <td>       Brain</td>\n",
        "      <td> Brain - Nucleus accumbens (basal ganglia)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>23  </th>\n",
        "      <td>  GTEX-N7MS-0011-R6a-SM-2HMJ4</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  7.6</td>\n",
        "      <td>       Brain</td>\n",
        "      <td> Brain - Nucleus accumbens (basal ganglia)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td> 17668410</td>\n",
        "      <td> 12313288</td>\n",
        "      <td> 0.003942</td>\n",
        "      <td> 17447282</td>\n",
        "      <td> 49.685143</td>\n",
        "      <td> 0.040960</td>\n",
        "      <td> 0.883223</td>\n",
        "      <td> 854</td>\n",
        "      <td> 0.262580</td>\n",
        "      <td> 50.615242</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>24  </th>\n",
        "      <td>  GTEX-N7MS-0011-R7a-SM-2AXV5</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  6.4</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>           Brain - Putamen (basal ganglia)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>25  </th>\n",
        "      <td>  GTEX-N7MS-0011-R7a-SM-2HMKN</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  6.4</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>           Brain - Putamen (basal ganglia)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td> 10366169</td>\n",
        "      <td>  5352672</td>\n",
        "      <td> 0.004739</td>\n",
        "      <td>  9997472</td>\n",
        "      <td> 49.094720</td>\n",
        "      <td> 0.058973</td>\n",
        "      <td> 0.664810</td>\n",
        "      <td> 791</td>\n",
        "      <td> 0.400345</td>\n",
        "      <td> 51.510887</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>26  </th>\n",
        "      <td>  GTEX-N7MS-0011-R8a-SM-2AXVD</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  6.8</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>                      Brain - Hypothalamus</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>                     RNA isolation_QIAGEN miRNeasy</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>27  </th>\n",
        "      <td>  GTEX-N7MS-0011-R8a-SM-2YUMK</td>\n",
        "      <td>NaN</td>\n",
        "      <td> C1, A1</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  6.8</td>\n",
        "      <td>       Brain</td>\n",
        "      <td>                      Brain - Hypothalamus</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17395</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 12317489</td>\n",
        "      <td>  9750793</td>\n",
        "      <td> 0.002769</td>\n",
        "      <td> 12167817</td>\n",
        "      <td> 49.694363</td>\n",
        "      <td> 0.054535</td>\n",
        "      <td> 0.817677</td>\n",
        "      <td> 820</td>\n",
        "      <td> 0.366260</td>\n",
        "      <td> 50.655490</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>28  </th>\n",
        "      <td>      GTEX-N7MS-0126-SM-3TW8O</td>\n",
        "      <td>  3</td>\n",
        "      <td>     C1</td>\n",
        "      <td>              OK</td>\n",
        "      <td>  9.1</td>\n",
        "      <td>      Testis</td>\n",
        "      <td>                                    Testis</td>\n",
        "      <td> 16-19 hours</td>\n",
        "      <td> BP-16740</td>\n",
        "      <td>                RNA isolation_PAXgene Tissue miRNA</td>\n",
        "      <td>...</td>\n",
        "      <td> 19539848</td>\n",
        "      <td> 15703873</td>\n",
        "      <td> 0.002877</td>\n",
        "      <td> 19525988</td>\n",
        "      <td> 49.982260</td>\n",
        "      <td> 0.051392</td>\n",
        "      <td> 0.934544</td>\n",
        "      <td> 937</td>\n",
        "      <td> 0.185286</td>\n",
        "      <td> 50.262527</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>29  </th>\n",
        "      <td>      GTEX-N7MS-0225-SM-4E3HO</td>\n",
        "      <td>  1</td>\n",
        "      <td>     C1</td>\n",
        "      <td> OK for analysis</td>\n",
        "      <td>  7.7</td>\n",
        "      <td>        Skin</td>\n",
        "      <td>            Skin - Sun Exposed (Lower leg)</td>\n",
        "      <td> 16-19 hours</td>\n",
        "      <td> BP-36182</td>\n",
        "      <td>                RNA isolation_PAXgene Tissue miRNA</td>\n",
        "      <td>...</td>\n",
        "      <td> 17258030</td>\n",
        "      <td> 13502100</td>\n",
        "      <td> 0.004377</td>\n",
        "      <td> 17381617</td>\n",
        "      <td> 50.178387</td>\n",
        "      <td> 0.010061</td>\n",
        "      <td> 0.937283</td>\n",
        "      <td> 862</td>\n",
        "      <td> 0.223826</td>\n",
        "      <td> 50.288605</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>...</th>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "      <td>...</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4471</th>\n",
        "      <td>               K-562-SM-3GADY</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 20293042</td>\n",
        "      <td> 21127232</td>\n",
        "      <td> 0.003853</td>\n",
        "      <td> 20085667</td>\n",
        "      <td> 49.743217</td>\n",
        "      <td> 0.010052</td>\n",
        "      <td> 0.912978</td>\n",
        "      <td> 877</td>\n",
        "      <td> 0.323958</td>\n",
        "      <td> 50.618893</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4472</th>\n",
        "      <td>               K-562-SM-3GAFC</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 19387186</td>\n",
        "      <td> 19565057</td>\n",
        "      <td> 0.003505</td>\n",
        "      <td> 19503152</td>\n",
        "      <td> 50.149097</td>\n",
        "      <td> 0.011422</td>\n",
        "      <td> 0.890707</td>\n",
        "      <td> 854</td>\n",
        "      <td> 0.420687</td>\n",
        "      <td> 50.387650</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4473</th>\n",
        "      <td>               K-562-SM-3GIKB</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 12498139</td>\n",
        "      <td> 12769804</td>\n",
        "      <td> 0.003243</td>\n",
        "      <td> 12654894</td>\n",
        "      <td> 50.311604</td>\n",
        "      <td> 0.010686</td>\n",
        "      <td> 0.852001</td>\n",
        "      <td> 849</td>\n",
        "      <td> 0.388517</td>\n",
        "      <td> 50.260590</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4474</th>\n",
        "      <td>               K-562-SM-3GILO</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 18089388</td>\n",
        "      <td> 18151281</td>\n",
        "      <td> 0.004954</td>\n",
        "      <td> 17946903</td>\n",
        "      <td> 49.802307</td>\n",
        "      <td> 0.010809</td>\n",
        "      <td> 0.865095</td>\n",
        "      <td> 876</td>\n",
        "      <td> 0.278859</td>\n",
        "      <td> 50.707745</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4475</th>\n",
        "      <td>               K-562-SM-3K2BF</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 10709316</td>\n",
        "      <td> 11153056</td>\n",
        "      <td> 0.005191</td>\n",
        "      <td> 10650786</td>\n",
        "      <td> 49.862990</td>\n",
        "      <td> 0.017607</td>\n",
        "      <td> 0.801476</td>\n",
        "      <td> 841</td>\n",
        "      <td> 0.225620</td>\n",
        "      <td> 50.603752</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4476</th>\n",
        "      <td>               K-562-SM-3LK7S</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 19775330</td>\n",
        "      <td> 21833559</td>\n",
        "      <td> 0.002277</td>\n",
        "      <td> 19642715</td>\n",
        "      <td> 49.831787</td>\n",
        "      <td> 0.018965</td>\n",
        "      <td> 0.922582</td>\n",
        "      <td> 882</td>\n",
        "      <td> 0.312202</td>\n",
        "      <td> 50.353050</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4477</th>\n",
        "      <td>               K-562-SM-3MJHH</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 22850569</td>\n",
        "      <td> 25074169</td>\n",
        "      <td> 0.002691</td>\n",
        "      <td> 22654052</td>\n",
        "      <td> 49.784070</td>\n",
        "      <td> 0.014306</td>\n",
        "      <td> 0.938719</td>\n",
        "      <td> 887</td>\n",
        "      <td> 0.299666</td>\n",
        "      <td> 50.487100</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4478</th>\n",
        "      <td>               K-562-SM-3NB3I</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 21923991</td>\n",
        "      <td> 23322261</td>\n",
        "      <td> 0.003130</td>\n",
        "      <td> 22010308</td>\n",
        "      <td> 50.098236</td>\n",
        "      <td> 0.022786</td>\n",
        "      <td> 0.928603</td>\n",
        "      <td> 872</td>\n",
        "      <td> 0.299371</td>\n",
        "      <td> 50.367115</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4479</th>\n",
        "      <td>               K-562-SM-3NMAP</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 22210968</td>\n",
        "      <td> 24363787</td>\n",
        "      <td> 0.003103</td>\n",
        "      <td> 22339747</td>\n",
        "      <td> 50.144530</td>\n",
        "      <td> 0.018574</td>\n",
        "      <td> 0.936381</td>\n",
        "      <td> 890</td>\n",
        "      <td> 0.284208</td>\n",
        "      <td> 50.140530</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4480</th>\n",
        "      <td>               K-562-SM-3NMDG</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 20447451</td>\n",
        "      <td> 22524234</td>\n",
        "      <td> 0.002659</td>\n",
        "      <td> 20583281</td>\n",
        "      <td> 50.165520</td>\n",
        "      <td> 0.017522</td>\n",
        "      <td> 0.950120</td>\n",
        "      <td> 873</td>\n",
        "      <td> 0.267642</td>\n",
        "      <td> 50.144337</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4481</th>\n",
        "      <td>               K-562-SM-3P61Y</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 23221097</td>\n",
        "      <td> 25541338</td>\n",
        "      <td> 0.002788</td>\n",
        "      <td> 23358263</td>\n",
        "      <td> 50.147243</td>\n",
        "      <td> 0.022186</td>\n",
        "      <td> 0.946919</td>\n",
        "      <td> 894</td>\n",
        "      <td> 0.257213</td>\n",
        "      <td> 50.123116</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4482</th>\n",
        "      <td>               K-562-SM-46MWI</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 12960037</td>\n",
        "      <td> 14028074</td>\n",
        "      <td> 0.002139</td>\n",
        "      <td> 13042438</td>\n",
        "      <td> 50.158447</td>\n",
        "      <td> 0.024899</td>\n",
        "      <td> 0.956672</td>\n",
        "      <td> 873</td>\n",
        "      <td> 0.188689</td>\n",
        "      <td> 50.133953</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4483</th>\n",
        "      <td>               K-562-SM-47JYY</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 15251988</td>\n",
        "      <td> 17187136</td>\n",
        "      <td> 0.002230</td>\n",
        "      <td> 15395095</td>\n",
        "      <td> 50.233475</td>\n",
        "      <td> 0.022935</td>\n",
        "      <td> 0.957918</td>\n",
        "      <td> 862</td>\n",
        "      <td> 0.218666</td>\n",
        "      <td> 49.976242</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4484</th>\n",
        "      <td>               K-562-SM-48FEU</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 11679926</td>\n",
        "      <td> 13207067</td>\n",
        "      <td> 0.002256</td>\n",
        "      <td> 11762756</td>\n",
        "      <td> 50.176662</td>\n",
        "      <td> 0.028136</td>\n",
        "      <td> 0.953367</td>\n",
        "      <td> 832</td>\n",
        "      <td> 0.393895</td>\n",
        "      <td> 50.098743</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4485</th>\n",
        "      <td>               K-562-SM-48TE3</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 14556993</td>\n",
        "      <td> 16245672</td>\n",
        "      <td> 0.001645</td>\n",
        "      <td> 14501457</td>\n",
        "      <td> 49.904438</td>\n",
        "      <td> 0.013772</td>\n",
        "      <td> 0.964781</td>\n",
        "      <td> 873</td>\n",
        "      <td> 0.197786</td>\n",
        "      <td> 50.232384</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4486</th>\n",
        "      <td>               K-562-SM-4AD4F</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 14758394</td>\n",
        "      <td> 16187369</td>\n",
        "      <td> 0.002012</td>\n",
        "      <td> 14775511</td>\n",
        "      <td> 50.028980</td>\n",
        "      <td> 0.013740</td>\n",
        "      <td> 0.962448</td>\n",
        "      <td> 863</td>\n",
        "      <td> 0.197407</td>\n",
        "      <td> 50.208282</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4487</th>\n",
        "      <td>               K-562-SM-4AT3W</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 10916665</td>\n",
        "      <td> 12078403</td>\n",
        "      <td> 0.005620</td>\n",
        "      <td> 10964475</td>\n",
        "      <td> 50.109250</td>\n",
        "      <td> 0.013850</td>\n",
        "      <td> 0.950125</td>\n",
        "      <td> 828</td>\n",
        "      <td> 0.172564</td>\n",
        "      <td> 50.208042</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4488</th>\n",
        "      <td>               K-562-SM-4B66B</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 22550061</td>\n",
        "      <td> 24921589</td>\n",
        "      <td> 0.002292</td>\n",
        "      <td> 22611795</td>\n",
        "      <td> 50.068350</td>\n",
        "      <td> 0.022443</td>\n",
        "      <td> 0.956932</td>\n",
        "      <td> 892</td>\n",
        "      <td> 0.230228</td>\n",
        "      <td> 50.232353</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4489</th>\n",
        "      <td>               K-562-SM-4BONS</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 15745843</td>\n",
        "      <td> 17474655</td>\n",
        "      <td> 0.003985</td>\n",
        "      <td> 15727040</td>\n",
        "      <td> 49.970127</td>\n",
        "      <td> 0.020865</td>\n",
        "      <td> 0.956510</td>\n",
        "      <td> 876</td>\n",
        "      <td> 0.215475</td>\n",
        "      <td> 50.267246</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4490</th>\n",
        "      <td>               K-562-SM-4BRWK</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 12423333</td>\n",
        "      <td> 13669729</td>\n",
        "      <td> 0.002084</td>\n",
        "      <td> 12383723</td>\n",
        "      <td> 49.920166</td>\n",
        "      <td> 0.011200</td>\n",
        "      <td> 0.958494</td>\n",
        "      <td> 861</td>\n",
        "      <td> 0.184230</td>\n",
        "      <td> 50.298664</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4491</th>\n",
        "      <td>               K-562-SM-4DM4W</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 11350047</td>\n",
        "      <td> 12454038</td>\n",
        "      <td> 0.005404</td>\n",
        "      <td> 11400063</td>\n",
        "      <td> 50.109924</td>\n",
        "      <td> 0.029977</td>\n",
        "      <td> 0.945595</td>\n",
        "      <td> 828</td>\n",
        "      <td> 0.181774</td>\n",
        "      <td> 50.162186</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4492</th>\n",
        "      <td>               K-562-SM-4EDPU</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 19954703</td>\n",
        "      <td> 21371627</td>\n",
        "      <td> 0.003999</td>\n",
        "      <td> 20055440</td>\n",
        "      <td> 50.125885</td>\n",
        "      <td> 0.042664</td>\n",
        "      <td> 0.944981</td>\n",
        "      <td> 890</td>\n",
        "      <td> 0.243209</td>\n",
        "      <td> 50.225540</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4493</th>\n",
        "      <td>               K-562-SM-4GICD</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 21863245</td>\n",
        "      <td> 24097377</td>\n",
        "      <td> 0.003084</td>\n",
        "      <td> 21979888</td>\n",
        "      <td> 50.133026</td>\n",
        "      <td> 0.012700</td>\n",
        "      <td> 0.929063</td>\n",
        "      <td> 893</td>\n",
        "      <td> 0.242783</td>\n",
        "      <td> 50.196144</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4494</th>\n",
        "      <td>               K-562-SM-4IHK7</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 14231410</td>\n",
        "      <td> 15796005</td>\n",
        "      <td> 0.004426</td>\n",
        "      <td> 14188282</td>\n",
        "      <td> 49.924120</td>\n",
        "      <td> 0.022133</td>\n",
        "      <td> 0.949039</td>\n",
        "      <td> 870</td>\n",
        "      <td> 0.214134</td>\n",
        "      <td> 50.295345</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4495</th>\n",
        "      <td>               K-562-SM-4JBIQ</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 18279772</td>\n",
        "      <td> 20403076</td>\n",
        "      <td> 0.004726</td>\n",
        "      <td> 18332662</td>\n",
        "      <td> 50.072230</td>\n",
        "      <td> 0.014611</td>\n",
        "      <td> 0.954230</td>\n",
        "      <td> 871</td>\n",
        "      <td> 0.226838</td>\n",
        "      <td> 50.212060</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4496</th>\n",
        "      <td>               K-562-SM-4KKZ9</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 17621411</td>\n",
        "      <td> 19649733</td>\n",
        "      <td> 0.003199</td>\n",
        "      <td> 17603160</td>\n",
        "      <td> 49.974094</td>\n",
        "      <td> 0.017336</td>\n",
        "      <td> 0.951078</td>\n",
        "      <td> 855</td>\n",
        "      <td> 0.208081</td>\n",
        "      <td> 50.299175</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4497</th>\n",
        "      <td>               K-562-SM-4LMI2</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 16494770</td>\n",
        "      <td> 18425439</td>\n",
        "      <td> 0.003787</td>\n",
        "      <td> 16498971</td>\n",
        "      <td> 50.006367</td>\n",
        "      <td> 0.011050</td>\n",
        "      <td> 0.952641</td>\n",
        "      <td> 866</td>\n",
        "      <td> 0.197396</td>\n",
        "      <td> 50.248615</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4498</th>\n",
        "      <td>               K-562-SM-4LVKX</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  9.5</td>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>          Cells - Leukemia cell line (CML)</td>\n",
        "      <td>         NaN</td>\n",
        "      <td> BP-17177</td>\n",
        "      <td>         RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
        "      <td>...</td>\n",
        "      <td> 13735552</td>\n",
        "      <td> 15244841</td>\n",
        "      <td> 0.002508</td>\n",
        "      <td> 13719276</td>\n",
        "      <td> 49.970356</td>\n",
        "      <td> 0.011353</td>\n",
        "      <td> 0.952748</td>\n",
        "      <td> 870</td>\n",
        "      <td> 0.185469</td>\n",
        "      <td> 50.244150</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4499</th>\n",
        "      <td>             NA12878-SM-2XJZN</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  NaN</td>\n",
        "      <td>         NaN</td>\n",
        "      <td>                                       NaN</td>\n",
        "      <td>         NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>          Cell Line DNA (Derived from Blood Cells)</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4500</th>\n",
        "      <td>           NA12878_C-SM-2VCTR</td>\n",
        "      <td>NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>             NaN</td>\n",
        "      <td>  NaN</td>\n",
        "      <td>         NaN</td>\n",
        "      <td>                                       NaN</td>\n",
        "      <td>         NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>          Cell Line DNA (Derived from Blood Cells)</td>\n",
        "      <td>...</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>       NaN</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>4501 rows \u00d7 59 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 196,
       "text": [
        "                            SAMPID  SMATSSCR SMCENTER         SMPTHNTS  SMRIN  \\\n",
        "0          GTEX-N7MS-0007-SM-26GME       NaN       C1              NaN    8.2   \n",
        "1          GTEX-N7MS-0007-SM-26GMV       NaN       C1              NaN    8.2   \n",
        "2          GTEX-N7MS-0007-SM-2D43E       NaN       C1              NaN    8.2   \n",
        "3          GTEX-N7MS-0007-SM-2D7W1       NaN       C1              NaN    8.2   \n",
        "4          GTEX-N7MS-0008-SM-4E3JI       NaN       C1              NaN   10.0   \n",
        "5          GTEX-N7MS-0009-SM-2BWY4       NaN       C1              NaN    NaN   \n",
        "6          GTEX-N7MS-0009-SM-2XK1D       NaN       C1              NaN    NaN   \n",
        "7     GTEX-N7MS-0011-R10A-SM-2HMJK       NaN   C1, A1              NaN    7.1   \n",
        "8     GTEX-N7MS-0011-R10A-SM-2IZJW       NaN   C1, A1              NaN    7.1   \n",
        "9     GTEX-N7MS-0011-R11A-SM-2HMJS       NaN   C1, A1              NaN    6.6   \n",
        "10    GTEX-N7MS-0011-R11A-SM-2IZJZ       NaN   C1, A1              NaN    6.6   \n",
        "11     GTEX-N7MS-0011-R1a-SM-2AXVJ       NaN   C1, A1              NaN    7.3   \n",
        "12     GTEX-N7MS-0011-R1a-SM-2HMJG       NaN   C1, A1              NaN    7.3   \n",
        "13     GTEX-N7MS-0011-R2a-SM-2HML6       NaN   C1, A1              NaN    7.0   \n",
        "14     GTEX-N7MS-0011-R2a-SM-2IZK7       NaN   C1, A1              NaN    7.0   \n",
        "15     GTEX-N7MS-0011-R3a-SM-2AXVU       NaN   C1, A1              NaN    7.6   \n",
        "16     GTEX-N7MS-0011-R3a-SM-2HMKD       NaN   C1, A1              NaN    7.6   \n",
        "17     GTEX-N7MS-0011-R3a-SM-33HC6       NaN   C1, A1              NaN    7.6   \n",
        "18     GTEX-N7MS-0011-R4a-SM-2AXW2       NaN   C1, A1              NaN    6.2   \n",
        "19     GTEX-N7MS-0011-R4a-SM-2HMKW       NaN   C1, A1              NaN    6.2   \n",
        "20     GTEX-N7MS-0011-R5a-SM-2AXW7       NaN   C1, A1              NaN    7.8   \n",
        "21     GTEX-N7MS-0011-R5a-SM-2HMK8       NaN   C1, A1              NaN    7.8   \n",
        "22     GTEX-N7MS-0011-R6a-SM-2AXWD       NaN   C1, A1              NaN    7.6   \n",
        "23     GTEX-N7MS-0011-R6a-SM-2HMJ4       NaN   C1, A1              NaN    7.6   \n",
        "24     GTEX-N7MS-0011-R7a-SM-2AXV5       NaN   C1, A1              NaN    6.4   \n",
        "25     GTEX-N7MS-0011-R7a-SM-2HMKN       NaN   C1, A1              NaN    6.4   \n",
        "26     GTEX-N7MS-0011-R8a-SM-2AXVD       NaN   C1, A1              NaN    6.8   \n",
        "27     GTEX-N7MS-0011-R8a-SM-2YUMK       NaN   C1, A1              NaN    6.8   \n",
        "28         GTEX-N7MS-0126-SM-3TW8O         3       C1               OK    9.1   \n",
        "29         GTEX-N7MS-0225-SM-4E3HO         1       C1  OK for analysis    7.7   \n",
        "...                            ...       ...      ...              ...    ...   \n",
        "4471                K-562-SM-3GADY       NaN      NaN              NaN    9.5   \n",
        "4472                K-562-SM-3GAFC       NaN      NaN              NaN    9.5   \n",
        "4473                K-562-SM-3GIKB       NaN      NaN              NaN    9.5   \n",
        "4474                K-562-SM-3GILO       NaN      NaN              NaN    9.5   \n",
        "4475                K-562-SM-3K2BF       NaN      NaN              NaN    9.5   \n",
        "4476                K-562-SM-3LK7S       NaN      NaN              NaN    9.5   \n",
        "4477                K-562-SM-3MJHH       NaN      NaN              NaN    9.5   \n",
        "4478                K-562-SM-3NB3I       NaN      NaN              NaN    9.5   \n",
        "4479                K-562-SM-3NMAP       NaN      NaN              NaN    9.5   \n",
        "4480                K-562-SM-3NMDG       NaN      NaN              NaN    9.5   \n",
        "4481                K-562-SM-3P61Y       NaN      NaN              NaN    9.5   \n",
        "4482                K-562-SM-46MWI       NaN      NaN              NaN    9.5   \n",
        "4483                K-562-SM-47JYY       NaN      NaN              NaN    9.5   \n",
        "4484                K-562-SM-48FEU       NaN      NaN              NaN    9.5   \n",
        "4485                K-562-SM-48TE3       NaN      NaN              NaN    9.5   \n",
        "4486                K-562-SM-4AD4F       NaN      NaN              NaN    9.5   \n",
        "4487                K-562-SM-4AT3W       NaN      NaN              NaN    9.5   \n",
        "4488                K-562-SM-4B66B       NaN      NaN              NaN    9.5   \n",
        "4489                K-562-SM-4BONS       NaN      NaN              NaN    9.5   \n",
        "4490                K-562-SM-4BRWK       NaN      NaN              NaN    9.5   \n",
        "4491                K-562-SM-4DM4W       NaN      NaN              NaN    9.5   \n",
        "4492                K-562-SM-4EDPU       NaN      NaN              NaN    9.5   \n",
        "4493                K-562-SM-4GICD       NaN      NaN              NaN    9.5   \n",
        "4494                K-562-SM-4IHK7       NaN      NaN              NaN    9.5   \n",
        "4495                K-562-SM-4JBIQ       NaN      NaN              NaN    9.5   \n",
        "4496                K-562-SM-4KKZ9       NaN      NaN              NaN    9.5   \n",
        "4497                K-562-SM-4LMI2       NaN      NaN              NaN    9.5   \n",
        "4498                K-562-SM-4LVKX       NaN      NaN              NaN    9.5   \n",
        "4499              NA12878-SM-2XJZN       NaN      NaN              NaN    NaN   \n",
        "4500            NA12878_C-SM-2VCTR       NaN      NaN              NaN    NaN   \n",
        "\n",
        "             SMTS                                      SMTSD     SMTSISCH  \\\n",
        "0           Blood                                Whole Blood  16-19 hours   \n",
        "1           Blood                                Whole Blood  16-19 hours   \n",
        "2           Blood                                Whole Blood  16-19 hours   \n",
        "3           Blood                                Whole Blood  16-19 hours   \n",
        "4            Skin            Cells - Transformed fibroblasts          NaN   \n",
        "5           Blood                                Whole Blood  16-19 hours   \n",
        "6           Blood                                Whole Blood  16-19 hours   \n",
        "7           Brain               Brain - Frontal Cortex (BA9)          NaN   \n",
        "8           Brain               Brain - Frontal Cortex (BA9)          NaN   \n",
        "9           Brain              Brain - Cerebellar Hemisphere          NaN   \n",
        "10          Brain              Brain - Cerebellar Hemisphere          NaN   \n",
        "11          Brain                        Brain - Hippocampus          NaN   \n",
        "12          Brain                        Brain - Hippocampus          NaN   \n",
        "13          Brain                   Brain - Substantia nigra          NaN   \n",
        "14          Brain                   Brain - Substantia nigra          NaN   \n",
        "15          Brain   Brain - Anterior cingulate cortex (BA24)          NaN   \n",
        "16          Brain   Brain - Anterior cingulate cortex (BA24)          NaN   \n",
        "17          Brain   Brain - Anterior cingulate cortex (BA24)          NaN   \n",
        "18          Brain                           Brain - Amygdala          NaN   \n",
        "19          Brain                           Brain - Amygdala          NaN   \n",
        "20          Brain            Brain - Caudate (basal ganglia)          NaN   \n",
        "21          Brain            Brain - Caudate (basal ganglia)          NaN   \n",
        "22          Brain  Brain - Nucleus accumbens (basal ganglia)          NaN   \n",
        "23          Brain  Brain - Nucleus accumbens (basal ganglia)          NaN   \n",
        "24          Brain            Brain - Putamen (basal ganglia)          NaN   \n",
        "25          Brain            Brain - Putamen (basal ganglia)          NaN   \n",
        "26          Brain                       Brain - Hypothalamus          NaN   \n",
        "27          Brain                       Brain - Hypothalamus          NaN   \n",
        "28         Testis                                     Testis  16-19 hours   \n",
        "29           Skin             Skin - Sun Exposed (Lower leg)  16-19 hours   \n",
        "...           ...                                        ...          ...   \n",
        "4471  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4472  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4473  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4474  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4475  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4476  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4477  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4478  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4479  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4480  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4481  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4482  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4483  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4484  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4485  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4486  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4487  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4488  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4489  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4490  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4491  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4492  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4493  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4494  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4495  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4496  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4497  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4498  Bone Marrow           Cells - Leukemia cell line (CML)          NaN   \n",
        "4499          NaN                                        NaN          NaN   \n",
        "4500          NaN                                        NaN          NaN   \n",
        "\n",
        "      SMNABTCH                                          SMNABTCHT    ...      \\\n",
        "0     BP-16653           RNA isolation_PAXgene Blood RNA (Manual)    ...       \n",
        "1     BP-16653           RNA isolation_PAXgene Blood RNA (Manual)    ...       \n",
        "2     BP-16653           RNA isolation_PAXgene Blood RNA (Manual)    ...       \n",
        "3     BP-16653           RNA isolation_PAXgene Blood RNA (Manual)    ...       \n",
        "4     BP-37581          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "5     BP-16657  DNA isolation_Whole Blood _QIAGEN Puregene (Ma...    ...       \n",
        "6     BP-16657  DNA isolation_Whole Blood _QIAGEN Puregene (Ma...    ...       \n",
        "7     BP-19253                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "8     BP-19253                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "9     BP-19253                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "10    BP-19253                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "11    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "12    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "13    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "14    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "15    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "16    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "17    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "18    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "19    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "20    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "21    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "22    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "23    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "24    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "25    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "26    BP-17395                      RNA isolation_QIAGEN miRNeasy    ...       \n",
        "27    BP-17395          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "28    BP-16740                 RNA isolation_PAXgene Tissue miRNA    ...       \n",
        "29    BP-36182                 RNA isolation_PAXgene Tissue miRNA    ...       \n",
        "...        ...                                                ...    ...       \n",
        "4471  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4472  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4473  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4474  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4475  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4476  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4477  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4478  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4479  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4480  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4481  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4482  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4483  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4484  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4485  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4486  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4487  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4488  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4489  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4490  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4491  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4492  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4493  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4494  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4495  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4496  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4497  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4498  BP-17177          RNA isolation_Trizol Manual (Cell Pellet)    ...       \n",
        "4499       NaN           Cell Line DNA (Derived from Blood Cells)    ...       \n",
        "4500       NaN           Cell Line DNA (Derived from Blood Cells)    ...       \n",
        "\n",
        "      SME1ANTI  SMSPLTRD  SMBSMMRT  SME1SNSE   SME1PCTS  SMRRNART  SME1MPRT  \\\n",
        "0          NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "1          NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "2          NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "3     13705136  18432744  0.002456  13447728  49.526005  0.041526  0.835199   \n",
        "4     17962165  20910366  0.004087  18012435  50.069874  0.028395  0.948329   \n",
        "5          NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "6          NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "7     18948398  12221905  0.004294  18747238  49.733180  0.051237  0.875680   \n",
        "8          NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "9     19024292  12200496  0.003643  18954711  49.908398  0.016711  0.893391   \n",
        "10         NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "11         NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "12    15226514   8806379  0.004332  15122489  49.828620  0.041028  0.790736   \n",
        "13    11678187   8637924  0.004073  11575007  49.778137  0.028304  0.628574   \n",
        "14         NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "15         NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "16     8114816   4205357  0.004996   8085575  49.909750  0.004037  0.151460   \n",
        "17    36663415  22956649  0.003256  36619121  49.969780  0.033465  0.916319   \n",
        "18         NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "19    11651563   8137082  0.004180  11545498  49.771380  0.035598  0.657749   \n",
        "20         NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "21    21641531  13525503  0.004141  21325223  49.631920  0.050445  0.889377   \n",
        "22         NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "23    17668410  12313288  0.003942  17447282  49.685143  0.040960  0.883223   \n",
        "24         NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "25    10366169   5352672  0.004739   9997472  49.094720  0.058973  0.664810   \n",
        "26         NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "27    12317489   9750793  0.002769  12167817  49.694363  0.054535  0.817677   \n",
        "28    19539848  15703873  0.002877  19525988  49.982260  0.051392  0.934544   \n",
        "29    17258030  13502100  0.004377  17381617  50.178387  0.010061  0.937283   \n",
        "...        ...       ...       ...       ...        ...       ...       ...   \n",
        "4471  20293042  21127232  0.003853  20085667  49.743217  0.010052  0.912978   \n",
        "4472  19387186  19565057  0.003505  19503152  50.149097  0.011422  0.890707   \n",
        "4473  12498139  12769804  0.003243  12654894  50.311604  0.010686  0.852001   \n",
        "4474  18089388  18151281  0.004954  17946903  49.802307  0.010809  0.865095   \n",
        "4475  10709316  11153056  0.005191  10650786  49.862990  0.017607  0.801476   \n",
        "4476  19775330  21833559  0.002277  19642715  49.831787  0.018965  0.922582   \n",
        "4477  22850569  25074169  0.002691  22654052  49.784070  0.014306  0.938719   \n",
        "4478  21923991  23322261  0.003130  22010308  50.098236  0.022786  0.928603   \n",
        "4479  22210968  24363787  0.003103  22339747  50.144530  0.018574  0.936381   \n",
        "4480  20447451  22524234  0.002659  20583281  50.165520  0.017522  0.950120   \n",
        "4481  23221097  25541338  0.002788  23358263  50.147243  0.022186  0.946919   \n",
        "4482  12960037  14028074  0.002139  13042438  50.158447  0.024899  0.956672   \n",
        "4483  15251988  17187136  0.002230  15395095  50.233475  0.022935  0.957918   \n",
        "4484  11679926  13207067  0.002256  11762756  50.176662  0.028136  0.953367   \n",
        "4485  14556993  16245672  0.001645  14501457  49.904438  0.013772  0.964781   \n",
        "4486  14758394  16187369  0.002012  14775511  50.028980  0.013740  0.962448   \n",
        "4487  10916665  12078403  0.005620  10964475  50.109250  0.013850  0.950125   \n",
        "4488  22550061  24921589  0.002292  22611795  50.068350  0.022443  0.956932   \n",
        "4489  15745843  17474655  0.003985  15727040  49.970127  0.020865  0.956510   \n",
        "4490  12423333  13669729  0.002084  12383723  49.920166  0.011200  0.958494   \n",
        "4491  11350047  12454038  0.005404  11400063  50.109924  0.029977  0.945595   \n",
        "4492  19954703  21371627  0.003999  20055440  50.125885  0.042664  0.944981   \n",
        "4493  21863245  24097377  0.003084  21979888  50.133026  0.012700  0.929063   \n",
        "4494  14231410  15796005  0.004426  14188282  49.924120  0.022133  0.949039   \n",
        "4495  18279772  20403076  0.004726  18332662  50.072230  0.014611  0.954230   \n",
        "4496  17621411  19649733  0.003199  17603160  49.974094  0.017336  0.951078   \n",
        "4497  16494770  18425439  0.003787  16498971  50.006367  0.011050  0.952641   \n",
        "4498  13735552  15244841  0.002508  13719276  49.970356  0.011353  0.952748   \n",
        "4499       NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "4500       NaN       NaN       NaN       NaN        NaN       NaN       NaN   \n",
        "\n",
        "      SMNUM5CD  SMDPMPRT   SME2PCTS  \n",
        "0          NaN       NaN        NaN  \n",
        "1          NaN       NaN        NaN  \n",
        "2          NaN       NaN        NaN  \n",
        "3          840  0.563503  51.361324  \n",
        "4          879  0.226835  50.270794  \n",
        "5          NaN       NaN        NaN  \n",
        "6          NaN       NaN        NaN  \n",
        "7          859  0.330709  50.619534  \n",
        "8          NaN       NaN        NaN  \n",
        "9          851  0.193112  50.387028  \n",
        "10         NaN       NaN        NaN  \n",
        "11         NaN       NaN        NaN  \n",
        "12         835  0.324148  50.618090  \n",
        "13         837  0.275110  50.561234  \n",
        "14         NaN       NaN        NaN  \n",
        "15         NaN       NaN        NaN  \n",
        "16         806  0.364988  50.379780  \n",
        "17         875  0.341985  50.874683  \n",
        "18         NaN       NaN        NaN  \n",
        "19         811  0.334207  50.741714  \n",
        "20         NaN       NaN        NaN  \n",
        "21         891  0.312833  50.695858  \n",
        "22         NaN       NaN        NaN  \n",
        "23         854  0.262580  50.615242  \n",
        "24         NaN       NaN        NaN  \n",
        "25         791  0.400345  51.510887  \n",
        "26         NaN       NaN        NaN  \n",
        "27         820  0.366260  50.655490  \n",
        "28         937  0.185286  50.262527  \n",
        "29         862  0.223826  50.288605  \n",
        "...        ...       ...        ...  \n",
        "4471       877  0.323958  50.618893  \n",
        "4472       854  0.420687  50.387650  \n",
        "4473       849  0.388517  50.260590  \n",
        "4474       876  0.278859  50.707745  \n",
        "4475       841  0.225620  50.603752  \n",
        "4476       882  0.312202  50.353050  \n",
        "4477       887  0.299666  50.487100  \n",
        "4478       872  0.299371  50.367115  \n",
        "4479       890  0.284208  50.140530  \n",
        "4480       873  0.267642  50.144337  \n",
        "4481       894  0.257213  50.123116  \n",
        "4482       873  0.188689  50.133953  \n",
        "4483       862  0.218666  49.976242  \n",
        "4484       832  0.393895  50.098743  \n",
        "4485       873  0.197786  50.232384  \n",
        "4486       863  0.197407  50.208282  \n",
        "4487       828  0.172564  50.208042  \n",
        "4488       892  0.230228  50.232353  \n",
        "4489       876  0.215475  50.267246  \n",
        "4490       861  0.184230  50.298664  \n",
        "4491       828  0.181774  50.162186  \n",
        "4492       890  0.243209  50.225540  \n",
        "4493       893  0.242783  50.196144  \n",
        "4494       870  0.214134  50.295345  \n",
        "4495       871  0.226838  50.212060  \n",
        "4496       855  0.208081  50.299175  \n",
        "4497       866  0.197396  50.248615  \n",
        "4498       870  0.185469  50.244150  \n",
        "4499       NaN       NaN        NaN  \n",
        "4500       NaN       NaN        NaN  \n",
        "\n",
        "[4501 rows x 59 columns]"
       ]
      }
     ],
     "prompt_number": 196
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Ugh, this has the UGLIEST column names. What does `SMTSISCH` really mean, anyway? Plus, there's 59 columns and I don't want to have to go through and copy/paste something 59 different times.\n",
      "\n",
      "Turns out the excel file `GTEx_Data_2014-01-17_Annotations_SampleAttributesDD.xlsx` has the mapping between these weird names and human-readable concepts. Open it Excel.\n",
      "\n",
      "Since there's so many things to rename, we'll do it programmatically rather than copy/pasting by hand."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Read the excel file `GTEx_Data_2014-01-17_Annotations_SampleAttributesDD.xlsx` using `pandas`, and set the first column as the \"index\" or row names"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Hint 1: `index_col` is the argument you want for setting the column number that should be the index\n",
      "\n",
      "Hint 2: In computer science, we count from 0, so the third column is indicated by the number `2`\n",
      "\n",
      "Hint 3: \"how to open excel in pandas\" is a great search term :)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to read the excel file GTEx_Data_2014-01-17_Annotations_SampleAttributesDD.xlsx, and set the first column as the \"index\" goes here\n",
      "sample_attributes_dd = pd.read_excel('GTEx_Data_2014-01-17_Annotations_SampleAttributesDD.xlsx', index_col=0)\n",
      "\n",
      "# Code to look at the top of the file goes here\n",
      "sample_attributes_dd.head()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>VARDESC</th>\n",
        "      <th>DOCFILE</th>\n",
        "      <th>TYPE</th>\n",
        "      <th>UNITS</th>\n",
        "      <th>COMMENT1</th>\n",
        "      <th>COMMENT2</th>\n",
        "      <th>VALUES</th>\n",
        "      <th>Unnamed: 8</th>\n",
        "      <th>Unnamed: 9</th>\n",
        "      <th>Unnamed: 10</th>\n",
        "      <th>...</th>\n",
        "      <th>Unnamed: 47</th>\n",
        "      <th>Unnamed: 48</th>\n",
        "      <th>Unnamed: 49</th>\n",
        "      <th>Unnamed: 50</th>\n",
        "      <th>Unnamed: 51</th>\n",
        "      <th>Unnamed: 52</th>\n",
        "      <th>Unnamed: 53</th>\n",
        "      <th>Unnamed: 54</th>\n",
        "      <th>Unnamed: 55</th>\n",
        "      <th>Unnamed: 56</th>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>VARNAME</th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>SAMPID</th>\n",
        "      <td>     Sample ID, GTEx Public Sample ID</td>\n",
        "      <td>                     NaN</td>\n",
        "      <td>                 string</td>\n",
        "      <td> NaN</td>\n",
        "      <td>                NaN</td>\n",
        "      <td>                                               NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>        NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>...</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>SMATSSCR</th>\n",
        "      <td>                      Autolysis Score</td>\n",
        "      <td> PRC Case Summary Report</td>\n",
        "      <td> integer, encoded value</td>\n",
        "      <td> NaN</td>\n",
        "      <td>          Autolysis</td>\n",
        "      <td> The destruction of organism cells or tissues b...</td>\n",
        "      <td> 0=None</td>\n",
        "      <td> 1=Mild</td>\n",
        "      <td> 2=Moderate</td>\n",
        "      <td> 3=Severe</td>\n",
        "      <td>...</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>SMNABTCH</th>\n",
        "      <td>      Nucleic Acid Isolation Batch ID</td>\n",
        "      <td>                   LDACC</td>\n",
        "      <td>                 string</td>\n",
        "      <td> NaN</td>\n",
        "      <td> Generated at LDACC</td>\n",
        "      <td> Batch when DNA/RNA was isolated and extracted ...</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>        NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>...</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>SMNABTCHT</th>\n",
        "      <td> Type of nucleic acid isolation batch</td>\n",
        "      <td>                   LDACC</td>\n",
        "      <td>                 string</td>\n",
        "      <td> NaN</td>\n",
        "      <td> Generated at LDACC</td>\n",
        "      <td>         The process by which DNA/RNA was isolated</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>        NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>...</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>SMNABTCHD</th>\n",
        "      <td> Date of nucleic acid isolation batch</td>\n",
        "      <td>                   LDACC</td>\n",
        "      <td>                 string</td>\n",
        "      <td> NaN</td>\n",
        "      <td> Generated at LDACC</td>\n",
        "      <td>            The date on which DNA/RNA was isolated</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>    NaN</td>\n",
        "      <td>        NaN</td>\n",
        "      <td>      NaN</td>\n",
        "      <td>...</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "      <td> NaN</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>5 rows \u00d7 56 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 195,
       "text": [
        "                                        VARDESC                  DOCFILE  \\\n",
        "VARNAME                                                                    \n",
        "SAMPID         Sample ID, GTEx Public Sample ID                      NaN   \n",
        "SMATSSCR                        Autolysis Score  PRC Case Summary Report   \n",
        "SMNABTCH        Nucleic Acid Isolation Batch ID                    LDACC   \n",
        "SMNABTCHT  Type of nucleic acid isolation batch                    LDACC   \n",
        "SMNABTCHD  Date of nucleic acid isolation batch                    LDACC   \n",
        "\n",
        "                             TYPE UNITS            COMMENT1  \\\n",
        "VARNAME                                                       \n",
        "SAMPID                     string   NaN                 NaN   \n",
        "SMATSSCR   integer, encoded value   NaN           Autolysis   \n",
        "SMNABTCH                   string   NaN  Generated at LDACC   \n",
        "SMNABTCHT                  string   NaN  Generated at LDACC   \n",
        "SMNABTCHD                  string   NaN  Generated at LDACC   \n",
        "\n",
        "                                                    COMMENT2  VALUES  \\\n",
        "VARNAME                                                                \n",
        "SAMPID                                                   NaN     NaN   \n",
        "SMATSSCR   The destruction of organism cells or tissues b...  0=None   \n",
        "SMNABTCH   Batch when DNA/RNA was isolated and extracted ...     NaN   \n",
        "SMNABTCHT          The process by which DNA/RNA was isolated     NaN   \n",
        "SMNABTCHD             The date on which DNA/RNA was isolated     NaN   \n",
        "\n",
        "          Unnamed: 8  Unnamed: 9 Unnamed: 10     ...     Unnamed: 47  \\\n",
        "VARNAME                                          ...                   \n",
        "SAMPID           NaN         NaN         NaN     ...             NaN   \n",
        "SMATSSCR      1=Mild  2=Moderate    3=Severe     ...             NaN   \n",
        "SMNABTCH         NaN         NaN         NaN     ...             NaN   \n",
        "SMNABTCHT        NaN         NaN         NaN     ...             NaN   \n",
        "SMNABTCHD        NaN         NaN         NaN     ...             NaN   \n",
        "\n",
        "          Unnamed: 48 Unnamed: 49 Unnamed: 50 Unnamed: 51 Unnamed: 52  \\\n",
        "VARNAME                                                                 \n",
        "SAMPID            NaN         NaN         NaN         NaN         NaN   \n",
        "SMATSSCR          NaN         NaN         NaN         NaN         NaN   \n",
        "SMNABTCH          NaN         NaN         NaN         NaN         NaN   \n",
        "SMNABTCHT         NaN         NaN         NaN         NaN         NaN   \n",
        "SMNABTCHD         NaN         NaN         NaN         NaN         NaN   \n",
        "\n",
        "          Unnamed: 53 Unnamed: 54 Unnamed: 55 Unnamed: 56  \n",
        "VARNAME                                                    \n",
        "SAMPID            NaN         NaN         NaN         NaN  \n",
        "SMATSSCR          NaN         NaN         NaN         NaN  \n",
        "SMNABTCH          NaN         NaN         NaN         NaN  \n",
        "SMNABTCHT         NaN         NaN         NaN         NaN  \n",
        "SMNABTCHD         NaN         NaN         NaN         NaN  \n",
        "\n",
        "[5 rows x 56 columns]"
       ]
      }
     ],
     "prompt_number": 195
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Which column do we want to use to rename the weird column names in the other dataframe? Let's print it."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Print the descriptive column in `sample_attributes_dd`, that describes the column names in `sample_attribute`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code for showing the descriptive column in the dataframe `sample_attributes_dd`\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Since the `index` of `sample_attributes_dd` is the column names of `sample_attributes`, any column from `sample_attributes_dd` is a **mapping** from column names to some other values, depending on what column you use.\n",
      "\n",
      "What's really nice about this is that we can then use one of these columns in `sample_attributes_dd` to rename the column names in `sample_attributes`. Here's an example of renaming the columns of `sample_attributes` using the column `'TYPE'` in `sample_attributes_dd`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sample_attributes.rename(columns=sample_attributes_dd['TYPE'])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Rename the columns of `sample_attributes` using the variable description column from `sample_attribute_dd`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to rename the columns of sample_attributes using a column from sample_attributes_dd\n",
      "sample_attributes = sample_attributes.rename(columns=sample_attributes_dd.VARDESC)\n",
      "\n",
      "# Code for looking at the top of the new sample_attributes goes here\n",
      "sample_attributes"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sample_attributes.ix[:5, :20]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Excellent! Now we have a dataframe with human-readable column names.\n",
      "\n",
      "Remember that first dataframe that we created, `subject_phenotypes`? We want to unify that first dataframe with this new one. Let's take a gander at it to remember what's in it in the first place."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code for looking at the top of `subject_phenotypes` goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Which columns of `subject_phenotypes` and `sample_attributes` look like they could be matched up?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Note: they don't have to be exactly the same values, because we can modify them, but look for commonalities."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Exercise: Code to show a column of `subject_phenotypes` goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Exercise: Code to show a column of `sample_attributes` goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Detour: working with strings in Python"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Before we can make a new column in `sample_attributes` with the corresponding sample id of `sample_phenotypes`, we need to go over some string manipulation techniques.\n",
      "\n",
      "For example, we can take a string and `split` it. By default, they will be split on the whitespace (like spaces and new lines)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "s = 'you have as many hours in a day as beyonce'\n",
      "s.split()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If you don't want to split on whitespace, you can specify a specific letter or character to split on, too."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "s.split('d')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: What happened to the letter \"d\"? Write a complete sentence below."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Complete sentence = it has a subject, object and a verb."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Split the string `s` above on the letter \"a\""
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to split `s` goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The result of `s.split()` is a `list`, which is a special name in Python. Look, it even comes in a special color, different from black, so you can see how special it is:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "list"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can access elements from the split using a number in square brackets."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "s.split()[4]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Get the element `'day'` from `s.split()`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to get \"day\" from s.split() goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "What if we want to access mutiple items at once? We can use the colon \"`:`\" to indicate we want everything up to (but not including) the Nth item. For example, if we want the first 5 words, we can do:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "s.split()[:5]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Split the string `s` on \"a\", and get the first 3 elements"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to split `s` on \"a\", and get the first 3 elements\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we're able to split a string, but what if we want to put it back together? We can `join` the results."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "' '.join(s.split()[:5])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Notice that we used a space to join the words. We could have used any character to `join` them (as well as any character to `split`):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "'!'.join(s.split('e')[:4])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Split the string `s` on 3 different characters, and join on 3 different characters."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code for split 1, join 1 goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code for split 2, join 2 goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code for split 3, join 3 goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Back to dataframes!"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can use what we just learned about manipulating strings, on columns of dataframe using `lambda`, which allows us to create small functions. For example, if we wanted to take column `'SUBJID'` from the dataframe `subject_phenotypes`, split every item on the dash character `'-'`, and get the first item, we would do this:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "subject_phenotypes['SUBJID'].map(lambda x: x.split('-')[0])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Split the column `\"Type of nucleic acid isolation batch\"` in `sample_attributes` on whitespace, and get the 3rd element."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to split the column 'Type of nucleic acid isolation batch' on whitespace and get the third element goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The data may not have a string in every element of a column. For example, this code produces an `AttributeError`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sample_attributes['Tissue Type, more specific detail of tissue type'].map(lambda x: x.split('-')[0])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The above code produces the error,\n",
      "\n",
      "    <ipython-input-157-6714d86271d7> in <lambda>(x)\n",
      "    ----> 1 sample_attributes['Tissue Type, more specific detail of tissue type'].map(lambda x: x.split('-')[0])\n",
      "\n",
      "    AttributeError: 'float' object has no attribute 'split'\n",
      "    \n",
      "Which happens because instead of a nice string, there is a `float` there, and `float`s don't know how to be `split`. Why is that? Well, NAs are of type `float`, so this indicates that there's an NA there.\n",
      "\n",
      "To deal with this, we can add an `if` statement to our `lambda` to make it deal with these situations. We will use the function `isinstance` to check if `x` is a string (the special word for a string in Python is `str`), and replace it with an NA using the `numpy` library (which we imported as `np` for shorthand) `np.nan`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sample_attributes['Tissue Type, more specific detail of tissue type'].map(lambda x: x.split('-')[0] if isinstance(x, str) else np.nan)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Split the column `'Code for BSS collection site'` on a comma \"`,`\" and get the first two items of the split, accounting for NAs"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In addition to `split`-ing elements, we can `join` within the `map`/`lambda` combo too!"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sample_attributes['Tissue Type, more specific detail of tissue type'].map(lambda x: '_'.join(x.split()[:3]) \n",
      "                                                                          if isinstance(x, str) else np.nan)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Split items in the column `\"Type of nucleic acid isolation batch\"` in `sample_attributes` on underscores, and join the first two elements using a space"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we have all the tools to add a `\"subject_id\"` column to `sample_attributes`! Remember, we want to create a column which has **exactly** the same entries as the column `\"SUBJID\"` in `subject_phenotypes`. What was that again? It's been so long that I forgot what those IDs look like. To remind yourself what the subject IDs in `subject_phenotypes` and sample ids in `sample_attributes` look like, take a look at the top of each of those dataframes."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to look at the top of `subject_phenotypes`\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code to look at the top of `sample_attributes`\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Add a column to `sample_attributes` called `\"subject_id\"`, using one of its existing columns, that matches the `\"SUBJID\"` in `subject_phenotypes` *exactly*"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Merging dataframes"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Excellent! Now we have a column which exactly matches the rows of `subject_phenotypes` to the rows of `sample_attributes`. Now we want to merge these two dataframes together. How do we do that? We will use the function `merge`, which is a function of the dataframe. Merge is a little complicated, so let's break it down with a few examples.\n",
      "\n",
      "First, we'll create a couple example dataframes."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dataframe1 = pd.DataFrame([['cucumber', 'watery'], ['broccoli', 'crunchy'], ['kale', 'chewy'], \n",
      "                           ['mango', 'sweet'] ], columns=['vegetable', 'description'])\n",
      "dataframe1"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dataframe2 = pd.DataFrame([['broccoli', 'harvested', 8], ['broccoli', 'planted', 5],\n",
      "                           ['kale', 'planted', 6], ['kale', 'harvested', 9],\n",
      "                           ['cucumber', 'harvested', 7], ['cucumber', 'planted', 4],\n",
      "                           ['strawberry', 'planted', 10], ['strawberry', 'harvested', 2]], \n",
      "                          columns=['crop', 'action', 'number'])\n",
      "dataframe2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We want to `merge` these two dataframes on their common column, which is `\"vegetable\"` in `dataframe1` and `crop` in `dataframe2`. We can do this using `merge`, and specifying what we want to merge the **left** and **right**  dataframes **on**."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dataframe1.merge(dataframe2, left_on='vegetable', right_on='crop')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: What happened to the row \"cucumber\"? What about to \"mango\" and \"strawberry\"? Why?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Use \"Shift\"+\"Tab\" to read the documentation behind `merge`. What's the default way that two dataframes are merged? "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Merge `dataframe1` and `dataframe2`, but use `dataframe2` as the \"left\" dataframe, and merge using `\"outer\"`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we're ready to try this with real data! Let's go back to our `sample_attributes` and `subject_phenotypes` dataframes. To recap, We added a column to `sample_attributes` to match up with `subject_phenotypes`. Now, merge the two dataframes together using their columns with common values."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Exercise: Merge `sample_attributes` and `subject_phenotypes` on their common column."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Code goes here\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Congratulations! You have now performed a DATABASE MERGE!!! Now you can't be afraid of databases! Mwahahahah! \n",
      "\n",
      "All a \"database\" really is, is a bunch of tables linked together by certain 'keys', aka the values in the columns. What you've done today is manipulate some tables (aka databases), changing column names and adding columns so they're mergable, and merging them together.\n",
      "\n",
      "Concepts from today (there's a lot!):\n",
      "\n",
      "* Unix\n",
      "    * Using `head` and `tail` to look at the beginings and ends of files\n",
      "        * Using `-n N` to modify the number of lines output by `head` and `tail`\n",
      "    * Using `wc -l` to count the number of lines in a file\n",
      "    * Searching the web and finding millions of results for a seemingly simply Unix question\n",
      "    * Finding a method to count the number of columns in a file\n",
      "* Python\n",
      "    * Pandas\n",
      "        * Reading a tabular file using `pandas`, specifically `pd.read_table`\n",
      "        * Using `.head()` and `.tail()` to look at the tops and bottoms of dataframes\n",
      "            * Using `.head(N)` and `.tail(N)` to look at the top and bottom `N` rows\n",
      "        * Accessing columns in `pandas` `DataFrames`\n",
      "        * Python dictionaries as a way of mapping one item to another\n",
      "        * Using `map` instead of `for`-loops to operate on every item of a column\n",
      "        * Creating new columns in `pandas` `DataFrames`\n",
      "            * Creating new columns as a result of operating on other columns\n",
      "        * Searching the web for help with `pandas`\n",
      "        * Reading an Excel file using `pandas`\n",
      "        * Setting one of the columns as the `index` (aka row names) when you read in the file\n",
      "        * A column in a DataFrame can be used as a mapping from the row name to the item in the column\n",
      "        * Renaming column names in one table based on a mapping\n",
      "    * String operations\n",
      "        * A string is anything between quotes\n",
      "        * Strings can be `split` on any characters\n",
      "        * The result of a `split` is a list\n",
      "        * Lists (and everything else in Python) start counting from 0 (aka \"0-based\")\n",
      "        * Get individual elements of a list using square brackets and a number, e.g. `[3]` shows the 4th element\n",
      "        * Access the first `N` elements of a list using square brackets, a colon, and the number, e.g. `[:5]` shows up to, but not including, the 6th element\n",
      "        * Strings can be glued together using `join`\n",
      "            * The `join` can be on any character\n",
      "    * Pandas\n",
      "        * Use `lambda` to create an \"anonymous\" function to use within `map`\n",
      "            * Use `lambda` to split and join strings within a column\n",
      "        * NAs are of type `float`\n",
      "        * To check if a thing is of a certain type, use `isinstance`\n",
      "        * A `lambda` can contain an `if` statement for alternative outputs\n",
      "            * But it must also contain an `else` statement as well\n",
      "        * Create a new column by combining `map` and `lambda` to do a complicated operation on each item of the column\n",
      "        * Two dataframes can be merged together if they have columns with the same elements\n",
      "            * For a merge, need to specify the columns to merge on in both dataframes\n",
      "        * Shift-tab to read documentation for a function\n",
      "        * Reading documentation is fun!\n",
      "* Databases are just tables!"
     ]
    }
   ],
   "metadata": {}
  }
 ]
}