Skip to content

Instantly share code, notes, and snippets.

@olgabot
Created November 21, 2015 18:58
Show Gist options
  • Save olgabot/1f42373040b5eefb9bfc to your computer and use it in GitHub Desktop.
Save olgabot/1f42373040b5eefb9bfc to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:cee0de1ec12d856434dfc5e5496bd627b1714cee7e3862b892477cb08786855f"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"BIOM262 January 29th, 2015: Cleaning data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Olga Botvinnik\n",
"* 3rd year Bioinformatics PhD Student\n",
"* Email: obotvinn@ucsd.edu\n",
"* Twitter: @olgabot\n",
"* Website: http://www.olgabotvinnik.com\n",
"\n",
"Instructions:\n",
"1. Download this IPython notebook from NBViewer (upper right corner)\n",
"2. Download the `biom262_2015_01_29_cleaning_data.zip` file emailed out\n",
"3. Unzip the `zip` file (this will create a folder)\n",
"4. Open up the terminal and navigate to the folder `biom262_2015_01_29_cleaning_data`\n",
"5. Start an IPython notebook server by typing `ipython notebook` into the terminal\n",
"\n",
"For this activity, work in pairs. I will give you a blue and a pink sticky note. Put the pink one on one of your laptops to show that you're stuck or have a question. Put the blue one on your laptop if you and your partner are cruisin' through the exercises and don't want to be bothered."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Background"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Whenever you work with any kind of data, such as your own Excel spreadsheets or data downloaded from a paper that you'd like to analyze, 99.9999999999% of the time it's not formatted well, and you have to do a bunch of manual cleaning. This is a real-world example of four different metadata files describing post-mortem RNA-sequencing data from 212 post-mortem subjects and 2 cell lines, from 32 different tissues, for a total of 4500 samples. In other words, this is **not** a dataset you would enjoy cleaning up in Excel! You can read more about this project at the [GTEx portal](http://www.gtexportal.org/home/documentationPage).\n",
"\n",
"The goal of this exercise is to create a single, easily human-readable table from two `.txt` files, and two Excel files. Because this is such a big project, they had to standardize everything and their tables are at least consistently formatted. But there's a lot of jargon in the data that's specific to this session, which we will replace with human-readable terms. And in 2 main tables, `GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt` and `GTEx_Data_2014-01-17_Annotations_SampleAttributesDS.txt`, the sample identifiers aren't exactly the same, so we'll have to do extra work to merge them.\n",
"\n",
"One package that doesn't come with the Anaconda Python distribution is `seaborn`, so to install it, run this command.\n",
" \n",
"### IPython Tips:\n",
"* When it says \"run this command,\" it means to press \"Shift\" and \"Enter\" together. If there's a cell with some code in it, that implies to run the cell and look at the output.\n",
"* If you want help on any Python function or object, you can type \"list??\" and it will pop up help at the bottom of the screen.\n",
"* IF you want help on any Python function, like `pd.read_table()`, i.e. the ones that have parentheses, you can get it by moving your cursor in between the parentheses, and pressing \"Shift\" and \"Tab\". This will pop up a help window next to the parentheses. Press \"Tab\" once more, and the window will get bigger. Press \"Tab\" a third time, and it will pop up a big help screen at the bottom."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"! pip install seaborn"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's import everything we need for this session."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Pandas or \"Panel Data Analysis\" toolkit for Data Frames/Data Tables\n",
"import pandas as pd\n",
"\n",
"# Numpy or \"Numerical Python\"\n",
"import numpy as np\n",
"\n",
"# Powerful R-style/statistical plotting\n",
"import seaborn as sns\n",
"\n",
"# These styles are my personal preferences\n",
"# For more options, see this page: http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/aesthetics.html\n",
"sns.set(style='whitegrid', context='notebook')\n",
"\n",
"# Show the figures directly in the IPython notebok\n",
"%matplotlib inline"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Initial inspection of files with Unix commands"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note: all these commands are meant to be run ***within*** the notebook, not on the terminal**\n",
"\n",
"The unix command `cd` moves where the notebook looks for data, and `ls` will also list the files. These are some key Unix commands that can be used without the exclamation point \"`!`\", as we will use later.\n",
"\n",
"Change directories to where you downloaded the example data. For me, that's the directory \"`~/Downloads/biom262_2015_01_29_cleaning_data`\". Remember that the character \"`~`\" (pronounced \"tilde\") indicates your home directory, which for me is `/Users/olga`, but I didn't feel like typing that out, so I used the tilde instead."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"cd ~/Downloads/biom262_2015_01_29_cleaning_data"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And make sure you have all the right files around"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ls"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start with the `SubjectPhenotypes` file, `GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt`. The first step is to look at the first 10 lines of the file with the Unix command `head`. In the IPython notebook, you can call unix/bash commands by starting the line with an exclamation point, `!`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"! head GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So there's some generic sample, a gender of `1` or `2`, a range of years, and some kind of `DTTHDY` thing. I don't know what that means, but we'll deal with that once we open the file in `pandas`. It looks like it ranges from `0` to `4`, but I don't see any entries of `1`. By default, `head` outputs the first 10 lines. We can modify the number of lines with the flag `-n` and then provide a number. For example, let's look at the first 23 lines of the file instead."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"! head -n 23 GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Still no entries with a `DTHHRDY` of 1. Maybe we need to look at the end of the file. We can do that with the command `tail`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"! tail GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ooh, our first `1` in `DTHHRDY`! Hmm, some of the rows don't have a value for the `DTHHRDY` column! This will come into play in the future."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Look at the last 17 lines of the file."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's stop for a brief exercise. How would you look at the last 17 lines of the file?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Exercise: modify the `tail` command below to look at the last 17 lines of the file.\n",
"\n",
"! tail GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another question we may have about this file is how many lines are in it. We can do this with the Unix command `wc`, or \"word, line, character and byte counter.\" Specifically `wc -l` will count the number of lines in the file."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"! wc -l GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: How do you count the number of columns in a file?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Exercise: do a web search for \"unix count number of columns\" and \n",
"# check the command on the file, GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt\n",
"\n",
"! # Column-counting code goes here"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Reading tabular data with the `pandas` library in Python"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will make heavy use of the `pandas` library, which is a godsend to Pythonista Data Scientists. It makes working with weirdly formatted data much easier, as you will soon see."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subject_phenotypes = pd.read_table('GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DS.txt')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This has created a `pandas` `DataFrame` variable called `subject_phenotypes`. Python's standards are to name variables `lowercase_with_underscores`, so we'll stick to that :)\n",
"\n",
"Let's look at the top of `subject_phenotypes`, again with a command called `head`, but we call it a little differently now that we're in Python and not Unix (notice no \"`!`\" at the beginnings of the lines anymore)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subject_phenotypes.head()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `pandas` head shows the first 5 rows, instead the first 10 rows like Unix `head`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also access individual columns in two ways. The first way is by using square brackets around the string of the column name, like this for `SUBJID`:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Run this cell. Try entering 'subjid' and 'subject id' as well.\n",
"subject_phenotypes['SUBJID']"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If a column name consists of only letters, numbers, and underscores, and starts with a letter, you can also access it with the column name, no quotes, after the dataframe name and a dot."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subject_phenotypes.SUBJID"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You may hear the word \"series\" get tossed around. A \"series\" is the `pandas`-specific technical name for a column of a dataframe."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: How would you look at the last 8 rows of the column `AGE`?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hint: `head` and `tail` work for series as well as whole dataframes."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Exercise: Show the last 8 rows.\n",
"\n",
"# Code for looking at the last 8 rows goes here."
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Converting data types from one to another"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's get to these `GENDER` and `DTHHRDY` columns. Open up the file, `GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DD.xlsx` in Excel, and see what a gender of `1` and `2` means."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# A python \"dictionary\", or mapping from one thing to the next.\n",
"# In this case, we're mapping the numbers 1 and 2 to strings\n",
"# indicating the gender.\n",
"gender = {1: 'fillmein',\n",
" 2: 'fillmein'}"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can access items of a dictionary with square brackets, much like the columns of a dataframe."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Run this cell. How do you access the other gender?\n",
"gender[2]"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we can create a new column called `'gender'` (all lowercase, because I feel like ALL CAPS COLUMNS ARE YELLING AT ME), using the `gender` dictionary.\n",
"\n",
"In `pandas`, you can create a new column by pretending to access an existing column in the dataframe, and assigning it to some value. Here's an example of creating a new column called `\"don't worry\"` with the value `\"be yonce\"` in every cell."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subject_phenotypes[\"don't worry\"] = \"be yonce\""
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Look at the top of the dataframe to see what that did."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to look at the top of the dataframe goes here \n",
"# hint: remember the command \"head\"? How did we use it to look at the dataframe when we first loaded it?\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 199
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Add a column with a name and value of your choice"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to add a column with programmer's choice of name and value goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another convenient operation is `map` on a series, which performs the operation specified on every element of a column. It's as if you wrote a `for`-loop to access every item of the `GENDER` column, and use that item to access the `gender` dictionary, and replace the value."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subject_phenotypes.GENDER.map(gender)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `map`, it's as if we wrote the following `for`-loop, but `map` is less code, and more concise."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for g in subject_phenotypes.GENDER:\n",
" print gender[g]"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Has this changed the dataframe `subject_phenotypes`?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to check if the dataframe subject_phenotypes has changed goes here\n",
"# You can check if it has changed by looking at it using your favorite body endpoint\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dataframe shouldn't have changed."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: What happens when you `map` the dictionary `gender` onto the column `DTHHRDY`?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to `map` `gender` onto DTHHRDY goes here\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 197
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Combine bracket-based column creation and `map` on a series"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your next exercise is to combine these previous two concepts of creating a column and using `map`, to create a column called `\"gender\"` (all lowercase) which is the result of using `map` with the dictionary `gender` on the `GENDER` column of `subject_phenotypes`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to create a new column called \"gender\" in `subject_phenotypes` that is the result of using `map` \n",
"# with the `gender` dictionary on the \"GENDER\" column of `subject_phenotypes`.\n",
"\n",
"\n",
"# Code to check if the dataframe has changed goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great! Now we have a column called `gender`, that makes sense to a human without having to look something up in some other table."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Convert the `DTHHRDY` column values into human-readable values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the `GTEx_Data_2014-01-17_Annotations_SubjectPhenotypes_DD.xlsx` spreadsheet again, find out what the different numbers in `DTHHRDY` mean, make a dictionary mapping numbers to words like we did with `gender`, and create a new column with a human-understandable name. \n",
"\n",
"* Test for human-understandable: you could show the column name to someone who doesn't know the data, and they understand it without you doing any extra explaining"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to create a new column in `subject_phenotypes` that is a human-readable version of the column `DTHHRDY` goes here\n",
"\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have our cleaned-up dataframe, let's do some plotting!\n",
"\n",
"Let's use `seaborn` (which we imported as the variable `sns` for brevity) to plot this. We will use the function [`factorplot`](http://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.factorplot.html) which has a bunch of options, but for now we'll just focus on two. \n",
"\n",
"The first argument is the name of the column you want to plot, and then we provide the keyword argument `data=subject_phenotypes`, to specify the dataframe we want to get this column from."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sns.factorplot('gender', data=subject_phenotypes)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nice, so we can now see the distribution of the number of subjects of these two genders.\n",
"\n",
"What if we also want to see how many people of the two genders, have different `DTHHRDY` categorizations?"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Plot the distribution of both gender and `DTHHRDY`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Change `human_variable_DTHHRDY` to the new column you created."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Edit the argument `hue=...`\n",
"sns.factorplot('gender', data=subject_phenotypes, hue=human_readable_DTHHRDY)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can break this down even more by also plotting the age of the subjects, and showing a separate plot, as below. The argument `col='AGE'` means to plot each age group onto a separate column of plots."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sns.factorplot('gender', data=subject_phenotypes, hue='DTHHRDY', col='AGE')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Plot Age as the main `x` variable, and `'gender'` as each column"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code goes below\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Combining information across dataframes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So far, we've been working with one `pandas` dataframe, and an external `*.xlsx` file. Now we're going to work on combining `sample_phenotypes` with a new dataframe, which we will call `sample_attributes`."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Inspect the new data table with Unix, and read it in using Python"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to look at the top of GTEx_Data_2014-01-17_Annotations_SampleAttributesDS.txt goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to count the number of lines in GTEx_Data_2014-01-17_Annotations_SampleAttributesDS.txt goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to count the number of columns in GTEx_Data_2014-01-17_Annotations_SampleAttributesDS.txt goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to read the table GTEx_Data_2014-01-17_Annotations_SampleAttributesDS.txt goes here\n",
"pd.read_table('GTEx_Data_2014-01-17_Annotations_SampleAttributesDS.txt')\n",
"\n",
"# Code to look at the top of the `DataFrame` you just created\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SAMPID</th>\n",
" <th>SMATSSCR</th>\n",
" <th>SMCENTER</th>\n",
" <th>SMPTHNTS</th>\n",
" <th>SMRIN</th>\n",
" <th>SMTS</th>\n",
" <th>SMTSD</th>\n",
" <th>SMTSISCH</th>\n",
" <th>SMNABTCH</th>\n",
" <th>SMNABTCHT</th>\n",
" <th>...</th>\n",
" <th>SME1ANTI</th>\n",
" <th>SMSPLTRD</th>\n",
" <th>SMBSMMRT</th>\n",
" <th>SME1SNSE</th>\n",
" <th>SME1PCTS</th>\n",
" <th>SMRRNART</th>\n",
" <th>SME1MPRT</th>\n",
" <th>SMNUM5CD</th>\n",
" <th>SMDPMPRT</th>\n",
" <th>SME2PCTS</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0 </th>\n",
" <td> GTEX-N7MS-0007-SM-26GME</td>\n",
" <td>NaN</td>\n",
" <td> C1</td>\n",
" <td> NaN</td>\n",
" <td> 8.2</td>\n",
" <td> Blood</td>\n",
" <td> Whole Blood</td>\n",
" <td> 16-19 hours</td>\n",
" <td> BP-16653</td>\n",
" <td> RNA isolation_PAXgene Blood RNA (Manual)</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1 </th>\n",
" <td> GTEX-N7MS-0007-SM-26GMV</td>\n",
" <td>NaN</td>\n",
" <td> C1</td>\n",
" <td> NaN</td>\n",
" <td> 8.2</td>\n",
" <td> Blood</td>\n",
" <td> Whole Blood</td>\n",
" <td> 16-19 hours</td>\n",
" <td> BP-16653</td>\n",
" <td> RNA isolation_PAXgene Blood RNA (Manual)</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2 </th>\n",
" <td> GTEX-N7MS-0007-SM-2D43E</td>\n",
" <td>NaN</td>\n",
" <td> C1</td>\n",
" <td> NaN</td>\n",
" <td> 8.2</td>\n",
" <td> Blood</td>\n",
" <td> Whole Blood</td>\n",
" <td> 16-19 hours</td>\n",
" <td> BP-16653</td>\n",
" <td> RNA isolation_PAXgene Blood RNA (Manual)</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3 </th>\n",
" <td> GTEX-N7MS-0007-SM-2D7W1</td>\n",
" <td>NaN</td>\n",
" <td> C1</td>\n",
" <td> NaN</td>\n",
" <td> 8.2</td>\n",
" <td> Blood</td>\n",
" <td> Whole Blood</td>\n",
" <td> 16-19 hours</td>\n",
" <td> BP-16653</td>\n",
" <td> RNA isolation_PAXgene Blood RNA (Manual)</td>\n",
" <td>...</td>\n",
" <td> 13705136</td>\n",
" <td> 18432744</td>\n",
" <td> 0.002456</td>\n",
" <td> 13447728</td>\n",
" <td> 49.526005</td>\n",
" <td> 0.041526</td>\n",
" <td> 0.835199</td>\n",
" <td> 840</td>\n",
" <td> 0.563503</td>\n",
" <td> 51.361324</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4 </th>\n",
" <td> GTEX-N7MS-0008-SM-4E3JI</td>\n",
" <td>NaN</td>\n",
" <td> C1</td>\n",
" <td> NaN</td>\n",
" <td> 10.0</td>\n",
" <td> Skin</td>\n",
" <td> Cells - Transformed fibroblasts</td>\n",
" <td> NaN</td>\n",
" <td> BP-37581</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 17962165</td>\n",
" <td> 20910366</td>\n",
" <td> 0.004087</td>\n",
" <td> 18012435</td>\n",
" <td> 50.069874</td>\n",
" <td> 0.028395</td>\n",
" <td> 0.948329</td>\n",
" <td> 879</td>\n",
" <td> 0.226835</td>\n",
" <td> 50.270794</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5 </th>\n",
" <td> GTEX-N7MS-0009-SM-2BWY4</td>\n",
" <td>NaN</td>\n",
" <td> C1</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> Blood</td>\n",
" <td> Whole Blood</td>\n",
" <td> 16-19 hours</td>\n",
" <td> BP-16657</td>\n",
" <td> DNA isolation_Whole Blood _QIAGEN Puregene (Ma...</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6 </th>\n",
" <td> GTEX-N7MS-0009-SM-2XK1D</td>\n",
" <td>NaN</td>\n",
" <td> C1</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> Blood</td>\n",
" <td> Whole Blood</td>\n",
" <td> 16-19 hours</td>\n",
" <td> BP-16657</td>\n",
" <td> DNA isolation_Whole Blood _QIAGEN Puregene (Ma...</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7 </th>\n",
" <td> GTEX-N7MS-0011-R10A-SM-2HMJK</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.1</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Frontal Cortex (BA9)</td>\n",
" <td> NaN</td>\n",
" <td> BP-19253</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> 18948398</td>\n",
" <td> 12221905</td>\n",
" <td> 0.004294</td>\n",
" <td> 18747238</td>\n",
" <td> 49.733180</td>\n",
" <td> 0.051237</td>\n",
" <td> 0.875680</td>\n",
" <td> 859</td>\n",
" <td> 0.330709</td>\n",
" <td> 50.619534</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8 </th>\n",
" <td> GTEX-N7MS-0011-R10A-SM-2IZJW</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.1</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Frontal Cortex (BA9)</td>\n",
" <td> NaN</td>\n",
" <td> BP-19253</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9 </th>\n",
" <td> GTEX-N7MS-0011-R11A-SM-2HMJS</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 6.6</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Cerebellar Hemisphere</td>\n",
" <td> NaN</td>\n",
" <td> BP-19253</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> 19024292</td>\n",
" <td> 12200496</td>\n",
" <td> 0.003643</td>\n",
" <td> 18954711</td>\n",
" <td> 49.908398</td>\n",
" <td> 0.016711</td>\n",
" <td> 0.893391</td>\n",
" <td> 851</td>\n",
" <td> 0.193112</td>\n",
" <td> 50.387028</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10 </th>\n",
" <td> GTEX-N7MS-0011-R11A-SM-2IZJZ</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 6.6</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Cerebellar Hemisphere</td>\n",
" <td> NaN</td>\n",
" <td> BP-19253</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11 </th>\n",
" <td> GTEX-N7MS-0011-R1a-SM-2AXVJ</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.3</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Hippocampus</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12 </th>\n",
" <td> GTEX-N7MS-0011-R1a-SM-2HMJG</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.3</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Hippocampus</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> 15226514</td>\n",
" <td> 8806379</td>\n",
" <td> 0.004332</td>\n",
" <td> 15122489</td>\n",
" <td> 49.828620</td>\n",
" <td> 0.041028</td>\n",
" <td> 0.790736</td>\n",
" <td> 835</td>\n",
" <td> 0.324148</td>\n",
" <td> 50.618090</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13 </th>\n",
" <td> GTEX-N7MS-0011-R2a-SM-2HML6</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.0</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Substantia nigra</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> 11678187</td>\n",
" <td> 8637924</td>\n",
" <td> 0.004073</td>\n",
" <td> 11575007</td>\n",
" <td> 49.778137</td>\n",
" <td> 0.028304</td>\n",
" <td> 0.628574</td>\n",
" <td> 837</td>\n",
" <td> 0.275110</td>\n",
" <td> 50.561234</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14 </th>\n",
" <td> GTEX-N7MS-0011-R2a-SM-2IZK7</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.0</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Substantia nigra</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15 </th>\n",
" <td> GTEX-N7MS-0011-R3a-SM-2AXVU</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.6</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Anterior cingulate cortex (BA24)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16 </th>\n",
" <td> GTEX-N7MS-0011-R3a-SM-2HMKD</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.6</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Anterior cingulate cortex (BA24)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> 8114816</td>\n",
" <td> 4205357</td>\n",
" <td> 0.004996</td>\n",
" <td> 8085575</td>\n",
" <td> 49.909750</td>\n",
" <td> 0.004037</td>\n",
" <td> 0.151460</td>\n",
" <td> 806</td>\n",
" <td> 0.364988</td>\n",
" <td> 50.379780</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17 </th>\n",
" <td> GTEX-N7MS-0011-R3a-SM-33HC6</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.6</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Anterior cingulate cortex (BA24)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> 36663415</td>\n",
" <td> 22956649</td>\n",
" <td> 0.003256</td>\n",
" <td> 36619121</td>\n",
" <td> 49.969780</td>\n",
" <td> 0.033465</td>\n",
" <td> 0.916319</td>\n",
" <td> 875</td>\n",
" <td> 0.341985</td>\n",
" <td> 50.874683</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18 </th>\n",
" <td> GTEX-N7MS-0011-R4a-SM-2AXW2</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 6.2</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Amygdala</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19 </th>\n",
" <td> GTEX-N7MS-0011-R4a-SM-2HMKW</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 6.2</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Amygdala</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> 11651563</td>\n",
" <td> 8137082</td>\n",
" <td> 0.004180</td>\n",
" <td> 11545498</td>\n",
" <td> 49.771380</td>\n",
" <td> 0.035598</td>\n",
" <td> 0.657749</td>\n",
" <td> 811</td>\n",
" <td> 0.334207</td>\n",
" <td> 50.741714</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20 </th>\n",
" <td> GTEX-N7MS-0011-R5a-SM-2AXW7</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.8</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Caudate (basal ganglia)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21 </th>\n",
" <td> GTEX-N7MS-0011-R5a-SM-2HMK8</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.8</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Caudate (basal ganglia)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> 21641531</td>\n",
" <td> 13525503</td>\n",
" <td> 0.004141</td>\n",
" <td> 21325223</td>\n",
" <td> 49.631920</td>\n",
" <td> 0.050445</td>\n",
" <td> 0.889377</td>\n",
" <td> 891</td>\n",
" <td> 0.312833</td>\n",
" <td> 50.695858</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22 </th>\n",
" <td> GTEX-N7MS-0011-R6a-SM-2AXWD</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.6</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Nucleus accumbens (basal ganglia)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23 </th>\n",
" <td> GTEX-N7MS-0011-R6a-SM-2HMJ4</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 7.6</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Nucleus accumbens (basal ganglia)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> 17668410</td>\n",
" <td> 12313288</td>\n",
" <td> 0.003942</td>\n",
" <td> 17447282</td>\n",
" <td> 49.685143</td>\n",
" <td> 0.040960</td>\n",
" <td> 0.883223</td>\n",
" <td> 854</td>\n",
" <td> 0.262580</td>\n",
" <td> 50.615242</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24 </th>\n",
" <td> GTEX-N7MS-0011-R7a-SM-2AXV5</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 6.4</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Putamen (basal ganglia)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25 </th>\n",
" <td> GTEX-N7MS-0011-R7a-SM-2HMKN</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 6.4</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Putamen (basal ganglia)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> 10366169</td>\n",
" <td> 5352672</td>\n",
" <td> 0.004739</td>\n",
" <td> 9997472</td>\n",
" <td> 49.094720</td>\n",
" <td> 0.058973</td>\n",
" <td> 0.664810</td>\n",
" <td> 791</td>\n",
" <td> 0.400345</td>\n",
" <td> 51.510887</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26 </th>\n",
" <td> GTEX-N7MS-0011-R8a-SM-2AXVD</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 6.8</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Hypothalamus</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_QIAGEN miRNeasy</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27 </th>\n",
" <td> GTEX-N7MS-0011-R8a-SM-2YUMK</td>\n",
" <td>NaN</td>\n",
" <td> C1, A1</td>\n",
" <td> NaN</td>\n",
" <td> 6.8</td>\n",
" <td> Brain</td>\n",
" <td> Brain - Hypothalamus</td>\n",
" <td> NaN</td>\n",
" <td> BP-17395</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 12317489</td>\n",
" <td> 9750793</td>\n",
" <td> 0.002769</td>\n",
" <td> 12167817</td>\n",
" <td> 49.694363</td>\n",
" <td> 0.054535</td>\n",
" <td> 0.817677</td>\n",
" <td> 820</td>\n",
" <td> 0.366260</td>\n",
" <td> 50.655490</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28 </th>\n",
" <td> GTEX-N7MS-0126-SM-3TW8O</td>\n",
" <td> 3</td>\n",
" <td> C1</td>\n",
" <td> OK</td>\n",
" <td> 9.1</td>\n",
" <td> Testis</td>\n",
" <td> Testis</td>\n",
" <td> 16-19 hours</td>\n",
" <td> BP-16740</td>\n",
" <td> RNA isolation_PAXgene Tissue miRNA</td>\n",
" <td>...</td>\n",
" <td> 19539848</td>\n",
" <td> 15703873</td>\n",
" <td> 0.002877</td>\n",
" <td> 19525988</td>\n",
" <td> 49.982260</td>\n",
" <td> 0.051392</td>\n",
" <td> 0.934544</td>\n",
" <td> 937</td>\n",
" <td> 0.185286</td>\n",
" <td> 50.262527</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29 </th>\n",
" <td> GTEX-N7MS-0225-SM-4E3HO</td>\n",
" <td> 1</td>\n",
" <td> C1</td>\n",
" <td> OK for analysis</td>\n",
" <td> 7.7</td>\n",
" <td> Skin</td>\n",
" <td> Skin - Sun Exposed (Lower leg)</td>\n",
" <td> 16-19 hours</td>\n",
" <td> BP-36182</td>\n",
" <td> RNA isolation_PAXgene Tissue miRNA</td>\n",
" <td>...</td>\n",
" <td> 17258030</td>\n",
" <td> 13502100</td>\n",
" <td> 0.004377</td>\n",
" <td> 17381617</td>\n",
" <td> 50.178387</td>\n",
" <td> 0.010061</td>\n",
" <td> 0.937283</td>\n",
" <td> 862</td>\n",
" <td> 0.223826</td>\n",
" <td> 50.288605</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4471</th>\n",
" <td> K-562-SM-3GADY</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 20293042</td>\n",
" <td> 21127232</td>\n",
" <td> 0.003853</td>\n",
" <td> 20085667</td>\n",
" <td> 49.743217</td>\n",
" <td> 0.010052</td>\n",
" <td> 0.912978</td>\n",
" <td> 877</td>\n",
" <td> 0.323958</td>\n",
" <td> 50.618893</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4472</th>\n",
" <td> K-562-SM-3GAFC</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 19387186</td>\n",
" <td> 19565057</td>\n",
" <td> 0.003505</td>\n",
" <td> 19503152</td>\n",
" <td> 50.149097</td>\n",
" <td> 0.011422</td>\n",
" <td> 0.890707</td>\n",
" <td> 854</td>\n",
" <td> 0.420687</td>\n",
" <td> 50.387650</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4473</th>\n",
" <td> K-562-SM-3GIKB</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 12498139</td>\n",
" <td> 12769804</td>\n",
" <td> 0.003243</td>\n",
" <td> 12654894</td>\n",
" <td> 50.311604</td>\n",
" <td> 0.010686</td>\n",
" <td> 0.852001</td>\n",
" <td> 849</td>\n",
" <td> 0.388517</td>\n",
" <td> 50.260590</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4474</th>\n",
" <td> K-562-SM-3GILO</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 18089388</td>\n",
" <td> 18151281</td>\n",
" <td> 0.004954</td>\n",
" <td> 17946903</td>\n",
" <td> 49.802307</td>\n",
" <td> 0.010809</td>\n",
" <td> 0.865095</td>\n",
" <td> 876</td>\n",
" <td> 0.278859</td>\n",
" <td> 50.707745</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4475</th>\n",
" <td> K-562-SM-3K2BF</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 10709316</td>\n",
" <td> 11153056</td>\n",
" <td> 0.005191</td>\n",
" <td> 10650786</td>\n",
" <td> 49.862990</td>\n",
" <td> 0.017607</td>\n",
" <td> 0.801476</td>\n",
" <td> 841</td>\n",
" <td> 0.225620</td>\n",
" <td> 50.603752</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4476</th>\n",
" <td> K-562-SM-3LK7S</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 19775330</td>\n",
" <td> 21833559</td>\n",
" <td> 0.002277</td>\n",
" <td> 19642715</td>\n",
" <td> 49.831787</td>\n",
" <td> 0.018965</td>\n",
" <td> 0.922582</td>\n",
" <td> 882</td>\n",
" <td> 0.312202</td>\n",
" <td> 50.353050</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4477</th>\n",
" <td> K-562-SM-3MJHH</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 22850569</td>\n",
" <td> 25074169</td>\n",
" <td> 0.002691</td>\n",
" <td> 22654052</td>\n",
" <td> 49.784070</td>\n",
" <td> 0.014306</td>\n",
" <td> 0.938719</td>\n",
" <td> 887</td>\n",
" <td> 0.299666</td>\n",
" <td> 50.487100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4478</th>\n",
" <td> K-562-SM-3NB3I</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 21923991</td>\n",
" <td> 23322261</td>\n",
" <td> 0.003130</td>\n",
" <td> 22010308</td>\n",
" <td> 50.098236</td>\n",
" <td> 0.022786</td>\n",
" <td> 0.928603</td>\n",
" <td> 872</td>\n",
" <td> 0.299371</td>\n",
" <td> 50.367115</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4479</th>\n",
" <td> K-562-SM-3NMAP</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 22210968</td>\n",
" <td> 24363787</td>\n",
" <td> 0.003103</td>\n",
" <td> 22339747</td>\n",
" <td> 50.144530</td>\n",
" <td> 0.018574</td>\n",
" <td> 0.936381</td>\n",
" <td> 890</td>\n",
" <td> 0.284208</td>\n",
" <td> 50.140530</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4480</th>\n",
" <td> K-562-SM-3NMDG</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 20447451</td>\n",
" <td> 22524234</td>\n",
" <td> 0.002659</td>\n",
" <td> 20583281</td>\n",
" <td> 50.165520</td>\n",
" <td> 0.017522</td>\n",
" <td> 0.950120</td>\n",
" <td> 873</td>\n",
" <td> 0.267642</td>\n",
" <td> 50.144337</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4481</th>\n",
" <td> K-562-SM-3P61Y</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 23221097</td>\n",
" <td> 25541338</td>\n",
" <td> 0.002788</td>\n",
" <td> 23358263</td>\n",
" <td> 50.147243</td>\n",
" <td> 0.022186</td>\n",
" <td> 0.946919</td>\n",
" <td> 894</td>\n",
" <td> 0.257213</td>\n",
" <td> 50.123116</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4482</th>\n",
" <td> K-562-SM-46MWI</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 12960037</td>\n",
" <td> 14028074</td>\n",
" <td> 0.002139</td>\n",
" <td> 13042438</td>\n",
" <td> 50.158447</td>\n",
" <td> 0.024899</td>\n",
" <td> 0.956672</td>\n",
" <td> 873</td>\n",
" <td> 0.188689</td>\n",
" <td> 50.133953</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4483</th>\n",
" <td> K-562-SM-47JYY</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 15251988</td>\n",
" <td> 17187136</td>\n",
" <td> 0.002230</td>\n",
" <td> 15395095</td>\n",
" <td> 50.233475</td>\n",
" <td> 0.022935</td>\n",
" <td> 0.957918</td>\n",
" <td> 862</td>\n",
" <td> 0.218666</td>\n",
" <td> 49.976242</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4484</th>\n",
" <td> K-562-SM-48FEU</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 11679926</td>\n",
" <td> 13207067</td>\n",
" <td> 0.002256</td>\n",
" <td> 11762756</td>\n",
" <td> 50.176662</td>\n",
" <td> 0.028136</td>\n",
" <td> 0.953367</td>\n",
" <td> 832</td>\n",
" <td> 0.393895</td>\n",
" <td> 50.098743</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4485</th>\n",
" <td> K-562-SM-48TE3</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 14556993</td>\n",
" <td> 16245672</td>\n",
" <td> 0.001645</td>\n",
" <td> 14501457</td>\n",
" <td> 49.904438</td>\n",
" <td> 0.013772</td>\n",
" <td> 0.964781</td>\n",
" <td> 873</td>\n",
" <td> 0.197786</td>\n",
" <td> 50.232384</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4486</th>\n",
" <td> K-562-SM-4AD4F</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 14758394</td>\n",
" <td> 16187369</td>\n",
" <td> 0.002012</td>\n",
" <td> 14775511</td>\n",
" <td> 50.028980</td>\n",
" <td> 0.013740</td>\n",
" <td> 0.962448</td>\n",
" <td> 863</td>\n",
" <td> 0.197407</td>\n",
" <td> 50.208282</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4487</th>\n",
" <td> K-562-SM-4AT3W</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 10916665</td>\n",
" <td> 12078403</td>\n",
" <td> 0.005620</td>\n",
" <td> 10964475</td>\n",
" <td> 50.109250</td>\n",
" <td> 0.013850</td>\n",
" <td> 0.950125</td>\n",
" <td> 828</td>\n",
" <td> 0.172564</td>\n",
" <td> 50.208042</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4488</th>\n",
" <td> K-562-SM-4B66B</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 22550061</td>\n",
" <td> 24921589</td>\n",
" <td> 0.002292</td>\n",
" <td> 22611795</td>\n",
" <td> 50.068350</td>\n",
" <td> 0.022443</td>\n",
" <td> 0.956932</td>\n",
" <td> 892</td>\n",
" <td> 0.230228</td>\n",
" <td> 50.232353</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4489</th>\n",
" <td> K-562-SM-4BONS</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 15745843</td>\n",
" <td> 17474655</td>\n",
" <td> 0.003985</td>\n",
" <td> 15727040</td>\n",
" <td> 49.970127</td>\n",
" <td> 0.020865</td>\n",
" <td> 0.956510</td>\n",
" <td> 876</td>\n",
" <td> 0.215475</td>\n",
" <td> 50.267246</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4490</th>\n",
" <td> K-562-SM-4BRWK</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 12423333</td>\n",
" <td> 13669729</td>\n",
" <td> 0.002084</td>\n",
" <td> 12383723</td>\n",
" <td> 49.920166</td>\n",
" <td> 0.011200</td>\n",
" <td> 0.958494</td>\n",
" <td> 861</td>\n",
" <td> 0.184230</td>\n",
" <td> 50.298664</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4491</th>\n",
" <td> K-562-SM-4DM4W</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 11350047</td>\n",
" <td> 12454038</td>\n",
" <td> 0.005404</td>\n",
" <td> 11400063</td>\n",
" <td> 50.109924</td>\n",
" <td> 0.029977</td>\n",
" <td> 0.945595</td>\n",
" <td> 828</td>\n",
" <td> 0.181774</td>\n",
" <td> 50.162186</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4492</th>\n",
" <td> K-562-SM-4EDPU</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 19954703</td>\n",
" <td> 21371627</td>\n",
" <td> 0.003999</td>\n",
" <td> 20055440</td>\n",
" <td> 50.125885</td>\n",
" <td> 0.042664</td>\n",
" <td> 0.944981</td>\n",
" <td> 890</td>\n",
" <td> 0.243209</td>\n",
" <td> 50.225540</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4493</th>\n",
" <td> K-562-SM-4GICD</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 21863245</td>\n",
" <td> 24097377</td>\n",
" <td> 0.003084</td>\n",
" <td> 21979888</td>\n",
" <td> 50.133026</td>\n",
" <td> 0.012700</td>\n",
" <td> 0.929063</td>\n",
" <td> 893</td>\n",
" <td> 0.242783</td>\n",
" <td> 50.196144</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4494</th>\n",
" <td> K-562-SM-4IHK7</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 14231410</td>\n",
" <td> 15796005</td>\n",
" <td> 0.004426</td>\n",
" <td> 14188282</td>\n",
" <td> 49.924120</td>\n",
" <td> 0.022133</td>\n",
" <td> 0.949039</td>\n",
" <td> 870</td>\n",
" <td> 0.214134</td>\n",
" <td> 50.295345</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4495</th>\n",
" <td> K-562-SM-4JBIQ</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 18279772</td>\n",
" <td> 20403076</td>\n",
" <td> 0.004726</td>\n",
" <td> 18332662</td>\n",
" <td> 50.072230</td>\n",
" <td> 0.014611</td>\n",
" <td> 0.954230</td>\n",
" <td> 871</td>\n",
" <td> 0.226838</td>\n",
" <td> 50.212060</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4496</th>\n",
" <td> K-562-SM-4KKZ9</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 17621411</td>\n",
" <td> 19649733</td>\n",
" <td> 0.003199</td>\n",
" <td> 17603160</td>\n",
" <td> 49.974094</td>\n",
" <td> 0.017336</td>\n",
" <td> 0.951078</td>\n",
" <td> 855</td>\n",
" <td> 0.208081</td>\n",
" <td> 50.299175</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4497</th>\n",
" <td> K-562-SM-4LMI2</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 16494770</td>\n",
" <td> 18425439</td>\n",
" <td> 0.003787</td>\n",
" <td> 16498971</td>\n",
" <td> 50.006367</td>\n",
" <td> 0.011050</td>\n",
" <td> 0.952641</td>\n",
" <td> 866</td>\n",
" <td> 0.197396</td>\n",
" <td> 50.248615</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4498</th>\n",
" <td> K-562-SM-4LVKX</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 9.5</td>\n",
" <td> Bone Marrow</td>\n",
" <td> Cells - Leukemia cell line (CML)</td>\n",
" <td> NaN</td>\n",
" <td> BP-17177</td>\n",
" <td> RNA isolation_Trizol Manual (Cell Pellet)</td>\n",
" <td>...</td>\n",
" <td> 13735552</td>\n",
" <td> 15244841</td>\n",
" <td> 0.002508</td>\n",
" <td> 13719276</td>\n",
" <td> 49.970356</td>\n",
" <td> 0.011353</td>\n",
" <td> 0.952748</td>\n",
" <td> 870</td>\n",
" <td> 0.185469</td>\n",
" <td> 50.244150</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4499</th>\n",
" <td> NA12878-SM-2XJZN</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> Cell Line DNA (Derived from Blood Cells)</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4500</th>\n",
" <td> NA12878_C-SM-2VCTR</td>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> Cell Line DNA (Derived from Blood Cells)</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>4501 rows \u00d7 59 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 196,
"text": [
" SAMPID SMATSSCR SMCENTER SMPTHNTS SMRIN \\\n",
"0 GTEX-N7MS-0007-SM-26GME NaN C1 NaN 8.2 \n",
"1 GTEX-N7MS-0007-SM-26GMV NaN C1 NaN 8.2 \n",
"2 GTEX-N7MS-0007-SM-2D43E NaN C1 NaN 8.2 \n",
"3 GTEX-N7MS-0007-SM-2D7W1 NaN C1 NaN 8.2 \n",
"4 GTEX-N7MS-0008-SM-4E3JI NaN C1 NaN 10.0 \n",
"5 GTEX-N7MS-0009-SM-2BWY4 NaN C1 NaN NaN \n",
"6 GTEX-N7MS-0009-SM-2XK1D NaN C1 NaN NaN \n",
"7 GTEX-N7MS-0011-R10A-SM-2HMJK NaN C1, A1 NaN 7.1 \n",
"8 GTEX-N7MS-0011-R10A-SM-2IZJW NaN C1, A1 NaN 7.1 \n",
"9 GTEX-N7MS-0011-R11A-SM-2HMJS NaN C1, A1 NaN 6.6 \n",
"10 GTEX-N7MS-0011-R11A-SM-2IZJZ NaN C1, A1 NaN 6.6 \n",
"11 GTEX-N7MS-0011-R1a-SM-2AXVJ NaN C1, A1 NaN 7.3 \n",
"12 GTEX-N7MS-0011-R1a-SM-2HMJG NaN C1, A1 NaN 7.3 \n",
"13 GTEX-N7MS-0011-R2a-SM-2HML6 NaN C1, A1 NaN 7.0 \n",
"14 GTEX-N7MS-0011-R2a-SM-2IZK7 NaN C1, A1 NaN 7.0 \n",
"15 GTEX-N7MS-0011-R3a-SM-2AXVU NaN C1, A1 NaN 7.6 \n",
"16 GTEX-N7MS-0011-R3a-SM-2HMKD NaN C1, A1 NaN 7.6 \n",
"17 GTEX-N7MS-0011-R3a-SM-33HC6 NaN C1, A1 NaN 7.6 \n",
"18 GTEX-N7MS-0011-R4a-SM-2AXW2 NaN C1, A1 NaN 6.2 \n",
"19 GTEX-N7MS-0011-R4a-SM-2HMKW NaN C1, A1 NaN 6.2 \n",
"20 GTEX-N7MS-0011-R5a-SM-2AXW7 NaN C1, A1 NaN 7.8 \n",
"21 GTEX-N7MS-0011-R5a-SM-2HMK8 NaN C1, A1 NaN 7.8 \n",
"22 GTEX-N7MS-0011-R6a-SM-2AXWD NaN C1, A1 NaN 7.6 \n",
"23 GTEX-N7MS-0011-R6a-SM-2HMJ4 NaN C1, A1 NaN 7.6 \n",
"24 GTEX-N7MS-0011-R7a-SM-2AXV5 NaN C1, A1 NaN 6.4 \n",
"25 GTEX-N7MS-0011-R7a-SM-2HMKN NaN C1, A1 NaN 6.4 \n",
"26 GTEX-N7MS-0011-R8a-SM-2AXVD NaN C1, A1 NaN 6.8 \n",
"27 GTEX-N7MS-0011-R8a-SM-2YUMK NaN C1, A1 NaN 6.8 \n",
"28 GTEX-N7MS-0126-SM-3TW8O 3 C1 OK 9.1 \n",
"29 GTEX-N7MS-0225-SM-4E3HO 1 C1 OK for analysis 7.7 \n",
"... ... ... ... ... ... \n",
"4471 K-562-SM-3GADY NaN NaN NaN 9.5 \n",
"4472 K-562-SM-3GAFC NaN NaN NaN 9.5 \n",
"4473 K-562-SM-3GIKB NaN NaN NaN 9.5 \n",
"4474 K-562-SM-3GILO NaN NaN NaN 9.5 \n",
"4475 K-562-SM-3K2BF NaN NaN NaN 9.5 \n",
"4476 K-562-SM-3LK7S NaN NaN NaN 9.5 \n",
"4477 K-562-SM-3MJHH NaN NaN NaN 9.5 \n",
"4478 K-562-SM-3NB3I NaN NaN NaN 9.5 \n",
"4479 K-562-SM-3NMAP NaN NaN NaN 9.5 \n",
"4480 K-562-SM-3NMDG NaN NaN NaN 9.5 \n",
"4481 K-562-SM-3P61Y NaN NaN NaN 9.5 \n",
"4482 K-562-SM-46MWI NaN NaN NaN 9.5 \n",
"4483 K-562-SM-47JYY NaN NaN NaN 9.5 \n",
"4484 K-562-SM-48FEU NaN NaN NaN 9.5 \n",
"4485 K-562-SM-48TE3 NaN NaN NaN 9.5 \n",
"4486 K-562-SM-4AD4F NaN NaN NaN 9.5 \n",
"4487 K-562-SM-4AT3W NaN NaN NaN 9.5 \n",
"4488 K-562-SM-4B66B NaN NaN NaN 9.5 \n",
"4489 K-562-SM-4BONS NaN NaN NaN 9.5 \n",
"4490 K-562-SM-4BRWK NaN NaN NaN 9.5 \n",
"4491 K-562-SM-4DM4W NaN NaN NaN 9.5 \n",
"4492 K-562-SM-4EDPU NaN NaN NaN 9.5 \n",
"4493 K-562-SM-4GICD NaN NaN NaN 9.5 \n",
"4494 K-562-SM-4IHK7 NaN NaN NaN 9.5 \n",
"4495 K-562-SM-4JBIQ NaN NaN NaN 9.5 \n",
"4496 K-562-SM-4KKZ9 NaN NaN NaN 9.5 \n",
"4497 K-562-SM-4LMI2 NaN NaN NaN 9.5 \n",
"4498 K-562-SM-4LVKX NaN NaN NaN 9.5 \n",
"4499 NA12878-SM-2XJZN NaN NaN NaN NaN \n",
"4500 NA12878_C-SM-2VCTR NaN NaN NaN NaN \n",
"\n",
" SMTS SMTSD SMTSISCH \\\n",
"0 Blood Whole Blood 16-19 hours \n",
"1 Blood Whole Blood 16-19 hours \n",
"2 Blood Whole Blood 16-19 hours \n",
"3 Blood Whole Blood 16-19 hours \n",
"4 Skin Cells - Transformed fibroblasts NaN \n",
"5 Blood Whole Blood 16-19 hours \n",
"6 Blood Whole Blood 16-19 hours \n",
"7 Brain Brain - Frontal Cortex (BA9) NaN \n",
"8 Brain Brain - Frontal Cortex (BA9) NaN \n",
"9 Brain Brain - Cerebellar Hemisphere NaN \n",
"10 Brain Brain - Cerebellar Hemisphere NaN \n",
"11 Brain Brain - Hippocampus NaN \n",
"12 Brain Brain - Hippocampus NaN \n",
"13 Brain Brain - Substantia nigra NaN \n",
"14 Brain Brain - Substantia nigra NaN \n",
"15 Brain Brain - Anterior cingulate cortex (BA24) NaN \n",
"16 Brain Brain - Anterior cingulate cortex (BA24) NaN \n",
"17 Brain Brain - Anterior cingulate cortex (BA24) NaN \n",
"18 Brain Brain - Amygdala NaN \n",
"19 Brain Brain - Amygdala NaN \n",
"20 Brain Brain - Caudate (basal ganglia) NaN \n",
"21 Brain Brain - Caudate (basal ganglia) NaN \n",
"22 Brain Brain - Nucleus accumbens (basal ganglia) NaN \n",
"23 Brain Brain - Nucleus accumbens (basal ganglia) NaN \n",
"24 Brain Brain - Putamen (basal ganglia) NaN \n",
"25 Brain Brain - Putamen (basal ganglia) NaN \n",
"26 Brain Brain - Hypothalamus NaN \n",
"27 Brain Brain - Hypothalamus NaN \n",
"28 Testis Testis 16-19 hours \n",
"29 Skin Skin - Sun Exposed (Lower leg) 16-19 hours \n",
"... ... ... ... \n",
"4471 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4472 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4473 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4474 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4475 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4476 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4477 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4478 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4479 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4480 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4481 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4482 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4483 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4484 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4485 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4486 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4487 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4488 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4489 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4490 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4491 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4492 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4493 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4494 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4495 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4496 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4497 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4498 Bone Marrow Cells - Leukemia cell line (CML) NaN \n",
"4499 NaN NaN NaN \n",
"4500 NaN NaN NaN \n",
"\n",
" SMNABTCH SMNABTCHT ... \\\n",
"0 BP-16653 RNA isolation_PAXgene Blood RNA (Manual) ... \n",
"1 BP-16653 RNA isolation_PAXgene Blood RNA (Manual) ... \n",
"2 BP-16653 RNA isolation_PAXgene Blood RNA (Manual) ... \n",
"3 BP-16653 RNA isolation_PAXgene Blood RNA (Manual) ... \n",
"4 BP-37581 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"5 BP-16657 DNA isolation_Whole Blood _QIAGEN Puregene (Ma... ... \n",
"6 BP-16657 DNA isolation_Whole Blood _QIAGEN Puregene (Ma... ... \n",
"7 BP-19253 RNA isolation_QIAGEN miRNeasy ... \n",
"8 BP-19253 RNA isolation_QIAGEN miRNeasy ... \n",
"9 BP-19253 RNA isolation_QIAGEN miRNeasy ... \n",
"10 BP-19253 RNA isolation_QIAGEN miRNeasy ... \n",
"11 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"12 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"13 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"14 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"15 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"16 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"17 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"18 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"19 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"20 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"21 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"22 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"23 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"24 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"25 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"26 BP-17395 RNA isolation_QIAGEN miRNeasy ... \n",
"27 BP-17395 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"28 BP-16740 RNA isolation_PAXgene Tissue miRNA ... \n",
"29 BP-36182 RNA isolation_PAXgene Tissue miRNA ... \n",
"... ... ... ... \n",
"4471 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4472 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4473 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4474 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4475 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4476 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4477 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4478 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4479 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4480 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4481 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4482 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4483 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4484 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4485 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4486 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4487 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4488 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4489 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4490 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4491 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4492 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4493 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4494 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4495 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4496 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4497 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4498 BP-17177 RNA isolation_Trizol Manual (Cell Pellet) ... \n",
"4499 NaN Cell Line DNA (Derived from Blood Cells) ... \n",
"4500 NaN Cell Line DNA (Derived from Blood Cells) ... \n",
"\n",
" SME1ANTI SMSPLTRD SMBSMMRT SME1SNSE SME1PCTS SMRRNART SME1MPRT \\\n",
"0 NaN NaN NaN NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN NaN NaN NaN \n",
"3 13705136 18432744 0.002456 13447728 49.526005 0.041526 0.835199 \n",
"4 17962165 20910366 0.004087 18012435 50.069874 0.028395 0.948329 \n",
"5 NaN NaN NaN NaN NaN NaN NaN \n",
"6 NaN NaN NaN NaN NaN NaN NaN \n",
"7 18948398 12221905 0.004294 18747238 49.733180 0.051237 0.875680 \n",
"8 NaN NaN NaN NaN NaN NaN NaN \n",
"9 19024292 12200496 0.003643 18954711 49.908398 0.016711 0.893391 \n",
"10 NaN NaN NaN NaN NaN NaN NaN \n",
"11 NaN NaN NaN NaN NaN NaN NaN \n",
"12 15226514 8806379 0.004332 15122489 49.828620 0.041028 0.790736 \n",
"13 11678187 8637924 0.004073 11575007 49.778137 0.028304 0.628574 \n",
"14 NaN NaN NaN NaN NaN NaN NaN \n",
"15 NaN NaN NaN NaN NaN NaN NaN \n",
"16 8114816 4205357 0.004996 8085575 49.909750 0.004037 0.151460 \n",
"17 36663415 22956649 0.003256 36619121 49.969780 0.033465 0.916319 \n",
"18 NaN NaN NaN NaN NaN NaN NaN \n",
"19 11651563 8137082 0.004180 11545498 49.771380 0.035598 0.657749 \n",
"20 NaN NaN NaN NaN NaN NaN NaN \n",
"21 21641531 13525503 0.004141 21325223 49.631920 0.050445 0.889377 \n",
"22 NaN NaN NaN NaN NaN NaN NaN \n",
"23 17668410 12313288 0.003942 17447282 49.685143 0.040960 0.883223 \n",
"24 NaN NaN NaN NaN NaN NaN NaN \n",
"25 10366169 5352672 0.004739 9997472 49.094720 0.058973 0.664810 \n",
"26 NaN NaN NaN NaN NaN NaN NaN \n",
"27 12317489 9750793 0.002769 12167817 49.694363 0.054535 0.817677 \n",
"28 19539848 15703873 0.002877 19525988 49.982260 0.051392 0.934544 \n",
"29 17258030 13502100 0.004377 17381617 50.178387 0.010061 0.937283 \n",
"... ... ... ... ... ... ... ... \n",
"4471 20293042 21127232 0.003853 20085667 49.743217 0.010052 0.912978 \n",
"4472 19387186 19565057 0.003505 19503152 50.149097 0.011422 0.890707 \n",
"4473 12498139 12769804 0.003243 12654894 50.311604 0.010686 0.852001 \n",
"4474 18089388 18151281 0.004954 17946903 49.802307 0.010809 0.865095 \n",
"4475 10709316 11153056 0.005191 10650786 49.862990 0.017607 0.801476 \n",
"4476 19775330 21833559 0.002277 19642715 49.831787 0.018965 0.922582 \n",
"4477 22850569 25074169 0.002691 22654052 49.784070 0.014306 0.938719 \n",
"4478 21923991 23322261 0.003130 22010308 50.098236 0.022786 0.928603 \n",
"4479 22210968 24363787 0.003103 22339747 50.144530 0.018574 0.936381 \n",
"4480 20447451 22524234 0.002659 20583281 50.165520 0.017522 0.950120 \n",
"4481 23221097 25541338 0.002788 23358263 50.147243 0.022186 0.946919 \n",
"4482 12960037 14028074 0.002139 13042438 50.158447 0.024899 0.956672 \n",
"4483 15251988 17187136 0.002230 15395095 50.233475 0.022935 0.957918 \n",
"4484 11679926 13207067 0.002256 11762756 50.176662 0.028136 0.953367 \n",
"4485 14556993 16245672 0.001645 14501457 49.904438 0.013772 0.964781 \n",
"4486 14758394 16187369 0.002012 14775511 50.028980 0.013740 0.962448 \n",
"4487 10916665 12078403 0.005620 10964475 50.109250 0.013850 0.950125 \n",
"4488 22550061 24921589 0.002292 22611795 50.068350 0.022443 0.956932 \n",
"4489 15745843 17474655 0.003985 15727040 49.970127 0.020865 0.956510 \n",
"4490 12423333 13669729 0.002084 12383723 49.920166 0.011200 0.958494 \n",
"4491 11350047 12454038 0.005404 11400063 50.109924 0.029977 0.945595 \n",
"4492 19954703 21371627 0.003999 20055440 50.125885 0.042664 0.944981 \n",
"4493 21863245 24097377 0.003084 21979888 50.133026 0.012700 0.929063 \n",
"4494 14231410 15796005 0.004426 14188282 49.924120 0.022133 0.949039 \n",
"4495 18279772 20403076 0.004726 18332662 50.072230 0.014611 0.954230 \n",
"4496 17621411 19649733 0.003199 17603160 49.974094 0.017336 0.951078 \n",
"4497 16494770 18425439 0.003787 16498971 50.006367 0.011050 0.952641 \n",
"4498 13735552 15244841 0.002508 13719276 49.970356 0.011353 0.952748 \n",
"4499 NaN NaN NaN NaN NaN NaN NaN \n",
"4500 NaN NaN NaN NaN NaN NaN NaN \n",
"\n",
" SMNUM5CD SMDPMPRT SME2PCTS \n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 NaN NaN NaN \n",
"3 840 0.563503 51.361324 \n",
"4 879 0.226835 50.270794 \n",
"5 NaN NaN NaN \n",
"6 NaN NaN NaN \n",
"7 859 0.330709 50.619534 \n",
"8 NaN NaN NaN \n",
"9 851 0.193112 50.387028 \n",
"10 NaN NaN NaN \n",
"11 NaN NaN NaN \n",
"12 835 0.324148 50.618090 \n",
"13 837 0.275110 50.561234 \n",
"14 NaN NaN NaN \n",
"15 NaN NaN NaN \n",
"16 806 0.364988 50.379780 \n",
"17 875 0.341985 50.874683 \n",
"18 NaN NaN NaN \n",
"19 811 0.334207 50.741714 \n",
"20 NaN NaN NaN \n",
"21 891 0.312833 50.695858 \n",
"22 NaN NaN NaN \n",
"23 854 0.262580 50.615242 \n",
"24 NaN NaN NaN \n",
"25 791 0.400345 51.510887 \n",
"26 NaN NaN NaN \n",
"27 820 0.366260 50.655490 \n",
"28 937 0.185286 50.262527 \n",
"29 862 0.223826 50.288605 \n",
"... ... ... ... \n",
"4471 877 0.323958 50.618893 \n",
"4472 854 0.420687 50.387650 \n",
"4473 849 0.388517 50.260590 \n",
"4474 876 0.278859 50.707745 \n",
"4475 841 0.225620 50.603752 \n",
"4476 882 0.312202 50.353050 \n",
"4477 887 0.299666 50.487100 \n",
"4478 872 0.299371 50.367115 \n",
"4479 890 0.284208 50.140530 \n",
"4480 873 0.267642 50.144337 \n",
"4481 894 0.257213 50.123116 \n",
"4482 873 0.188689 50.133953 \n",
"4483 862 0.218666 49.976242 \n",
"4484 832 0.393895 50.098743 \n",
"4485 873 0.197786 50.232384 \n",
"4486 863 0.197407 50.208282 \n",
"4487 828 0.172564 50.208042 \n",
"4488 892 0.230228 50.232353 \n",
"4489 876 0.215475 50.267246 \n",
"4490 861 0.184230 50.298664 \n",
"4491 828 0.181774 50.162186 \n",
"4492 890 0.243209 50.225540 \n",
"4493 893 0.242783 50.196144 \n",
"4494 870 0.214134 50.295345 \n",
"4495 871 0.226838 50.212060 \n",
"4496 855 0.208081 50.299175 \n",
"4497 866 0.197396 50.248615 \n",
"4498 870 0.185469 50.244150 \n",
"4499 NaN NaN NaN \n",
"4500 NaN NaN NaN \n",
"\n",
"[4501 rows x 59 columns]"
]
}
],
"prompt_number": 196
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ugh, this has the UGLIEST column names. What does `SMTSISCH` really mean, anyway? Plus, there's 59 columns and I don't want to have to go through and copy/paste something 59 different times.\n",
"\n",
"Turns out the excel file `GTEx_Data_2014-01-17_Annotations_SampleAttributesDD.xlsx` has the mapping between these weird names and human-readable concepts. Open it Excel.\n",
"\n",
"Since there's so many things to rename, we'll do it programmatically rather than copy/pasting by hand."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Read the excel file `GTEx_Data_2014-01-17_Annotations_SampleAttributesDD.xlsx` using `pandas`, and set the first column as the \"index\" or row names"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hint 1: `index_col` is the argument you want for setting the column number that should be the index\n",
"\n",
"Hint 2: In computer science, we count from 0, so the third column is indicated by the number `2`\n",
"\n",
"Hint 3: \"how to open excel in pandas\" is a great search term :)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to read the excel file GTEx_Data_2014-01-17_Annotations_SampleAttributesDD.xlsx, and set the first column as the \"index\" goes here\n",
"sample_attributes_dd = pd.read_excel('GTEx_Data_2014-01-17_Annotations_SampleAttributesDD.xlsx', index_col=0)\n",
"\n",
"# Code to look at the top of the file goes here\n",
"sample_attributes_dd.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>VARDESC</th>\n",
" <th>DOCFILE</th>\n",
" <th>TYPE</th>\n",
" <th>UNITS</th>\n",
" <th>COMMENT1</th>\n",
" <th>COMMENT2</th>\n",
" <th>VALUES</th>\n",
" <th>Unnamed: 8</th>\n",
" <th>Unnamed: 9</th>\n",
" <th>Unnamed: 10</th>\n",
" <th>...</th>\n",
" <th>Unnamed: 47</th>\n",
" <th>Unnamed: 48</th>\n",
" <th>Unnamed: 49</th>\n",
" <th>Unnamed: 50</th>\n",
" <th>Unnamed: 51</th>\n",
" <th>Unnamed: 52</th>\n",
" <th>Unnamed: 53</th>\n",
" <th>Unnamed: 54</th>\n",
" <th>Unnamed: 55</th>\n",
" <th>Unnamed: 56</th>\n",
" </tr>\n",
" <tr>\n",
" <th>VARNAME</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>SAMPID</th>\n",
" <td> Sample ID, GTEx Public Sample ID</td>\n",
" <td> NaN</td>\n",
" <td> string</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SMATSSCR</th>\n",
" <td> Autolysis Score</td>\n",
" <td> PRC Case Summary Report</td>\n",
" <td> integer, encoded value</td>\n",
" <td> NaN</td>\n",
" <td> Autolysis</td>\n",
" <td> The destruction of organism cells or tissues b...</td>\n",
" <td> 0=None</td>\n",
" <td> 1=Mild</td>\n",
" <td> 2=Moderate</td>\n",
" <td> 3=Severe</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SMNABTCH</th>\n",
" <td> Nucleic Acid Isolation Batch ID</td>\n",
" <td> LDACC</td>\n",
" <td> string</td>\n",
" <td> NaN</td>\n",
" <td> Generated at LDACC</td>\n",
" <td> Batch when DNA/RNA was isolated and extracted ...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SMNABTCHT</th>\n",
" <td> Type of nucleic acid isolation batch</td>\n",
" <td> LDACC</td>\n",
" <td> string</td>\n",
" <td> NaN</td>\n",
" <td> Generated at LDACC</td>\n",
" <td> The process by which DNA/RNA was isolated</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SMNABTCHD</th>\n",
" <td> Date of nucleic acid isolation batch</td>\n",
" <td> LDACC</td>\n",
" <td> string</td>\n",
" <td> NaN</td>\n",
" <td> Generated at LDACC</td>\n",
" <td> The date on which DNA/RNA was isolated</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td>...</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows \u00d7 56 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 195,
"text": [
" VARDESC DOCFILE \\\n",
"VARNAME \n",
"SAMPID Sample ID, GTEx Public Sample ID NaN \n",
"SMATSSCR Autolysis Score PRC Case Summary Report \n",
"SMNABTCH Nucleic Acid Isolation Batch ID LDACC \n",
"SMNABTCHT Type of nucleic acid isolation batch LDACC \n",
"SMNABTCHD Date of nucleic acid isolation batch LDACC \n",
"\n",
" TYPE UNITS COMMENT1 \\\n",
"VARNAME \n",
"SAMPID string NaN NaN \n",
"SMATSSCR integer, encoded value NaN Autolysis \n",
"SMNABTCH string NaN Generated at LDACC \n",
"SMNABTCHT string NaN Generated at LDACC \n",
"SMNABTCHD string NaN Generated at LDACC \n",
"\n",
" COMMENT2 VALUES \\\n",
"VARNAME \n",
"SAMPID NaN NaN \n",
"SMATSSCR The destruction of organism cells or tissues b... 0=None \n",
"SMNABTCH Batch when DNA/RNA was isolated and extracted ... NaN \n",
"SMNABTCHT The process by which DNA/RNA was isolated NaN \n",
"SMNABTCHD The date on which DNA/RNA was isolated NaN \n",
"\n",
" Unnamed: 8 Unnamed: 9 Unnamed: 10 ... Unnamed: 47 \\\n",
"VARNAME ... \n",
"SAMPID NaN NaN NaN ... NaN \n",
"SMATSSCR 1=Mild 2=Moderate 3=Severe ... NaN \n",
"SMNABTCH NaN NaN NaN ... NaN \n",
"SMNABTCHT NaN NaN NaN ... NaN \n",
"SMNABTCHD NaN NaN NaN ... NaN \n",
"\n",
" Unnamed: 48 Unnamed: 49 Unnamed: 50 Unnamed: 51 Unnamed: 52 \\\n",
"VARNAME \n",
"SAMPID NaN NaN NaN NaN NaN \n",
"SMATSSCR NaN NaN NaN NaN NaN \n",
"SMNABTCH NaN NaN NaN NaN NaN \n",
"SMNABTCHT NaN NaN NaN NaN NaN \n",
"SMNABTCHD NaN NaN NaN NaN NaN \n",
"\n",
" Unnamed: 53 Unnamed: 54 Unnamed: 55 Unnamed: 56 \n",
"VARNAME \n",
"SAMPID NaN NaN NaN NaN \n",
"SMATSSCR NaN NaN NaN NaN \n",
"SMNABTCH NaN NaN NaN NaN \n",
"SMNABTCHT NaN NaN NaN NaN \n",
"SMNABTCHD NaN NaN NaN NaN \n",
"\n",
"[5 rows x 56 columns]"
]
}
],
"prompt_number": 195
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Which column do we want to use to rename the weird column names in the other dataframe? Let's print it."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Print the descriptive column in `sample_attributes_dd`, that describes the column names in `sample_attribute`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code for showing the descriptive column in the dataframe `sample_attributes_dd`\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the `index` of `sample_attributes_dd` is the column names of `sample_attributes`, any column from `sample_attributes_dd` is a **mapping** from column names to some other values, depending on what column you use.\n",
"\n",
"What's really nice about this is that we can then use one of these columns in `sample_attributes_dd` to rename the column names in `sample_attributes`. Here's an example of renaming the columns of `sample_attributes` using the column `'TYPE'` in `sample_attributes_dd`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sample_attributes.rename(columns=sample_attributes_dd['TYPE'])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Rename the columns of `sample_attributes` using the variable description column from `sample_attribute_dd`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to rename the columns of sample_attributes using a column from sample_attributes_dd\n",
"sample_attributes = sample_attributes.rename(columns=sample_attributes_dd.VARDESC)\n",
"\n",
"# Code for looking at the top of the new sample_attributes goes here\n",
"sample_attributes"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sample_attributes.ix[:5, :20]"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Excellent! Now we have a dataframe with human-readable column names.\n",
"\n",
"Remember that first dataframe that we created, `subject_phenotypes`? We want to unify that first dataframe with this new one. Let's take a gander at it to remember what's in it in the first place."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code for looking at the top of `subject_phenotypes` goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Which columns of `subject_phenotypes` and `sample_attributes` look like they could be matched up?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: they don't have to be exactly the same values, because we can modify them, but look for commonalities."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Exercise: Code to show a column of `subject_phenotypes` goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Exercise: Code to show a column of `sample_attributes` goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Detour: working with strings in Python"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we can make a new column in `sample_attributes` with the corresponding sample id of `sample_phenotypes`, we need to go over some string manipulation techniques.\n",
"\n",
"For example, we can take a string and `split` it. By default, they will be split on the whitespace (like spaces and new lines)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s = 'you have as many hours in a day as beyonce'\n",
"s.split()"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you don't want to split on whitespace, you can specify a specific letter or character to split on, too."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.split('d')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: What happened to the letter \"d\"? Write a complete sentence below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Complete sentence = it has a subject, object and a verb."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Split the string `s` above on the letter \"a\""
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to split `s` goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result of `s.split()` is a `list`, which is a special name in Python. Look, it even comes in a special color, different from black, so you can see how special it is:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"list"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can access elements from the split using a number in square brackets."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.split()[4]"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Get the element `'day'` from `s.split()`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to get \"day\" from s.split() goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What if we want to access mutiple items at once? We can use the colon \"`:`\" to indicate we want everything up to (but not including) the Nth item. For example, if we want the first 5 words, we can do:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s.split()[:5]"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Split the string `s` on \"a\", and get the first 3 elements"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to split `s` on \"a\", and get the first 3 elements\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we're able to split a string, but what if we want to put it back together? We can `join` the results."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"' '.join(s.split()[:5])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that we used a space to join the words. We could have used any character to `join` them (as well as any character to `split`):"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'!'.join(s.split('e')[:4])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Split the string `s` on 3 different characters, and join on 3 different characters."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code for split 1, join 1 goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code for split 2, join 2 goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code for split 3, join 3 goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Back to dataframes!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use what we just learned about manipulating strings, on columns of dataframe using `lambda`, which allows us to create small functions. For example, if we wanted to take column `'SUBJID'` from the dataframe `subject_phenotypes`, split every item on the dash character `'-'`, and get the first item, we would do this:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"subject_phenotypes['SUBJID'].map(lambda x: x.split('-')[0])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Split the column `\"Type of nucleic acid isolation batch\"` in `sample_attributes` on whitespace, and get the 3rd element."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to split the column 'Type of nucleic acid isolation batch' on whitespace and get the third element goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data may not have a string in every element of a column. For example, this code produces an `AttributeError`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sample_attributes['Tissue Type, more specific detail of tissue type'].map(lambda x: x.split('-')[0])"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above code produces the error,\n",
"\n",
" <ipython-input-157-6714d86271d7> in <lambda>(x)\n",
" ----> 1 sample_attributes['Tissue Type, more specific detail of tissue type'].map(lambda x: x.split('-')[0])\n",
"\n",
" AttributeError: 'float' object has no attribute 'split'\n",
" \n",
"Which happens because instead of a nice string, there is a `float` there, and `float`s don't know how to be `split`. Why is that? Well, NAs are of type `float`, so this indicates that there's an NA there.\n",
"\n",
"To deal with this, we can add an `if` statement to our `lambda` to make it deal with these situations. We will use the function `isinstance` to check if `x` is a string (the special word for a string in Python is `str`), and replace it with an NA using the `numpy` library (which we imported as `np` for shorthand) `np.nan`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sample_attributes['Tissue Type, more specific detail of tissue type'].map(lambda x: x.split('-')[0] if isinstance(x, str) else np.nan)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Split the column `'Code for BSS collection site'` on a comma \"`,`\" and get the first two items of the split, accounting for NAs"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In addition to `split`-ing elements, we can `join` within the `map`/`lambda` combo too!"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sample_attributes['Tissue Type, more specific detail of tissue type'].map(lambda x: '_'.join(x.split()[:3]) \n",
" if isinstance(x, str) else np.nan)"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Split items in the column `\"Type of nucleic acid isolation batch\"` in `sample_attributes` on underscores, and join the first two elements using a space"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we have all the tools to add a `\"subject_id\"` column to `sample_attributes`! Remember, we want to create a column which has **exactly** the same entries as the column `\"SUBJID\"` in `subject_phenotypes`. What was that again? It's been so long that I forgot what those IDs look like. To remind yourself what the subject IDs in `subject_phenotypes` and sample ids in `sample_attributes` look like, take a look at the top of each of those dataframes."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to look at the top of `subject_phenotypes`\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code to look at the top of `sample_attributes`\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Add a column to `sample_attributes` called `\"subject_id\"`, using one of its existing columns, that matches the `\"SUBJID\"` in `subject_phenotypes` *exactly*"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Merging dataframes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Excellent! Now we have a column which exactly matches the rows of `subject_phenotypes` to the rows of `sample_attributes`. Now we want to merge these two dataframes together. How do we do that? We will use the function `merge`, which is a function of the dataframe. Merge is a little complicated, so let's break it down with a few examples.\n",
"\n",
"First, we'll create a couple example dataframes."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dataframe1 = pd.DataFrame([['cucumber', 'watery'], ['broccoli', 'crunchy'], ['kale', 'chewy'], \n",
" ['mango', 'sweet'] ], columns=['vegetable', 'description'])\n",
"dataframe1"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dataframe2 = pd.DataFrame([['broccoli', 'harvested', 8], ['broccoli', 'planted', 5],\n",
" ['kale', 'planted', 6], ['kale', 'harvested', 9],\n",
" ['cucumber', 'harvested', 7], ['cucumber', 'planted', 4],\n",
" ['strawberry', 'planted', 10], ['strawberry', 'harvested', 2]], \n",
" columns=['crop', 'action', 'number'])\n",
"dataframe2"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want to `merge` these two dataframes on their common column, which is `\"vegetable\"` in `dataframe1` and `crop` in `dataframe2`. We can do this using `merge`, and specifying what we want to merge the **left** and **right** dataframes **on**."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dataframe1.merge(dataframe2, left_on='vegetable', right_on='crop')"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: What happened to the row \"cucumber\"? What about to \"mango\" and \"strawberry\"? Why?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use \"Shift\"+\"Tab\" to read the documentation behind `merge`. What's the default way that two dataframes are merged? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Merge `dataframe1` and `dataframe2`, but use `dataframe2` as the \"left\" dataframe, and merge using `\"outer\"`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we're ready to try this with real data! Let's go back to our `sample_attributes` and `subject_phenotypes` dataframes. To recap, We added a column to `sample_attributes` to match up with `subject_phenotypes`. Now, merge the two dataframes together using their columns with common values."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Exercise: Merge `sample_attributes` and `subject_phenotypes` on their common column."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Code goes here\n"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Congratulations! You have now performed a DATABASE MERGE!!! Now you can't be afraid of databases! Mwahahahah! \n",
"\n",
"All a \"database\" really is, is a bunch of tables linked together by certain 'keys', aka the values in the columns. What you've done today is manipulate some tables (aka databases), changing column names and adding columns so they're mergable, and merging them together.\n",
"\n",
"Concepts from today (there's a lot!):\n",
"\n",
"* Unix\n",
" * Using `head` and `tail` to look at the beginings and ends of files\n",
" * Using `-n N` to modify the number of lines output by `head` and `tail`\n",
" * Using `wc -l` to count the number of lines in a file\n",
" * Searching the web and finding millions of results for a seemingly simply Unix question\n",
" * Finding a method to count the number of columns in a file\n",
"* Python\n",
" * Pandas\n",
" * Reading a tabular file using `pandas`, specifically `pd.read_table`\n",
" * Using `.head()` and `.tail()` to look at the tops and bottoms of dataframes\n",
" * Using `.head(N)` and `.tail(N)` to look at the top and bottom `N` rows\n",
" * Accessing columns in `pandas` `DataFrames`\n",
" * Python dictionaries as a way of mapping one item to another\n",
" * Using `map` instead of `for`-loops to operate on every item of a column\n",
" * Creating new columns in `pandas` `DataFrames`\n",
" * Creating new columns as a result of operating on other columns\n",
" * Searching the web for help with `pandas`\n",
" * Reading an Excel file using `pandas`\n",
" * Setting one of the columns as the `index` (aka row names) when you read in the file\n",
" * A column in a DataFrame can be used as a mapping from the row name to the item in the column\n",
" * Renaming column names in one table based on a mapping\n",
" * String operations\n",
" * A string is anything between quotes\n",
" * Strings can be `split` on any characters\n",
" * The result of a `split` is a list\n",
" * Lists (and everything else in Python) start counting from 0 (aka \"0-based\")\n",
" * Get individual elements of a list using square brackets and a number, e.g. `[3]` shows the 4th element\n",
" * Access the first `N` elements of a list using square brackets, a colon, and the number, e.g. `[:5]` shows up to, but not including, the 6th element\n",
" * Strings can be glued together using `join`\n",
" * The `join` can be on any character\n",
" * Pandas\n",
" * Use `lambda` to create an \"anonymous\" function to use within `map`\n",
" * Use `lambda` to split and join strings within a column\n",
" * NAs are of type `float`\n",
" * To check if a thing is of a certain type, use `isinstance`\n",
" * A `lambda` can contain an `if` statement for alternative outputs\n",
" * But it must also contain an `else` statement as well\n",
" * Create a new column by combining `map` and `lambda` to do a complicated operation on each item of the column\n",
" * Two dataframes can be merged together if they have columns with the same elements\n",
" * For a merge, need to specify the columns to merge on in both dataframes\n",
" * Shift-tab to read documentation for a function\n",
" * Reading documentation is fun!\n",
"* Databases are just tables!"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment