Skip to content

Instantly share code, notes, and snippets.

@Jessime
Created July 9, 2015 19:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Jessime/b2ca1281d7a2066f76bf to your computer and use it in GitHub Desktop.
Save Jessime/b2ca1281d7a2066f76bf to your computer and use it in GitHub Desktop.
Python Lecture Notebook
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#How to approach a real world problem from the beginning "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Things to mention about this talk:\n",
"\n",
"1. We are going to integrate a lot of the pieces we've been talking about.\n",
"2. Input problem, output functional script\n",
"3. The script is going to be written in Python, but don't worry about syntax. It doesn't matter.\n",
"4. Don't bother taking notes. Just think."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#The Problem "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Here's what your boss tells you:\n",
"\n",
"\"I need to know the genomic spans of all the transcripts in this file. More specifically, I want only canonical transcripts, which are designated in the transcript name by ending in 01. Also, the graph the spans sorted by length.\n",
"\n",
"The file is attached.\""
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"It sounds like a reasonable and fairly straightforward task, but how do you actually go about tackling the problem?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Opening the file's a good start:"
]
},
{
"cell_type": "raw",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"##gff-version 3\n",
"#description: evidence-based annotation of the mouse genome (GRCm38), version 5 (Ensembl 80) - long non-coding RNAs\n",
"#provider: GENCODE\n",
"#contact: gencode-help@sanger.ac.uk\n",
"#format: gff3\n",
"#date: 2015-05-13\n",
"##sequence-region chr1 1 195471971\n",
"chr1\tHAVANA\tgene\t3073253\t3074322\t.\t+\t.\tID=ENSMUSG00000102693.1;gene_id=ENSMUSG00000102693.1;gene_type=TEC;gene_status=KNOWN;gene_name=4933401J01Rik;level=2;havana_gene=OTTMUSG00000049935.1\n",
"chr1\tHAVANA\ttranscript\t3073253\t3074322\t.\t+\t.\tID=ENSMUST00000193812.1;Parent=ENSMUSG00000102693.1;gene_id=ENSMUSG00000102693.1;transcript_id=ENSMUST00000193812.1;gene_type=TEC;gene_status=KNOWN;gene_name=4933401J01Rik;transcript_type=TEC;transcript_status=KNOWN;transcript_name=4933401J01Rik-001;level=2;tag=basic;havana_gene=OTTMUSG00000049935.1;havana_transcript=OTTMUST00000127109.1\n",
"chr1\tHAVANA\texon\t3073253\t3074322\t.\t+\t.\tID=exon:ENSMUST00000193812.1:1;Parent=ENSMUST00000193812.1;gene_id=ENSMUSG00000102693.1;transcript_id=ENSMUST00000193812.1;gene_type=TEC;gene_status=KNOWN;gene_name=4933401J01Rik;transcript_type=TEC;transcript_status=KNOWN;transcript_name=4933401J01Rik-001;exon_number=1;exon_id=ENSMUSE00001343744.1;level=2;havana_gene=OTTMUSG00000049935.1;havana_transcript=OTTMUST00000127109.1;tag=basic\t"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"It's pretty ugly, right? It's a bit nicer if you open is in something like excel, but there are still 126,218 lines to go through. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"The first place to start is by breaking up the task into bite size chucks, and turning those chunks into pseduocode."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"What specific steps, and in what order, are we going to need to complete for this task?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"1. Read this file to get information about the transcripts\n",
"2. Filter transcripts for 01 endings\n",
"3. Find the span of each transcript\n",
"5. Plot the spans by length"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Warning: Moving forward, we aren't focusing on optimization or doing things the way a professional developer would. The point of this exercise is to get the job done with the tools we have, and do it in a timely fashion (i.e. don't spend forever researching how to do each step). "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Read this file to get information about the transcripts"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Each of these high level ideas should be further broken down into low level pseudocode. Write these ideas out before you touch the keyboard, and then again as comments as you're going through the script."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"1. Open the file\n",
"2. Ignore the header lines\n",
"3. Loop through the rest of the lines\n",
"4. Check to see if the 3rd column is a transcript\n",
"5. If it is, save that line"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Where are we going to save that line? "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"If you wanted, you could write it to a new file, and check to make sure everything looks good there. We want to keep the program running though, so we need to have a place to store the lines we want to keep ready to go. Let's add that to our to-do list. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"1. Open the file\n",
"2. Ignore the header lines\n",
"3. Place to store transcript lines\n",
"4. Loop through the rest of the lines\n",
"5. Check to see if the 3rd column is a transcript\n",
"6. If it is, save that line"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"#Open the file\n",
"folder = \"C:/Users/Jessime/Research/\"\n",
"file_path = folder+\"gencode.v22.long_noncoding_RNAs.gff3\"\n",
"gff3 = open(file_path)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"That's pretty straightforward, just open the file designated by the path. Don't get tripped up by the string concatenation. That's just a preference of mine to make things more readable. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"##gff-version 3\n",
"\n",
"#description: evidence-based annotation of the human genome (GRCh38), version 22 (Ensembl 79) - long non-coding RNAs\n",
"\n",
"#provider: GENCODE\n",
"\n",
"#contact: gencode@sanger.ac.uk\n",
"\n",
"#format: gff3\n",
"\n",
"#date: 2015-03-06\n",
"\n",
"##sequence-region chr1 1 248956422\n",
"\n"
]
}
],
"source": [
"#Ignore the header lines\n",
"for i in range(7):\n",
" print gff3.readline()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"You don't have to print them, but it lets you know that you've moved past all of the header lines and you're where you want to be. "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"#Place to store transcript lines\n",
"transcripts = []"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"A simple list of the lines we want to keep will do the trick"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Here's where things start to become a little more complicated. The next 3 comments are going to be executed in a for loop."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"#Loop through the rest of the lines\n",
"for line in gff3:\n",
" columns = line.split() #Google 'get column of text python'\n",
" line_type = columns[2]\n",
" \n",
" #Check to see if the 3rd column is a transcript\n",
" if line_type == \"transcript\":\n",
" #If it is, save that line\n",
" transcripts.append(line)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Let's make sure our list has been populated."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"27670"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(transcripts)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Great, now we have our transcripts. What was the next step again?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#Filter transcripts for 01 endings"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Again, we need to break this down into simpler steps. My initial thought is, \"This shouldn't be too hard, just:\"\n",
"\n",
"1. Loop through each transcript\n",
"2. Check if the name ends in 01\n",
"3. If it does, save that line"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"This structure looks really familiar. Let's go ahead and just add in our container."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"1. Place to store 01 transcript lines\n",
"2. Loop through each transcript\n",
"3. Check if the name ends in 01\n",
"4. If it does, save that line"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"But wait, how do you go about checking if the name ends in 01? Let's print a couple lines to check what they look like again."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"chr1\tHAVANA\ttranscript\t29554\t31097\t.\t+\t.\tID=ENST00000473358.1;Parent=ENSG00000243485.3;gene_id=ENSG00000243485.3;transcript_id=ENST00000473358.1;gene_type=lincRNA;gene_status=NOVEL;gene_name=RP11-34P13.3;transcript_type=lincRNA;transcript_status=KNOWN;transcript_name=RP11-34P13.3-001;level=2;transcript_support_level=5;tag=not_best_in_genome_evidence,basic;havana_gene=OTTHUMG00000000959.2;havana_transcript=OTTHUMT00000002840.1\n",
"\n",
"chr1\tHAVANA\ttranscript\t30267\t31109\t.\t+\t.\tID=ENST00000469289.1;Parent=ENSG00000243485.3;gene_id=ENSG00000243485.3;transcript_id=ENST00000469289.1;gene_type=lincRNA;gene_status=NOVEL;gene_name=RP11-34P13.3;transcript_type=lincRNA;transcript_status=KNOWN;transcript_name=RP11-34P13.3-002;level=2;transcript_support_level=5;tag=not_best_in_genome_evidence;havana_gene=OTTHUMG00000000959.2;havana_transcript=OTTHUMT00000002841.2\n",
"\n"
]
}
],
"source": [
"print transcripts[0]\n",
"print transcripts[1]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"It looks like transcript_name is the section we're after, but how do we \"get to it\"?\n",
"\n",
"WARNING: I probably don't have to say this, but I'm going to anyway: in real life, don't just guess about what you're processing. You'll eventually get it wrong and potentially waste a lot of time. Save yourself the trouble and ask someone if \"transcript_name\" is the field you need to process. \n",
"\n",
"Grabbing the index of those values isn't going to work because you have no way of knowing that they're going to stay the same for other lines. The best way to go about this is whittle away pieces until you have only the part you want. Let's give it a shot:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"chr1\tHAVANA\ttranscript\t29554\t31097\t.\t+\t.\tID=ENST00000473358.1;Parent=ENSG00000243485.3;gene_id=ENSG00000243485.3;transcript_id=ENST00000473358.1;gene_type=lincRNA;gene_status=NOVEL;gene_name=RP11-34P13.3;transcript_type=lincRNA;transcript_status=KNOWN;transcript_name=RP11-34P13.3-001;level=2;transcript_support_level=5;tag=not_best_in_genome_evidence,basic;havana_gene=OTTHUMG00000000959.2;havana_transcript=OTTHUMT00000002840.1\n",
"\n"
]
}
],
"source": [
"practice_line = transcripts[0]\n",
"print practice_line"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['chr1', 'HAVANA', 'transcript', '29554', '31097', '.', '+', '.', 'ID=ENST00000473358.1;Parent=ENSG00000243485.3;gene_id=ENSG00000243485.3;transcript_id=ENST00000473358.1;gene_type=lincRNA;gene_status=NOVEL;gene_name=RP11-34P13.3;transcript_type=lincRNA;transcript_status=KNOWN;transcript_name=RP11-34P13.3-001;level=2;transcript_support_level=5;tag=not_best_in_genome_evidence,basic;havana_gene=OTTHUMG00000000959.2;havana_transcript=OTTHUMT00000002840.1']\n",
"\n",
"9\n"
]
}
],
"source": [
"p_columns = practice_line.split()\n",
"print p_columns\n",
"print \"\"\n",
"print len(p_columns)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"For whatever reason, this last pieces of information is seperated by semi-colons, and didn't get split up properly. We can grab that for further processing."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ID=ENST00000473358.1;Parent=ENSG00000243485.3;gene_id=ENSG00000243485.3;transcript_id=ENST00000473358.1;gene_type=lincRNA;gene_status=NOVEL;gene_name=RP11-34P13.3;transcript_type=lincRNA;transcript_status=KNOWN;transcript_name=RP11-34P13.3-001;level=2;transcript_support_level=5;tag=not_best_in_genome_evidence,basic;havana_gene=OTTHUMG00000000959.2;havana_transcript=OTTHUMT00000002840.1\n"
]
}
],
"source": [
"info_column = p_columns[-1]\n",
"print info_column"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['ID=ENST00000473358.1', 'Parent=ENSG00000243485.3', 'gene_id=ENSG00000243485.3', 'transcript_id=ENST00000473358.1', 'gene_type=lincRNA', 'gene_status=NOVEL', 'gene_name=RP11-34P13.3', 'transcript_type=lincRNA', 'transcript_status=KNOWN', 'transcript_name=RP11-34P13.3-001', 'level=2', 'transcript_support_level=5', 'tag=not_best_in_genome_evidence,basic', 'havana_gene=OTTHUMG00000000959.2', 'havana_transcript=OTTHUMT00000002840.1']\n"
]
}
],
"source": [
"semi_columns = info_column.split(\";\")\n",
"print semi_columns"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We're making progress..."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"transcript_name=RP11-34P13.3-001\n"
]
}
],
"source": [
"transcript_name_column = semi_columns[9]\n",
"print transcript_name_column"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We're almost done now..."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['transcript_name', 'RP11-34P13.3-001']\n"
]
}
],
"source": [
"name_split = transcript_name_column.split(\"=\")\n",
"print name_split"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"RP11-34P13.3-001\n"
]
}
],
"source": [
"name = name_split[1]\n",
"print name"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"So what does all of that look like together?"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"chr1\tHAVANA\ttranscript\t29554\t31097\t.\t+\t.\tID=ENST00000473358.1;Parent=ENSG00000243485.3;gene_id=ENSG00000243485.3;transcript_id=ENST00000473358.1;gene_type=lincRNA;gene_status=NOVEL;gene_name=RP11-34P13.3;transcript_type=lincRNA;transcript_status=KNOWN;transcript_name=RP11-34P13.3-001;level=2;transcript_support_level=5;tag=not_best_in_genome_evidence,basic;havana_gene=OTTHUMG00000000959.2;havana_transcript=OTTHUMT00000002840.1\n",
"\n",
"RP11-34P13.3-001\n"
]
}
],
"source": [
"def get_name(line):\n",
" \"\"\"returns the transcript name of an RNA from a line of a GFF3 file\"\"\"\n",
" columns = line.split()\n",
" info_column = columns[-1]\n",
" semi_columns = info_column.split(\";\")\n",
" transcript_name_column = semi_columns[9]\n",
" name_split = transcript_name_column.split(\"=\")\n",
" name = name_split[1]\n",
" return name\n",
"\n",
"print practice_line\n",
"name = get_name(practice_line)\n",
"print name"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"We finally have the name, what do we want to do with it again? We can integrate these steps into our original pseudocode."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"1. Place to store 01 transcript lines\n",
"2. Loop through each transcript\n",
"3. Extract the name from the line\n",
" 1. Split the line into a list\n",
" 2. Keep the last element/column\n",
" 3. Split that column up by semi-colons\n",
" 4. Keep the nineth element/column, that has the transcript name\n",
" 5. Split that column by the equal sign\n",
" 6. Keep the name on the right side of the equal sign\n",
"4. Check if the name ends in 01\n",
"5. If it does, save that line"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"#Place to store 01 transcript lines\n",
"transcripts01 = []"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"#Loop through each transcript\n",
"for line in transcripts:\n",
" \n",
" #Extract the name from the line\n",
" columns = line.split()\n",
" info_column = columns[-1]\n",
" semi_columns = info_column.split(\";\")\n",
" transcript_name_column = semi_columns[9]\n",
" name_split = transcript_name_column.split(\"=\")\n",
" name = name_split[1]\n",
" \n",
" #Check if name ends in 01\n",
" if name[-2:] == \"01\":\n",
" #If it does, save that line\n",
" transcripts01.append(line)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"15954"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(transcripts01)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We're making some significant progress, since we now have only the lines we care about. Now let's get those spans. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Find the span of each transcript"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Knowing or getting the span of each transcript is a little vaugue. To clarify further, 'span' is just going to mean the end-start chromosome positions which are found in the 4th and 5th columns. We're going to want the information in two forms. First, it would be nice to be able to type the name of a given transcript and immediately know it's span. Next, we're going to need a sorted list of the spans that can be used for our plot. Given these clearer instructions, we can make some psuedocode. \n",
"\n",
"1. Make containers for the spans\n",
"2. Loop through all of the 01 transcripts\n",
"3. Find the span\n",
"4. Put the name&span in one container and just the span in the other"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"#Make containers for the spans\n",
"spans = [] \n",
"span_lookup = {} #A dictionary will be perfect for our purposes here"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"#Loop through all of the 01 transcripts\n",
"for line in transcripts01:\n",
" #Find the span\n",
" columns = line.split()\n",
" start = int(columns[3])\n",
" end = int(columns[4])\n",
" span = end - start\n",
" \n",
" #Populate the dictionary\n",
" name = get_name(line)\n",
" span_lookup[name] = span\n",
" \n",
" #Populate the list\n",
" spans.append(span)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Let's double check to make sure our lookup table is working by inputting some of the transcript names"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"170391\n",
"32097\n"
]
}
],
"source": [
"print span_lookup[\"RP11-72L22.1-001\"]\n",
"print span_lookup[\"XIST-001\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We've just completed one of the two part's of the job. All that's left now to do is plot the spans. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#Plot the spans by length "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Python is a general purpose language used for everything from controlling microdevices to web development. A vast majority of it's users are never going to have to plot anything, ever. That's why libraries (or modules) exist. At this point, you're probably going to have to do a bit of Googling to see what plotting libraries Python as, and what fits your needs best"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Simply searching \"scatter plot python\" will give you:\n",
" \n",
"http://matplotlib.org/examples/shapes_and_collections/scatter_demo.html\n",
" \n",
"as the first link. This resource will tell you what you need to make a scatter plot."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"The first thing we have to do is import the library."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"#Ignore this line, it's just for the presentation\n",
"%matplotlib inline "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"If we look at the link I mentioned above, we'll see that we need a list of x coordinates and another list of y coordinates. Which makes sense for a scatter plot, right?"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false,
"scrolled": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"#Coordinate lists for the scatter plot\n",
"x = range(len(spans))\n",
"y = sorted(spans)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"image/png": [
"iVBORw0KGgoAAAANSUhEUgAAAZ0AAAEACAYAAABoJ6s/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n",
"AAALEgAACxIB0t1+/AAAGIdJREFUeJzt3X+sX3Wd5/HnqxSYOjCUy0yQHxVIrInduJFhA5voZGpU\n",
"Wv4BTFSYRG2UTGaH2cFdMcuPbABXY2SywMAfMH+IWshKIRJ/TJaR1h9NnD+goiI4yAC7sqFFqttS\n",
"0BWxpe/943yu/XK5Ld97e+/53u+9z0dycs/3fX58P9/T7z2ve875nNNUFZIk9WHZqBsgSVo6DB1J\n",
"Um8MHUlSbwwdSVJvDB1JUm8MHUlSb4YKnSSrknw3yb8k+UmSy1p9IsmWJE8k2Zxk5cAyVyV5Msnj\n",
"Sc4dqJ+V5NE27eaB+tFJ7m71B5KcNjBtQ3uPJ5J8ZKB+RpIH2zKbkhx5uBtEkjR/hj3S2Qv856r6\n",
"N8C/B/4myVuBK4EtVfUW4NvtNUnWABcBa4D1wK1J0tZ1G3BJVa0GVidZ3+qXALta/Sbg+rauCeAa\n",
"4Ow2XJvkuLbM9cANbZnn2zokSQvUUKFTVc9V1cNt/NfAT4FTgPOBjW22jcCFbfwC4K6q2ltVTwNP\n",
"AeckOQk4tqq2tfnuGFhmcF33Au9u4+uAzVW1p6r2AFuA81qIvQv4yjTvL0lagGZ8TSfJ6cCZwIPA\n",
"iVW1s03aCZzYxk8Gtg8stp0upKbWd7Q67eczAFW1D3ghyQmHWNcEsKeq9k+zLknSAjSj0ElyDN1R\n",
"yMer6leD06p7nk5fz9Tx2T2SNIaWDztju0h/L3BnVX2tlXcmeWNVPddOnf2i1XcAqwYWP5XuCGVH\n",
"G59an1zmTcCzSZYDx1XVriQ7gLUDy6wCvgPsBlYmWdaOdk5t65jabgNKkmahqvL6c818pa87AKG7\n",
"/nLTlPrfAVe08SuBz7XxNcDDwFHAGcD/AtKmPQic09Z5H7C+1S8FbmvjFwOb2vgE8L+BlcDxk+Nt\n",
"2j3ARW38H4D/ME3ba5jPuBQG4LpRt2GhDG4Lt4Xb4nW3Rc3Heoc90nkH8CHgkSQ/arWrgM8B9yS5\n",
"BHga+GBr6WNJ7gEeA/YBl1b7FC1cvgSsAO6rqm+2+u3AnUmeBHa14KGqdif5NPD9Nt+nqutQAHAF\n",
"sCnJZ4AftnVIkhaooUKnqv6Zg1//ec9Blvks8Nlp6j8A3jZN/WVaaE0z7YvAF6ep/4zuqEmSNAZ8\n",
"IsHSsnXUDVhAto66AQvI1lE3YAHZOuoGLHY5cNZrcUpSNR8XwyRpEZuvfadHOpKk3hg6kqTeGDqS\n",
"pN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTe\n",
"GDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6\n",
"kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSNAtJ1iUnbO6GrBt1e8ZFqmrUbZhX\n",
"SaqqMup2SFo8upD5o6/CLSu6ymUvwYvvq6r7R9uyuTNf+86hjnSSfCHJziSPDtSuS7I9yY/acN7A\n",
"tKuSPJnk8STnDtTPSvJom3bzQP3oJHe3+gNJThuYtiHJE234yED9jCQPtmU2JTnycDaEJA1v4vIu\n",
"cDbQDbes6Gp6PcOeXvsisH5KrYAbq+rMNvwTQJI1wEXAmrbMrUkm0/I24JKqWg2sTjK5zkuAXa1+\n",
"E3B9W9cEcA1wdhuuTXJcW+Z64Ia2zPNtHZKkBWyo0Kmq79Ht2Kea7tDrAuCuqtpbVU8DTwHnJDkJ\n",
"OLaqtrX57gAubOPnAxvb+L3Au9v4OmBzVe2pqj3AFuC8FmLvAr7S5ts4sC5Jmme7b+hOqW2kGy57\n",
"qavp9RxuR4K/TfLjJLcnWdlqJwPbB+bZDpwyTX1Hq9N+PgNQVfuAF5KccIh1TQB7qmr/NOuSpHnV\n",
"Xbt58X3wiS3dsLiu58yn5Yex7G3Af2vjnwZuoJ9TXDPu+ZDkuoGXW6tq65y1RtKS1EJm0QRNkrXA\n",
"2vl+n1mHTlX9YnI8yeeBf2wvdwCrBmY9le4IZUcbn1qfXOZNwLNJlgPHVdWuJDt49UZYBXwH2A2s\n",
"TLKsHe2c2tZxsLZeN9PPJ0lLSftjfOvk6yTXzsf7zPr0WrtGM+l9wGTPtm8AFyc5KskZwGpgW1U9\n",
"B7yY5Jx2TebDwNcHltnQxt8PfLuNbwbOTbIyyfHAe4H7q+vn/V3gA22+DcDXZvtZJEn9GOpIJ8ld\n",
"wJ8Df5zkGeBaYG2St9Od7voZ8FcAVfVYknuAx4B9wKV14GagS4EvASuA+6rqm61+O3BnkieBXcDF\n",
"bV27k3wa+H6b71OtQwHAFcCmJJ8BftjWIUlawLw5VJL0GiO9OVSSpLlg6EiSemPoSJJ6Y+hI0iz4\n",
"lOnZsSOBJM2QT5mevcN5IoEkLVETl8ONKw7cXsgK+MTlLKInFMwXT69JknrjkY4kzdjuG+Cyd9Ld\n",
"6E47veZTpofgNR1JmoXuus7kf9y2+4bFdD0H5m/faehIkl7DJxJIksaeoSNJ6o2hI0nqjaEjSeqN\n",
"oSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEj\n",
"SeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSerNUKGT5AtJdiZ5\n",
"dKA2kWRLkieSbE6ycmDaVUmeTPJ4knMH6mclebRNu3mgfnSSu1v9gSSnDUzb0N7jiSQfGaifkeTB\n",
"tsymJEcezoaQJM2/YY90vgisn1K7EthSVW8Bvt1ek2QNcBGwpi1za5K0ZW4DLqmq1cDqJJPrvATY\n",
"1eo3Ade3dU0A1wBnt+HaJMe1Za4HbmjLPN/WIUlawIYKnar6Ht2OfdD5wMY2vhG4sI1fANxVVXur\n",
"6mngKeCcJCcBx1bVtjbfHQPLDK7rXuDdbXwdsLmq9lTVHmALcF4LsXcBX5nm/SVJC9ThXNM5sap2\n",
"tvGdwIlt/GRg+8B824FTpqnvaHXaz2cAqmof8EKSEw6xrglgT1Xtn2ZdkqQFak46ElRVATUX6xrm\n",
"7Xp6H0nSHFt+GMvuTPLGqnqunTr7RavvAFYNzHcq3RHKjjY+tT65zJuAZ5MsB46rql1JdgBrB5ZZ\n",
"BXwH2A2sTLKsHe2c2tYxrSTXDbzcWlVbZ/JBJWmxS7KWV+9v58XhhM43gA10F/Q3AF8bqH85yY10\n",
"p7xWA9uqqpK8mOQcYBvwYeCWKet6AHg/XccEgM3AZ1vPuADvBa5o6/ou8AHg7inv/xpVdd1hfE5J\n",
"WvTaH+NbJ18nuXY+3ifdmbHXmSm5C/hz4I/prt9cA3wduIfuCOVp4IPtYj9JrgY+BuwDPl5V97f6\n",
"WcCXgBXAfVV1WasfDdwJnAnsAi5unRBI8lHg6taUz1TVxlY/A9hEd33nh8CHqmrvNG2vqsrUuiTp\n",
"4OZr3zlU6IwzQ0eSZm6+9p0+kUCS1BtDR5LUG0NHktQbQ0eSZiHJuuSEzd2QdaNuz7iwI4EkzVAX\n",
"Mn/0VbhlRVe57CV48X2TPXUXg/nadx7OfTqStERNXA43ruhuEQRgBXzicmDRhM588fSaJKk3HulI\n",
"0oztvgEueyfdje6002s3jLRJY8JrOpI0C911nYnLu1e7b1hM13PAJxLMmqEjSTPnEwkkSWPP0JEk\n",
"9cbQkST1xtCRJPXG0JEk9cbQkaRZ8Nlrs2OXaUmaIZ+9Nns+kUCSZsxnr82Wp9ckSb3xSEeSZsxn\n",
"r82W13QkaRZ89trseHpNktQbj3QkaYaSXA1/9Bm4pe1bLnsZXrxgMR3t+JTpWTJ0JM2l7rTacffB\n",
"zcsO9F7bCPynH1Y9f9Yo2zaX7DItSQvCxOXwhmkuTSw7rf+2jB9DR5Jm7BjgkwOvPwm8/H9G1Jix\n",
"YuhI0ozs3gq/fW+3+/yHVvvNPvjN1SNs1Niw95okzcjEWvhr4G3AL4FfA3lkMXUimE+GjiTNyP4T\n",
"usD5Z+ApulNrR462SWPE02uSNCMvv3ma6zlvHlVrxo1dpiVpBpKJgo8BP2uVM4AvULV7Ue1nvE9n\n",
"lgwdSXMpObbgD4D/3iqfBH5L1a8W1X7G+3QkaUF46RXYf8RAzzXg5VdG2aJxYuhI0owcuQ9ePgKe\n",
"aK9fbjUNw95rkjQjK5bDhcDxbbiw1TQMr+lI0gwkK34HRx0Jt7TKZcDv9la9dNQo2zXXFux/bZDk\n",
"6SSPJPlRkm2tNpFkS5InkmxOsnJg/quSPJnk8STnDtTPSvJom3bzQP3oJHe3+gNJThuYtqG9xxNJ\n",
"PnK4n0WSXt/yI2Af8F/bsK/VNIy5OL1WwNqqOrOqzm61K4EtVfUW4NvtNUnWABcBa4D1wK1JJpP0\n",
"NuCSqloNrE6yvtUvAXa1+k3A9W1dE8A1wNltuHYw3CRpnizrLoef2oblraZhzNWGmnoIdj7ds75p\n",
"Py9s4xcAd1XV3qp6mu523nOSnAQcW1Xb2nx3DCwzuK57gXe38XXA5qraU1V7gC10QSZJ82hfGw72\n",
"WocyV0c630ryUJK/bLUTq2pnG98JnNjGTwa2Dyy7HThlmvqOVqf9fAagqvYBLyQ54RDrkqR59Mp+\n",
"eIVul7OdbvyV/aNt0/iYix4X76iqnyf5E2BLkscHJ1ZVJVncvRUkLSF5DvafDL9tr/e3moZx2KFT\n",
"VT9vP3+Z5Kt011d2JnljVT3XTp39os2+A1g1sPipdH8q7GjjU+uTy7wJeDbJcuC4qtqVZAewdmCZ\n",
"VcB3pmtjkusGXm6tqq0z/ZyS1Jncbf5B+/mbUTVkTiVZy6v3qfPzPofTZTrJG4AjqupXSf4Q2Ax8\n",
"CngP3cX/65NcCaysqitbR4Iv0wXTKcC3gDe3o6EH6foebgP+J3BLVX0zyaXA26rqr5NcDFxYVRe3\n",
"jgQPAX9Kd03pB8Cftus7g220y7SkOZNM7IOPHTHl2WuvVO1eVPfqLNTH4JwIfLV1QFsO/I+q2pzk\n",
"IeCeJJcATwMfBKiqx5LcAzxGd+Xt0jqQepcCXwJWAPdV1Tdb/XbgziRPAruAi9u6dif5NPD9Nt+n\n",
"pgaOJM29ven6Ng0+e22vf9gOyZtDJWkGkhWvwLJl8G9b5RFg//6qlxbVvToL9UhHkpaY5cu6zgPb\n",
"p9Q0DENHkmZkH93dJpN9n3bjfTrDM3QkacYm79OZHPcM/rA8JJSkGZnuv87xv9MZlkc6kjQjkwEz\n",
"eHOooTMsQ0eSZqToQmbyyTevtJqG4ek1SZqR8OqQKbymMzxDR5KGlOT+14bM1BDSoRg6kjS048/t\n",
"AmbqkY6hMyxDR5JmbGroaFiGjiQN7UUOstv8Qc8NGVuGjiQNbT/dNZzJXWd3lFNV/25ULRo3ho4k\n",
"Da3ogmdle+2D7WfK0JGkGfF6zuEwdCRpaNPdj+M9OjPhEwkkaQhJ1rWxwSoe7cyMRzqSNJRj7+t+\n",
"enrtcBg6kjQU/6O2ueBGlKSh7D3YhN8ebIJey9CRpKH8GnjDwOvf36OzYiTNGVOGjiS9jiS/68Z+\n",
"A0y06vOjas5YM3Qk6XUdf+SoW7BYGDqSpN54n44kHUKSQ/1f1P+vt4YsEh7pSNIhHT/NfvL3nQiO\n",
"6bkxY8/QkaSDeO1Rzqs6EXhn6CwYOpI0jSR7u6Oc54EjpptlS89NWhQMHUmaIsnVcPzyA89Ze4UD\n",
"wVMA+6tq3UgaN+ZStbiPEJNUVfkYWElDS46v7m/y3XQ3hA7en7MbYH1V3T+i5vVivvadho4kNe0a\n",
"zjI4nlff/HkMcBTwO+DX+6tq2vNti8l87TvtMi1pyTsQNtAFzktT5jiK7rTa0gic+WToSFqyXhs2\n",
"k3/Y/xY4GniZ7rRaAc9j4Bw+Q0fSkvHqkJk0GDbQrtlwIHCgnWr7v/PbuqXB0JG0KE0fMIMOFjZT\n",
"/b632p/MVduWMkNH0lh6/VCZamrITDpY2BxPO632QlWtnGn7ND1DR9JIJdkDHDe3az1YwAw6VNgA\n",
"PG+ngXkw9qGTZD3w93R3bn2+qq4fcZOksTLzI4aFYJhQmepgITO4TjBs5tdY36eT5AjgX4H3ADuA\n",
"7wN/UVU/HZhnyd+nM547FS0NswmPYbxewExtAxg2r+Z9OtM7G3iqqp4GSLIJuAD46aEWWkoMHB3a\n",
"fO3059tMQmUqQ2aUxj10TgGeGXi9HThnRG1ZoI5fNp47FS0NhxMewzBgFppxD52hzg0muW7g5daq\n",
"2jovrZHGznzv9OeboTJXkqwF1s73+4x76OwAVg28XkV3tPMqVXVdXw1aeJ7fj6fXtOAZHqPW/hjf\n",
"Ovk6ybXz8T7jHjoPAauTnA48C1wE/MUoG7TQVNURXtfR63Onr36MdehU1b4k/xG4n67L9O2DPdfU\n",
"cSciaaEY6y7Tw7DLtCTN3HztOz3lIknqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEj\n",
"SeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nq\n",
"jaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2h\n",
"I0nqjaEjSeqNoSNJ6o2hI0nqjaEjSerNrEMnyXVJtif5URvOG5h2VZInkzye5NyB+llJHm3Tbh6o\n",
"H53k7lZ/IMlpA9M2JHmiDR8ZqJ+R5MG2zKYkR872s0iS+nE4RzoF3FhVZ7bhnwCSrAEuAtYA64Fb\n",
"k6QtcxtwSVWtBlYnWd/qlwC7Wv0m4Pq2rgngGuDsNlyb5Li2zPXADW2Z59s6dAhJ1o66DQuF2+IA\n",
"t8UBbov5d7in1zJN7QLgrqraW1VPA08B5yQ5CTi2qra1+e4ALmzj5wMb2/i9wLvb+Dpgc1Xtqao9\n",
"wBbgvBZi7wK+0ubbOLAuHdzaUTdgAVk76gYsIGtH3YAFZO2oG7DYHW7o/G2SHye5PcnKVjsZ2D4w\n",
"z3bglGnqO1qd9vMZgKraB7yQ5IRDrGsC2FNV+6dZlyRpgTpk6CTZ0q7BTB3OpztVdgbwduDnwA09\n",
"tBe603qSpDG0/FATq+q9w6wkyeeBf2wvdwCrBiafSneEsqONT61PLvMm4Nkky4HjqmpXkh28+nB3\n",
"FfAdYDewMsmydrRzalvHwdpnUDVJrh11GxYKt8UBbosD3Bbz65ChcyhJTqqqn7eX7wMebePfAL6c\n",
"5Ea6U16rgW1VVUleTHIOsA34MHDLwDIbgAeA9wPfbvXNwGfbqbsA7wWuaOv6LvAB4O627Nema2dV\n",
"TXfdSZI0Aqma3UFAkjvoTq0V8DPgr6pqZ5t2NfAxYB/w8aq6v9XPAr4ErADuq6rLWv1o4E7gTGAX\n",
"cHHrhECSjwJXt7f9TFVtbPUzgE1013d+CHyoqvbO6sNIknox69CRJGmmxvqJBH3doDrukqxv2+HJ\n",
"JFeMuj3zJcnTSR5p34VtrTbROsQ8kWTzQC/LGX9HFrIkX0iyM8mjA7U5++zj9PtxkG2xJPcVSVYl\n",
"+W6Sf0nykySTZ5dG992oqrEdgGuBT0xTXwM8DBwJnE53r9DkUd024Ow2fh+wvo1fCtzaxi8CNo36\n",
"883RNjqiff7T2/Z4GHjrqNs1T5/1Z8DElNrfAf+ljV8BfG6235GFPAB/Rnd6+tH5+Ozj9PtxkG2x\n",
"JPcVwBuBt7fxY4B/Bd46yu/GWB/pNPN9g+q4Oxt4qqqeru6a1ya67bNYTf0+DP67Dt5EPJvvyIJV\n",
"Vd+jezLHoLn87GPz+3GQbQFLcF9RVc9V1cNt/NfAT+k6eI3su7EYQmc+b1CdmNeW9+P3n6uZ3BaL\n",
"UQHfSvJQkr9stROrdXABdgIntvHZfEfGzVx+9sXw+7Gk9xVJTqc7AnyQEX43FnzoZGHeoDpOllJP\n",
"kXdU1ZnAecDfJPmzwYnVHf8vpe3xe0v5szdLel+R5Bi6o5CPV9WvBqf1/d2Y9X06fanR3qC6+zCa\n",
"vlBM3RarePVfLItGtfvGquqXSb5Kd2pxZ5I3VtVz7RTBL9rsM/mOHPTG4wVuLj77ovj9qKrJz77k\n",
"9hXpnsB/L3BnVU3ezziy78aCP9I5lLaxJk29QfXiJEelu59n8gbV54AXk5yTJHQ3qH59YJkNbXzw\n",
"BtVx9xDdE71PT3IU3YW+b4y4TXMuyRuSHNvG/xA4l+77MPjvOngT8Uy+I9PeeDwG5uKzL4rfj6W6\n",
"r2htvx14rKr+fmDS6L4bo+5dcZg9M+4AHgF+3DbaiQPTrqa7CPY4sG6gfhbdF+4p4JaB+tHAPcCT\n",
"dE9GOH3Un28Ot9N5dL1WngKuGnV75ukznkHX6+Zh4CeTn5Pu5uFvAU/QPeFi5Wy/Iwt5AO4CngV+\n",
"R3d+/aNz+dnH6fdjmm3xsaW6rwDeCexvvxc/asP6UX43vDlUktSbsT69JkkaL4aOJKk3ho4kqTeG\n",
"jiSpN4aOJKk3ho4kqTeGjiSpN4aOJKk3/x9oiq1gfMdI4AAAAABJRU5ErkJggg==\n"
],
"text/plain": [
"<matplotlib.figure.Figure at 0xb3dcf60>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#Generate scatter plot\n",
"plt.scatter(x,y)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#Putting all the pieces together "
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"###################################Header###################################\n",
"##gff-version 3\n",
"\n",
"#description: evidence-based annotation of the human genome (GRCh38), version 22 (Ensembl 79) - long non-coding RNAs\n",
"\n",
"#provider: GENCODE\n",
"\n",
"#contact: gencode@sanger.ac.uk\n",
"\n",
"#format: gff3\n",
"\n",
"#date: 2015-03-06\n",
"\n",
"##sequence-region chr1 1 248956422\n",
"\n",
"###################################End Header###################################\n",
"\n",
"\n",
"\n",
"Number of transcripts: 27670\n",
"Number of 01 transcripts: 15954 \n",
"\n",
"\n",
"\n",
"Number of spans: 15954\n",
"The span of RP11-72L22.1-001 is: 170391\n",
"The span of XIST-001 is: 32097 \n",
"\n",
"\n",
"\n"
]
},
{
"data": {
"image/png": [
"iVBORw0KGgoAAAANSUhEUgAAAZ0AAAEACAYAAABoJ6s/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n",
"AAALEgAACxIB0t1+/AAAGIdJREFUeJzt3X+sX3Wd5/HnqxSYOjCUy0yQHxVIrInduJFhA5voZGpU\n",
"Wv4BTFSYRG2UTGaH2cFdMcuPbABXY2SywMAfMH+IWshKIRJ/TJaR1h9NnD+goiI4yAC7sqFFqttS\n",
"0BWxpe/943yu/XK5Ld97e+/53u+9z0dycs/3fX58P9/T7z2ve875nNNUFZIk9WHZqBsgSVo6DB1J\n",
"Um8MHUlSbwwdSVJvDB1JUm8MHUlSb4YKnSSrknw3yb8k+UmSy1p9IsmWJE8k2Zxk5cAyVyV5Msnj\n",
"Sc4dqJ+V5NE27eaB+tFJ7m71B5KcNjBtQ3uPJ5J8ZKB+RpIH2zKbkhx5uBtEkjR/hj3S2Qv856r6\n",
"N8C/B/4myVuBK4EtVfUW4NvtNUnWABcBa4D1wK1J0tZ1G3BJVa0GVidZ3+qXALta/Sbg+rauCeAa\n",
"4Ow2XJvkuLbM9cANbZnn2zokSQvUUKFTVc9V1cNt/NfAT4FTgPOBjW22jcCFbfwC4K6q2ltVTwNP\n",
"AeckOQk4tqq2tfnuGFhmcF33Au9u4+uAzVW1p6r2AFuA81qIvQv4yjTvL0lagGZ8TSfJ6cCZwIPA\n",
"iVW1s03aCZzYxk8Gtg8stp0upKbWd7Q67eczAFW1D3ghyQmHWNcEsKeq9k+zLknSAjSj0ElyDN1R\n",
"yMer6leD06p7nk5fz9Tx2T2SNIaWDztju0h/L3BnVX2tlXcmeWNVPddOnf2i1XcAqwYWP5XuCGVH\n",
"G59an1zmTcCzSZYDx1XVriQ7gLUDy6wCvgPsBlYmWdaOdk5t65jabgNKkmahqvL6c818pa87AKG7\n",
"/nLTlPrfAVe08SuBz7XxNcDDwFHAGcD/AtKmPQic09Z5H7C+1S8FbmvjFwOb2vgE8L+BlcDxk+Nt\n",
"2j3ARW38H4D/ME3ba5jPuBQG4LpRt2GhDG4Lt4Xb4nW3Rc3Heoc90nkH8CHgkSQ/arWrgM8B9yS5\n",
"BHga+GBr6WNJ7gEeA/YBl1b7FC1cvgSsAO6rqm+2+u3AnUmeBHa14KGqdif5NPD9Nt+nqutQAHAF\n",
"sCnJZ4AftnVIkhaooUKnqv6Zg1//ec9Blvks8Nlp6j8A3jZN/WVaaE0z7YvAF6ep/4zuqEmSNAZ8\n",
"IsHSsnXUDVhAto66AQvI1lE3YAHZOuoGLHY5cNZrcUpSNR8XwyRpEZuvfadHOpKk3hg6kqTeGDqS\n",
"pN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTe\n",
"GDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6\n",
"kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSNAtJ1iUnbO6GrBt1e8ZFqmrUbZhX\n",
"SaqqMup2SFo8upD5o6/CLSu6ymUvwYvvq6r7R9uyuTNf+86hjnSSfCHJziSPDtSuS7I9yY/acN7A\n",
"tKuSPJnk8STnDtTPSvJom3bzQP3oJHe3+gNJThuYtiHJE234yED9jCQPtmU2JTnycDaEJA1v4vIu\n",
"cDbQDbes6Gp6PcOeXvsisH5KrYAbq+rMNvwTQJI1wEXAmrbMrUkm0/I24JKqWg2sTjK5zkuAXa1+\n",
"E3B9W9cEcA1wdhuuTXJcW+Z64Ia2zPNtHZKkBWyo0Kmq79Ht2Kea7tDrAuCuqtpbVU8DTwHnJDkJ\n",
"OLaqtrX57gAubOPnAxvb+L3Au9v4OmBzVe2pqj3AFuC8FmLvAr7S5ts4sC5Jmme7b+hOqW2kGy57\n",
"qavp9RxuR4K/TfLjJLcnWdlqJwPbB+bZDpwyTX1Hq9N+PgNQVfuAF5KccIh1TQB7qmr/NOuSpHnV\n",
"Xbt58X3wiS3dsLiu58yn5Yex7G3Af2vjnwZuoJ9TXDPu+ZDkuoGXW6tq65y1RtKS1EJm0QRNkrXA\n",
"2vl+n1mHTlX9YnI8yeeBf2wvdwCrBmY9le4IZUcbn1qfXOZNwLNJlgPHVdWuJDt49UZYBXwH2A2s\n",
"TLKsHe2c2tZxsLZeN9PPJ0lLSftjfOvk6yTXzsf7zPr0WrtGM+l9wGTPtm8AFyc5KskZwGpgW1U9\n",
"B7yY5Jx2TebDwNcHltnQxt8PfLuNbwbOTbIyyfHAe4H7q+vn/V3gA22+DcDXZvtZJEn9GOpIJ8ld\n",
"wJ8Df5zkGeBaYG2St9Od7voZ8FcAVfVYknuAx4B9wKV14GagS4EvASuA+6rqm61+O3BnkieBXcDF\n",
"bV27k3wa+H6b71OtQwHAFcCmJJ8BftjWIUlawLw5VJL0GiO9OVSSpLlg6EiSemPoSJJ6Y+hI0iz4\n",
"lOnZsSOBJM2QT5mevcN5IoEkLVETl8ONKw7cXsgK+MTlLKInFMwXT69JknrjkY4kzdjuG+Cyd9Ld\n",
"6E47veZTpofgNR1JmoXuus7kf9y2+4bFdD0H5m/faehIkl7DJxJIksaeoSNJ6o2hI0nqjaEjSeqN\n",
"oSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEj\n",
"SeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSerNUKGT5AtJdiZ5\n",
"dKA2kWRLkieSbE6ycmDaVUmeTPJ4knMH6mclebRNu3mgfnSSu1v9gSSnDUzb0N7jiSQfGaifkeTB\n",
"tsymJEcezoaQJM2/YY90vgisn1K7EthSVW8Bvt1ek2QNcBGwpi1za5K0ZW4DLqmq1cDqJJPrvATY\n",
"1eo3Ade3dU0A1wBnt+HaJMe1Za4HbmjLPN/WIUlawIYKnar6Ht2OfdD5wMY2vhG4sI1fANxVVXur\n",
"6mngKeCcJCcBx1bVtjbfHQPLDK7rXuDdbXwdsLmq9lTVHmALcF4LsXcBX5nm/SVJC9ThXNM5sap2\n",
"tvGdwIlt/GRg+8B824FTpqnvaHXaz2cAqmof8EKSEw6xrglgT1Xtn2ZdkqQFak46ElRVATUX6xrm\n",
"7Xp6H0nSHFt+GMvuTPLGqnqunTr7RavvAFYNzHcq3RHKjjY+tT65zJuAZ5MsB46rql1JdgBrB5ZZ\n",
"BXwH2A2sTLKsHe2c2tYxrSTXDbzcWlVbZ/JBJWmxS7KWV+9v58XhhM43gA10F/Q3AF8bqH85yY10\n",
"p7xWA9uqqpK8mOQcYBvwYeCWKet6AHg/XccEgM3AZ1vPuADvBa5o6/ou8AHg7inv/xpVdd1hfE5J\n",
"WvTaH+NbJ18nuXY+3ifdmbHXmSm5C/hz4I/prt9cA3wduIfuCOVp4IPtYj9JrgY+BuwDPl5V97f6\n",
"WcCXgBXAfVV1WasfDdwJnAnsAi5unRBI8lHg6taUz1TVxlY/A9hEd33nh8CHqmrvNG2vqsrUuiTp\n",
"4OZr3zlU6IwzQ0eSZm6+9p0+kUCS1BtDR5LUG0NHktQbQ0eSZiHJuuSEzd2QdaNuz7iwI4EkzVAX\n",
"Mn/0VbhlRVe57CV48X2TPXUXg/nadx7OfTqStERNXA43ruhuEQRgBXzicmDRhM588fSaJKk3HulI\n",
"0oztvgEueyfdje6002s3jLRJY8JrOpI0C911nYnLu1e7b1hM13PAJxLMmqEjSTPnEwkkSWPP0JEk\n",
"9cbQkST1xtCRJPXG0JEk9cbQkaRZ8Nlrs2OXaUmaIZ+9Nns+kUCSZsxnr82Wp9ckSb3xSEeSZsxn\n",
"r82W13QkaRZ89trseHpNktQbj3QkaYaSXA1/9Bm4pe1bLnsZXrxgMR3t+JTpWTJ0JM2l7rTacffB\n",
"zcsO9F7bCPynH1Y9f9Yo2zaX7DItSQvCxOXwhmkuTSw7rf+2jB9DR5Jm7BjgkwOvPwm8/H9G1Jix\n",
"YuhI0ozs3gq/fW+3+/yHVvvNPvjN1SNs1Niw95okzcjEWvhr4G3AL4FfA3lkMXUimE+GjiTNyP4T\n",
"usD5Z+ApulNrR462SWPE02uSNCMvv3ma6zlvHlVrxo1dpiVpBpKJgo8BP2uVM4AvULV7Ue1nvE9n\n",
"lgwdSXMpObbgD4D/3iqfBH5L1a8W1X7G+3QkaUF46RXYf8RAzzXg5VdG2aJxYuhI0owcuQ9ePgKe\n",
"aK9fbjUNw95rkjQjK5bDhcDxbbiw1TQMr+lI0gwkK34HRx0Jt7TKZcDv9la9dNQo2zXXFux/bZDk\n",
"6SSPJPlRkm2tNpFkS5InkmxOsnJg/quSPJnk8STnDtTPSvJom3bzQP3oJHe3+gNJThuYtqG9xxNJ\n",
"PnK4n0WSXt/yI2Af8F/bsK/VNIy5OL1WwNqqOrOqzm61K4EtVfUW4NvtNUnWABcBa4D1wK1JJpP0\n",
"NuCSqloNrE6yvtUvAXa1+k3A9W1dE8A1wNltuHYw3CRpnizrLoef2oblraZhzNWGmnoIdj7ds75p\n",
"Py9s4xcAd1XV3qp6mu523nOSnAQcW1Xb2nx3DCwzuK57gXe38XXA5qraU1V7gC10QSZJ82hfGw72\n",
"WocyV0c630ryUJK/bLUTq2pnG98JnNjGTwa2Dyy7HThlmvqOVqf9fAagqvYBLyQ54RDrkqR59Mp+\n",
"eIVul7OdbvyV/aNt0/iYix4X76iqnyf5E2BLkscHJ1ZVJVncvRUkLSF5DvafDL9tr/e3moZx2KFT\n",
"VT9vP3+Z5Kt011d2JnljVT3XTp39os2+A1g1sPipdH8q7GjjU+uTy7wJeDbJcuC4qtqVZAewdmCZ\n",
"VcB3pmtjkusGXm6tqq0z/ZyS1Jncbf5B+/mbUTVkTiVZy6v3qfPzPofTZTrJG4AjqupXSf4Q2Ax8\n",
"CngP3cX/65NcCaysqitbR4Iv0wXTKcC3gDe3o6EH6foebgP+J3BLVX0zyaXA26rqr5NcDFxYVRe3\n",
"jgQPAX9Kd03pB8Cftus7g220y7SkOZNM7IOPHTHl2WuvVO1eVPfqLNTH4JwIfLV1QFsO/I+q2pzk\n",
"IeCeJJcATwMfBKiqx5LcAzxGd+Xt0jqQepcCXwJWAPdV1Tdb/XbgziRPAruAi9u6dif5NPD9Nt+n\n",
"pgaOJM29ven6Ng0+e22vf9gOyZtDJWkGkhWvwLJl8G9b5RFg//6qlxbVvToL9UhHkpaY5cu6zgPb\n",
"p9Q0DENHkmZkH93dJpN9n3bjfTrDM3QkacYm79OZHPcM/rA8JJSkGZnuv87xv9MZlkc6kjQjkwEz\n",
"eHOooTMsQ0eSZqToQmbyyTevtJqG4ek1SZqR8OqQKbymMzxDR5KGlOT+14bM1BDSoRg6kjS048/t\n",
"AmbqkY6hMyxDR5JmbGroaFiGjiQN7UUOstv8Qc8NGVuGjiQNbT/dNZzJXWd3lFNV/25ULRo3ho4k\n",
"Da3ogmdle+2D7WfK0JGkGfF6zuEwdCRpaNPdj+M9OjPhEwkkaQhJ1rWxwSoe7cyMRzqSNJRj7+t+\n",
"enrtcBg6kjQU/6O2ueBGlKSh7D3YhN8ebIJey9CRpKH8GnjDwOvf36OzYiTNGVOGjiS9jiS/68Z+\n",
"A0y06vOjas5YM3Qk6XUdf+SoW7BYGDqSpN54n44kHUKSQ/1f1P+vt4YsEh7pSNIhHT/NfvL3nQiO\n",
"6bkxY8/QkaSDeO1Rzqs6EXhn6CwYOpI0jSR7u6Oc54EjpptlS89NWhQMHUmaIsnVcPzyA89Ze4UD\n",
"wVMA+6tq3UgaN+ZStbiPEJNUVfkYWElDS46v7m/y3XQ3hA7en7MbYH1V3T+i5vVivvadho4kNe0a\n",
"zjI4nlff/HkMcBTwO+DX+6tq2vNti8l87TvtMi1pyTsQNtAFzktT5jiK7rTa0gic+WToSFqyXhs2\n",
"k3/Y/xY4GniZ7rRaAc9j4Bw+Q0fSkvHqkJk0GDbQrtlwIHCgnWr7v/PbuqXB0JG0KE0fMIMOFjZT\n",
"/b632p/MVduWMkNH0lh6/VCZamrITDpY2BxPO632QlWtnGn7ND1DR9JIJdkDHDe3az1YwAw6VNgA\n",
"PG+ngXkw9qGTZD3w93R3bn2+qq4fcZOksTLzI4aFYJhQmepgITO4TjBs5tdY36eT5AjgX4H3ADuA\n",
"7wN/UVU/HZhnyd+nM547FS0NswmPYbxewExtAxg2r+Z9OtM7G3iqqp4GSLIJuAD46aEWWkoMHB3a\n",
"fO3059tMQmUqQ2aUxj10TgGeGXi9HThnRG1ZoI5fNp47FS0NhxMewzBgFppxD52hzg0muW7g5daq\n",
"2jovrZHGznzv9OeboTJXkqwF1s73+4x76OwAVg28XkV3tPMqVXVdXw1aeJ7fj6fXtOAZHqPW/hjf\n",
"Ovk6ybXz8T7jHjoPAauTnA48C1wE/MUoG7TQVNURXtfR63Onr36MdehU1b4k/xG4n67L9O2DPdfU\n",
"cSciaaEY6y7Tw7DLtCTN3HztOz3lIknqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEj\n",
"SeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nq\n",
"jaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2h\n",
"I0nqjaEjSeqNoSNJ6o2hI0nqjaEjSerNrEMnyXVJtif5URvOG5h2VZInkzye5NyB+llJHm3Tbh6o\n",
"H53k7lZ/IMlpA9M2JHmiDR8ZqJ+R5MG2zKYkR872s0iS+nE4RzoF3FhVZ7bhnwCSrAEuAtYA64Fb\n",
"k6QtcxtwSVWtBlYnWd/qlwC7Wv0m4Pq2rgngGuDsNlyb5Li2zPXADW2Z59s6dAhJ1o66DQuF2+IA\n",
"t8UBbov5d7in1zJN7QLgrqraW1VPA08B5yQ5CTi2qra1+e4ALmzj5wMb2/i9wLvb+Dpgc1Xtqao9\n",
"wBbgvBZi7wK+0ubbOLAuHdzaUTdgAVk76gYsIGtH3YAFZO2oG7DYHW7o/G2SHye5PcnKVjsZ2D4w\n",
"z3bglGnqO1qd9vMZgKraB7yQ5IRDrGsC2FNV+6dZlyRpgTpk6CTZ0q7BTB3OpztVdgbwduDnwA09\n",
"tBe603qSpDG0/FATq+q9w6wkyeeBf2wvdwCrBiafSneEsqONT61PLvMm4Nkky4HjqmpXkh28+nB3\n",
"FfAdYDewMsmydrRzalvHwdpnUDVJrh11GxYKt8UBbosD3Bbz65ChcyhJTqqqn7eX7wMebePfAL6c\n",
"5Ea6U16rgW1VVUleTHIOsA34MHDLwDIbgAeA9wPfbvXNwGfbqbsA7wWuaOv6LvAB4O627Nema2dV\n",
"TXfdSZI0Aqma3UFAkjvoTq0V8DPgr6pqZ5t2NfAxYB/w8aq6v9XPAr4ErADuq6rLWv1o4E7gTGAX\n",
"cHHrhECSjwJXt7f9TFVtbPUzgE1013d+CHyoqvbO6sNIknox69CRJGmmxvqJBH3doDrukqxv2+HJ\n",
"JFeMuj3zJcnTSR5p34VtrTbROsQ8kWTzQC/LGX9HFrIkX0iyM8mjA7U5++zj9PtxkG2xJPcVSVYl\n",
"+W6Sf0nykySTZ5dG992oqrEdgGuBT0xTXwM8DBwJnE53r9DkUd024Ow2fh+wvo1fCtzaxi8CNo36\n",
"883RNjqiff7T2/Z4GHjrqNs1T5/1Z8DElNrfAf+ljV8BfG6235GFPAB/Rnd6+tH5+Ozj9PtxkG2x\n",
"JPcVwBuBt7fxY4B/Bd46yu/GWB/pNPN9g+q4Oxt4qqqeru6a1ya67bNYTf0+DP67Dt5EPJvvyIJV\n",
"Vd+jezLHoLn87GPz+3GQbQFLcF9RVc9V1cNt/NfAT+k6eI3su7EYQmc+b1CdmNeW9+P3n6uZ3BaL\n",
"UQHfSvJQkr9stROrdXABdgIntvHZfEfGzVx+9sXw+7Gk9xVJTqc7AnyQEX43FnzoZGHeoDpOllJP\n",
"kXdU1ZnAecDfJPmzwYnVHf8vpe3xe0v5szdLel+R5Bi6o5CPV9WvBqf1/d2Y9X06fanR3qC6+zCa\n",
"vlBM3RarePVfLItGtfvGquqXSb5Kd2pxZ5I3VtVz7RTBL9rsM/mOHPTG4wVuLj77ovj9qKrJz77k\n",
"9hXpnsB/L3BnVU3ezziy78aCP9I5lLaxJk29QfXiJEelu59n8gbV54AXk5yTJHQ3qH59YJkNbXzw\n",
"BtVx9xDdE71PT3IU3YW+b4y4TXMuyRuSHNvG/xA4l+77MPjvOngT8Uy+I9PeeDwG5uKzL4rfj6W6\n",
"r2htvx14rKr+fmDS6L4bo+5dcZg9M+4AHgF+3DbaiQPTrqa7CPY4sG6gfhbdF+4p4JaB+tHAPcCT\n",
"dE9GOH3Un28Ot9N5dL1WngKuGnV75ukznkHX6+Zh4CeTn5Pu5uFvAU/QPeFi5Wy/Iwt5AO4CngV+\n",
"R3d+/aNz+dnH6fdjmm3xsaW6rwDeCexvvxc/asP6UX43vDlUktSbsT69JkkaL4aOJKk3ho4kqTeG\n",
"jiSpN4aOJKk3ho4kqTeGjiSpN4aOJKk3/x9oiq1gfMdI4AAAAABJRU5ErkJggg==\n"
],
"text/plain": [
"<matplotlib.figure.Figure at 0x37b6d30>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"#Ignore this line, it's just for the presentation\n",
"%matplotlib inline \n",
"\n",
"def get_name(line):\n",
" \"\"\"returns the transcript name of an RNA from a line of a GFF3 file\"\"\"\n",
" columns = line.split()\n",
" info_column = columns[-1]\n",
" semi_columns = info_column.split(\";\")\n",
" transcript_name_column = semi_columns[9]\n",
" name_split = transcript_name_column.split(\"=\")\n",
" name = name_split[1]\n",
" return name\n",
"\n",
"#Open the file\n",
"folder = \"C:/Users/Jessime/Research/\"\n",
"file_path = folder+\"gencode.v22.long_noncoding_RNAs.gff3\"\n",
"gff3 = open(file_path)\n",
"\n",
"#Ignore the header lines\n",
"print \"#\"*35+\"Header\"+\"#\"*35\n",
"for i in range(7):\n",
" print gff3.readline()\n",
"print \"#\"*35+\"End Header\"+\"#\"*35+\"\\n\"*3\n",
"\n",
"#Place to store transcript lines\n",
"transcripts = []\n",
"\n",
"#Loop through the rest of the lines\n",
"for line in gff3:\n",
" columns = line.split() #Google 'get column of text python'\n",
" line_type = columns[2]\n",
" \n",
" #Check to see if the 3rd column is a transcript\n",
" if line_type == \"transcript\":\n",
" #If it is, save that line\n",
" transcripts.append(line)\n",
"print \"Number of transcripts: \", len(transcripts)\n",
" \n",
"#Place to store 01 transcript lines\n",
"transcripts01 = []\n",
"\n",
"#Loop through each transcript\n",
"for line in transcripts:\n",
" \n",
" #Extract the name from the line\n",
" name = get_name(line)\n",
" \n",
" #Check if name ends in 01\n",
" if name[-2:] == \"01\":\n",
" #If it does, save that line\n",
" transcripts01.append(line)\n",
"print \"Number of 01 transcripts: \", len(transcripts01), \"\\n\"*3\n",
" \n",
"#Make containers for the spans\n",
"spans = [] \n",
"span_lookup = {} #A dictionary will be perfect for our purposes here\n",
"\n",
"#Loop through all of the 01 transcripts\n",
"for line in transcripts01:\n",
" #Find the span\n",
" columns = line.split()\n",
" start = int(columns[3])\n",
" end = int(columns[4])\n",
" span = end - start\n",
" \n",
" #Populate the dictionary\n",
" name = get_name(line)\n",
" span_lookup[name] = span\n",
" \n",
" #Populate the list\n",
" spans.append(span)\n",
"\n",
"\n",
"print \"Number of spans: \", len(spans)\n",
"print \"The span of RP11-72L22.1-001 is: \", span_lookup[\"RP11-72L22.1-001\"]\n",
"print \"The span of XIST-001 is: \", span_lookup[\"XIST-001\"], \"\\n\"*3\n",
"\n",
"#Coordinate lists for the scatter plot\n",
"x = range(len(spans))\n",
"y = sorted(spans)\n",
"\n",
"#Generate scatter plot\n",
"plt.scatter(x,y)\n",
"plt.show()\n",
"\n",
"gff3.close()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Congratulations! We have working code, that we created from scratch. \n",
"\n",
"Now, if time allows, we can go back through and make edits. Edits should focus on keeping code well documents, easy to read, computationally efficient, and reasonably generalized."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"What improvements could we make to our scripts?"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"image/png": [
"iVBORw0KGgoAAAANSUhEUgAAAZ0AAAEACAYAAABoJ6s/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n",
"AAALEgAACxIB0t1+/AAAGIdJREFUeJzt3X+sX3Wd5/HnqxSYOjCUy0yQHxVIrInduJFhA5voZGpU\n",
"Wv4BTFSYRG2UTGaH2cFdMcuPbABXY2SywMAfMH+IWshKIRJ/TJaR1h9NnD+goiI4yAC7sqFFqttS\n",
"0BWxpe/943yu/XK5Ld97e+/53u+9z0dycs/3fX58P9/T7z2ve875nNNUFZIk9WHZqBsgSVo6DB1J\n",
"Um8MHUlSbwwdSVJvDB1JUm8MHUlSb4YKnSSrknw3yb8k+UmSy1p9IsmWJE8k2Zxk5cAyVyV5Msnj\n",
"Sc4dqJ+V5NE27eaB+tFJ7m71B5KcNjBtQ3uPJ5J8ZKB+RpIH2zKbkhx5uBtEkjR/hj3S2Qv856r6\n",
"N8C/B/4myVuBK4EtVfUW4NvtNUnWABcBa4D1wK1J0tZ1G3BJVa0GVidZ3+qXALta/Sbg+rauCeAa\n",
"4Ow2XJvkuLbM9cANbZnn2zokSQvUUKFTVc9V1cNt/NfAT4FTgPOBjW22jcCFbfwC4K6q2ltVTwNP\n",
"AeckOQk4tqq2tfnuGFhmcF33Au9u4+uAzVW1p6r2AFuA81qIvQv4yjTvL0lagGZ8TSfJ6cCZwIPA\n",
"iVW1s03aCZzYxk8Gtg8stp0upKbWd7Q67eczAFW1D3ghyQmHWNcEsKeq9k+zLknSAjSj0ElyDN1R\n",
"yMer6leD06p7nk5fz9Tx2T2SNIaWDztju0h/L3BnVX2tlXcmeWNVPddOnf2i1XcAqwYWP5XuCGVH\n",
"G59an1zmTcCzSZYDx1XVriQ7gLUDy6wCvgPsBlYmWdaOdk5t65jabgNKkmahqvL6c818pa87AKG7\n",
"/nLTlPrfAVe08SuBz7XxNcDDwFHAGcD/AtKmPQic09Z5H7C+1S8FbmvjFwOb2vgE8L+BlcDxk+Nt\n",
"2j3ARW38H4D/ME3ba5jPuBQG4LpRt2GhDG4Lt4Xb4nW3Rc3Heoc90nkH8CHgkSQ/arWrgM8B9yS5\n",
"BHga+GBr6WNJ7gEeA/YBl1b7FC1cvgSsAO6rqm+2+u3AnUmeBHa14KGqdif5NPD9Nt+nqutQAHAF\n",
"sCnJZ4AftnVIkhaooUKnqv6Zg1//ec9Blvks8Nlp6j8A3jZN/WVaaE0z7YvAF6ep/4zuqEmSNAZ8\n",
"IsHSsnXUDVhAto66AQvI1lE3YAHZOuoGLHY5cNZrcUpSNR8XwyRpEZuvfadHOpKk3hg6kqTeGDqS\n",
"pN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTe\n",
"GDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6\n",
"kqTeGDqSpN4YOpKk3hg6kqTeGDqSpN4YOpKk3hg6kqTeGDqSNAtJ1iUnbO6GrBt1e8ZFqmrUbZhX\n",
"SaqqMup2SFo8upD5o6/CLSu6ymUvwYvvq6r7R9uyuTNf+86hjnSSfCHJziSPDtSuS7I9yY/acN7A\n",
"tKuSPJnk8STnDtTPSvJom3bzQP3oJHe3+gNJThuYtiHJE234yED9jCQPtmU2JTnycDaEJA1v4vIu\n",
"cDbQDbes6Gp6PcOeXvsisH5KrYAbq+rMNvwTQJI1wEXAmrbMrUkm0/I24JKqWg2sTjK5zkuAXa1+\n",
"E3B9W9cEcA1wdhuuTXJcW+Z64Ia2zPNtHZKkBWyo0Kmq79Ht2Kea7tDrAuCuqtpbVU8DTwHnJDkJ\n",
"OLaqtrX57gAubOPnAxvb+L3Au9v4OmBzVe2pqj3AFuC8FmLvAr7S5ts4sC5Jmme7b+hOqW2kGy57\n",
"qavp9RxuR4K/TfLjJLcnWdlqJwPbB+bZDpwyTX1Hq9N+PgNQVfuAF5KccIh1TQB7qmr/NOuSpHnV\n",
"Xbt58X3wiS3dsLiu58yn5Yex7G3Af2vjnwZuoJ9TXDPu+ZDkuoGXW6tq65y1RtKS1EJm0QRNkrXA\n",
"2vl+n1mHTlX9YnI8yeeBf2wvdwCrBmY9le4IZUcbn1qfXOZNwLNJlgPHVdWuJDt49UZYBXwH2A2s\n",
"TLKsHe2c2tZxsLZeN9PPJ0lLSftjfOvk6yTXzsf7zPr0WrtGM+l9wGTPtm8AFyc5KskZwGpgW1U9\n",
"B7yY5Jx2TebDwNcHltnQxt8PfLuNbwbOTbIyyfHAe4H7q+vn/V3gA22+DcDXZvtZJEn9GOpIJ8ld\n",
"wJ8Df5zkGeBaYG2St9Od7voZ8FcAVfVYknuAx4B9wKV14GagS4EvASuA+6rqm61+O3BnkieBXcDF\n",
"bV27k3wa+H6b71OtQwHAFcCmJJ8BftjWIUlawLw5VJL0GiO9OVSSpLlg6EiSemPoSJJ6Y+hI0iz4\n",
"lOnZsSOBJM2QT5mevcN5IoEkLVETl8ONKw7cXsgK+MTlLKInFMwXT69JknrjkY4kzdjuG+Cyd9Ld\n",
"6E47veZTpofgNR1JmoXuus7kf9y2+4bFdD0H5m/faehIkl7DJxJIksaeoSNJ6o2hI0nqjaEjSeqN\n",
"oSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEj\n",
"SeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSerNUKGT5AtJdiZ5\n",
"dKA2kWRLkieSbE6ycmDaVUmeTPJ4knMH6mclebRNu3mgfnSSu1v9gSSnDUzb0N7jiSQfGaifkeTB\n",
"tsymJEcezoaQJM2/YY90vgisn1K7EthSVW8Bvt1ek2QNcBGwpi1za5K0ZW4DLqmq1cDqJJPrvATY\n",
"1eo3Ade3dU0A1wBnt+HaJMe1Za4HbmjLPN/WIUlawIYKnar6Ht2OfdD5wMY2vhG4sI1fANxVVXur\n",
"6mngKeCcJCcBx1bVtjbfHQPLDK7rXuDdbXwdsLmq9lTVHmALcF4LsXcBX5nm/SVJC9ThXNM5sap2\n",
"tvGdwIlt/GRg+8B824FTpqnvaHXaz2cAqmof8EKSEw6xrglgT1Xtn2ZdkqQFak46ElRVATUX6xrm\n",
"7Xp6H0nSHFt+GMvuTPLGqnqunTr7RavvAFYNzHcq3RHKjjY+tT65zJuAZ5MsB46rql1JdgBrB5ZZ\n",
"BXwH2A2sTLKsHe2c2tYxrSTXDbzcWlVbZ/JBJWmxS7KWV+9v58XhhM43gA10F/Q3AF8bqH85yY10\n",
"p7xWA9uqqpK8mOQcYBvwYeCWKet6AHg/XccEgM3AZ1vPuADvBa5o6/ou8AHg7inv/xpVdd1hfE5J\n",
"WvTaH+NbJ18nuXY+3ifdmbHXmSm5C/hz4I/prt9cA3wduIfuCOVp4IPtYj9JrgY+BuwDPl5V97f6\n",
"WcCXgBXAfVV1WasfDdwJnAnsAi5unRBI8lHg6taUz1TVxlY/A9hEd33nh8CHqmrvNG2vqsrUuiTp\n",
"4OZr3zlU6IwzQ0eSZm6+9p0+kUCS1BtDR5LUG0NHktQbQ0eSZiHJuuSEzd2QdaNuz7iwI4EkzVAX\n",
"Mn/0VbhlRVe57CV48X2TPXUXg/nadx7OfTqStERNXA43ruhuEQRgBXzicmDRhM588fSaJKk3HulI\n",
"0oztvgEueyfdje6002s3jLRJY8JrOpI0C911nYnLu1e7b1hM13PAJxLMmqEjSTPnEwkkSWPP0JEk\n",
"9cbQkST1xtCRJPXG0JEk9cbQkaRZ8Nlrs2OXaUmaIZ+9Nns+kUCSZsxnr82Wp9ckSb3xSEeSZsxn\n",
"r82W13QkaRZ89trseHpNktQbj3QkaYaSXA1/9Bm4pe1bLnsZXrxgMR3t+JTpWTJ0JM2l7rTacffB\n",
"zcsO9F7bCPynH1Y9f9Yo2zaX7DItSQvCxOXwhmkuTSw7rf+2jB9DR5Jm7BjgkwOvPwm8/H9G1Jix\n",
"YuhI0ozs3gq/fW+3+/yHVvvNPvjN1SNs1Niw95okzcjEWvhr4G3AL4FfA3lkMXUimE+GjiTNyP4T\n",
"usD5Z+ApulNrR462SWPE02uSNCMvv3ma6zlvHlVrxo1dpiVpBpKJgo8BP2uVM4AvULV7Ue1nvE9n\n",
"lgwdSXMpObbgD4D/3iqfBH5L1a8W1X7G+3QkaUF46RXYf8RAzzXg5VdG2aJxYuhI0owcuQ9ePgKe\n",
"aK9fbjUNw95rkjQjK5bDhcDxbbiw1TQMr+lI0gwkK34HRx0Jt7TKZcDv9la9dNQo2zXXFux/bZDk\n",
"6SSPJPlRkm2tNpFkS5InkmxOsnJg/quSPJnk8STnDtTPSvJom3bzQP3oJHe3+gNJThuYtqG9xxNJ\n",
"PnK4n0WSXt/yI2Af8F/bsK/VNIy5OL1WwNqqOrOqzm61K4EtVfUW4NvtNUnWABcBa4D1wK1JJpP0\n",
"NuCSqloNrE6yvtUvAXa1+k3A9W1dE8A1wNltuHYw3CRpnizrLoef2oblraZhzNWGmnoIdj7ds75p\n",
"Py9s4xcAd1XV3qp6mu523nOSnAQcW1Xb2nx3DCwzuK57gXe38XXA5qraU1V7gC10QSZJ82hfGw72\n",
"WocyV0c630ryUJK/bLUTq2pnG98JnNjGTwa2Dyy7HThlmvqOVqf9fAagqvYBLyQ54RDrkqR59Mp+\n",
"eIVul7OdbvyV/aNt0/iYix4X76iqnyf5E2BLkscHJ1ZVJVncvRUkLSF5DvafDL9tr/e3moZx2KFT\n",
"VT9vP3+Z5Kt011d2JnljVT3XTp39os2+A1g1sPipdH8q7GjjU+uTy7wJeDbJcuC4qtqVZAewdmCZ\n",
"VcB3pmtjkusGXm6tqq0z/ZyS1Jncbf5B+/mbUTVkTiVZy6v3qfPzPofTZTrJG4AjqupXSf4Q2Ax8\n",
"CngP3cX/65NcCaysqitbR4Iv0wXTKcC3gDe3o6EH6foebgP+J3BLVX0zyaXA26rqr5NcDFxYVRe3\n",
"jgQPAX9Kd03pB8Cftus7g220y7SkOZNM7IOPHTHl2WuvVO1eVPfqLNTH4JwIfLV1QFsO/I+q2pzk\n",
"IeCeJJcATwMfBKiqx5LcAzxGd+Xt0jqQepcCXwJWAPdV1Tdb/XbgziRPAruAi9u6dif5NPD9Nt+n\n",
"pgaOJM29ven6Ng0+e22vf9gOyZtDJWkGkhWvwLJl8G9b5RFg//6qlxbVvToL9UhHkpaY5cu6zgPb\n",
"p9Q0DENHkmZkH93dJpN9n3bjfTrDM3QkacYm79OZHPcM/rA8JJSkGZnuv87xv9MZlkc6kjQjkwEz\n",
"eHOooTMsQ0eSZqToQmbyyTevtJqG4ek1SZqR8OqQKbymMzxDR5KGlOT+14bM1BDSoRg6kjS048/t\n",
"AmbqkY6hMyxDR5JmbGroaFiGjiQN7UUOstv8Qc8NGVuGjiQNbT/dNZzJXWd3lFNV/25ULRo3ho4k\n",
"Da3ogmdle+2D7WfK0JGkGfF6zuEwdCRpaNPdj+M9OjPhEwkkaQhJ1rWxwSoe7cyMRzqSNJRj7+t+\n",
"enrtcBg6kjQU/6O2ueBGlKSh7D3YhN8ebIJey9CRpKH8GnjDwOvf36OzYiTNGVOGjiS9jiS/68Z+\n",
"A0y06vOjas5YM3Qk6XUdf+SoW7BYGDqSpN54n44kHUKSQ/1f1P+vt4YsEh7pSNIhHT/NfvL3nQiO\n",
"6bkxY8/QkaSDeO1Rzqs6EXhn6CwYOpI0jSR7u6Oc54EjpptlS89NWhQMHUmaIsnVcPzyA89Ze4UD\n",
"wVMA+6tq3UgaN+ZStbiPEJNUVfkYWElDS46v7m/y3XQ3hA7en7MbYH1V3T+i5vVivvadho4kNe0a\n",
"zjI4nlff/HkMcBTwO+DX+6tq2vNti8l87TvtMi1pyTsQNtAFzktT5jiK7rTa0gic+WToSFqyXhs2\n",
"k3/Y/xY4GniZ7rRaAc9j4Bw+Q0fSkvHqkJk0GDbQrtlwIHCgnWr7v/PbuqXB0JG0KE0fMIMOFjZT\n",
"/b632p/MVduWMkNH0lh6/VCZamrITDpY2BxPO632QlWtnGn7ND1DR9JIJdkDHDe3az1YwAw6VNgA\n",
"PG+ngXkw9qGTZD3w93R3bn2+qq4fcZOksTLzI4aFYJhQmepgITO4TjBs5tdY36eT5AjgX4H3ADuA\n",
"7wN/UVU/HZhnyd+nM547FS0NswmPYbxewExtAxg2r+Z9OtM7G3iqqp4GSLIJuAD46aEWWkoMHB3a\n",
"fO3059tMQmUqQ2aUxj10TgGeGXi9HThnRG1ZoI5fNp47FS0NhxMewzBgFppxD52hzg0muW7g5daq\n",
"2jovrZHGznzv9OeboTJXkqwF1s73+4x76OwAVg28XkV3tPMqVXVdXw1aeJ7fj6fXtOAZHqPW/hjf\n",
"Ovk6ybXz8T7jHjoPAauTnA48C1wE/MUoG7TQVNURXtfR63Onr36MdehU1b4k/xG4n67L9O2DPdfU\n",
"cSciaaEY6y7Tw7DLtCTN3HztOz3lIknqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEj\n",
"SeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nq\n",
"jaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2hI0nqjaEjSeqNoSNJ6o2h\n",
"I0nqjaEjSeqNoSNJ6o2hI0nqjaEjSerNrEMnyXVJtif5URvOG5h2VZInkzye5NyB+llJHm3Tbh6o\n",
"H53k7lZ/IMlpA9M2JHmiDR8ZqJ+R5MG2zKYkR872s0iS+nE4RzoF3FhVZ7bhnwCSrAEuAtYA64Fb\n",
"k6QtcxtwSVWtBlYnWd/qlwC7Wv0m4Pq2rgngGuDsNlyb5Li2zPXADW2Z59s6dAhJ1o66DQuF2+IA\n",
"t8UBbov5d7in1zJN7QLgrqraW1VPA08B5yQ5CTi2qra1+e4ALmzj5wMb2/i9wLvb+Dpgc1Xtqao9\n",
"wBbgvBZi7wK+0ubbOLAuHdzaUTdgAVk76gYsIGtH3YAFZO2oG7DYHW7o/G2SHye5PcnKVjsZ2D4w\n",
"z3bglGnqO1qd9vMZgKraB7yQ5IRDrGsC2FNV+6dZlyRpgTpk6CTZ0q7BTB3OpztVdgbwduDnwA09\n",
"tBe603qSpDG0/FATq+q9w6wkyeeBf2wvdwCrBiafSneEsqONT61PLvMm4Nkky4HjqmpXkh28+nB3\n",
"FfAdYDewMsmydrRzalvHwdpnUDVJrh11GxYKt8UBbosD3Bbz65ChcyhJTqqqn7eX7wMebePfAL6c\n",
"5Ea6U16rgW1VVUleTHIOsA34MHDLwDIbgAeA9wPfbvXNwGfbqbsA7wWuaOv6LvAB4O627Nema2dV\n",
"TXfdSZI0Aqma3UFAkjvoTq0V8DPgr6pqZ5t2NfAxYB/w8aq6v9XPAr4ErADuq6rLWv1o4E7gTGAX\n",
"cHHrhECSjwJXt7f9TFVtbPUzgE1013d+CHyoqvbO6sNIknox69CRJGmmxvqJBH3doDrukqxv2+HJ\n",
"JFeMuj3zJcnTSR5p34VtrTbROsQ8kWTzQC/LGX9HFrIkX0iyM8mjA7U5++zj9PtxkG2xJPcVSVYl\n",
"+W6Sf0nykySTZ5dG992oqrEdgGuBT0xTXwM8DBwJnE53r9DkUd024Ow2fh+wvo1fCtzaxi8CNo36\n",
"883RNjqiff7T2/Z4GHjrqNs1T5/1Z8DElNrfAf+ljV8BfG6235GFPAB/Rnd6+tH5+Ozj9PtxkG2x\n",
"JPcVwBuBt7fxY4B/Bd46yu/GWB/pNPN9g+q4Oxt4qqqeru6a1ya67bNYTf0+DP67Dt5EPJvvyIJV\n",
"Vd+jezLHoLn87GPz+3GQbQFLcF9RVc9V1cNt/NfAT+k6eI3su7EYQmc+b1CdmNeW9+P3n6uZ3BaL\n",
"UQHfSvJQkr9stROrdXABdgIntvHZfEfGzVx+9sXw+7Gk9xVJTqc7AnyQEX43FnzoZGHeoDpOllJP\n",
"kXdU1ZnAecDfJPmzwYnVHf8vpe3xe0v5szdLel+R5Bi6o5CPV9WvBqf1/d2Y9X06fanR3qC6+zCa\n",
"vlBM3RarePVfLItGtfvGquqXSb5Kd2pxZ5I3VtVz7RTBL9rsM/mOHPTG4wVuLj77ovj9qKrJz77k\n",
"9hXpnsB/L3BnVU3ezziy78aCP9I5lLaxJk29QfXiJEelu59n8gbV54AXk5yTJHQ3qH59YJkNbXzw\n",
"BtVx9xDdE71PT3IU3YW+b4y4TXMuyRuSHNvG/xA4l+77MPjvOngT8Uy+I9PeeDwG5uKzL4rfj6W6\n",
"r2htvx14rKr+fmDS6L4bo+5dcZg9M+4AHgF+3DbaiQPTrqa7CPY4sG6gfhbdF+4p4JaB+tHAPcCT\n",
"dE9GOH3Un28Ot9N5dL1WngKuGnV75ukznkHX6+Zh4CeTn5Pu5uFvAU/QPeFi5Wy/Iwt5AO4CngV+\n",
"R3d+/aNz+dnH6fdjmm3xsaW6rwDeCexvvxc/asP6UX43vDlUktSbsT69JkkaL4aOJKk3ho4kqTeG\n",
"jiSpN4aOJKk3ho4kqTeGjiSpN4aOJKk3/x9oiq1gfMdI4AAAAABJRU5ErkJggg==\n"
],
"text/plain": [
"<matplotlib.figure.Figure at 0xd35d518>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"#Ignore this line, it's just for the presentation\n",
"%matplotlib inline \n",
"\n",
"def main(infile, outfile): \n",
" #Open the file\n",
" with open(infile) as gff3, open(outfile, \"w\") as outfile:\n",
" \n",
" #Ignore the header lines\n",
" for i in range(7):\n",
" gff3.readline()\n",
"\n",
" #Make containers for the spans\n",
" spans = [] \n",
" span_lookup = {}\n",
" \n",
" #Loop through the rest of the lines\n",
" for line in gff3:\n",
" columns = line.split() \n",
" line_type = columns[2]\n",
" \n",
" #Check to see if the 3rd column is a transcript\n",
" if line_type == \"transcript\": \n",
" #Extract the name from the line\n",
" name = line.split()[-1].split(\";\")[9].split(\"=\")[1]\n",
" \n",
" #Check if name ends in 01\n",
" if name[-2:] == \"01\":\n",
" #Populate the dictionary\n",
" span = int(columns[4]) - int(columns[3])\n",
" span_lookup[name] = span\n",
" #Populate the list\n",
" spans.append(span)\n",
" \n",
" #Save span lookup table\n",
" outfile.write(\"{}\".format(span_lookup))\n",
" \n",
" #Generate scatter plot\n",
" plt.scatter(range(len(spans)), sorted(spans))\n",
" plt.show()\n",
" \n",
"if __name__ == \"__main__\":\n",
" folder = \"C:/Users/Jessime/Research/\"\n",
" file_path = folder+\"gencode.v22.long_noncoding_RNAs.gff3\"\n",
" outfile = folder+\"transcript_spans.txt\"\n",
" main(file_path, outfile)"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.8"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment