Skip to content

Instantly share code, notes, and snippets.

@nchelaru
Created November 6, 2019 01:30
Show Gist options
  • Save nchelaru/4dbf6ddc313e4a3a60b0612b8278e633 to your computer and use it in GitHub Desktop.
Save nchelaru/4dbf6ddc313e4a3a60b0612b8278e633 to your computer and use it in GitHub Desktop.
7a. Drosophila CNS transcriptome analyses.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"kernel": "SoS"
},
"source": [
"# Index reference transcripts \n",
"\n",
"Reference transcripts were obtained from Ensembl, which were imported from FlyBase release dmel_6.17 (FB2017_04)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"kernel": "calysto_bash"
},
"outputs": [],
"source": [
"## Combine coding & non-coding RNAs\n",
"cat DM_cDNA.fa DM_ncRNA.fa > DM_all_RNA.fa\n",
"\n",
"## Create Salmon index (k=31) \n",
"salmon index -t DM_all_RNA.fa -i ~/DM_all_RNA_k31_Jun28_salmon_index --type quasi -k 31"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true,
"kernel": "SoS"
},
"source": [
"# Pre-processing & mapping"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true,
"kernel": "SoS"
},
"source": [
"## SRR3478195 (15-76 bp)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hidden": true,
"kernel": "calysto_bash"
},
"outputs": [],
"source": [
"## FastQC\n",
"fastqc -o . -f fastq --extract SRR3478195.fastq.gz -t 8\n",
"\n",
"## rCorrector \n",
"perl ~/install/Rcorrector-master/run_rcorrector.pl -t 10 -s ./SRR3478195.fastq.gz\n",
"\n",
"## Filter\n",
"python ~/FilterUncorrectabledSEfastq.py -i SRR3478195.cor.fq.gz -o filtered\n",
"\n",
"## fastp\n",
"fastp -i filtered_SRR3478195.cor.fq -o filtered_SRR3478195_fastp.cor.fq \\\n",
"-q 5 -c -p -w 10 -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGTCCCGATCTCGTAT \\\n",
"-j filtered_SRR3478195_fastp.json -h filtered_SRR3478195_fastp.html \\\n",
"-R \"filtered_SRR3478195_fastp report\"\n",
" \n",
"## FastQC \n",
"fastqc -o . -f fastq --extract filtered_SRR3478195_fastp.cor.fq -t 6\n",
" \n",
"## Mapping to k=31 index ---> 97.3689% reads mapped \n",
"salmon quant -i ~/DM_all_RNA_k31_Jun28_salmon_index -l A \\\n",
"-r filtered_SRR3478195_fastp.cor.fq \\\n",
"-o ~/filtered_SRR3478195_fastp_DM_all_RNA_k31_Jun28_salmon_quant"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true,
"kernel": "SoS"
},
"source": [
"## SRR3478196 (76 bp)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hidden": true,
"kernel": "calysto_bash"
},
"outputs": [],
"source": [
"## FastQC\n",
"fastqc -o . -f fastq --extract SRR3478196.fastq.gz -t 8\n",
"\n",
"## rCorrector \n",
"perl ~/install/Rcorrector-master/run_rcorrector.pl -t 10 -s ./SRR3478196.fastq.gz\n",
"\n",
"## Filter \n",
"python ~/FilterUncorrectabledSEfastq.py -i SRR3478196.cor.fq.gz -o filtered\n",
"\n",
"## fastp \n",
"fastp -i filtered_SRR3478196.cor.fq -o filtered_SRR3478196_fastp.cor.fq \\\n",
"-q 5 -c -p -w 6 -A \\\n",
"-j filtered_fastp_SRR3478196.json -h filtered_fastp_SRR3478196.html \\\n",
"-R \"filtered_fastp_SRR3478196 report\"\n",
" \n",
"## FastQC \n",
"fastqc -o . -f fastq --extract filtered_SRR3478196_fastp.cor.fq -t 6\n",
"\n",
"## Mapping (k=31) ---> 97.1142% reads mapped \n",
"salmon quant -i ~/DM_all_RNA_k31_Jun28_salmon_index -l A \\\n",
"-r filtered_SRR3478196_fastp.cor.fq \\\n",
"-o ~/filtered_SRR3478196_fastp_DM_all_RNA_k31_Jun28_salmon_quant"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true,
"kernel": "SoS"
},
"source": [
"## SRR3478197 (76 bp)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hidden": true,
"kernel": "calysto_bash"
},
"outputs": [],
"source": [
"## FastQC\n",
"fastqc -o . -f fastq --extract SRR3478197.fastq.gz -t 8\n",
"\n",
"## rCorrector \n",
"perl ~/install/Rcorrector-master/run_rcorrector.pl -t 10 -s ./SRR3478197.fastq.gz\n",
"\n",
"## Filter \n",
"python ~/FilterUncorrectabledSEfastq.py -i SRR3478197.cor.fq.gz -o filtered\n",
"\n",
"## fastp \n",
"fastp -i filtered_SRR3478197.cor.fq -o filtered_SRR3478197_fastp.cor.fq \\\n",
"-q 5 -c -p -w 6 -A \\\n",
"-j filtered_fastp_SRR3478197.json -h filtered_fastp_SRR3478197.html \\\n",
"-R \"filtered_fastp_SRR3478197 report\"\n",
" \n",
"## FastQC \n",
"fastqc -o . -f fastq --extract filtered_SRR3478197_fastp.cor.fq -t 6\n",
"\n",
"## Mapping (k=31) ---> 97.148% reads mapped \n",
"salmon quant -i ~/DM_all_RNA_k31_Jun28_salmon_index -l A \\\n",
"-r filtered_SRR3478197_fastp.cor.fq \\\n",
"-o ~/filtered_SRR3478197_fastp_DM_all_RNA_k31_Jun28_salmon_quant"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true,
"kernel": "SoS"
},
"source": [
"## SRR3478217 (76 bp)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hidden": true,
"kernel": "calysto_bash"
},
"outputs": [],
"source": [
"## FastQC\n",
"fastqc -o . -f fastq --extract SRR3478217.fastq.gz -t 8\n",
"\n",
"## rCorrector \n",
"perl ~/install/Rcorrector-master/run_rcorrector.pl -t 10 -s ./SRR3478217.fastq.gz\n",
"\n",
"## Filter \n",
"python ~/FilterUncorrectabledSEfastq.py -i SRR3478217.cor.fq.gz -o filtered\n",
"\n",
"## fastp \n",
"fastp -i filtered_SRR3478217.cor.fq -o filtered_SRR3478217_fastp.cor.fq \\\n",
"-q 5 -c -p -w 6 \\\n",
"-j filtered_fastp_SRR3478217.json -h filtered_fastp_SRR3478217.html \\\n",
"-R \"filtered_fastp_SRR3478217 report\"\n",
"\n",
"## FastQC \n",
"fastqc -o . -f fastq --extract filtered_SRR3478217_fastp.cor.fq -t 6\n",
" \n",
"## Mapping (k=31) ---> 96.9379% reads mapped \n",
"salmon quant -i ~/DM_all_RNA_k31_Jun28_salmon_index -l A \\\n",
"-r filtered_SRR3478217_fastp.cor.fq \\\n",
"-o ~/filtered_SRR3478217_fastp_DM_all_RNA_k31_Jun28_salmon_quant"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true,
"kernel": "SoS"
},
"source": [
"## SRR3478218 (76 bp)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hidden": true,
"kernel": "calysto_bash"
},
"outputs": [],
"source": [
"## FastQC\n",
"fastqc -o . -f fastq --extract SRR3478218.fastq.gz -t 8\n",
"\n",
"## rCorrector \n",
"perl ~/install/Rcorrector-master/run_rcorrector.pl -t 10 -s ./SRR3478218.fastq.gz\n",
"\n",
"## Filter \n",
"python ~/FilterUncorrectabledSEfastq.py -i SRR3478218.cor.fq.gz -o filtered\n",
"\n",
"# fastp \n",
"fastp -i filtered_SRR3478218.cor.fq -o filtered_SRR3478218_fastp.cor.fq \\\n",
"-q 5 -c -p -A -w 6 \\\n",
"-j filtered_SRR3478218_fastp.json -h filtered_SRR3478218_fastp.html \\\n",
"-R \"filtered_SRR3478218_fastp report\"\n",
"\n",
"## FastQC \n",
"fastqc -o . -f fastq --extract filtered_SRR3478218_fastp.cor.fq -t 6\n",
"\n",
"## Mapping (k=31) ---> 97.2746% reads mapped \n",
"salmon quant -i ~/DM_all_RNA_k31_Jun28_salmon_index -l A \\\n",
"-r filtered_SRR3478218_fastp.cor.fq \\\n",
"-o ~/filtered_SRR3478218_fastp_DM_all_RNA_k31_Jun28_salmon_quant"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true,
"kernel": "SoS"
},
"source": [
"## SRR3478219 (76 bp)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hidden": true,
"kernel": "calysto_bash"
},
"outputs": [],
"source": [
"## FastQC\n",
"fastqc -o . -f fastq --extract SRR3478219.fastq.gz -t 8\n",
"\n",
"## rCorrector \n",
"perl ~/install/Rcorrector-master/run_rcorrector.pl -t 10 -s ./SRR3478219.fastq.gz\n",
"\n",
"## Filter \n",
"python ~/FilterUncorrectabledSEfastq.py -i SRR3478219.cor.fq.gz -o filtered\n",
"\n",
"## fastp \n",
"fastp -i filtered_SRR3478219.cor.fq -o filtered_SRR3478219_fastp.cor.fq \\\n",
"-q 5 -c -p -w 6 \\\n",
"-j filtered_SRR3478219_fastp.json -h filtered_SRR3478219_fastp.html \\\n",
"-R \"filtered_SRR3478219_fastp report\"\n",
"\n",
"## FastQC \n",
"fastqc -o . -f fastq --extract filtered_SRR3478219_fastp.cor.fq -t 6\n",
" \n",
"## Mapping (k=31) ---> 97.2507% reads mapped \n",
"salmon quant -i ~/DM_all_RNA_k31_Jun28_salmon_index -l A \\\n",
"-r filtered_SRR3478219_fastp.cor.fq \\\n",
"-o ~/filtered_SRR3478219_fastp_DM_all_RNA_k31_Jun28_salmon_quant"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true,
"kernel": "SoS"
},
"source": [
"# MultiQC summary of QC & mapping \n",
"\n",
"Named Jun29, but using data analyzed on Jun28 (k=31 mapping)\n",
"\n",
"- [DM_multiqc_report_Jun29.html](https://www.dropbox.com/s/9wwtw1fybkh9ss9/DM_multiqc_report_Jun29.html?dl=0)\n",
"- [DM multiQC summary - Jun29.xlsx](https://www.dropbox.com/s/csy8aaw6cacdxfq/DM%20multiQC%20summary%20-%20Jun29.xlsx?dl=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"kernel": "calysto_bash"
},
"outputs": [],
"source": [
"## MultiQC command to summarize all FastQC and Salmon output logs in the current folder\n",
"multiqc ."
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "SoS"
},
"source": [
"# Extract expressed transcripts"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "calysto_bash"
},
"source": [
"## Extract TPM>0 transcript IDs and read counts for each library"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"kernel": "Python3"
},
"outputs": [],
"source": [
"## Import libraries\n",
"import pandas as pd\n",
"import os\n",
"os.chdir(\"/home/zhanglab1/ndong/Lymnaea_CNS_transcriptome_files/7_Interspecies_comparison/7a_Drosophila\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"kernel": "Python3"
},
"outputs": [],
"source": [
"## Define function \n",
"def extract_non0(salmon_output_filename, library_ID):\n",
" with open(salmon_output_filename, \"r\") as infile:\n",
" lib = pd.read_csv(infile, sep='\\t')\n",
" lib_non0 = lib.loc[lib[\"TPM\"]>0] \n",
" lib_non0_counts = lib_non0[[\"Name\", \"TPM\"]] \n",
" print(\"There are\", lib_non0_counts.shape[0], \"transcripts with TPM>0 in the reads library\", library_ID)\n",
" return(lib_non0_counts)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"kernel": "Python3"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are 22356 transcripts with TPM>0 in the reads library SRR3478195\n",
"There are 22889 transcripts with TPM>0 in the reads library SRR3478196\n",
"There are 21790 transcripts with TPM>0 in the reads library SRR3478197\n",
"There are 22697 transcripts with TPM>0 in the reads library SRR3478217\n",
"There are 19936 transcripts with TPM>0 in the reads library SRR3478218\n",
"There are 20369 transcripts with TPM>0 in the reads library SRR3478219\n"
]
}
],
"source": [
"## Analyses\n",
"lib95 = extract_non0(\"./filtered_SRR3478195_fastp_DM_all_RNA_k31_Jun28_salmon_quant/quant.sf\", \"SRR3478195\")\n",
"lib96 = extract_non0(\"./filtered_SRR3478196_fastp_DM_all_RNA_k31_Jun28_salmon_quant/quant.sf\", \"SRR3478196\")\n",
"lib97 = extract_non0(\"./filtered_SRR3478197_fastp_DM_all_RNA_k31_Jun28_salmon_quant/quant.sf\", \"SRR3478197\")\n",
"lib17 = extract_non0(\"./filtered_SRR3478217_fastp_DM_all_RNA_k31_Jun28_salmon_quant/quant.sf\", \"SRR3478217\")\n",
"lib18 = extract_non0(\"./filtered_SRR3478218_fastp_DM_all_RNA_k31_Jun28_salmon_quant/quant.sf\", \"SRR3478218\")\n",
"lib19 = extract_non0(\"./filtered_SRR3478219_fastp_DM_all_RNA_k31_Jun28_salmon_quant/quant.sf\", \"SRR3478219\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true,
"kernel": "R"
},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead><tr><th></th><th scope=col>Name</th><th scope=col>TPM</th></tr></thead>\n",
"<tbody>\n",
"\t<tr><th scope=row>0</th><td>FBtr0075502</td><td> 0.813066 </td></tr>\n",
"\t<tr><th scope=row>1</th><td>FBtr0300738</td><td> 0.102797 </td></tr>\n",
"\t<tr><th scope=row>2</th><td>FBtr0300739</td><td>24.321500 </td></tr>\n",
"\t<tr><th scope=row>3</th><td>FBtr0300737</td><td> 5.016550 </td></tr>\n",
"\t<tr><th scope=row>4</th><td>FBtr0300736</td><td> 0.416568 </td></tr>\n",
"\t<tr><th scope=row>5</th><td>FBtr0078628</td><td> 0.059027 </td></tr>\n",
"\t<tr><th scope=row>6</th><td>FBtr0300740</td><td> 0.015383 </td></tr>\n",
"\t<tr><th scope=row>7</th><td>FBtr0078627</td><td> 2.365450 </td></tr>\n",
"\t<tr><th scope=row>8</th><td>FBtr0300741</td><td> 0.558595 </td></tr>\n",
"\t<tr><th scope=row>9</th><td>FBtr0072762</td><td>18.382800 </td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" & Name & TPM\\\\\n",
"\\hline\n",
"\t0 & FBtr0075502 & 0.813066 \\\\\n",
"\t1 & FBtr0300738 & 0.102797 \\\\\n",
"\t2 & FBtr0300739 & 24.321500 \\\\\n",
"\t3 & FBtr0300737 & 5.016550 \\\\\n",
"\t4 & FBtr0300736 & 0.416568 \\\\\n",
"\t5 & FBtr0078628 & 0.059027 \\\\\n",
"\t6 & FBtr0300740 & 0.015383 \\\\\n",
"\t7 & FBtr0078627 & 2.365450 \\\\\n",
"\t8 & FBtr0300741 & 0.558595 \\\\\n",
"\t9 & FBtr0072762 & 18.382800 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| <!--/--> | Name | TPM | \n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| 0 | FBtr0075502 | 0.813066 | \n",
"| 1 | FBtr0300738 | 0.102797 | \n",
"| 2 | FBtr0300739 | 24.321500 | \n",
"| 3 | FBtr0300737 | 5.016550 | \n",
"| 4 | FBtr0300736 | 0.416568 | \n",
"| 5 | FBtr0078628 | 0.059027 | \n",
"| 6 | FBtr0300740 | 0.015383 | \n",
"| 7 | FBtr0078627 | 2.365450 | \n",
"| 8 | FBtr0300741 | 0.558595 | \n",
"| 9 | FBtr0072762 | 18.382800 | \n",
"\n",
"\n"
],
"text/plain": [
" Name TPM \n",
"0 FBtr0075502 0.813066\n",
"1 FBtr0300738 0.102797\n",
"2 FBtr0300739 24.321500\n",
"3 FBtr0300737 5.016550\n",
"4 FBtr0300736 0.416568\n",
"5 FBtr0078628 0.059027\n",
"6 FBtr0300740 0.015383\n",
"7 FBtr0078627 2.365450\n",
"8 FBtr0300741 0.558595\n",
"9 FBtr0072762 18.382800"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<table>\n",
"<thead><tr><th></th><th scope=col>Name</th><th scope=col>TPM</th></tr></thead>\n",
"<tbody>\n",
"\t<tr><th scope=row>2</th><td>FBtr0300739</td><td>20.8008000 </td></tr>\n",
"\t<tr><th scope=row>4</th><td>FBtr0300736</td><td> 0.8957440 </td></tr>\n",
"\t<tr><th scope=row>5</th><td>FBtr0078628</td><td> 0.0303317 </td></tr>\n",
"\t<tr><th scope=row>7</th><td>FBtr0078627</td><td> 0.3727980 </td></tr>\n",
"\t<tr><th scope=row>9</th><td>FBtr0072762</td><td>16.6405000 </td></tr>\n",
"\t<tr><th scope=row>10</th><td>FBtr0346821</td><td> 0.9219090 </td></tr>\n",
"\t<tr><th scope=row>13</th><td>FBtr0086122</td><td> 2.2962300 </td></tr>\n",
"\t<tr><th scope=row>15</th><td>FBtr0081459</td><td> 3.5324000 </td></tr>\n",
"\t<tr><th scope=row>16</th><td>FBtr0305321</td><td> 4.4229800 </td></tr>\n",
"\t<tr><th scope=row>17</th><td>FBtr0083716</td><td> 1.4543400 </td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" & Name & TPM\\\\\n",
"\\hline\n",
"\t2 & FBtr0300739 & 20.8008000 \\\\\n",
"\t4 & FBtr0300736 & 0.8957440 \\\\\n",
"\t5 & FBtr0078628 & 0.0303317 \\\\\n",
"\t7 & FBtr0078627 & 0.3727980 \\\\\n",
"\t9 & FBtr0072762 & 16.6405000 \\\\\n",
"\t10 & FBtr0346821 & 0.9219090 \\\\\n",
"\t13 & FBtr0086122 & 2.2962300 \\\\\n",
"\t15 & FBtr0081459 & 3.5324000 \\\\\n",
"\t16 & FBtr0305321 & 4.4229800 \\\\\n",
"\t17 & FBtr0083716 & 1.4543400 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| <!--/--> | Name | TPM | \n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| 2 | FBtr0300739 | 20.8008000 | \n",
"| 4 | FBtr0300736 | 0.8957440 | \n",
"| 5 | FBtr0078628 | 0.0303317 | \n",
"| 7 | FBtr0078627 | 0.3727980 | \n",
"| 9 | FBtr0072762 | 16.6405000 | \n",
"| 10 | FBtr0346821 | 0.9219090 | \n",
"| 13 | FBtr0086122 | 2.2962300 | \n",
"| 15 | FBtr0081459 | 3.5324000 | \n",
"| 16 | FBtr0305321 | 4.4229800 | \n",
"| 17 | FBtr0083716 | 1.4543400 | \n",
"\n",
"\n"
],
"text/plain": [
" Name TPM \n",
"2 FBtr0300739 20.8008000\n",
"4 FBtr0300736 0.8957440\n",
"5 FBtr0078628 0.0303317\n",
"7 FBtr0078627 0.3727980\n",
"9 FBtr0072762 16.6405000\n",
"10 FBtr0346821 0.9219090\n",
"13 FBtr0086122 2.2962300\n",
"15 FBtr0081459 3.5324000\n",
"16 FBtr0305321 4.4229800\n",
"17 FBtr0083716 1.4543400"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%get lib95 --from Python3\n",
"%get lib96 --from Python3 \n",
"\n",
"head(lib95, 10)\n",
"head(lib96, 10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "Python3"
},
"source": [
"## Create median-sorted lookup table for TPM>0 transcripts in all six libraries"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"kernel": "Python3"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataframe dimensions: (15379, 8)\n"
]
}
],
"source": [
"## Define function\n",
"def sorted_merged_table(df1, df2, df3, df4, df5, df6, \n",
" lib_name1, lib_name2, lib_name3, lib_name4, lib_name5, lib_name6):\n",
" non0_2 = df1.merge(df2, on='Name') # Create lookup merged table of all libraries\n",
" non0_3 = non0_2.merge(df3, on='Name')\n",
" non0_4 = non0_3.merge(df4, on='Name')\n",
" non0_5 = non0_4.merge(df5, on='Name')\n",
" non0_6 = non0_5.merge(df6, on='Name')\n",
" non0_6['Median'] = non0_6.median(axis=1) # Calculate median read count of each transcript across all libraries\n",
" non0_6_median_sorted = non0_6.sort_values(by=\"Median\", ascending=False) # Sort transcripts by median read count \n",
" non0_6_median_sorted.columns = (\"Name\", lib_name1, lib_name2, lib_name3, # Rename columns\n",
" lib_name4, lib_name5, lib_name6, \"Median\") \n",
" return non0_6_median_sorted\n",
"\n",
"## Create median-sorted lookup table\n",
"DM_sorted_merged = sorted_merged_table(lib95, lib96, lib97, lib17, lib18, lib19, \n",
" \"SRR3478195\", \"SRR3478196\", \"SRR3478197\", \"SRR3478217\", \"SRR3478218\", \"SRR3478219\")\n",
"print(\"Dataframe dimensions:\", DM_sorted_merged.shape)\n",
"\n",
"## Output non-0 transcript IDs to file\n",
"DM_sorted_merged[\"Name\"].to_csv(\"./DM_567789_non0_Jun28.txt\", index=None)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"kernel": "R"
},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead><tr><th></th><th scope=col>Name</th><th scope=col>SRR3478195</th><th scope=col>SRR3478196</th><th scope=col>SRR3478197</th><th scope=col>SRR3478217</th><th scope=col>SRR3478218</th><th scope=col>SRR3478219</th><th scope=col>Median</th></tr></thead>\n",
"<tbody>\n",
"\t<tr><th scope=row>14866</th><td>FBtr0100888</td><td>72407.20 </td><td>81838.0 </td><td>61424.00 </td><td>109216.0 </td><td>103100.0 </td><td>91296.2 </td><td>86567.10 </td></tr>\n",
"\t<tr><th scope=row>15247</th><td>FBtr0307364</td><td>67079.00 </td><td>91910.5 </td><td>71111.50 </td><td> 38351.2 </td><td> 33338.0 </td><td>34928.0 </td><td>52715.10 </td></tr>\n",
"\t<tr><th scope=row>7880</th><td>FBtr0100868</td><td>23731.20 </td><td>30708.1 </td><td>26425.30 </td><td> 32645.9 </td><td> 30841.3 </td><td>28208.7 </td><td>29458.40 </td></tr>\n",
"\t<tr><th scope=row>8570</th><td>FBtr0100861</td><td>26776.40 </td><td>33453.8 </td><td>28996.10 </td><td> 35745.4 </td><td> 29645.1 </td><td>27438.0 </td><td>29320.60 </td></tr>\n",
"\t<tr><th scope=row>12070</th><td>FBtr0082158</td><td>16639.90 </td><td>24308.1 </td><td>20152.30 </td><td> 14051.7 </td><td> 18095.5 </td><td>18878.5 </td><td>18487.00 </td></tr>\n",
"\t<tr><th scope=row>6743</th><td>FBtr0100863</td><td>14372.10 </td><td>18408.6 </td><td>14199.80 </td><td> 21887.3 </td><td> 18524.1 </td><td>16917.7 </td><td>17663.15 </td></tr>\n",
"\t<tr><th scope=row>14915</th><td>FBtr0346885</td><td> 7351.36 </td><td>20358.5 </td><td> 8762.40 </td><td> 10896.8 </td><td> 18878.2 </td><td>23953.6 </td><td>14887.50 </td></tr>\n",
"\t<tr><th scope=row>6257</th><td>FBtr0433502</td><td>11658.80 </td><td>14917.7 </td><td>14051.10 </td><td> 15637.7 </td><td> 14516.7 </td><td>13962.5 </td><td>14283.90 </td></tr>\n",
"\t<tr><th scope=row>14800</th><td>FBtr0346903</td><td> 7757.37 </td><td>16773.8 </td><td> 7570.16 </td><td> 10930.5 </td><td> 21163.4 </td><td>22669.6 </td><td>13852.15 </td></tr>\n",
"\t<tr><th scope=row>11967</th><td>FBtr0072185</td><td>12502.60 </td><td>15227.6 </td><td>12987.80 </td><td> 14468.9 </td><td> 12239.2 </td><td>11509.1 </td><td>12745.20 </td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllll}\n",
" & Name & SRR3478195 & SRR3478196 & SRR3478197 & SRR3478217 & SRR3478218 & SRR3478219 & Median\\\\\n",
"\\hline\n",
"\t14866 & FBtr0100888 & 72407.20 & 81838.0 & 61424.00 & 109216.0 & 103100.0 & 91296.2 & 86567.10 \\\\\n",
"\t15247 & FBtr0307364 & 67079.00 & 91910.5 & 71111.50 & 38351.2 & 33338.0 & 34928.0 & 52715.10 \\\\\n",
"\t7880 & FBtr0100868 & 23731.20 & 30708.1 & 26425.30 & 32645.9 & 30841.3 & 28208.7 & 29458.40 \\\\\n",
"\t8570 & FBtr0100861 & 26776.40 & 33453.8 & 28996.10 & 35745.4 & 29645.1 & 27438.0 & 29320.60 \\\\\n",
"\t12070 & FBtr0082158 & 16639.90 & 24308.1 & 20152.30 & 14051.7 & 18095.5 & 18878.5 & 18487.00 \\\\\n",
"\t6743 & FBtr0100863 & 14372.10 & 18408.6 & 14199.80 & 21887.3 & 18524.1 & 16917.7 & 17663.15 \\\\\n",
"\t14915 & FBtr0346885 & 7351.36 & 20358.5 & 8762.40 & 10896.8 & 18878.2 & 23953.6 & 14887.50 \\\\\n",
"\t6257 & FBtr0433502 & 11658.80 & 14917.7 & 14051.10 & 15637.7 & 14516.7 & 13962.5 & 14283.90 \\\\\n",
"\t14800 & FBtr0346903 & 7757.37 & 16773.8 & 7570.16 & 10930.5 & 21163.4 & 22669.6 & 13852.15 \\\\\n",
"\t11967 & FBtr0072185 & 12502.60 & 15227.6 & 12987.80 & 14468.9 & 12239.2 & 11509.1 & 12745.20 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| <!--/--> | Name | SRR3478195 | SRR3478196 | SRR3478197 | SRR3478217 | SRR3478218 | SRR3478219 | Median | \n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| 14866 | FBtr0100888 | 72407.20 | 81838.0 | 61424.00 | 109216.0 | 103100.0 | 91296.2 | 86567.10 | \n",
"| 15247 | FBtr0307364 | 67079.00 | 91910.5 | 71111.50 | 38351.2 | 33338.0 | 34928.0 | 52715.10 | \n",
"| 7880 | FBtr0100868 | 23731.20 | 30708.1 | 26425.30 | 32645.9 | 30841.3 | 28208.7 | 29458.40 | \n",
"| 8570 | FBtr0100861 | 26776.40 | 33453.8 | 28996.10 | 35745.4 | 29645.1 | 27438.0 | 29320.60 | \n",
"| 12070 | FBtr0082158 | 16639.90 | 24308.1 | 20152.30 | 14051.7 | 18095.5 | 18878.5 | 18487.00 | \n",
"| 6743 | FBtr0100863 | 14372.10 | 18408.6 | 14199.80 | 21887.3 | 18524.1 | 16917.7 | 17663.15 | \n",
"| 14915 | FBtr0346885 | 7351.36 | 20358.5 | 8762.40 | 10896.8 | 18878.2 | 23953.6 | 14887.50 | \n",
"| 6257 | FBtr0433502 | 11658.80 | 14917.7 | 14051.10 | 15637.7 | 14516.7 | 13962.5 | 14283.90 | \n",
"| 14800 | FBtr0346903 | 7757.37 | 16773.8 | 7570.16 | 10930.5 | 21163.4 | 22669.6 | 13852.15 | \n",
"| 11967 | FBtr0072185 | 12502.60 | 15227.6 | 12987.80 | 14468.9 | 12239.2 | 11509.1 | 12745.20 | \n",
"\n",
"\n"
],
"text/plain": [
" Name SRR3478195 SRR3478196 SRR3478197 SRR3478217 SRR3478218\n",
"14866 FBtr0100888 72407.20 81838.0 61424.00 109216.0 103100.0 \n",
"15247 FBtr0307364 67079.00 91910.5 71111.50 38351.2 33338.0 \n",
"7880 FBtr0100868 23731.20 30708.1 26425.30 32645.9 30841.3 \n",
"8570 FBtr0100861 26776.40 33453.8 28996.10 35745.4 29645.1 \n",
"12070 FBtr0082158 16639.90 24308.1 20152.30 14051.7 18095.5 \n",
"6743 FBtr0100863 14372.10 18408.6 14199.80 21887.3 18524.1 \n",
"14915 FBtr0346885 7351.36 20358.5 8762.40 10896.8 18878.2 \n",
"6257 FBtr0433502 11658.80 14917.7 14051.10 15637.7 14516.7 \n",
"14800 FBtr0346903 7757.37 16773.8 7570.16 10930.5 21163.4 \n",
"11967 FBtr0072185 12502.60 15227.6 12987.80 14468.9 12239.2 \n",
" SRR3478219 Median \n",
"14866 91296.2 86567.10\n",
"15247 34928.0 52715.10\n",
"7880 28208.7 29458.40\n",
"8570 27438.0 29320.60\n",
"12070 18878.5 18487.00\n",
"6743 16917.7 17663.15\n",
"14915 23953.6 14887.50\n",
"6257 13962.5 14283.90\n",
"14800 22669.6 13852.15\n",
"11967 11509.1 12745.20"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%get DM_sorted_merged --from Python3 \n",
"head(DM_sorted_merged, 10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "Python3"
},
"source": [
"## Retrieve gene names and biotype info of all expressed transcripts from Biomart"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "Python3"
},
"source": [
"### Extract IDs of all expressed transcripts"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"kernel": "Python3"
},
"outputs": [],
"source": [
"DM_median_sorted_IDs = DM_sorted_merged[\"Name\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "R"
},
"source": [
"### Load Drosophila Biomart dataset"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"kernel": "R"
},
"outputs": [],
"source": [
"library(biomaRt)\n",
"DM_ensembl = useMart(\"ensembl\", dataset=\"dmelanogaster_gene_ensembl\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "R"
},
"source": [
"### Create Biomart query"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"kernel": "R",
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Batch submitting query [=>-----------------------------] 6% eta: 14s\n",
"Batch submitting query [==>----------------------------] 10% eta: 18s\n",
"Batch submitting query [===>---------------------------] 13% eta: 23s\n",
"Batch submitting query [====>--------------------------] 16% eta: 24s\n",
"Batch submitting query [=====>-------------------------] 19% eta: 26s\n",
"Batch submitting query [======>------------------------] 23% eta: 25s\n",
"Batch submitting query [=======>-----------------------] 26% eta: 24s\n",
"Batch submitting query [========>----------------------] 29% eta: 25s\n",
"Batch submitting query [=========>---------------------] 32% eta: 23s\n",
"Batch submitting query [==========>--------------------] 35% eta: 22s\n",
"Batch submitting query [===========>-------------------] 39% eta: 21s\n",
"Batch submitting query [============>------------------] 42% eta: 20s\n",
"Batch submitting query [=============>-----------------] 45% eta: 20s\n",
"Batch submitting query [==============>----------------] 48% eta: 19s\n",
"Batch submitting query [===============>---------------] 52% eta: 18s\n",
"Batch submitting query [================>--------------] 55% eta: 16s\n",
"Batch submitting query [=================>-------------] 58% eta: 15s\n",
"Batch submitting query [==================>------------] 61% eta: 14s\n",
"Batch submitting query [===================>-----------] 65% eta: 13s\n",
"Batch submitting query [====================>----------] 68% eta: 12s\n",
"Batch submitting query [=====================>---------] 71% eta: 10s\n",
"Batch submitting query [======================>--------] 74% eta: 9s\n",
"Batch submitting query [=======================>-------] 77% eta: 8s\n",
"Batch submitting query [========================>------] 81% eta: 7s\n",
"Batch submitting query [=========================>-----] 84% eta: 6s\n",
"Batch submitting query [==========================>----] 87% eta: 5s\n",
"Batch submitting query [===========================>---] 90% eta: 4s\n",
"Batch submitting query [============================>--] 94% eta: 2s\n",
"Batch submitting query [=============================>-] 97% eta: 1s\n",
"Batch submitting query [===============================] 100% eta: 0s\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
"<thead><tr><th scope=col>flybase_transcript_id</th><th scope=col>ensembl_gene_id</th><th scope=col>external_gene_name</th><th scope=col>description</th><th scope=col>transcript_biotype</th><th scope=col>gene_biotype</th></tr></thead>\n",
"<tbody>\n",
"\t<tr><td>FBtr0070041 </td><td>FBgn0052230 </td><td>ND-MLRQ </td><td>NADH dehydrogenase (ubiquinone) MLRQ subunit [Source:FlyBase;Acc:FBgn0052230] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0070135 </td><td>FBgn0040382 </td><td>CG5273 </td><td> </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0070148 </td><td>FBgn0015288 </td><td>RpL22 </td><td>Ribosomal protein L22 [Source:FlyBase;Acc:FBgn0015288] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0070159 </td><td>FBgn0026879 </td><td>CG13364 </td><td> </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0070364 </td><td>FBgn0026088 </td><td>CG14818 </td><td> </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0070386 </td><td>FBgn0025839 </td><td>ND-B14.5A </td><td>NADH dehydrogenase (ubiquinone) B14.5 A subunit [Source:FlyBase;Acc:FBgn0025839]</td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0070611 </td><td>FBgn0285910 </td><td>VhaAC39-1 </td><td>Vacuolar H[+] ATPase AC39 subunit 1 [Source:FlyBase;Acc:FBgn0285910] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0070655 </td><td>FBgn0040907 </td><td>mRpL33 </td><td>mitochondrial ribosomal protein L33 [Source:FlyBase;Acc:FBgn0040907] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0070800 </td><td>FBgn0029785 </td><td>RpL35 </td><td>Ribosomal protein L35 [Source:FlyBase;Acc:FBgn0029785] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0070801 </td><td>FBgn0029785 </td><td>RpL35 </td><td>Ribosomal protein L35 [Source:FlyBase;Acc:FBgn0029785] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0070814 </td><td>FBgn0029810 </td><td>CG12239 </td><td> </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0070907 </td><td>FBgn0261955 </td><td>kdn </td><td>knockdown [Source:FlyBase;Acc:FBgn0261955] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0070933 </td><td>FBgn0086558 </td><td>Ubi-p5E </td><td>Ubiquitin-5E [Source:FlyBase;Acc:FBgn0086558] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0071094 </td><td>FBgn0004403 </td><td>RpS14a </td><td>Ribosomal protein S14a [Source:FlyBase;Acc:FBgn0004403] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0071123 </td><td>FBgn0029990 </td><td>CG2233 </td><td> </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0071135 </td><td>FBgn0261592 </td><td>RpS6 </td><td>Ribosomal protein S6 [Source:FlyBase;Acc:FBgn0261592] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0071343 </td><td>FBgn0040931 </td><td>CG9034 </td><td> </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0071360 </td><td>FBgn0030136 </td><td>RpS28b </td><td>Ribosomal protein S28b [Source:FlyBase;Acc:FBgn0030136] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0071389 </td><td>FBgn0030158 </td><td>CG9686 </td><td> </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><td>FBtr0071393 </td><td>FBgn0026415 </td><td>Idgf4 </td><td>Imaginal disc growth factor 4 [Source:FlyBase;Acc:FBgn0026415] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"\\begin{tabular}{r|llllll}\n",
" flybase\\_transcript\\_id & ensembl\\_gene\\_id & external\\_gene\\_name & description & transcript\\_biotype & gene\\_biotype\\\\\n",
"\\hline\n",
"\t FBtr0070041 & FBgn0052230 & ND-MLRQ & NADH dehydrogenase (ubiquinone) MLRQ subunit {[}Source:FlyBase;Acc:FBgn0052230{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0070135 & FBgn0040382 & CG5273 & & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0070148 & FBgn0015288 & RpL22 & Ribosomal protein L22 {[}Source:FlyBase;Acc:FBgn0015288{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0070159 & FBgn0026879 & CG13364 & & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0070364 & FBgn0026088 & CG14818 & & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0070386 & FBgn0025839 & ND-B14.5A & NADH dehydrogenase (ubiquinone) B14.5 A subunit {[}Source:FlyBase;Acc:FBgn0025839{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0070611 & FBgn0285910 & VhaAC39-1 & Vacuolar H{[}+{]} ATPase AC39 subunit 1 {[}Source:FlyBase;Acc:FBgn0285910{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0070655 & FBgn0040907 & mRpL33 & mitochondrial ribosomal protein L33 {[}Source:FlyBase;Acc:FBgn0040907{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0070800 & FBgn0029785 & RpL35 & Ribosomal protein L35 {[}Source:FlyBase;Acc:FBgn0029785{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0070801 & FBgn0029785 & RpL35 & Ribosomal protein L35 {[}Source:FlyBase;Acc:FBgn0029785{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0070814 & FBgn0029810 & CG12239 & & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0070907 & FBgn0261955 & kdn & knockdown {[}Source:FlyBase;Acc:FBgn0261955{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0070933 & FBgn0086558 & Ubi-p5E & Ubiquitin-5E {[}Source:FlyBase;Acc:FBgn0086558{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0071094 & FBgn0004403 & RpS14a & Ribosomal protein S14a {[}Source:FlyBase;Acc:FBgn0004403{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0071123 & FBgn0029990 & CG2233 & & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0071135 & FBgn0261592 & RpS6 & Ribosomal protein S6 {[}Source:FlyBase;Acc:FBgn0261592{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0071343 & FBgn0040931 & CG9034 & & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0071360 & FBgn0030136 & RpS28b & Ribosomal protein S28b {[}Source:FlyBase;Acc:FBgn0030136{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0071389 & FBgn0030158 & CG9686 & & protein\\_coding & protein\\_coding \\\\\n",
"\t FBtr0071393 & FBgn0026415 & Idgf4 & Imaginal disc growth factor 4 {[}Source:FlyBase;Acc:FBgn0026415{]} & protein\\_coding & protein\\_coding \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"flybase_transcript_id | ensembl_gene_id | external_gene_name | description | transcript_biotype | gene_biotype | \n",
"|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
"| FBtr0070041 | FBgn0052230 | ND-MLRQ | NADH dehydrogenase (ubiquinone) MLRQ subunit [Source:FlyBase;Acc:FBgn0052230] | protein_coding | protein_coding | \n",
"| FBtr0070135 | FBgn0040382 | CG5273 | | protein_coding | protein_coding | \n",
"| FBtr0070148 | FBgn0015288 | RpL22 | Ribosomal protein L22 [Source:FlyBase;Acc:FBgn0015288] | protein_coding | protein_coding | \n",
"| FBtr0070159 | FBgn0026879 | CG13364 | | protein_coding | protein_coding | \n",
"| FBtr0070364 | FBgn0026088 | CG14818 | | protein_coding | protein_coding | \n",
"| FBtr0070386 | FBgn0025839 | ND-B14.5A | NADH dehydrogenase (ubiquinone) B14.5 A subunit [Source:FlyBase;Acc:FBgn0025839] | protein_coding | protein_coding | \n",
"| FBtr0070611 | FBgn0285910 | VhaAC39-1 | Vacuolar H[+] ATPase AC39 subunit 1 [Source:FlyBase;Acc:FBgn0285910] | protein_coding | protein_coding | \n",
"| FBtr0070655 | FBgn0040907 | mRpL33 | mitochondrial ribosomal protein L33 [Source:FlyBase;Acc:FBgn0040907] | protein_coding | protein_coding | \n",
"| FBtr0070800 | FBgn0029785 | RpL35 | Ribosomal protein L35 [Source:FlyBase;Acc:FBgn0029785] | protein_coding | protein_coding | \n",
"| FBtr0070801 | FBgn0029785 | RpL35 | Ribosomal protein L35 [Source:FlyBase;Acc:FBgn0029785] | protein_coding | protein_coding | \n",
"| FBtr0070814 | FBgn0029810 | CG12239 | | protein_coding | protein_coding | \n",
"| FBtr0070907 | FBgn0261955 | kdn | knockdown [Source:FlyBase;Acc:FBgn0261955] | protein_coding | protein_coding | \n",
"| FBtr0070933 | FBgn0086558 | Ubi-p5E | Ubiquitin-5E [Source:FlyBase;Acc:FBgn0086558] | protein_coding | protein_coding | \n",
"| FBtr0071094 | FBgn0004403 | RpS14a | Ribosomal protein S14a [Source:FlyBase;Acc:FBgn0004403] | protein_coding | protein_coding | \n",
"| FBtr0071123 | FBgn0029990 | CG2233 | | protein_coding | protein_coding | \n",
"| FBtr0071135 | FBgn0261592 | RpS6 | Ribosomal protein S6 [Source:FlyBase;Acc:FBgn0261592] | protein_coding | protein_coding | \n",
"| FBtr0071343 | FBgn0040931 | CG9034 | | protein_coding | protein_coding | \n",
"| FBtr0071360 | FBgn0030136 | RpS28b | Ribosomal protein S28b [Source:FlyBase;Acc:FBgn0030136] | protein_coding | protein_coding | \n",
"| FBtr0071389 | FBgn0030158 | CG9686 | | protein_coding | protein_coding | \n",
"| FBtr0071393 | FBgn0026415 | Idgf4 | Imaginal disc growth factor 4 [Source:FlyBase;Acc:FBgn0026415] | protein_coding | protein_coding | \n",
"\n",
"\n"
],
"text/plain": [
" flybase_transcript_id ensembl_gene_id external_gene_name\n",
"1 FBtr0070041 FBgn0052230 ND-MLRQ \n",
"2 FBtr0070135 FBgn0040382 CG5273 \n",
"3 FBtr0070148 FBgn0015288 RpL22 \n",
"4 FBtr0070159 FBgn0026879 CG13364 \n",
"5 FBtr0070364 FBgn0026088 CG14818 \n",
"6 FBtr0070386 FBgn0025839 ND-B14.5A \n",
"7 FBtr0070611 FBgn0285910 VhaAC39-1 \n",
"8 FBtr0070655 FBgn0040907 mRpL33 \n",
"9 FBtr0070800 FBgn0029785 RpL35 \n",
"10 FBtr0070801 FBgn0029785 RpL35 \n",
"11 FBtr0070814 FBgn0029810 CG12239 \n",
"12 FBtr0070907 FBgn0261955 kdn \n",
"13 FBtr0070933 FBgn0086558 Ubi-p5E \n",
"14 FBtr0071094 FBgn0004403 RpS14a \n",
"15 FBtr0071123 FBgn0029990 CG2233 \n",
"16 FBtr0071135 FBgn0261592 RpS6 \n",
"17 FBtr0071343 FBgn0040931 CG9034 \n",
"18 FBtr0071360 FBgn0030136 RpS28b \n",
"19 FBtr0071389 FBgn0030158 CG9686 \n",
"20 FBtr0071393 FBgn0026415 Idgf4 \n",
" description \n",
"1 NADH dehydrogenase (ubiquinone) MLRQ subunit [Source:FlyBase;Acc:FBgn0052230] \n",
"2 \n",
"3 Ribosomal protein L22 [Source:FlyBase;Acc:FBgn0015288] \n",
"4 \n",
"5 \n",
"6 NADH dehydrogenase (ubiquinone) B14.5 A subunit [Source:FlyBase;Acc:FBgn0025839]\n",
"7 Vacuolar H[+] ATPase AC39 subunit 1 [Source:FlyBase;Acc:FBgn0285910] \n",
"8 mitochondrial ribosomal protein L33 [Source:FlyBase;Acc:FBgn0040907] \n",
"9 Ribosomal protein L35 [Source:FlyBase;Acc:FBgn0029785] \n",
"10 Ribosomal protein L35 [Source:FlyBase;Acc:FBgn0029785] \n",
"11 \n",
"12 knockdown [Source:FlyBase;Acc:FBgn0261955] \n",
"13 Ubiquitin-5E [Source:FlyBase;Acc:FBgn0086558] \n",
"14 Ribosomal protein S14a [Source:FlyBase;Acc:FBgn0004403] \n",
"15 \n",
"16 Ribosomal protein S6 [Source:FlyBase;Acc:FBgn0261592] \n",
"17 \n",
"18 Ribosomal protein S28b [Source:FlyBase;Acc:FBgn0030136] \n",
"19 \n",
"20 Imaginal disc growth factor 4 [Source:FlyBase;Acc:FBgn0026415] \n",
" transcript_biotype gene_biotype \n",
"1 protein_coding protein_coding\n",
"2 protein_coding protein_coding\n",
"3 protein_coding protein_coding\n",
"4 protein_coding protein_coding\n",
"5 protein_coding protein_coding\n",
"6 protein_coding protein_coding\n",
"7 protein_coding protein_coding\n",
"8 protein_coding protein_coding\n",
"9 protein_coding protein_coding\n",
"10 protein_coding protein_coding\n",
"11 protein_coding protein_coding\n",
"12 protein_coding protein_coding\n",
"13 protein_coding protein_coding\n",
"14 protein_coding protein_coding\n",
"15 protein_coding protein_coding\n",
"16 protein_coding protein_coding\n",
"17 protein_coding protein_coding\n",
"18 protein_coding protein_coding\n",
"19 protein_coding protein_coding\n",
"20 protein_coding protein_coding"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<ol class=list-inline>\n",
"\t<li>15379</li>\n",
"\t<li>6</li>\n",
"</ol>\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 15379\n",
"\\item 6\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 15379\n",
"2. 6\n",
"\n",
"\n"
],
"text/plain": [
"[1] 15379 6"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%get DM_median_sorted_IDs --from Python3 \n",
"\n",
"## Create getBM() query for converting transcript IDs to gene IDs \n",
"DM_biomart_output <- getBM(attributes = c('flybase_transcript_id', 'ensembl_gene_id', 'external_gene_name', 'description', \n",
" 'transcript_biotype', 'gene_biotype'), \n",
" filters = 'flybase_transcript_id', \n",
" values = DM_median_sorted_IDs, \n",
" mart = DM_ensembl)\n",
"\n",
"## Preview results\n",
"head(DM_biomart_output, 20)\n",
"dim(DM_biomart_output)"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "R"
},
"source": [
"## Merge with Biomart output with TPM>0 read counts table"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "R"
},
"source": [
"### Import BiomaRt output into Python & rename transcript ID column header"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"kernel": "Python3"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataframe dimensions: (15379, 6) \n",
"\n"
]
}
],
"source": [
"%get DM_biomart_output --from R\n",
"\n",
"## Check dataframe dimension after import\n",
"print(\"Dataframe dimensions:\", DM_biomart_output.shape, \"\\n\")\n",
"\n",
"## Change transcript ID column header to match TPM>0 read counts table for merging \n",
"DM_biomart_output_df = DM_biomart_output.rename(columns = {'flybase_transcript_id':'Name'})"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "Python3"
},
"source": [
"### Create lookup table & extract only entries of protein-coding transcripts"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"kernel": "Python3"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(14789, 13)\n"
]
}
],
"source": [
"## Merge Biomart output & read counts table sorted by median read count\n",
"DM_sorted_counts_BM = DM_sorted_merged.merge(DM_biomart_output_df, on='Name')\n",
"\n",
"## Extract entries of protein-coding transcripts\n",
"DM_PC_transcripts = DM_sorted_counts_BM.loc[DM_sorted_counts_BM[\"transcript_biotype\"].str.contains(\"protein_coding\")] \n",
"\n",
"## Check dataframe dimensions\n",
"print(DM_PC_transcripts.shape)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"kernel": "R"
},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead><tr><th></th><th scope=col>Name</th><th scope=col>SRR3478195</th><th scope=col>SRR3478196</th><th scope=col>SRR3478197</th><th scope=col>SRR3478217</th><th scope=col>SRR3478218</th><th scope=col>SRR3478219</th><th scope=col>Median</th><th scope=col>ensembl_gene_id</th><th scope=col>external_gene_name</th><th scope=col>description</th><th scope=col>transcript_biotype</th><th scope=col>gene_biotype</th></tr></thead>\n",
"<tbody>\n",
"\t<tr><th scope=row>0</th><td>FBtr0100888 </td><td>7.24072e+04 </td><td>8.18380e+04 </td><td>6.14240e+04 </td><td>109216.00 </td><td>103100.00 </td><td>91296.20 </td><td>86567.100 </td><td>FBgn0013686 </td><td>mt:lrRNA </td><td>mitochondrial large ribosomal RNA [Source:FlyBase;Acc:FBgn0013686] </td><td>rRNA </td><td>rRNA </td></tr>\n",
"\t<tr><th scope=row>1</th><td>FBtr0307364 </td><td>6.70790e+04 </td><td>9.19105e+04 </td><td>7.11115e+04 </td><td> 38351.20 </td><td> 33338.00 </td><td>34928.00 </td><td>52715.100 </td><td>FBgn0058469 </td><td>CR40469 </td><td> </td><td>ncRNA </td><td>ncRNA </td></tr>\n",
"\t<tr><th scope=row>2</th><td>FBtr0100868 </td><td>2.37312e+04 </td><td>3.07081e+04 </td><td>2.64253e+04 </td><td> 32645.90 </td><td> 30841.30 </td><td>28208.70 </td><td>29458.400 </td><td>FBgn0013676 </td><td>mt:CoIII </td><td>mitochondrial Cytochrome c oxidase subunit III [Source:FlyBase;Acc:FBgn0013676]</td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>3</th><td>FBtr0100861 </td><td>2.67764e+04 </td><td>3.34538e+04 </td><td>2.89961e+04 </td><td> 35745.40 </td><td> 29645.10 </td><td>27438.00 </td><td>29320.600 </td><td>FBgn0013674 </td><td>mt:CoI </td><td>mitochondrial Cytochrome c oxidase subunit I [Source:FlyBase;Acc:FBgn0013674] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>4</th><td>FBtr0082158 </td><td>1.66399e+04 </td><td>2.43081e+04 </td><td>2.01523e+04 </td><td> 14051.70 </td><td> 18095.50 </td><td>18878.50 </td><td>18487.000 </td><td>FBgn0002868 </td><td>MtnA </td><td>Metallothionein A [Source:FlyBase;Acc:FBgn0002868] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>5</th><td>FBtr0100863 </td><td>1.43721e+04 </td><td>1.84086e+04 </td><td>1.41998e+04 </td><td> 21887.30 </td><td> 18524.10 </td><td>16917.70 </td><td>17663.150 </td><td>FBgn0013675 </td><td>mt:CoII </td><td>mitochondrial Cytochrome c oxidase subunit II [Source:FlyBase;Acc:FBgn0013675] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>6</th><td>FBtr0346885 </td><td>7.35136e+03 </td><td>2.03585e+04 </td><td>8.76240e+03 </td><td> 10896.80 </td><td> 18878.20 </td><td>23953.60 </td><td>14887.500 </td><td>FBgn0267504 </td><td>28SrRNA:CR45844 </td><td>28S ribosomal RNA:CR45844 [Source:FlyBase;Acc:FBgn0267504] </td><td>rRNA </td><td>rRNA </td></tr>\n",
"\t<tr><th scope=row>7</th><td>FBtr0433502 </td><td>1.16588e+04 </td><td>1.49177e+04 </td><td>1.40511e+04 </td><td> 15637.70 </td><td> 14516.70 </td><td>13962.50 </td><td>14283.900 </td><td>FBgn0013678 </td><td>mt:Cyt-b </td><td>mitochondrial Cytochrome b [Source:FlyBase;Acc:FBgn0013678] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>8</th><td>FBtr0346903 </td><td>7.75737e+03 </td><td>1.67738e+04 </td><td>7.57016e+03 </td><td> 10930.50 </td><td> 21163.40 </td><td>22669.60 </td><td>13852.150 </td><td>FBgn0267520 </td><td>28SrRNA-Psi:CR45860 </td><td>28S ribosomal RNA pseudogene:CR45860 [Source:FlyBase;Acc:FBgn0267520] </td><td>pseudogene </td><td>pseudogene </td></tr>\n",
"\t<tr><th scope=row>9</th><td>FBtr0072185 </td><td>1.25026e+04 </td><td>1.52276e+04 </td><td>1.29878e+04 </td><td> 14468.90 </td><td> 12239.20 </td><td>11509.10 </td><td>12745.200 </td><td>FBgn0023170 </td><td>RpL39 </td><td>Ribosomal protein L39 [Source:FlyBase;Acc:FBgn0023170] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>10</th><td>FBtr0100231 </td><td>1.22792e+04 </td><td>1.64489e+04 </td><td>1.14223e+04 </td><td> 14541.60 </td><td> 12367.90 </td><td>11503.70 </td><td>12323.550 </td><td>FBgn0066084 </td><td>RpL41 </td><td>Ribosomal protein L41 [Source:FlyBase;Acc:FBgn0066084] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>11</th><td>FBtr0433498 </td><td>1.02397e+04 </td><td>1.27918e+04 </td><td>1.13638e+04 </td><td> 15365.80 </td><td> 12053.50 </td><td>11041.70 </td><td>11708.650 </td><td>FBgn0013672 </td><td>mt:ATPase6 </td><td>mitochondrial ATPase subunit 6 [Source:FlyBase;Acc:FBgn0013672] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>12</th><td>FBtr0305669 </td><td>1.14361e+04 </td><td>1.18531e+04 </td><td>1.12036e+04 </td><td> 12985.30 </td><td> 10985.70 </td><td> 9710.47 </td><td>11319.850 </td><td>FBgn0016726 </td><td>RpL29 </td><td>Ribosomal protein L29 [Source:FlyBase;Acc:FBgn0016726] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>13</th><td>FBtr0346874 </td><td>5.15395e-02 </td><td>1.99499e-07 </td><td>1.65381e-04 </td><td> 19199.50 </td><td> 28094.00 </td><td>33610.30 </td><td> 9599.776 </td><td>FBgn0085802 </td><td>18SrRNA:CR41548 </td><td>18S ribosomal RNA:CR41548 [Source:FlyBase;Acc:FBgn0085802] </td><td>rRNA </td><td>rRNA </td></tr>\n",
"\t<tr><th scope=row>14</th><td>FBtr0346878 </td><td>1.72844e+04 </td><td>2.39737e+04 </td><td>2.04463e+04 </td><td> 703.08 </td><td> 1571.66 </td><td> 1590.72 </td><td> 9437.560 </td><td>FBgn0267498 </td><td>18SrRNA:CR45838 </td><td>18S ribosomal RNA:CR45838 [Source:FlyBase;Acc:FBgn0267498] </td><td>rRNA </td><td>rRNA </td></tr>\n",
"\t<tr><th scope=row>15</th><td>FBtr0088816 </td><td>1.08732e+04 </td><td>7.07863e+03 </td><td>1.20603e+04 </td><td> 10118.20 </td><td> 8431.46 </td><td> 7656.14 </td><td> 9274.830 </td><td>FBgn0033268 </td><td>Obp44a </td><td>Odorant-binding protein 44a [Source:FlyBase;Acc:FBgn0033268] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>16</th><td>FBtr0346872 </td><td>8.09745e+03 </td><td>7.66093e+03 </td><td>8.50554e+03 </td><td> 7512.93 </td><td> 12967.60 </td><td>12817.40 </td><td> 8301.495 </td><td>FBgn0267511 </td><td>28SrRNA-Psi:CR45851 </td><td>28S ribosomal RNA pseudogene:CR45851 [Source:FlyBase;Acc:FBgn0267511] </td><td>pseudogene </td><td>pseudogene </td></tr>\n",
"\t<tr><th scope=row>17</th><td>FBtr0346876 </td><td>3.77795e+03 </td><td>7.52062e+03 </td><td>4.48155e+03 </td><td> 4126.89 </td><td> 7462.42 </td><td> 8302.33 </td><td> 5971.985 </td><td>FBgn0267497 </td><td>28SrRNA:CR45837 </td><td>28S ribosomal RNA:CR45837 [Source:FlyBase;Acc:FBgn0267497] </td><td>rRNA </td><td>rRNA </td></tr>\n",
"\t<tr><th scope=row>18</th><td>FBtr0081920 </td><td>5.27758e+03 </td><td>6.44507e+03 </td><td>5.53297e+03 </td><td> 4731.72 </td><td> 5239.29 </td><td> 4888.94 </td><td> 5258.435 </td><td>FBgn0040532 </td><td>CG8369 </td><td> </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>19</th><td>FBtr0345321 </td><td>4.94450e+03 </td><td>5.63542e+03 </td><td>4.77491e+03 </td><td> 5218.73 </td><td> 4927.06 </td><td> 4362.67 </td><td> 4935.780 </td><td>FBgn0002579 </td><td>RpL36 </td><td>Ribosomal protein L36 [Source:FlyBase;Acc:FBgn0002579] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllllllllll}\n",
" & Name & SRR3478195 & SRR3478196 & SRR3478197 & SRR3478217 & SRR3478218 & SRR3478219 & Median & ensembl\\_gene\\_id & external\\_gene\\_name & description & transcript\\_biotype & gene\\_biotype\\\\\n",
"\\hline\n",
"\t0 & FBtr0100888 & 7.24072e+04 & 8.18380e+04 & 6.14240e+04 & 109216.00 & 103100.00 & 91296.20 & 86567.100 & FBgn0013686 & mt:lrRNA & mitochondrial large ribosomal RNA {[}Source:FlyBase;Acc:FBgn0013686{]} & rRNA & rRNA \\\\\n",
"\t1 & FBtr0307364 & 6.70790e+04 & 9.19105e+04 & 7.11115e+04 & 38351.20 & 33338.00 & 34928.00 & 52715.100 & FBgn0058469 & CR40469 & & ncRNA & ncRNA \\\\\n",
"\t2 & FBtr0100868 & 2.37312e+04 & 3.07081e+04 & 2.64253e+04 & 32645.90 & 30841.30 & 28208.70 & 29458.400 & FBgn0013676 & mt:CoIII & mitochondrial Cytochrome c oxidase subunit III {[}Source:FlyBase;Acc:FBgn0013676{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t3 & FBtr0100861 & 2.67764e+04 & 3.34538e+04 & 2.89961e+04 & 35745.40 & 29645.10 & 27438.00 & 29320.600 & FBgn0013674 & mt:CoI & mitochondrial Cytochrome c oxidase subunit I {[}Source:FlyBase;Acc:FBgn0013674{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t4 & FBtr0082158 & 1.66399e+04 & 2.43081e+04 & 2.01523e+04 & 14051.70 & 18095.50 & 18878.50 & 18487.000 & FBgn0002868 & MtnA & Metallothionein A {[}Source:FlyBase;Acc:FBgn0002868{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t5 & FBtr0100863 & 1.43721e+04 & 1.84086e+04 & 1.41998e+04 & 21887.30 & 18524.10 & 16917.70 & 17663.150 & FBgn0013675 & mt:CoII & mitochondrial Cytochrome c oxidase subunit II {[}Source:FlyBase;Acc:FBgn0013675{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t6 & FBtr0346885 & 7.35136e+03 & 2.03585e+04 & 8.76240e+03 & 10896.80 & 18878.20 & 23953.60 & 14887.500 & FBgn0267504 & 28SrRNA:CR45844 & 28S ribosomal RNA:CR45844 {[}Source:FlyBase;Acc:FBgn0267504{]} & rRNA & rRNA \\\\\n",
"\t7 & FBtr0433502 & 1.16588e+04 & 1.49177e+04 & 1.40511e+04 & 15637.70 & 14516.70 & 13962.50 & 14283.900 & FBgn0013678 & mt:Cyt-b & mitochondrial Cytochrome b {[}Source:FlyBase;Acc:FBgn0013678{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t8 & FBtr0346903 & 7.75737e+03 & 1.67738e+04 & 7.57016e+03 & 10930.50 & 21163.40 & 22669.60 & 13852.150 & FBgn0267520 & 28SrRNA-Psi:CR45860 & 28S ribosomal RNA pseudogene:CR45860 {[}Source:FlyBase;Acc:FBgn0267520{]} & pseudogene & pseudogene \\\\\n",
"\t9 & FBtr0072185 & 1.25026e+04 & 1.52276e+04 & 1.29878e+04 & 14468.90 & 12239.20 & 11509.10 & 12745.200 & FBgn0023170 & RpL39 & Ribosomal protein L39 {[}Source:FlyBase;Acc:FBgn0023170{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t10 & FBtr0100231 & 1.22792e+04 & 1.64489e+04 & 1.14223e+04 & 14541.60 & 12367.90 & 11503.70 & 12323.550 & FBgn0066084 & RpL41 & Ribosomal protein L41 {[}Source:FlyBase;Acc:FBgn0066084{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t11 & FBtr0433498 & 1.02397e+04 & 1.27918e+04 & 1.13638e+04 & 15365.80 & 12053.50 & 11041.70 & 11708.650 & FBgn0013672 & mt:ATPase6 & mitochondrial ATPase subunit 6 {[}Source:FlyBase;Acc:FBgn0013672{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t12 & FBtr0305669 & 1.14361e+04 & 1.18531e+04 & 1.12036e+04 & 12985.30 & 10985.70 & 9710.47 & 11319.850 & FBgn0016726 & RpL29 & Ribosomal protein L29 {[}Source:FlyBase;Acc:FBgn0016726{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t13 & FBtr0346874 & 5.15395e-02 & 1.99499e-07 & 1.65381e-04 & 19199.50 & 28094.00 & 33610.30 & 9599.776 & FBgn0085802 & 18SrRNA:CR41548 & 18S ribosomal RNA:CR41548 {[}Source:FlyBase;Acc:FBgn0085802{]} & rRNA & rRNA \\\\\n",
"\t14 & FBtr0346878 & 1.72844e+04 & 2.39737e+04 & 2.04463e+04 & 703.08 & 1571.66 & 1590.72 & 9437.560 & FBgn0267498 & 18SrRNA:CR45838 & 18S ribosomal RNA:CR45838 {[}Source:FlyBase;Acc:FBgn0267498{]} & rRNA & rRNA \\\\\n",
"\t15 & FBtr0088816 & 1.08732e+04 & 7.07863e+03 & 1.20603e+04 & 10118.20 & 8431.46 & 7656.14 & 9274.830 & FBgn0033268 & Obp44a & Odorant-binding protein 44a {[}Source:FlyBase;Acc:FBgn0033268{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t16 & FBtr0346872 & 8.09745e+03 & 7.66093e+03 & 8.50554e+03 & 7512.93 & 12967.60 & 12817.40 & 8301.495 & FBgn0267511 & 28SrRNA-Psi:CR45851 & 28S ribosomal RNA pseudogene:CR45851 {[}Source:FlyBase;Acc:FBgn0267511{]} & pseudogene & pseudogene \\\\\n",
"\t17 & FBtr0346876 & 3.77795e+03 & 7.52062e+03 & 4.48155e+03 & 4126.89 & 7462.42 & 8302.33 & 5971.985 & FBgn0267497 & 28SrRNA:CR45837 & 28S ribosomal RNA:CR45837 {[}Source:FlyBase;Acc:FBgn0267497{]} & rRNA & rRNA \\\\\n",
"\t18 & FBtr0081920 & 5.27758e+03 & 6.44507e+03 & 5.53297e+03 & 4731.72 & 5239.29 & 4888.94 & 5258.435 & FBgn0040532 & CG8369 & & protein\\_coding & protein\\_coding \\\\\n",
"\t19 & FBtr0345321 & 4.94450e+03 & 5.63542e+03 & 4.77491e+03 & 5218.73 & 4927.06 & 4362.67 & 4935.780 & FBgn0002579 & RpL36 & Ribosomal protein L36 {[}Source:FlyBase;Acc:FBgn0002579{]} & protein\\_coding & protein\\_coding \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| <!--/--> | Name | SRR3478195 | SRR3478196 | SRR3478197 | SRR3478217 | SRR3478218 | SRR3478219 | Median | ensembl_gene_id | external_gene_name | description | transcript_biotype | gene_biotype | \n",
"|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
"| 0 | FBtr0100888 | 7.24072e+04 | 8.18380e+04 | 6.14240e+04 | 109216.00 | 103100.00 | 91296.20 | 86567.100 | FBgn0013686 | mt:lrRNA | mitochondrial large ribosomal RNA [Source:FlyBase;Acc:FBgn0013686] | rRNA | rRNA | \n",
"| 1 | FBtr0307364 | 6.70790e+04 | 9.19105e+04 | 7.11115e+04 | 38351.20 | 33338.00 | 34928.00 | 52715.100 | FBgn0058469 | CR40469 | | ncRNA | ncRNA | \n",
"| 2 | FBtr0100868 | 2.37312e+04 | 3.07081e+04 | 2.64253e+04 | 32645.90 | 30841.30 | 28208.70 | 29458.400 | FBgn0013676 | mt:CoIII | mitochondrial Cytochrome c oxidase subunit III [Source:FlyBase;Acc:FBgn0013676] | protein_coding | protein_coding | \n",
"| 3 | FBtr0100861 | 2.67764e+04 | 3.34538e+04 | 2.89961e+04 | 35745.40 | 29645.10 | 27438.00 | 29320.600 | FBgn0013674 | mt:CoI | mitochondrial Cytochrome c oxidase subunit I [Source:FlyBase;Acc:FBgn0013674] | protein_coding | protein_coding | \n",
"| 4 | FBtr0082158 | 1.66399e+04 | 2.43081e+04 | 2.01523e+04 | 14051.70 | 18095.50 | 18878.50 | 18487.000 | FBgn0002868 | MtnA | Metallothionein A [Source:FlyBase;Acc:FBgn0002868] | protein_coding | protein_coding | \n",
"| 5 | FBtr0100863 | 1.43721e+04 | 1.84086e+04 | 1.41998e+04 | 21887.30 | 18524.10 | 16917.70 | 17663.150 | FBgn0013675 | mt:CoII | mitochondrial Cytochrome c oxidase subunit II [Source:FlyBase;Acc:FBgn0013675] | protein_coding | protein_coding | \n",
"| 6 | FBtr0346885 | 7.35136e+03 | 2.03585e+04 | 8.76240e+03 | 10896.80 | 18878.20 | 23953.60 | 14887.500 | FBgn0267504 | 28SrRNA:CR45844 | 28S ribosomal RNA:CR45844 [Source:FlyBase;Acc:FBgn0267504] | rRNA | rRNA | \n",
"| 7 | FBtr0433502 | 1.16588e+04 | 1.49177e+04 | 1.40511e+04 | 15637.70 | 14516.70 | 13962.50 | 14283.900 | FBgn0013678 | mt:Cyt-b | mitochondrial Cytochrome b [Source:FlyBase;Acc:FBgn0013678] | protein_coding | protein_coding | \n",
"| 8 | FBtr0346903 | 7.75737e+03 | 1.67738e+04 | 7.57016e+03 | 10930.50 | 21163.40 | 22669.60 | 13852.150 | FBgn0267520 | 28SrRNA-Psi:CR45860 | 28S ribosomal RNA pseudogene:CR45860 [Source:FlyBase;Acc:FBgn0267520] | pseudogene | pseudogene | \n",
"| 9 | FBtr0072185 | 1.25026e+04 | 1.52276e+04 | 1.29878e+04 | 14468.90 | 12239.20 | 11509.10 | 12745.200 | FBgn0023170 | RpL39 | Ribosomal protein L39 [Source:FlyBase;Acc:FBgn0023170] | protein_coding | protein_coding | \n",
"| 10 | FBtr0100231 | 1.22792e+04 | 1.64489e+04 | 1.14223e+04 | 14541.60 | 12367.90 | 11503.70 | 12323.550 | FBgn0066084 | RpL41 | Ribosomal protein L41 [Source:FlyBase;Acc:FBgn0066084] | protein_coding | protein_coding | \n",
"| 11 | FBtr0433498 | 1.02397e+04 | 1.27918e+04 | 1.13638e+04 | 15365.80 | 12053.50 | 11041.70 | 11708.650 | FBgn0013672 | mt:ATPase6 | mitochondrial ATPase subunit 6 [Source:FlyBase;Acc:FBgn0013672] | protein_coding | protein_coding | \n",
"| 12 | FBtr0305669 | 1.14361e+04 | 1.18531e+04 | 1.12036e+04 | 12985.30 | 10985.70 | 9710.47 | 11319.850 | FBgn0016726 | RpL29 | Ribosomal protein L29 [Source:FlyBase;Acc:FBgn0016726] | protein_coding | protein_coding | \n",
"| 13 | FBtr0346874 | 5.15395e-02 | 1.99499e-07 | 1.65381e-04 | 19199.50 | 28094.00 | 33610.30 | 9599.776 | FBgn0085802 | 18SrRNA:CR41548 | 18S ribosomal RNA:CR41548 [Source:FlyBase;Acc:FBgn0085802] | rRNA | rRNA | \n",
"| 14 | FBtr0346878 | 1.72844e+04 | 2.39737e+04 | 2.04463e+04 | 703.08 | 1571.66 | 1590.72 | 9437.560 | FBgn0267498 | 18SrRNA:CR45838 | 18S ribosomal RNA:CR45838 [Source:FlyBase;Acc:FBgn0267498] | rRNA | rRNA | \n",
"| 15 | FBtr0088816 | 1.08732e+04 | 7.07863e+03 | 1.20603e+04 | 10118.20 | 8431.46 | 7656.14 | 9274.830 | FBgn0033268 | Obp44a | Odorant-binding protein 44a [Source:FlyBase;Acc:FBgn0033268] | protein_coding | protein_coding | \n",
"| 16 | FBtr0346872 | 8.09745e+03 | 7.66093e+03 | 8.50554e+03 | 7512.93 | 12967.60 | 12817.40 | 8301.495 | FBgn0267511 | 28SrRNA-Psi:CR45851 | 28S ribosomal RNA pseudogene:CR45851 [Source:FlyBase;Acc:FBgn0267511] | pseudogene | pseudogene | \n",
"| 17 | FBtr0346876 | 3.77795e+03 | 7.52062e+03 | 4.48155e+03 | 4126.89 | 7462.42 | 8302.33 | 5971.985 | FBgn0267497 | 28SrRNA:CR45837 | 28S ribosomal RNA:CR45837 [Source:FlyBase;Acc:FBgn0267497] | rRNA | rRNA | \n",
"| 18 | FBtr0081920 | 5.27758e+03 | 6.44507e+03 | 5.53297e+03 | 4731.72 | 5239.29 | 4888.94 | 5258.435 | FBgn0040532 | CG8369 | | protein_coding | protein_coding | \n",
"| 19 | FBtr0345321 | 4.94450e+03 | 5.63542e+03 | 4.77491e+03 | 5218.73 | 4927.06 | 4362.67 | 4935.780 | FBgn0002579 | RpL36 | Ribosomal protein L36 [Source:FlyBase;Acc:FBgn0002579] | protein_coding | protein_coding | \n",
"\n",
"\n"
],
"text/plain": [
" Name SRR3478195 SRR3478196 SRR3478197 SRR3478217 SRR3478218\n",
"0 FBtr0100888 7.24072e+04 8.18380e+04 6.14240e+04 109216.00 103100.00 \n",
"1 FBtr0307364 6.70790e+04 9.19105e+04 7.11115e+04 38351.20 33338.00 \n",
"2 FBtr0100868 2.37312e+04 3.07081e+04 2.64253e+04 32645.90 30841.30 \n",
"3 FBtr0100861 2.67764e+04 3.34538e+04 2.89961e+04 35745.40 29645.10 \n",
"4 FBtr0082158 1.66399e+04 2.43081e+04 2.01523e+04 14051.70 18095.50 \n",
"5 FBtr0100863 1.43721e+04 1.84086e+04 1.41998e+04 21887.30 18524.10 \n",
"6 FBtr0346885 7.35136e+03 2.03585e+04 8.76240e+03 10896.80 18878.20 \n",
"7 FBtr0433502 1.16588e+04 1.49177e+04 1.40511e+04 15637.70 14516.70 \n",
"8 FBtr0346903 7.75737e+03 1.67738e+04 7.57016e+03 10930.50 21163.40 \n",
"9 FBtr0072185 1.25026e+04 1.52276e+04 1.29878e+04 14468.90 12239.20 \n",
"10 FBtr0100231 1.22792e+04 1.64489e+04 1.14223e+04 14541.60 12367.90 \n",
"11 FBtr0433498 1.02397e+04 1.27918e+04 1.13638e+04 15365.80 12053.50 \n",
"12 FBtr0305669 1.14361e+04 1.18531e+04 1.12036e+04 12985.30 10985.70 \n",
"13 FBtr0346874 5.15395e-02 1.99499e-07 1.65381e-04 19199.50 28094.00 \n",
"14 FBtr0346878 1.72844e+04 2.39737e+04 2.04463e+04 703.08 1571.66 \n",
"15 FBtr0088816 1.08732e+04 7.07863e+03 1.20603e+04 10118.20 8431.46 \n",
"16 FBtr0346872 8.09745e+03 7.66093e+03 8.50554e+03 7512.93 12967.60 \n",
"17 FBtr0346876 3.77795e+03 7.52062e+03 4.48155e+03 4126.89 7462.42 \n",
"18 FBtr0081920 5.27758e+03 6.44507e+03 5.53297e+03 4731.72 5239.29 \n",
"19 FBtr0345321 4.94450e+03 5.63542e+03 4.77491e+03 5218.73 4927.06 \n",
" SRR3478219 Median ensembl_gene_id external_gene_name \n",
"0 91296.20 86567.100 FBgn0013686 mt:lrRNA \n",
"1 34928.00 52715.100 FBgn0058469 CR40469 \n",
"2 28208.70 29458.400 FBgn0013676 mt:CoIII \n",
"3 27438.00 29320.600 FBgn0013674 mt:CoI \n",
"4 18878.50 18487.000 FBgn0002868 MtnA \n",
"5 16917.70 17663.150 FBgn0013675 mt:CoII \n",
"6 23953.60 14887.500 FBgn0267504 28SrRNA:CR45844 \n",
"7 13962.50 14283.900 FBgn0013678 mt:Cyt-b \n",
"8 22669.60 13852.150 FBgn0267520 28SrRNA-Psi:CR45860\n",
"9 11509.10 12745.200 FBgn0023170 RpL39 \n",
"10 11503.70 12323.550 FBgn0066084 RpL41 \n",
"11 11041.70 11708.650 FBgn0013672 mt:ATPase6 \n",
"12 9710.47 11319.850 FBgn0016726 RpL29 \n",
"13 33610.30 9599.776 FBgn0085802 18SrRNA:CR41548 \n",
"14 1590.72 9437.560 FBgn0267498 18SrRNA:CR45838 \n",
"15 7656.14 9274.830 FBgn0033268 Obp44a \n",
"16 12817.40 8301.495 FBgn0267511 28SrRNA-Psi:CR45851\n",
"17 8302.33 5971.985 FBgn0267497 28SrRNA:CR45837 \n",
"18 4888.94 5258.435 FBgn0040532 CG8369 \n",
"19 4362.67 4935.780 FBgn0002579 RpL36 \n",
" description \n",
"0 mitochondrial large ribosomal RNA [Source:FlyBase;Acc:FBgn0013686] \n",
"1 \n",
"2 mitochondrial Cytochrome c oxidase subunit III [Source:FlyBase;Acc:FBgn0013676]\n",
"3 mitochondrial Cytochrome c oxidase subunit I [Source:FlyBase;Acc:FBgn0013674] \n",
"4 Metallothionein A [Source:FlyBase;Acc:FBgn0002868] \n",
"5 mitochondrial Cytochrome c oxidase subunit II [Source:FlyBase;Acc:FBgn0013675] \n",
"6 28S ribosomal RNA:CR45844 [Source:FlyBase;Acc:FBgn0267504] \n",
"7 mitochondrial Cytochrome b [Source:FlyBase;Acc:FBgn0013678] \n",
"8 28S ribosomal RNA pseudogene:CR45860 [Source:FlyBase;Acc:FBgn0267520] \n",
"9 Ribosomal protein L39 [Source:FlyBase;Acc:FBgn0023170] \n",
"10 Ribosomal protein L41 [Source:FlyBase;Acc:FBgn0066084] \n",
"11 mitochondrial ATPase subunit 6 [Source:FlyBase;Acc:FBgn0013672] \n",
"12 Ribosomal protein L29 [Source:FlyBase;Acc:FBgn0016726] \n",
"13 18S ribosomal RNA:CR41548 [Source:FlyBase;Acc:FBgn0085802] \n",
"14 18S ribosomal RNA:CR45838 [Source:FlyBase;Acc:FBgn0267498] \n",
"15 Odorant-binding protein 44a [Source:FlyBase;Acc:FBgn0033268] \n",
"16 28S ribosomal RNA pseudogene:CR45851 [Source:FlyBase;Acc:FBgn0267511] \n",
"17 28S ribosomal RNA:CR45837 [Source:FlyBase;Acc:FBgn0267497] \n",
"18 \n",
"19 Ribosomal protein L36 [Source:FlyBase;Acc:FBgn0002579] \n",
" transcript_biotype gene_biotype \n",
"0 rRNA rRNA \n",
"1 ncRNA ncRNA \n",
"2 protein_coding protein_coding\n",
"3 protein_coding protein_coding\n",
"4 protein_coding protein_coding\n",
"5 protein_coding protein_coding\n",
"6 rRNA rRNA \n",
"7 protein_coding protein_coding\n",
"8 pseudogene pseudogene \n",
"9 protein_coding protein_coding\n",
"10 protein_coding protein_coding\n",
"11 protein_coding protein_coding\n",
"12 protein_coding protein_coding\n",
"13 rRNA rRNA \n",
"14 rRNA rRNA \n",
"15 protein_coding protein_coding\n",
"16 pseudogene pseudogene \n",
"17 rRNA rRNA \n",
"18 protein_coding protein_coding\n",
"19 protein_coding protein_coding"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%get DM_sorted_counts_BM --from Python3\n",
"\n",
"head(DM_sorted_counts_BM, 20)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"kernel": "R"
},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead><tr><th></th><th scope=col>Name</th><th scope=col>SRR3478195</th><th scope=col>SRR3478196</th><th scope=col>SRR3478197</th><th scope=col>SRR3478217</th><th scope=col>SRR3478218</th><th scope=col>SRR3478219</th><th scope=col>Median</th><th scope=col>ensembl_gene_id</th><th scope=col>external_gene_name</th><th scope=col>description</th><th scope=col>transcript_biotype</th><th scope=col>gene_biotype</th></tr></thead>\n",
"<tbody>\n",
"\t<tr><th scope=row>2</th><td>FBtr0100868 </td><td>23731.20 </td><td>30708.10 </td><td>26425.30 </td><td>32645.90 </td><td>30841.30 </td><td>28208.70 </td><td>29458.400 </td><td>FBgn0013676 </td><td>mt:CoIII </td><td>mitochondrial Cytochrome c oxidase subunit III [Source:FlyBase;Acc:FBgn0013676] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>3</th><td>FBtr0100861 </td><td>26776.40 </td><td>33453.80 </td><td>28996.10 </td><td>35745.40 </td><td>29645.10 </td><td>27438.00 </td><td>29320.600 </td><td>FBgn0013674 </td><td>mt:CoI </td><td>mitochondrial Cytochrome c oxidase subunit I [Source:FlyBase;Acc:FBgn0013674] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>4</th><td>FBtr0082158 </td><td>16639.90 </td><td>24308.10 </td><td>20152.30 </td><td>14051.70 </td><td>18095.50 </td><td>18878.50 </td><td>18487.000 </td><td>FBgn0002868 </td><td>MtnA </td><td>Metallothionein A [Source:FlyBase;Acc:FBgn0002868] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>5</th><td>FBtr0100863 </td><td>14372.10 </td><td>18408.60 </td><td>14199.80 </td><td>21887.30 </td><td>18524.10 </td><td>16917.70 </td><td>17663.150 </td><td>FBgn0013675 </td><td>mt:CoII </td><td>mitochondrial Cytochrome c oxidase subunit II [Source:FlyBase;Acc:FBgn0013675] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>7</th><td>FBtr0433502 </td><td>11658.80 </td><td>14917.70 </td><td>14051.10 </td><td>15637.70 </td><td>14516.70 </td><td>13962.50 </td><td>14283.900 </td><td>FBgn0013678 </td><td>mt:Cyt-b </td><td>mitochondrial Cytochrome b [Source:FlyBase;Acc:FBgn0013678] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>9</th><td>FBtr0072185 </td><td>12502.60 </td><td>15227.60 </td><td>12987.80 </td><td>14468.90 </td><td>12239.20 </td><td>11509.10 </td><td>12745.200 </td><td>FBgn0023170 </td><td>RpL39 </td><td>Ribosomal protein L39 [Source:FlyBase;Acc:FBgn0023170] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>10</th><td>FBtr0100231 </td><td>12279.20 </td><td>16448.90 </td><td>11422.30 </td><td>14541.60 </td><td>12367.90 </td><td>11503.70 </td><td>12323.550 </td><td>FBgn0066084 </td><td>RpL41 </td><td>Ribosomal protein L41 [Source:FlyBase;Acc:FBgn0066084] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>11</th><td>FBtr0433498 </td><td>10239.70 </td><td>12791.80 </td><td>11363.80 </td><td>15365.80 </td><td>12053.50 </td><td>11041.70 </td><td>11708.650 </td><td>FBgn0013672 </td><td>mt:ATPase6 </td><td>mitochondrial ATPase subunit 6 [Source:FlyBase;Acc:FBgn0013672] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>12</th><td>FBtr0305669 </td><td>11436.10 </td><td>11853.10 </td><td>11203.60 </td><td>12985.30 </td><td>10985.70 </td><td> 9710.47 </td><td>11319.850 </td><td>FBgn0016726 </td><td>RpL29 </td><td>Ribosomal protein L29 [Source:FlyBase;Acc:FBgn0016726] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>15</th><td>FBtr0088816 </td><td>10873.20 </td><td> 7078.63 </td><td>12060.30 </td><td>10118.20 </td><td> 8431.46 </td><td> 7656.14 </td><td> 9274.830 </td><td>FBgn0033268 </td><td>Obp44a </td><td>Odorant-binding protein 44a [Source:FlyBase;Acc:FBgn0033268] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>18</th><td>FBtr0081920 </td><td> 5277.58 </td><td> 6445.07 </td><td> 5532.97 </td><td> 4731.72 </td><td> 5239.29 </td><td> 4888.94 </td><td> 5258.435 </td><td>FBgn0040532 </td><td>CG8369 </td><td> </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>19</th><td>FBtr0345321 </td><td> 4944.50 </td><td> 5635.42 </td><td> 4774.91 </td><td> 5218.73 </td><td> 4927.06 </td><td> 4362.67 </td><td> 4935.780 </td><td>FBgn0002579 </td><td>RpL36 </td><td>Ribosomal protein L36 [Source:FlyBase;Acc:FBgn0002579] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>20</th><td>FBtr0100870 </td><td> 4460.08 </td><td> 5932.88 </td><td> 4120.85 </td><td> 5482.12 </td><td> 5074.41 </td><td> 4508.94 </td><td> 4791.675 </td><td>FBgn0013681 </td><td>mt:ND3 </td><td>mitochondrial NADH-ubiquinone oxidoreductase chain 3 [Source:FlyBase;Acc:FBgn0013681]</td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>21</th><td>FBtr0083970 </td><td> 4418.53 </td><td> 3542.62 </td><td> 4442.73 </td><td> 4719.67 </td><td> 3858.70 </td><td> 3558.81 </td><td> 4138.615 </td><td>FBgn0038834 </td><td>RpS30 </td><td>Ribosomal protein S30 [Source:FlyBase;Acc:FBgn0038834] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>22</th><td>FBtr0111120 </td><td> 3576.76 </td><td> 5052.31 </td><td> 4145.02 </td><td> 4539.37 </td><td> 3643.21 </td><td> 3369.58 </td><td> 3894.115 </td><td>FBgn0040007 </td><td>RpL38 </td><td>Ribosomal protein L38 [Source:FlyBase;Acc:FBgn0040007] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>23</th><td>FBtr0433499 </td><td> 3004.90 </td><td> 4096.07 </td><td> 4034.34 </td><td> 3941.94 </td><td> 3679.44 </td><td> 3526.38 </td><td> 3810.690 </td><td>FBgn0013679 </td><td>mt:ND1 </td><td>mitochondrial NADH-ubiquinone oxidoreductase chain 1 [Source:FlyBase;Acc:FBgn0013679]</td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>27</th><td>FBtr0071897 </td><td> 3474.22 </td><td> 4066.15 </td><td> 3227.59 </td><td> 3872.37 </td><td> 3135.34 </td><td> 2873.13 </td><td> 3350.905 </td><td>FBgn0010078 </td><td>RpL23 </td><td>Ribosomal protein L23 [Source:FlyBase;Acc:FBgn0010078] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>29</th><td>FBtr0111132 </td><td> 3204.97 </td><td> 2952.79 </td><td> 2982.18 </td><td> 3438.57 </td><td> 3133.56 </td><td> 3050.26 </td><td> 3091.910 </td><td>FBgn0064225 </td><td>RpL5 </td><td>Ribosomal protein L5 [Source:FlyBase;Acc:FBgn0064225] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>30</th><td>FBtr0082136 </td><td> 3226.80 </td><td> 3457.42 </td><td> 2830.81 </td><td> 3281.44 </td><td> 2926.15 </td><td> 2783.21 </td><td> 3076.475 </td><td>FBgn0261599 </td><td>RpS29 </td><td>Ribosomal protein S29 [Source:FlyBase;Acc:FBgn0261599] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"\t<tr><th scope=row>34</th><td>FBtr0085961 </td><td> 2927.31 </td><td> 3363.11 </td><td> 2942.27 </td><td> 3084.86 </td><td> 2589.26 </td><td> 2358.32 </td><td> 2934.790 </td><td>FBgn0032987 </td><td>RpL21 </td><td>Ribosomal protein L21 [Source:FlyBase;Acc:FBgn0032987] </td><td>protein_coding </td><td>protein_coding </td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllllllllll}\n",
" & Name & SRR3478195 & SRR3478196 & SRR3478197 & SRR3478217 & SRR3478218 & SRR3478219 & Median & ensembl\\_gene\\_id & external\\_gene\\_name & description & transcript\\_biotype & gene\\_biotype\\\\\n",
"\\hline\n",
"\t2 & FBtr0100868 & 23731.20 & 30708.10 & 26425.30 & 32645.90 & 30841.30 & 28208.70 & 29458.400 & FBgn0013676 & mt:CoIII & mitochondrial Cytochrome c oxidase subunit III {[}Source:FlyBase;Acc:FBgn0013676{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t3 & FBtr0100861 & 26776.40 & 33453.80 & 28996.10 & 35745.40 & 29645.10 & 27438.00 & 29320.600 & FBgn0013674 & mt:CoI & mitochondrial Cytochrome c oxidase subunit I {[}Source:FlyBase;Acc:FBgn0013674{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t4 & FBtr0082158 & 16639.90 & 24308.10 & 20152.30 & 14051.70 & 18095.50 & 18878.50 & 18487.000 & FBgn0002868 & MtnA & Metallothionein A {[}Source:FlyBase;Acc:FBgn0002868{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t5 & FBtr0100863 & 14372.10 & 18408.60 & 14199.80 & 21887.30 & 18524.10 & 16917.70 & 17663.150 & FBgn0013675 & mt:CoII & mitochondrial Cytochrome c oxidase subunit II {[}Source:FlyBase;Acc:FBgn0013675{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t7 & FBtr0433502 & 11658.80 & 14917.70 & 14051.10 & 15637.70 & 14516.70 & 13962.50 & 14283.900 & FBgn0013678 & mt:Cyt-b & mitochondrial Cytochrome b {[}Source:FlyBase;Acc:FBgn0013678{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t9 & FBtr0072185 & 12502.60 & 15227.60 & 12987.80 & 14468.90 & 12239.20 & 11509.10 & 12745.200 & FBgn0023170 & RpL39 & Ribosomal protein L39 {[}Source:FlyBase;Acc:FBgn0023170{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t10 & FBtr0100231 & 12279.20 & 16448.90 & 11422.30 & 14541.60 & 12367.90 & 11503.70 & 12323.550 & FBgn0066084 & RpL41 & Ribosomal protein L41 {[}Source:FlyBase;Acc:FBgn0066084{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t11 & FBtr0433498 & 10239.70 & 12791.80 & 11363.80 & 15365.80 & 12053.50 & 11041.70 & 11708.650 & FBgn0013672 & mt:ATPase6 & mitochondrial ATPase subunit 6 {[}Source:FlyBase;Acc:FBgn0013672{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t12 & FBtr0305669 & 11436.10 & 11853.10 & 11203.60 & 12985.30 & 10985.70 & 9710.47 & 11319.850 & FBgn0016726 & RpL29 & Ribosomal protein L29 {[}Source:FlyBase;Acc:FBgn0016726{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t15 & FBtr0088816 & 10873.20 & 7078.63 & 12060.30 & 10118.20 & 8431.46 & 7656.14 & 9274.830 & FBgn0033268 & Obp44a & Odorant-binding protein 44a {[}Source:FlyBase;Acc:FBgn0033268{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t18 & FBtr0081920 & 5277.58 & 6445.07 & 5532.97 & 4731.72 & 5239.29 & 4888.94 & 5258.435 & FBgn0040532 & CG8369 & & protein\\_coding & protein\\_coding \\\\\n",
"\t19 & FBtr0345321 & 4944.50 & 5635.42 & 4774.91 & 5218.73 & 4927.06 & 4362.67 & 4935.780 & FBgn0002579 & RpL36 & Ribosomal protein L36 {[}Source:FlyBase;Acc:FBgn0002579{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t20 & FBtr0100870 & 4460.08 & 5932.88 & 4120.85 & 5482.12 & 5074.41 & 4508.94 & 4791.675 & FBgn0013681 & mt:ND3 & mitochondrial NADH-ubiquinone oxidoreductase chain 3 {[}Source:FlyBase;Acc:FBgn0013681{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t21 & FBtr0083970 & 4418.53 & 3542.62 & 4442.73 & 4719.67 & 3858.70 & 3558.81 & 4138.615 & FBgn0038834 & RpS30 & Ribosomal protein S30 {[}Source:FlyBase;Acc:FBgn0038834{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t22 & FBtr0111120 & 3576.76 & 5052.31 & 4145.02 & 4539.37 & 3643.21 & 3369.58 & 3894.115 & FBgn0040007 & RpL38 & Ribosomal protein L38 {[}Source:FlyBase;Acc:FBgn0040007{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t23 & FBtr0433499 & 3004.90 & 4096.07 & 4034.34 & 3941.94 & 3679.44 & 3526.38 & 3810.690 & FBgn0013679 & mt:ND1 & mitochondrial NADH-ubiquinone oxidoreductase chain 1 {[}Source:FlyBase;Acc:FBgn0013679{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t27 & FBtr0071897 & 3474.22 & 4066.15 & 3227.59 & 3872.37 & 3135.34 & 2873.13 & 3350.905 & FBgn0010078 & RpL23 & Ribosomal protein L23 {[}Source:FlyBase;Acc:FBgn0010078{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t29 & FBtr0111132 & 3204.97 & 2952.79 & 2982.18 & 3438.57 & 3133.56 & 3050.26 & 3091.910 & FBgn0064225 & RpL5 & Ribosomal protein L5 {[}Source:FlyBase;Acc:FBgn0064225{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t30 & FBtr0082136 & 3226.80 & 3457.42 & 2830.81 & 3281.44 & 2926.15 & 2783.21 & 3076.475 & FBgn0261599 & RpS29 & Ribosomal protein S29 {[}Source:FlyBase;Acc:FBgn0261599{]} & protein\\_coding & protein\\_coding \\\\\n",
"\t34 & FBtr0085961 & 2927.31 & 3363.11 & 2942.27 & 3084.86 & 2589.26 & 2358.32 & 2934.790 & FBgn0032987 & RpL21 & Ribosomal protein L21 {[}Source:FlyBase;Acc:FBgn0032987{]} & protein\\_coding & protein\\_coding \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| <!--/--> | Name | SRR3478195 | SRR3478196 | SRR3478197 | SRR3478217 | SRR3478218 | SRR3478219 | Median | ensembl_gene_id | external_gene_name | description | transcript_biotype | gene_biotype | \n",
"|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
"| 2 | FBtr0100868 | 23731.20 | 30708.10 | 26425.30 | 32645.90 | 30841.30 | 28208.70 | 29458.400 | FBgn0013676 | mt:CoIII | mitochondrial Cytochrome c oxidase subunit III [Source:FlyBase;Acc:FBgn0013676] | protein_coding | protein_coding | \n",
"| 3 | FBtr0100861 | 26776.40 | 33453.80 | 28996.10 | 35745.40 | 29645.10 | 27438.00 | 29320.600 | FBgn0013674 | mt:CoI | mitochondrial Cytochrome c oxidase subunit I [Source:FlyBase;Acc:FBgn0013674] | protein_coding | protein_coding | \n",
"| 4 | FBtr0082158 | 16639.90 | 24308.10 | 20152.30 | 14051.70 | 18095.50 | 18878.50 | 18487.000 | FBgn0002868 | MtnA | Metallothionein A [Source:FlyBase;Acc:FBgn0002868] | protein_coding | protein_coding | \n",
"| 5 | FBtr0100863 | 14372.10 | 18408.60 | 14199.80 | 21887.30 | 18524.10 | 16917.70 | 17663.150 | FBgn0013675 | mt:CoII | mitochondrial Cytochrome c oxidase subunit II [Source:FlyBase;Acc:FBgn0013675] | protein_coding | protein_coding | \n",
"| 7 | FBtr0433502 | 11658.80 | 14917.70 | 14051.10 | 15637.70 | 14516.70 | 13962.50 | 14283.900 | FBgn0013678 | mt:Cyt-b | mitochondrial Cytochrome b [Source:FlyBase;Acc:FBgn0013678] | protein_coding | protein_coding | \n",
"| 9 | FBtr0072185 | 12502.60 | 15227.60 | 12987.80 | 14468.90 | 12239.20 | 11509.10 | 12745.200 | FBgn0023170 | RpL39 | Ribosomal protein L39 [Source:FlyBase;Acc:FBgn0023170] | protein_coding | protein_coding | \n",
"| 10 | FBtr0100231 | 12279.20 | 16448.90 | 11422.30 | 14541.60 | 12367.90 | 11503.70 | 12323.550 | FBgn0066084 | RpL41 | Ribosomal protein L41 [Source:FlyBase;Acc:FBgn0066084] | protein_coding | protein_coding | \n",
"| 11 | FBtr0433498 | 10239.70 | 12791.80 | 11363.80 | 15365.80 | 12053.50 | 11041.70 | 11708.650 | FBgn0013672 | mt:ATPase6 | mitochondrial ATPase subunit 6 [Source:FlyBase;Acc:FBgn0013672] | protein_coding | protein_coding | \n",
"| 12 | FBtr0305669 | 11436.10 | 11853.10 | 11203.60 | 12985.30 | 10985.70 | 9710.47 | 11319.850 | FBgn0016726 | RpL29 | Ribosomal protein L29 [Source:FlyBase;Acc:FBgn0016726] | protein_coding | protein_coding | \n",
"| 15 | FBtr0088816 | 10873.20 | 7078.63 | 12060.30 | 10118.20 | 8431.46 | 7656.14 | 9274.830 | FBgn0033268 | Obp44a | Odorant-binding protein 44a [Source:FlyBase;Acc:FBgn0033268] | protein_coding | protein_coding | \n",
"| 18 | FBtr0081920 | 5277.58 | 6445.07 | 5532.97 | 4731.72 | 5239.29 | 4888.94 | 5258.435 | FBgn0040532 | CG8369 | | protein_coding | protein_coding | \n",
"| 19 | FBtr0345321 | 4944.50 | 5635.42 | 4774.91 | 5218.73 | 4927.06 | 4362.67 | 4935.780 | FBgn0002579 | RpL36 | Ribosomal protein L36 [Source:FlyBase;Acc:FBgn0002579] | protein_coding | protein_coding | \n",
"| 20 | FBtr0100870 | 4460.08 | 5932.88 | 4120.85 | 5482.12 | 5074.41 | 4508.94 | 4791.675 | FBgn0013681 | mt:ND3 | mitochondrial NADH-ubiquinone oxidoreductase chain 3 [Source:FlyBase;Acc:FBgn0013681] | protein_coding | protein_coding | \n",
"| 21 | FBtr0083970 | 4418.53 | 3542.62 | 4442.73 | 4719.67 | 3858.70 | 3558.81 | 4138.615 | FBgn0038834 | RpS30 | Ribosomal protein S30 [Source:FlyBase;Acc:FBgn0038834] | protein_coding | protein_coding | \n",
"| 22 | FBtr0111120 | 3576.76 | 5052.31 | 4145.02 | 4539.37 | 3643.21 | 3369.58 | 3894.115 | FBgn0040007 | RpL38 | Ribosomal protein L38 [Source:FlyBase;Acc:FBgn0040007] | protein_coding | protein_coding | \n",
"| 23 | FBtr0433499 | 3004.90 | 4096.07 | 4034.34 | 3941.94 | 3679.44 | 3526.38 | 3810.690 | FBgn0013679 | mt:ND1 | mitochondrial NADH-ubiquinone oxidoreductase chain 1 [Source:FlyBase;Acc:FBgn0013679] | protein_coding | protein_coding | \n",
"| 27 | FBtr0071897 | 3474.22 | 4066.15 | 3227.59 | 3872.37 | 3135.34 | 2873.13 | 3350.905 | FBgn0010078 | RpL23 | Ribosomal protein L23 [Source:FlyBase;Acc:FBgn0010078] | protein_coding | protein_coding | \n",
"| 29 | FBtr0111132 | 3204.97 | 2952.79 | 2982.18 | 3438.57 | 3133.56 | 3050.26 | 3091.910 | FBgn0064225 | RpL5 | Ribosomal protein L5 [Source:FlyBase;Acc:FBgn0064225] | protein_coding | protein_coding | \n",
"| 30 | FBtr0082136 | 3226.80 | 3457.42 | 2830.81 | 3281.44 | 2926.15 | 2783.21 | 3076.475 | FBgn0261599 | RpS29 | Ribosomal protein S29 [Source:FlyBase;Acc:FBgn0261599] | protein_coding | protein_coding | \n",
"| 34 | FBtr0085961 | 2927.31 | 3363.11 | 2942.27 | 3084.86 | 2589.26 | 2358.32 | 2934.790 | FBgn0032987 | RpL21 | Ribosomal protein L21 [Source:FlyBase;Acc:FBgn0032987] | protein_coding | protein_coding | \n",
"\n",
"\n"
],
"text/plain": [
" Name SRR3478195 SRR3478196 SRR3478197 SRR3478217 SRR3478218\n",
"2 FBtr0100868 23731.20 30708.10 26425.30 32645.90 30841.30 \n",
"3 FBtr0100861 26776.40 33453.80 28996.10 35745.40 29645.10 \n",
"4 FBtr0082158 16639.90 24308.10 20152.30 14051.70 18095.50 \n",
"5 FBtr0100863 14372.10 18408.60 14199.80 21887.30 18524.10 \n",
"7 FBtr0433502 11658.80 14917.70 14051.10 15637.70 14516.70 \n",
"9 FBtr0072185 12502.60 15227.60 12987.80 14468.90 12239.20 \n",
"10 FBtr0100231 12279.20 16448.90 11422.30 14541.60 12367.90 \n",
"11 FBtr0433498 10239.70 12791.80 11363.80 15365.80 12053.50 \n",
"12 FBtr0305669 11436.10 11853.10 11203.60 12985.30 10985.70 \n",
"15 FBtr0088816 10873.20 7078.63 12060.30 10118.20 8431.46 \n",
"18 FBtr0081920 5277.58 6445.07 5532.97 4731.72 5239.29 \n",
"19 FBtr0345321 4944.50 5635.42 4774.91 5218.73 4927.06 \n",
"20 FBtr0100870 4460.08 5932.88 4120.85 5482.12 5074.41 \n",
"21 FBtr0083970 4418.53 3542.62 4442.73 4719.67 3858.70 \n",
"22 FBtr0111120 3576.76 5052.31 4145.02 4539.37 3643.21 \n",
"23 FBtr0433499 3004.90 4096.07 4034.34 3941.94 3679.44 \n",
"27 FBtr0071897 3474.22 4066.15 3227.59 3872.37 3135.34 \n",
"29 FBtr0111132 3204.97 2952.79 2982.18 3438.57 3133.56 \n",
"30 FBtr0082136 3226.80 3457.42 2830.81 3281.44 2926.15 \n",
"34 FBtr0085961 2927.31 3363.11 2942.27 3084.86 2589.26 \n",
" SRR3478219 Median ensembl_gene_id external_gene_name\n",
"2 28208.70 29458.400 FBgn0013676 mt:CoIII \n",
"3 27438.00 29320.600 FBgn0013674 mt:CoI \n",
"4 18878.50 18487.000 FBgn0002868 MtnA \n",
"5 16917.70 17663.150 FBgn0013675 mt:CoII \n",
"7 13962.50 14283.900 FBgn0013678 mt:Cyt-b \n",
"9 11509.10 12745.200 FBgn0023170 RpL39 \n",
"10 11503.70 12323.550 FBgn0066084 RpL41 \n",
"11 11041.70 11708.650 FBgn0013672 mt:ATPase6 \n",
"12 9710.47 11319.850 FBgn0016726 RpL29 \n",
"15 7656.14 9274.830 FBgn0033268 Obp44a \n",
"18 4888.94 5258.435 FBgn0040532 CG8369 \n",
"19 4362.67 4935.780 FBgn0002579 RpL36 \n",
"20 4508.94 4791.675 FBgn0013681 mt:ND3 \n",
"21 3558.81 4138.615 FBgn0038834 RpS30 \n",
"22 3369.58 3894.115 FBgn0040007 RpL38 \n",
"23 3526.38 3810.690 FBgn0013679 mt:ND1 \n",
"27 2873.13 3350.905 FBgn0010078 RpL23 \n",
"29 3050.26 3091.910 FBgn0064225 RpL5 \n",
"30 2783.21 3076.475 FBgn0261599 RpS29 \n",
"34 2358.32 2934.790 FBgn0032987 RpL21 \n",
" description \n",
"2 mitochondrial Cytochrome c oxidase subunit III [Source:FlyBase;Acc:FBgn0013676] \n",
"3 mitochondrial Cytochrome c oxidase subunit I [Source:FlyBase;Acc:FBgn0013674] \n",
"4 Metallothionein A [Source:FlyBase;Acc:FBgn0002868] \n",
"5 mitochondrial Cytochrome c oxidase subunit II [Source:FlyBase;Acc:FBgn0013675] \n",
"7 mitochondrial Cytochrome b [Source:FlyBase;Acc:FBgn0013678] \n",
"9 Ribosomal protein L39 [Source:FlyBase;Acc:FBgn0023170] \n",
"10 Ribosomal protein L41 [Source:FlyBase;Acc:FBgn0066084] \n",
"11 mitochondrial ATPase subunit 6 [Source:FlyBase;Acc:FBgn0013672] \n",
"12 Ribosomal protein L29 [Source:FlyBase;Acc:FBgn0016726] \n",
"15 Odorant-binding protein 44a [Source:FlyBase;Acc:FBgn0033268] \n",
"18 \n",
"19 Ribosomal protein L36 [Source:FlyBase;Acc:FBgn0002579] \n",
"20 mitochondrial NADH-ubiquinone oxidoreductase chain 3 [Source:FlyBase;Acc:FBgn0013681]\n",
"21 Ribosomal protein S30 [Source:FlyBase;Acc:FBgn0038834] \n",
"22 Ribosomal protein L38 [Source:FlyBase;Acc:FBgn0040007] \n",
"23 mitochondrial NADH-ubiquinone oxidoreductase chain 1 [Source:FlyBase;Acc:FBgn0013679]\n",
"27 Ribosomal protein L23 [Source:FlyBase;Acc:FBgn0010078] \n",
"29 Ribosomal protein L5 [Source:FlyBase;Acc:FBgn0064225] \n",
"30 Ribosomal protein S29 [Source:FlyBase;Acc:FBgn0261599] \n",
"34 Ribosomal protein L21 [Source:FlyBase;Acc:FBgn0032987] \n",
" transcript_biotype gene_biotype \n",
"2 protein_coding protein_coding\n",
"3 protein_coding protein_coding\n",
"4 protein_coding protein_coding\n",
"5 protein_coding protein_coding\n",
"7 protein_coding protein_coding\n",
"9 protein_coding protein_coding\n",
"10 protein_coding protein_coding\n",
"11 protein_coding protein_coding\n",
"12 protein_coding protein_coding\n",
"15 protein_coding protein_coding\n",
"18 protein_coding protein_coding\n",
"19 protein_coding protein_coding\n",
"20 protein_coding protein_coding\n",
"21 protein_coding protein_coding\n",
"22 protein_coding protein_coding\n",
"23 protein_coding protein_coding\n",
"27 protein_coding protein_coding\n",
"29 protein_coding protein_coding\n",
"30 protein_coding protein_coding\n",
"34 protein_coding protein_coding"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%get DM_PC_transcripts --from Python3\n",
"\n",
"## Preview the dataframe\n",
"head(DM_PC_transcripts, 20)\n",
"\n",
"## Write to table\n",
"write.csv(head(DM_PC_transcripts, 20), file = \"DM_top20_PC_transcripts.csv\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "SoS"
},
"source": [
"## Extract amino acid sequences of protein-coding sequences from Biomart"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"kernel": "R"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Batch submitting query [=>-----------------------------] 7% eta: 35s\n",
"Batch submitting query [==>----------------------------] 10% eta: 49s\n",
"Batch submitting query [===>---------------------------] 13% eta: 1m\n",
"Batch submitting query [====>--------------------------] 17% eta: 1m\n",
"Batch submitting query [=====>-------------------------] 20% eta: 1m\n",
"Batch submitting query [======>------------------------] 23% eta: 1m\n",
"Batch submitting query [=======>-----------------------] 27% eta: 1m\n",
"Batch submitting query [========>----------------------] 30% eta: 1m\n",
"Batch submitting query [=========>---------------------] 33% eta: 1m\n",
"Batch submitting query [==========>--------------------] 37% eta: 1m\n",
"Batch submitting query [===========>-------------------] 40% eta: 1m\n",
"Batch submitting query [============>------------------] 43% eta: 49s\n",
"Batch submitting query [=============>-----------------] 47% eta: 47s\n",
"Batch submitting query [===============>---------------] 50% eta: 45s\n",
"Batch submitting query [================>--------------] 53% eta: 42s\n",
"Batch submitting query [=================>-------------] 57% eta: 40s\n",
"Batch submitting query [==================>------------] 60% eta: 37s\n",
"Batch submitting query [===================>-----------] 63% eta: 34s\n",
"Batch submitting query [====================>----------] 67% eta: 31s\n",
"Batch submitting query [=====================>---------] 70% eta: 29s\n",
"Batch submitting query [======================>--------] 73% eta: 26s\n",
"Batch submitting query [=======================>-------] 77% eta: 23s\n",
"Batch submitting query [========================>------] 80% eta: 20s\n",
"Batch submitting query [=========================>-----] 83% eta: 16s\n",
"Batch submitting query [==========================>----] 87% eta: 13s\n",
"Batch submitting query [===========================>---] 90% eta: 10s\n",
"Batch submitting query [============================>--] 93% eta: 7s\n",
"Batch submitting query [=============================>-] 97% eta: 3s\n",
"Batch submitting query [===============================] 100% eta: 0s\n"
]
}
],
"source": [
"DM_PC_transcripts_ID <- DM_PC_transcripts$Name\n",
"\n",
"## Create getBM() query for obtaining peptide sequences\n",
"DM_BM_peptide_seqs <- getSequence(id = DM_PC_transcripts_ID, \n",
" type = 'flybase_transcript_id', \n",
" seqType = 'peptide', \n",
" mart = DM_ensembl)\n",
"\n",
"## Export to FASTA\n",
"exportFASTA(DM_BM_peptide_seqs, file='./DM_non0_pep_Nov11.fasta')"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "SoS"
},
"source": [
"## KOG"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"kernel": "calysto_bash"
},
"outputs": [],
"source": [
"##[28617542.bc]\n",
"rpsblast -query ~/Lymnaea_CNS_transcriptome_files/7_Interspecies_comparison/7a_Drosophila/DM_non0_pep_Nov11.fasta -db Kog \\\n",
"-out ~/Lymnaea_CNS_transcriptome_files/7_Interspecies_comparison/7a_Drosophila/DM_CNS_pep_Nov11_KOG.txt -evalue 1E-5 \\\n",
"-outfmt \"6 qseqid sseqid stitle pident length mismatch gapopen qlen qstart qend slen sstart send evalue bitscore qcovhsp qcovs\" \\\n",
"-max_hsps 1 -max_target_seqs 1"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"kernel": "calysto_bash"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"## Replace \"[\" and \"]\" with \"#\" for later import as dataframe\n",
"sed -i 's/\\[/#/g' DM_CNS_pep_Nov11_KOG.txt \n",
"sed -i 's/\\]./#/g' DM_CNS_pep_Nov11_KOG.txt "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"kernel": "Python3"
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import os\n",
"os.chdir(\"/home/zhanglab1/ndong/Lymnaea_CNS_transcriptome_files/7_Interspecies_comparison/7a_Drosophila\")\n",
"\n",
"DM_KOG = pd.read_csv(\"DM_CNS_pep_Nov11_KOG.txt\", sep='#', header=None, engine=\"python\")\n",
"type(DM_KOG)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"kernel": "R"
},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead><tr><th></th><th scope=col>0</th><th scope=col>1</th><th scope=col>2</th></tr></thead>\n",
"<tbody>\n",
"\t<tr><th scope=row>0</th><td>FBtr0070135\tgnl|CDD|229923\tKOG1984, KOG1984, KOG1984, Vesicle coat complex COPII, subunit SFB3 </td><td>Intracellular trafficking, secretion, and vesicular transport </td><td>\t24.057\t212\t146\t5\t469\t79\t279\t1007\t19\t226\t1.30e-08\t54.8\t43\t43 </td></tr>\n",
"\t<tr><th scope=row>1</th><td>FBtr0070611\tgnl|CDD|230896\tKOG2957, KOG2957, KOG2957, Vacuolar H+-ATPase V0 sector, subunit d </td><td>Energy production and conversion </td><td>\t78.000\t350\t76\t1\t350\t1\t350\t350\t2\t350\t0.0\t642\t100\t100 </td></tr>\n",
"\t<tr><th scope=row>2</th><td>FBtr0070159\tgnl|CDD|232691\tKOG4766, KOG4766, KOG4766, Uncharacterized conserved protein </td><td>Function unknown </td><td>\t75.926\t54\t13\t0\t64\t1\t54\t64\t1\t54\t9.64e-07\t39.1\t84\t84 </td></tr>\n",
"\t<tr><th scope=row>3</th><td>FBtr0071498\tgnl|CDD|229595\tKOG1654, KOG1654, KOG1654, Microtubule-associated anchor protein involved in autophagy and membrane trafficking </td><td>Cytoskeleton </td><td>\t66.379\t116\t39\t0\t121\t1\t116\t116\t1\t116\t4.27e-72\t208\t96\t96 </td></tr>\n",
"\t<tr><th scope=row>4</th><td>FBtr0072188\tgnl|CDD|229674\tKOG1735, KOG1735, KOG1735, Actin depolymerizing factor </td><td>Cytoskeleton </td><td>\t46.309\t149\t76\t2\t148\t1\t148\t146\t1\t146\t2.46e-63\t188\t100\t100 </td></tr>\n",
"\t<tr><th scope=row>5</th><td>FBtr0070148\tgnl|CDD|231372\tKOG3434, KOG3434, KOG3434, 60S ribosomal protein L22 </td><td>Translation, ribosomal structure and biogenesis </td><td>\t66.400\t125\t41\t1\t299\t173\t297\t125\t1\t124\t4.20e-44\t144\t42\t42 </td></tr>\n",
"\t<tr><th scope=row>6</th><td>FBtr0071094\tgnl|CDD|228356\tKOG0407, KOG0407, KOG0407, 40S ribosomal protein S14 </td><td>Translation, ribosomal structure and biogenesis </td><td>\t89.062\t128\t14\t0\t151\t13\t140\t139\t1\t128\t3.71e-82\t235\t85\t85 </td></tr>\n",
"\t<tr><th scope=row>7</th><td>FBtr0071444\tgnl|CDD|232489\tKOG4561, KOG4561, KOG4561, Uncharacterized conserved protein, contains TBC domain </td><td>Signal transduction mechanisms, General function prediction only </td><td>\t29.240\t342\t178\t6\t429\t73\t411\t281\t1\t281\t3.73e-91\t275\t79\t79 </td></tr>\n",
"\t<tr><th scope=row>8</th><td>FBtr0070800\tgnl|CDD|231374\tKOG3436, KOG3436, KOG3436, 60S ribosomal protein L35 </td><td>Translation, ribosomal structure and biogenesis </td><td>\t61.789\t123\t47\t0\t123\t1\t123\t123\t1\t123\t1.59e-29\t101\t100\t100 </td></tr>\n",
"\t<tr><th scope=row>9</th><td>FBtr0072187\tgnl|CDD|229526\tKOG1585, KOG1585, KOG1585, Protein required for fusion of vesicles in vesicular transport, gamma-SNAP </td><td>Intracellular trafficking, secretion, and vesicular transport </td><td>\t38.558\t319\t168\t4\t302\t1\t302\t308\t1\t308\t1.91e-101\t297\t100\t100 </td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" & 0 & 1 & 2\\\\\n",
"\\hline\n",
"\t0 & FBtr0070135\tgnl\\textbar{}CDD\\textbar{}229923\tKOG1984, KOG1984, KOG1984, Vesicle coat complex COPII, subunit SFB3 & Intracellular trafficking, secretion, and vesicular transport & \t24.057\t212\t146\t5\t469\t79\t279\t1007\t19\t226\t1.30e-08\t54.8\t43\t43 \\\\\n",
"\t1 & FBtr0070611\tgnl\\textbar{}CDD\\textbar{}230896\tKOG2957, KOG2957, KOG2957, Vacuolar H+-ATPase V0 sector, subunit d & Energy production and conversion & \t78.000\t350\t76\t1\t350\t1\t350\t350\t2\t350\t0.0\t642\t100\t100 \\\\\n",
"\t2 & FBtr0070159\tgnl\\textbar{}CDD\\textbar{}232691\tKOG4766, KOG4766, KOG4766, Uncharacterized conserved protein & Function unknown & \t75.926\t54\t13\t0\t64\t1\t54\t64\t1\t54\t9.64e-07\t39.1\t84\t84 \\\\\n",
"\t3 & FBtr0071498\tgnl\\textbar{}CDD\\textbar{}229595\tKOG1654, KOG1654, KOG1654, Microtubule-associated anchor protein involved in autophagy and membrane trafficking & Cytoskeleton & \t66.379\t116\t39\t0\t121\t1\t116\t116\t1\t116\t4.27e-72\t208\t96\t96 \\\\\n",
"\t4 & FBtr0072188\tgnl\\textbar{}CDD\\textbar{}229674\tKOG1735, KOG1735, KOG1735, Actin depolymerizing factor & Cytoskeleton & \t46.309\t149\t76\t2\t148\t1\t148\t146\t1\t146\t2.46e-63\t188\t100\t100 \\\\\n",
"\t5 & FBtr0070148\tgnl\\textbar{}CDD\\textbar{}231372\tKOG3434, KOG3434, KOG3434, 60S ribosomal protein L22 & Translation, ribosomal structure and biogenesis & \t66.400\t125\t41\t1\t299\t173\t297\t125\t1\t124\t4.20e-44\t144\t42\t42 \\\\\n",
"\t6 & FBtr0071094\tgnl\\textbar{}CDD\\textbar{}228356\tKOG0407, KOG0407, KOG0407, 40S ribosomal protein S14 & Translation, ribosomal structure and biogenesis & \t89.062\t128\t14\t0\t151\t13\t140\t139\t1\t128\t3.71e-82\t235\t85\t85 \\\\\n",
"\t7 & FBtr0071444\tgnl\\textbar{}CDD\\textbar{}232489\tKOG4561, KOG4561, KOG4561, Uncharacterized conserved protein, contains TBC domain & Signal transduction mechanisms, General function prediction only & \t29.240\t342\t178\t6\t429\t73\t411\t281\t1\t281\t3.73e-91\t275\t79\t79 \\\\\n",
"\t8 & FBtr0070800\tgnl\\textbar{}CDD\\textbar{}231374\tKOG3436, KOG3436, KOG3436, 60S ribosomal protein L35 & Translation, ribosomal structure and biogenesis & \t61.789\t123\t47\t0\t123\t1\t123\t123\t1\t123\t1.59e-29\t101\t100\t100 \\\\\n",
"\t9 & FBtr0072187\tgnl\\textbar{}CDD\\textbar{}229526\tKOG1585, KOG1585, KOG1585, Protein required for fusion of vesicles in vesicular transport, gamma-SNAP & Intracellular trafficking, secretion, and vesicular transport & \t38.558\t319\t168\t4\t302\t1\t302\t308\t1\t308\t1.91e-101\t297\t100\t100 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| <!--/--> | 0 | 1 | 2 | \n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| 0 | FBtr0070135\tgnl|CDD|229923\tKOG1984, KOG1984, KOG1984, Vesicle coat complex COPII, subunit SFB3 | Intracellular trafficking, secretion, and vesicular transport | \t24.057\t212\t146\t5\t469\t79\t279\t1007\t19\t226\t1.30e-08\t54.8\t43\t43 | \n",
"| 1 | FBtr0070611\tgnl|CDD|230896\tKOG2957, KOG2957, KOG2957, Vacuolar H+-ATPase V0 sector, subunit d | Energy production and conversion | \t78.000\t350\t76\t1\t350\t1\t350\t350\t2\t350\t0.0\t642\t100\t100 | \n",
"| 2 | FBtr0070159\tgnl|CDD|232691\tKOG4766, KOG4766, KOG4766, Uncharacterized conserved protein | Function unknown | \t75.926\t54\t13\t0\t64\t1\t54\t64\t1\t54\t9.64e-07\t39.1\t84\t84 | \n",
"| 3 | FBtr0071498\tgnl|CDD|229595\tKOG1654, KOG1654, KOG1654, Microtubule-associated anchor protein involved in autophagy and membrane trafficking | Cytoskeleton | \t66.379\t116\t39\t0\t121\t1\t116\t116\t1\t116\t4.27e-72\t208\t96\t96 | \n",
"| 4 | FBtr0072188\tgnl|CDD|229674\tKOG1735, KOG1735, KOG1735, Actin depolymerizing factor | Cytoskeleton | \t46.309\t149\t76\t2\t148\t1\t148\t146\t1\t146\t2.46e-63\t188\t100\t100 | \n",
"| 5 | FBtr0070148\tgnl|CDD|231372\tKOG3434, KOG3434, KOG3434, 60S ribosomal protein L22 | Translation, ribosomal structure and biogenesis | \t66.400\t125\t41\t1\t299\t173\t297\t125\t1\t124\t4.20e-44\t144\t42\t42 | \n",
"| 6 | FBtr0071094\tgnl|CDD|228356\tKOG0407, KOG0407, KOG0407, 40S ribosomal protein S14 | Translation, ribosomal structure and biogenesis | \t89.062\t128\t14\t0\t151\t13\t140\t139\t1\t128\t3.71e-82\t235\t85\t85 | \n",
"| 7 | FBtr0071444\tgnl|CDD|232489\tKOG4561, KOG4561, KOG4561, Uncharacterized conserved protein, contains TBC domain | Signal transduction mechanisms, General function prediction only | \t29.240\t342\t178\t6\t429\t73\t411\t281\t1\t281\t3.73e-91\t275\t79\t79 | \n",
"| 8 | FBtr0070800\tgnl|CDD|231374\tKOG3436, KOG3436, KOG3436, 60S ribosomal protein L35 | Translation, ribosomal structure and biogenesis | \t61.789\t123\t47\t0\t123\t1\t123\t123\t1\t123\t1.59e-29\t101\t100\t100 | \n",
"| 9 | FBtr0072187\tgnl|CDD|229526\tKOG1585, KOG1585, KOG1585, Protein required for fusion of vesicles in vesicular transport, gamma-SNAP | Intracellular trafficking, secretion, and vesicular transport | \t38.558\t319\t168\t4\t302\t1\t302\t308\t1\t308\t1.91e-101\t297\t100\t100 | \n",
"\n",
"\n"
],
"text/plain": [
" 0 \n",
"0 FBtr0070135\\tgnl|CDD|229923\\tKOG1984, KOG1984, KOG1984, Vesicle coat complex COPII, subunit SFB3 \n",
"1 FBtr0070611\\tgnl|CDD|230896\\tKOG2957, KOG2957, KOG2957, Vacuolar H+-ATPase V0 sector, subunit d \n",
"2 FBtr0070159\\tgnl|CDD|232691\\tKOG4766, KOG4766, KOG4766, Uncharacterized conserved protein \n",
"3 FBtr0071498\\tgnl|CDD|229595\\tKOG1654, KOG1654, KOG1654, Microtubule-associated anchor protein involved in autophagy and membrane trafficking \n",
"4 FBtr0072188\\tgnl|CDD|229674\\tKOG1735, KOG1735, KOG1735, Actin depolymerizing factor \n",
"5 FBtr0070148\\tgnl|CDD|231372\\tKOG3434, KOG3434, KOG3434, 60S ribosomal protein L22 \n",
"6 FBtr0071094\\tgnl|CDD|228356\\tKOG0407, KOG0407, KOG0407, 40S ribosomal protein S14 \n",
"7 FBtr0071444\\tgnl|CDD|232489\\tKOG4561, KOG4561, KOG4561, Uncharacterized conserved protein, contains TBC domain \n",
"8 FBtr0070800\\tgnl|CDD|231374\\tKOG3436, KOG3436, KOG3436, 60S ribosomal protein L35 \n",
"9 FBtr0072187\\tgnl|CDD|229526\\tKOG1585, KOG1585, KOG1585, Protein required for fusion of vesicles in vesicular transport, gamma-SNAP \n",
" 1 \n",
"0 Intracellular trafficking, secretion, and vesicular transport \n",
"1 Energy production and conversion \n",
"2 Function unknown \n",
"3 Cytoskeleton \n",
"4 Cytoskeleton \n",
"5 Translation, ribosomal structure and biogenesis \n",
"6 Translation, ribosomal structure and biogenesis \n",
"7 Signal transduction mechanisms, General function prediction only\n",
"8 Translation, ribosomal structure and biogenesis \n",
"9 Intracellular trafficking, secretion, and vesicular transport \n",
" 2 \n",
"0 \\t24.057\\t212\\t146\\t5\\t469\\t79\\t279\\t1007\\t19\\t226\\t1.30e-08\\t54.8\\t43\\t43\n",
"1 \\t78.000\\t350\\t76\\t1\\t350\\t1\\t350\\t350\\t2\\t350\\t0.0\\t642\\t100\\t100 \n",
"2 \\t75.926\\t54\\t13\\t0\\t64\\t1\\t54\\t64\\t1\\t54\\t9.64e-07\\t39.1\\t84\\t84 \n",
"3 \\t66.379\\t116\\t39\\t0\\t121\\t1\\t116\\t116\\t1\\t116\\t4.27e-72\\t208\\t96\\t96 \n",
"4 \\t46.309\\t149\\t76\\t2\\t148\\t1\\t148\\t146\\t1\\t146\\t2.46e-63\\t188\\t100\\t100 \n",
"5 \\t66.400\\t125\\t41\\t1\\t299\\t173\\t297\\t125\\t1\\t124\\t4.20e-44\\t144\\t42\\t42 \n",
"6 \\t89.062\\t128\\t14\\t0\\t151\\t13\\t140\\t139\\t1\\t128\\t3.71e-82\\t235\\t85\\t85 \n",
"7 \\t29.240\\t342\\t178\\t6\\t429\\t73\\t411\\t281\\t1\\t281\\t3.73e-91\\t275\\t79\\t79 \n",
"8 \\t61.789\\t123\\t47\\t0\\t123\\t1\\t123\\t123\\t1\\t123\\t1.59e-29\\t101\\t100\\t100 \n",
"9 \\t38.558\\t319\\t168\\t4\\t302\\t1\\t302\\t308\\t1\\t308\\t1.91e-101\\t297\\t100\\t100 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%get DM_KOG --from Python3\n",
"head(DM_KOG, 10)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"kernel": "Python3"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"RNA processing and modification 518\n",
"Chromatin structure and dynamics 229\n",
"Energy production and conversion 448\n",
"Cell cycle control 292\n",
"Amino acid transport and metabolism 435\n",
"Nucleotide transport and metabolism 148\n",
"Carbohydrate transport and metabolism 416\n",
"Coenzyme transport and metabolism 113\n",
"Lipid transport and metabolism 467\n",
"Translation, ribosomal structure and biogenesis 529\n",
"Transcription 1085\n",
"Replication, recombination and repair 231\n",
"Cell wall/membrane/envelope biogenesis 143\n",
"Cell motility 24\n",
"Posttranslational modification 1037\n",
"Inorganic ion transport and metabolism 384\n",
"Secondary metabolites 195\n",
"General function prediction only 1742\n",
"Function unknown 973\n",
"Signal transduction mechanisms 2239\n",
"Intracellular trafficking 702\n",
"Defense mechanisms 90\n",
"Extracellular structures 163\n",
"Nuclear structure 52\n",
"Cytoskeleton 532\n",
" KOG Count DM_Percentage\n",
"0 RNA processing and modification 518 3.928111\n",
"1 Chromatin structure and dynamics 229 1.736559\n",
"2 Energy production and conversion 448 3.397285\n",
"3 Cell cycle control 292 2.214302\n",
"4 Amino acid transport and metabolism 435 3.298703\n",
"5 Nucleotide transport and metabolism 148 1.122317\n",
"6 Carbohydrate transport and metabolism 416 3.154622\n",
"7 Coenzyme transport and metabolism 113 0.856905\n",
"8 Lipid transport and metabolism 467 3.541366\n",
"9 Translation, ribosomal structure and biogenesis 529 4.011527\n",
"10 Transcription 1085 8.227800\n",
"11 Replication, recombination and repair 231 1.751725\n",
"12 Cell wall/membrane/envelope biogenesis 143 1.084401\n",
"13 Cell motility 24 0.181997\n",
"14 Posttranslational modification 1037 7.863805\n",
"15 Inorganic ion transport and metabolism 384 2.911959\n",
"16 Secondary metabolites 195 1.478729\n",
"17 General function prediction only 1742 13.209980\n",
"18 Function unknown 973 7.378479\n",
"19 Signal transduction mechanisms 2239 16.978843\n",
"20 Intracellular trafficking 702 5.323425\n",
"21 Defense mechanisms 90 0.682490\n",
"22 Extracellular structures 163 1.236066\n",
"23 Nuclear structure 52 0.394328\n",
"24 Cytoskeleton 532 4.034276\n"
]
}
],
"source": [
"## Count the number of occurrences of each category\n",
"KOGs= [\"RNA processing and modification\", \"Chromatin structure and dynamics\", \"Energy production and conversion\", \"Cell cycle control\", \n",
" \"Amino acid transport and metabolism\", \"Nucleotide transport and metabolism\", \"Carbohydrate transport and metabolism\", \"Coenzyme transport and metabolism\", \n",
" \"Lipid transport and metabolism\", \"Translation, ribosomal structure and biogenesis\", \"Transcription\", \"Replication, recombination and repair\", \n",
" \"Cell wall/membrane/envelope biogenesis\", \"Cell motility\", \"Posttranslational modification\", \"Inorganic ion transport and metabolism\", \n",
" \"Secondary metabolites\", \"General function prediction only\", \"Function unknown\", \"Signal transduction mechanisms\", \"Intracellular trafficking\", \n",
" \"Defense mechanisms\", \"Extracellular structures\", \"Nuclear structure\", \"Cytoskeleton\"]\n",
"\n",
"data = []\n",
"for KOG in KOGs:\n",
" print(KOG, DM_KOG[1].str.contains(KOG).sum())\n",
" data.append([KOG, DM_KOG[1].str.contains(KOG).sum()])\n",
" \n",
"df = pd.DataFrame(data)\n",
"df.columns = [\"KOG\", \"Count\"]\n",
"df[\"DM_Percentage\"] = df[\"Count\"]/df[\"Count\"].sum()*100\n",
"print(df)\n",
"\n",
"df[[\"KOG\", \"DM_Percentage\"]].to_csv(\"DM_KOG_summary.txt\", sep=\"\\t\", index=None)"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "SoS"
},
"source": [
"# Archive"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true,
"kernel": "SoS"
},
"source": [
"## Extract protein sequences of expressed transcripts from Ensembl reference protein sequences\n",
"\n",
"- `DM_CNS_pep_Jun28.fa` ---> 14,789 sequences"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hidden": true,
"kernel": "calysto_bash"
},
"outputs": [],
"source": [
"filterbyname.sh \\\n",
"in=DM_pep.fa \\\n",
"out=DM_CNS_pep_Jun28.fa \\\n",
"names=DM_567789_non0_Jun28.txt \\\n",
"include=t substring "
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "calysto_bash"
},
"source": [
"## KOG analysis of extracted protein sequences"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"kernel": "calysto_bash"
},
"outputs": [],
"source": [
"rpsblast -query ~/CNS-transcriptomes/DM/DM_CNS_pep_Jun28.fa -db Kog \\\n",
"-out ~/CNS-transcriptomes/DM/DM_CNS_pep_Jun28_KOG.txt -evalue 1E-5 \\\n",
"-outfmt \"6 qseqid sseqid stitle pident length mismatch gapopen qlen qstart qend slen sstart send evalue bitscore qcovhsp qcovs\" \\\n",
"-max_hsps 1 -max_target_seqs 1"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "SoS",
"language": "sos",
"name": "sos"
},
"language_info": {
"codemirror_mode": "sos",
"file_extension": ".sos",
"mimetype": "text/x-sos",
"name": "sos",
"nbconvert_exporter": "sos_notebook.converter.SoS_Exporter",
"pygments_lexer": "sos"
},
"sos": {
"kernels": [
[
"Python3",
"python3",
"Python3",
"#FFD91A"
],
[
"R",
"ir",
"R",
"#DCDCDA"
],
[
"calysto_bash",
"calysto_bash",
"",
""
]
],
"panel": {
"displayed": true,
"height": 0,
"style": "side"
},
"version": "0.9.15.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {
"height": "calc(100% - 180px)",
"left": "10px",
"top": "150px",
"width": "316.391px"
},
"toc_section_display": true,
"toc_window_display": true
},
"toc-autonumbering": true,
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment