Last active
August 29, 2015 13:56
-
-
Save fjossinet/9033572 to your computer and use it in GitHub Desktop.
Create and manipulate molecules with PyRNA
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "Create and manipulate molecules." | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "heading", | |
"level": 1, | |
"metadata": {}, | |
"source": "Create molecules from scratch" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "PyRNA allows you to construct easily DNA and RNA molecules. An RNA molecule will automatically convert T residues into U." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "from pyrna.features import DNA, RNA\nrna = RNA(name = 'my_rna', sequence = 'GGGGGATTAACCCC')\nprint \"%s: %s\"%(rna.name, rna.sequence)\ndna = DNA(name = 'my_dna', sequence = 'GGGGGATTAACCCC')\nprint \"%s: %s\"%(dna.name, dna.sequence)", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "my_rna: GGGGGAUUAACCCC\nmy_dna: GGGGGATTAACCCC\n" | |
} | |
], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "RNA and DNA molecules can return their length, are slicable and iterable:" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "print \"slice: %s\"%rna[0:2]\nprint \"length: %i\"%len(rna)", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "slice: GG\nlength: 14\n" | |
} | |
], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "You can easily get a single residue:" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "print rna[3]", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "G\n" | |
} | |
], | |
"prompt_number": 8 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "The sequence can be easily changed by adding a new string at the end:" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "rna +'AAA'\nprint rna.sequence", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "GGGGGAUUAACCCCAAA\n" | |
} | |
], | |
"prompt_number": 13 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "Or by removing some residues from the end:" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "rna-3\nprint rna.sequence", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "GGGGGAUUAACCCC\n" | |
} | |
], | |
"prompt_number": 12 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "An RNA molecule is iterable over its primary sequence:" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "for index, residue in enumerate(rna):\n print \"residue n%i: %s\"%(index+1, residue)", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "residue n1: G\nresidue n2: G\nresidue n3: G\nresidue n4: G\nresidue n5: G\nresidue n6: A\nresidue n7: U\nresidue n8: U\nresidue n9: A\nresidue n10: A\nresidue n11: C\nresidue n12: C\nresidue n13: C\nresidue n14: C\n" | |
} | |
], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 1, | |
"metadata": {}, | |
"source": "Create molecules from files" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "With PyRNA, an object pyrna.features.TertiaryStructure is made with a single molecular chain. Since a PDB file can contains several molecules, the function parse_pdb() returns a list of such objects." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "h = open('data/1ehz.pdb')\npdb_content = h.read()\nh.close()\n\nfrom pyrna.parsers import parse_pdb\ntertiary_structures = parse_pdb(pdb_content)", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 16 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "RNA molecules extracted from PDB files can contain modified residues. PyRNA converts them automatically into unmodified residues, and stores the modification in a dictionary." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "for ts in tertiary_structures:\n print ts.rna.name\n print ts.rna.sequence\n print ts.rna.modified_residues", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "A\nGCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA\n[('2MG', 10), ('H2U', 16), ('H2U', 17), ('M2G', 26), ('OMC', 32), ('OMG', 34), ('YYG', 37), ('PSU', 39), ('5MC', 40), ('7MG', 46), ('5MC', 49), ('5MU', 54), ('PSU', 55), ('1MA', 58)]\n" | |
} | |
], | |
"prompt_number": 10 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "If you want to parse a FASTA file, you have to precise the type of molecules stored. DNA molecules are faster to create since PyRNA will not try to identify modified residues." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "h = open('data/telomerases.fasta')\nfasta_content = h.read()\nh.close()\n\nfrom pyrna.parsers import parse_fasta\n#the default type is RNA\nfor rna in parse_fasta(fasta_content):\n print \"sequence of %s:\"%rna.name\n print \"%s\\n\"%rna.sequence", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "sequence of telomerase 1:\nAGUUUCUCGAUAAUUGAUCUGUAGAAUCUGUCAAGCAAAACCCCAAAACCUUACACUGAGAGCAUUUAGCCUGAUUACUCUUUAAAUCAAAUCAGGCAAUAGAGAGAAACUCGAGAGGUGAAAACCCCACAGCAUUCUGAAAUGUAUUUGGGAGUAAUCUCAUAUUAGUUUGCUGUCCUCUCAUCUUUU\n\nsequence of telomerase 2:\nAUCCCCGCAAAUUCAUUCUGUUUGCAUUCAAACAGUCAUUCAACCCCAAAAAUCUAGACCAAAUAUUGUCUUCCCUUCUUGGCACAAACAAAGAAGAGACGCGGGAUAAAGAUACUCCGACGAUUGAUACAAUAUUUAUCAACGGGAGGUCUUACUUUU\n\nsequence of telomerase 3:\nUACCUCCUGUGGAUCCAUUCAGGAUUAAUGAAAUCCUGUCAUUCAACCCCAAAAAUCUUGUCAAAUUAUUGCCUCGUCUUUUGGGCACAAACAAAAGUCACGCAGGAGGUUCAGACAUUCGACAUAAGAUACACUAUUUAUCUUAUGGAAGGUCUAGUUUUU\n\n" | |
} | |
], | |
"prompt_number": 21 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "An object RNA will automatically convert T residues into U." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "h = open('data/ft3100_from_FANTOM3_project.fasta')\nfasta_content = h.read()\nh.close()\n\nfor dna in parse_fasta(fasta_content, 'DNA'):\n print \"sequence as a DNA:\"\n print \"%s\\n\"%dna.sequence\n\nfor rna in parse_fasta(fasta_content):\n print \"sequence as an RNA:\"\n print rna.sequence", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "sequence as a DNA:\nTAACAATCTGCTGAAAGGTACCGTCGGAGGGAGCTTTGTTGCCAGCGCCAGAAACGCCGGTTTAACCAGCGCCGAAGTGAGCGCAGTGATTAAAGCCATGCAGTGGCAAATGGATTTCCGCAAACTGAAAAAAGGCGATGAATTTGCGGT\n\nsequence as an RNA:\nUAACAAUCUGCUGAAAGGUACCGUCGGAGGGAGCUUUGUUGCCAGCGCCAGAAACGCCGGUUUAACCAGCGCCGAAGUGAGCGCAGUGAUUAAAGCCAUGCAGUGGCAAAUGGAUUUCCGCAAACUGAAAAAAGGCGAUGAAUUUGCGGU\n" | |
} | |
], | |
"prompt_number": 14 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "DNA and RNA objects have a rich textual representation in IPython notebooks. " | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "parse_fasta(fasta_content)[0]", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": "<pre>1\t<font color=\"green\">U</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"orange\">C</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"red\">G</font>\n61\t<font color=\"green\">U</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"red\">G</font>\n121\t<font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"green\">U</font>\n</pre>", | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 15, | |
"text": "<pyrna.features.RNA instance at 0x109c5be60>" | |
} | |
], | |
"prompt_number": 15 | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 1, | |
"metadata": {}, | |
"source": "Create molecules from databases" | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "You can load 3D structures directly from the Protein Databank" | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "from pyrna.db import PDB\npdb = PDB()\npdb_content = pdb.get_entry('1GID')", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": "With PyRNA, a pyrna.features.TertiaryStructure object is made with a single molecular chain. Since a PDB file can contains several molecules, the function parse_pdb returns a list of pyrna.features.TertiaryStructure." | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": "from pyrna.parsers import parse_pdb\n\nfor tertiary_structure in parse_pdb(pdb_content):\n print \"molecular chain %s: %s\"%(tertiary_structure.rna.name, tertiary_structure.rna.sequence)", | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": "molecular chain A: GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC\nmolecular chain B: GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC\n" | |
} | |
], | |
"prompt_number": 10 | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment