Skip to content

Instantly share code, notes, and snippets.

@fjossinet
Last active August 29, 2015 13:56
Show Gist options
  • Save fjossinet/9033572 to your computer and use it in GitHub Desktop.
Save fjossinet/9033572 to your computer and use it in GitHub Desktop.
Create and manipulate molecules with PyRNA
{
"metadata": {
"name": "Create and manipulate molecules."
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "Create molecules from scratch"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "PyRNA allows you to construct easily DNA and RNA molecules. An RNA molecule will automatically convert T residues into U."
},
{
"cell_type": "code",
"collapsed": false,
"input": "from pyrna.features import DNA, RNA\nrna = RNA(name = 'my_rna', sequence = 'GGGGGATTAACCCC')\nprint \"%s: %s\"%(rna.name, rna.sequence)\ndna = DNA(name = 'my_dna', sequence = 'GGGGGATTAACCCC')\nprint \"%s: %s\"%(dna.name, dna.sequence)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "my_rna: GGGGGAUUAACCCC\nmy_dna: GGGGGATTAACCCC\n"
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": "RNA and DNA molecules can return their length, are slicable and iterable:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "print \"slice: %s\"%rna[0:2]\nprint \"length: %i\"%len(rna)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "slice: GG\nlength: 14\n"
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": "You can easily get a single residue:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "print rna[3]",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "G\n"
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": "The sequence can be easily changed by adding a new string at the end:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "rna +'AAA'\nprint rna.sequence",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "GGGGGAUUAACCCCAAA\n"
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Or by removing some residues from the end:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "rna-3\nprint rna.sequence",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "GGGGGAUUAACCCC\n"
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": "An RNA molecule is iterable over its primary sequence:"
},
{
"cell_type": "code",
"collapsed": false,
"input": "for index, residue in enumerate(rna):\n print \"residue n%i: %s\"%(index+1, residue)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "residue n1: G\nresidue n2: G\nresidue n3: G\nresidue n4: G\nresidue n5: G\nresidue n6: A\nresidue n7: U\nresidue n8: U\nresidue n9: A\nresidue n10: A\nresidue n11: C\nresidue n12: C\nresidue n13: C\nresidue n14: C\n"
}
],
"prompt_number": 6
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "Create molecules from files"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "With PyRNA, an object pyrna.features.TertiaryStructure is made with a single molecular chain. Since a PDB file can contains several molecules, the function parse_pdb() returns a list of such objects."
},
{
"cell_type": "code",
"collapsed": false,
"input": "h = open('data/1ehz.pdb')\npdb_content = h.read()\nh.close()\n\nfrom pyrna.parsers import parse_pdb\ntertiary_structures = parse_pdb(pdb_content)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": "RNA molecules extracted from PDB files can contain modified residues. PyRNA converts them automatically into unmodified residues, and stores the modification in a dictionary."
},
{
"cell_type": "code",
"collapsed": false,
"input": "for ts in tertiary_structures:\n print ts.rna.name\n print ts.rna.sequence\n print ts.rna.modified_residues",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "A\nGCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAGAUCUGGAGGUCCUGUGUUCGAUCCACAGAAUUCGCACCA\n[('2MG', 10), ('H2U', 16), ('H2U', 17), ('M2G', 26), ('OMC', 32), ('OMG', 34), ('YYG', 37), ('PSU', 39), ('5MC', 40), ('7MG', 46), ('5MC', 49), ('5MU', 54), ('PSU', 55), ('1MA', 58)]\n"
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": "If you want to parse a FASTA file, you have to precise the type of molecules stored. DNA molecules are faster to create since PyRNA will not try to identify modified residues."
},
{
"cell_type": "code",
"collapsed": false,
"input": "h = open('data/telomerases.fasta')\nfasta_content = h.read()\nh.close()\n\nfrom pyrna.parsers import parse_fasta\n#the default type is RNA\nfor rna in parse_fasta(fasta_content):\n print \"sequence of %s:\"%rna.name\n print \"%s\\n\"%rna.sequence",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "sequence of telomerase 1:\nAGUUUCUCGAUAAUUGAUCUGUAGAAUCUGUCAAGCAAAACCCCAAAACCUUACACUGAGAGCAUUUAGCCUGAUUACUCUUUAAAUCAAAUCAGGCAAUAGAGAGAAACUCGAGAGGUGAAAACCCCACAGCAUUCUGAAAUGUAUUUGGGAGUAAUCUCAUAUUAGUUUGCUGUCCUCUCAUCUUUU\n\nsequence of telomerase 2:\nAUCCCCGCAAAUUCAUUCUGUUUGCAUUCAAACAGUCAUUCAACCCCAAAAAUCUAGACCAAAUAUUGUCUUCCCUUCUUGGCACAAACAAAGAAGAGACGCGGGAUAAAGAUACUCCGACGAUUGAUACAAUAUUUAUCAACGGGAGGUCUUACUUUU\n\nsequence of telomerase 3:\nUACCUCCUGUGGAUCCAUUCAGGAUUAAUGAAAUCCUGUCAUUCAACCCCAAAAAUCUUGUCAAAUUAUUGCCUCGUCUUUUGGGCACAAACAAAAGUCACGCAGGAGGUUCAGACAUUCGACAUAAGAUACACUAUUUAUCUUAUGGAAGGUCUAGUUUUU\n\n"
}
],
"prompt_number": 21
},
{
"cell_type": "markdown",
"metadata": {},
"source": "An object RNA will automatically convert T residues into U."
},
{
"cell_type": "code",
"collapsed": false,
"input": "h = open('data/ft3100_from_FANTOM3_project.fasta')\nfasta_content = h.read()\nh.close()\n\nfor dna in parse_fasta(fasta_content, 'DNA'):\n print \"sequence as a DNA:\"\n print \"%s\\n\"%dna.sequence\n\nfor rna in parse_fasta(fasta_content):\n print \"sequence as an RNA:\"\n print rna.sequence",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "sequence as a DNA:\nTAACAATCTGCTGAAAGGTACCGTCGGAGGGAGCTTTGTTGCCAGCGCCAGAAACGCCGGTTTAACCAGCGCCGAAGTGAGCGCAGTGATTAAAGCCATGCAGTGGCAAATGGATTTCCGCAAACTGAAAAAAGGCGATGAATTTGCGGT\n\nsequence as an RNA:\nUAACAAUCUGCUGAAAGGUACCGUCGGAGGGAGCUUUGUUGCCAGCGCCAGAAACGCCGGUUUAACCAGCGCCGAAGUGAGCGCAGUGAUUAAAGCCAUGCAGUGGCAAAUGGAUUUCCGCAAACUGAAAAAAGGCGAUGAAUUUGCGGU\n"
}
],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": "DNA and RNA objects have a rich textual representation in IPython notebooks. "
},
{
"cell_type": "code",
"collapsed": false,
"input": "parse_fasta(fasta_content)[0]",
"language": "python",
"metadata": {},
"outputs": [
{
"html": "<pre>1\t<font color=\"green\">U</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"orange\">C</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"red\">G</font>\n61\t<font color=\"green\">U</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"orange\">C</font><font color=\"orange\">C</font><font color=\"red\">G</font>\n121\t<font color=\"orange\">C</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"orange\">C</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"blue\">A</font><font color=\"blue\">A</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"green\">U</font><font color=\"red\">G</font><font color=\"orange\">C</font><font color=\"red\">G</font><font color=\"red\">G</font><font color=\"green\">U</font>\n</pre>",
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": "<pyrna.features.RNA instance at 0x109c5be60>"
}
],
"prompt_number": 15
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "Create molecules from databases"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "You can load 3D structures directly from the Protein Databank"
},
{
"cell_type": "code",
"collapsed": false,
"input": "from pyrna.db import PDB\npdb = PDB()\npdb_content = pdb.get_entry('1GID')",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": "With PyRNA, a pyrna.features.TertiaryStructure object is made with a single molecular chain. Since a PDB file can contains several molecules, the function parse_pdb returns a list of pyrna.features.TertiaryStructure."
},
{
"cell_type": "code",
"collapsed": false,
"input": "from pyrna.parsers import parse_pdb\n\nfor tertiary_structure in parse_pdb(pdb_content):\n print \"molecular chain %s: %s\"%(tertiary_structure.rna.name, tertiary_structure.rna.sequence)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "molecular chain A: GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC\nmolecular chain B: GAAUUGCGGGAAAGGGGUCAACAGCCGUUCAGUACCAAGUCUCAGGGGAAACUUUGAGAUGGCCUUGCAAAGGGUAUGGUAAUAAGCUGACGGACAUGGUCCUAACCACGCAGCCAAGUCCUAAGUCAACAGAUCUUCUGUUGAUAUGGAUGCAGUUC\n"
}
],
"prompt_number": 10
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment