Skip to content

Instantly share code, notes, and snippets.

@sandutsar
Last active April 3, 2023 15:04
Show Gist options
  • Save sandutsar/8db1c61efd24878e30867b10dfdfac48 to your computer and use it in GitHub Desktop.
Save sandutsar/8db1c61efd24878e30867b10dfdfac48 to your computer and use it in GitHub Desktop.
(WIP) Solution for Rosalind Bioinformatics Stronghold course
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Rosalind Bioinformatics Stronghold"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"DNA_ALPHABET = 'ACGT'\n",
"RNA_ALPHABET = 'ACGU'\n",
"\n",
"COMPLEMENT = {\n",
" 'A': 'T',\n",
" 'C': 'G',\n",
" 'G': 'C',\n",
" 'T': 'A'\n",
"}\n",
"\n",
"def is_DNA(nt):\n",
" return True if nt in DNA_ALPHABET else False\n",
"\n",
"def is_RNA(nt):\n",
" return True if nt in RNA_ALPHABET else False"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## DNA\n",
"\n",
"### Counting DNA Nucleotides\n",
"\n",
"**Problem**\n",
"\n",
"A `string` is simply an ordered collection of symbols selected from some `alphabet` and formed into a word; the `length` of a string is the number of symbols that it contains.\n",
"\n",
"An example of a length 21 `DNA string` (whose alphabet contains the symbols 'A', 'C', 'G', and 'T') is \"ATGCTTCAGAAAGGTCTTACG.\"\n",
"\n",
"<span style=\"color:green\">Given:</span> A DNA string **s** of length at most 1000 nt.\n",
"\n",
"<span style=\"color:green\">Return:</span> Four integers (separated by spaces) counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in **s**.\n",
"\n",
"**Sample Dataset**\n",
"\n",
"```\n",
"AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC\n",
"```\n",
"\n",
"**Sample Output**\n",
"\n",
"```\n",
"20 12 17 21\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'CTAGTCCGAGCGTAGCCCTTTCCCCAAGTTTATCTGCACGGCCCGAGACCCAAGCCGGTTCTTTTCAATGCCTAGGTTCTTCTTCGGCTTCGCCCGCTATGACCGCATTATTACTTACCATCCCAGCACCGGCGCCAGCGCCTGACGACTTAGATCTATTGTGTTACTGGAGTCAATAAAGTCACTTGCACGCAATTAACAGAAATTATCACGAGCGTCTAGGGCTCCACGCACAATCACGCTATCCTCCAATGCGCCATTTTGACCCGTCGTGAAGGCATTAGAACATTGGTATAGTTGCTTTCGCGACTATCCAACCGCTAGGGTCTACTCATGACTAGTGTAGACGCAGCTAGTGGAGTAGCTATTGGAATTTCCACTCACAGCGTTGCCGGTCTCACACCTGATACGCGGTTGGTCCCGCTTGAGCGAGCCGTCCTGACGGGTAGATGCGACCCCACTTAACGTTTCACCAGGAAATGGCGTCAGTCGTAAGCACTAGCTACGCTTAGGATTCTTCAGTGCGCGGGGCCGCATCCAAGTGCGGGGACTTCGAATGCTGCTTCAGAGTAATTCGGTACATTCCAAGAAGCAGGGCGGCTCACACACTCTGTACTCCGTCTAGGTGGCCGCGCGCACCGCCGAGCCTTGTGCTATTTCATGCGAGAGAAAACAATTTCTTCGGACAGTTGTTTAATCCAGCCAATTTGATATTAACAGAGCCTACTATGACGGAAACTCGTGCCATAATACCCAACTGGGGTTCATTTCTGGGGACTCCGCTGCGAGGTGCGTTTCGCGGTGATAGTGACCTGACGGCTCGCAAGTAGCTGTAACAATACCCGACGTCGGT'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with open('rosalind_dna_1_dataset.txt', 'r') as file:\n",
" s = file.readline()[:-1]\n",
"s"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"assert isinstance(s, str), f'Error: type(s) = {type(s).__name__} must be str!'\n",
"assert all(list(map(is_DNA, s))), f'Error: Your string is not a valid DNA string! \\\n",
" It must be composed using {DNA_ALPHABET} alphabet!'\n",
"assert len(s) <= 1e3, f'Error: len(s) = {len(s)} must be at most 1000!'"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"192 239 209 213\n"
]
}
],
"source": [
"result = [s.count(nt) for nt in DNA_ALPHABET]\n",
"print(*result)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"with open('rosalind_dna_1_output.txt', 'w') as file:\n",
" file.write(' '.join([str(x) for x in result]))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## RNA\n",
"\n",
"### Transcribing DNA into RNA \n",
"\n",
"**Problem**\n",
"\n",
"An `RNA string` is a `string` formed from the `alphabet` containing 'A', 'C', 'G', and 'U'.\n",
"\n",
"Given a `DNA string` **t** corresponding to a coding strand, its transcribed `RNA string` **u** is formed by replacing all occurrences of 'T' in **t** with 'U' in **u**.\n",
"\n",
"<span style=\"color:green\">Given:</span> A `DNA string` **t** having `length` at most 1000 `nt`.\n",
"\n",
"<span style=\"color:green\">Return:</span> The transcribed RNA string of **t**.\n",
"\n",
"**Sample Dataset**\n",
"\n",
"```\n",
"GATGGAACTTGACTACGTAAATT\n",
"```\n",
"\n",
"**Sample Output**\n",
"\n",
"```\n",
"GAUGGAACUUGACUACGUAAAUU\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'ACTGTTCTCACTGGAAACACACGCGAGGATTTGTGGCGTAACCGTGCCTCGGTCGCATACAAGCGACTCGACCTCCGAATTGTCATATAGCCACGTGCCACCTCTCCTAAGCAAAGGCGTAGATAGAGAATTGGCTCCATGTTATGACAACTTAACTACGTATTGGATCCCAATTGGCGTTTTGAGGGCCTATGGGATAGAGCATGCTGCATCATAGTCATCGAAACCGTTAAGCCATGGGCCCTTAAGAACAGAAAATAGTGGCCCTGGAACGGCCCAACATTAGAGAAGTCGCCCTTGACCCGCACCGAATCGGCTGCCAGCAAAATGGGCATCTACTATAGATTGAAGACCAGTCTTTGCTAGCTACATCCGAGCCGTACGCTACTAATAGCCTCACGATTTGCGCCCGTTTATAATCAGCCTGCCCGACGGTTGATGACGTCCAATTTCCTCGCCTAATAGCACTCTCAGGGAGTAATTGGATCGGCTACGGCAACGTCGATATTATGTAGGTTATCCACGTGAACTTCGCGTCGCAGTACCGACAGCGTGGTTTTACGGGAGGTTCATTCGCTTCTTTTCTGTTTCATCGGCGTTCCGATCGGACTTCAGTACAAAACTGACCTCGGTGACAAAACCGCCAATCTGAGGGGGAAATCACGAATTCAATTGTGTGAGCACTCTGTCGGCTCACACTATTGATTTTTCTTCTAGAATGTAGATACTCCTTAACTCACTCACACGCGGTGGAGGGCCTAGCAACCGTGCGTCTAACCACGATTTCTCACGACACGAGGGTGGTCATGCCACGCATAGTAACTAGCTACATTCATCGTTCTATGTTGGCAGCAGGTTAGGGACTTCCCCGATGCGTTAGCTACAAGAGCGAGAGGTTTTTCACCGGAACGAAGGAAACTTCTAATCACGTACT'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with open('rosalind_rna_1_dataset.txt', 'r') as file:\n",
" t = file.readline()[:-1]\n",
"t"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"assert isinstance(t, str), f'Error: type(t) = {type(t).__name__} must be str!'\n",
"assert all(list(map(is_DNA, t))), f'Error: Your string is not a valid DNA string! \\\n",
" It must be composed using {DNA_ALPHABET} alphabet!'\n",
"assert len(t) <= 1e3, f'Error: len(t) = {len(t)} must be at most 1000!'"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'ACUGUUCUCACUGGAAACACACGCGAGGAUUUGUGGCGUAACCGUGCCUCGGUCGCAUACAAGCGACUCGACCUCCGAAUUGUCAUAUAGCCACGUGCCACCUCUCCUAAGCAAAGGCGUAGAUAGAGAAUUGGCUCCAUGUUAUGACAACUUAACUACGUAUUGGAUCCCAAUUGGCGUUUUGAGGGCCUAUGGGAUAGAGCAUGCUGCAUCAUAGUCAUCGAAACCGUUAAGCCAUGGGCCCUUAAGAACAGAAAAUAGUGGCCCUGGAACGGCCCAACAUUAGAGAAGUCGCCCUUGACCCGCACCGAAUCGGCUGCCAGCAAAAUGGGCAUCUACUAUAGAUUGAAGACCAGUCUUUGCUAGCUACAUCCGAGCCGUACGCUACUAAUAGCCUCACGAUUUGCGCCCGUUUAUAAUCAGCCUGCCCGACGGUUGAUGACGUCCAAUUUCCUCGCCUAAUAGCACUCUCAGGGAGUAAUUGGAUCGGCUACGGCAACGUCGAUAUUAUGUAGGUUAUCCACGUGAACUUCGCGUCGCAGUACCGACAGCGUGGUUUUACGGGAGGUUCAUUCGCUUCUUUUCUGUUUCAUCGGCGUUCCGAUCGGACUUCAGUACAAAACUGACCUCGGUGACAAAACCGCCAAUCUGAGGGGGAAAUCACGAAUUCAAUUGUGUGAGCACUCUGUCGGCUCACACUAUUGAUUUUUCUUCUAGAAUGUAGAUACUCCUUAACUCACUCACACGCGGUGGAGGGCCUAGCAACCGUGCGUCUAACCACGAUUUCUCACGACACGAGGGUGGUCAUGCCACGCAUAGUAACUAGCUACAUUCAUCGUUCUAUGUUGGCAGCAGGUUAGGGACUUCCCCGAUGCGUUAGCUACAAGAGCGAGAGGUUUUUCACCGGAACGAAGGAAACUUCUAAUCACGUACU'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"u = t.replace('T', 'U')\n",
"u"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"with open('rosalind_rna_1_output.txt', 'w') as file:\n",
" file.write(u)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## REVC\n",
"\n",
"### Complementing a Strand of DNA\n",
"\n",
"**Problem**\n",
"\n",
"In `DNA strings`, `symbols` 'A' and 'T' are complements of each other, as are 'C' and 'G'.\n",
"\n",
"The `reverse complement` of a `DNA string` **s** is the string **s<sup>c</sup>** formed by reversing the symbols of **s**, then taking the complement of each symbol (e.g., the reverse complement of \"GTCA\" is \"TGAC\").\n",
"\n",
"<span style=\"color:green\">Given:</span> A DNA string **s** of length at most 1000 `bp`.\n",
"\n",
"<span style=\"color:green\">Return:</span> The reverse complement **s<sup>c</sup>** of **s**.\n",
"\n",
"**Sample Dataset**\n",
"\n",
"```\n",
"AAAACCCGGT\n",
"```\n",
"\n",
"**Sample Output**\n",
"\n",
"```\n",
"ACCGGGTTTT\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'CGCCCCGCAGGTTAACTCTCTTGACTTGGGCAGGGAGGTCTCAAACTTATGGTGCGCGGATTATGTTCAGGGTGATAATTCCGACCCTGGCGGTGCGGCGATATGTAGACCGCAAACTCGCGTCGGTGGGAACAAAGACCCCTTGCTTTTTCTCTAGAGCTTCTACGAGGAGTTTAATACTCCGAGCACCCTGTGTAATTGCGTTCCCGTCCCGTGCCGGGTATACCAGTGCGTATTTGCTTTACGTCCTACAATGATGCTATGTTTACATCATTAACCAGCTGACTGCTTTATGGTGCGAACTTATTCGCGGCCGATAGAATCCAGTAGCTAGCCGTCGAGTTATTATCAAAAGATACACCAGTTTTAAGTTATCATCGTAGTCAAACCCTTGTCCCGGCCCTCTTAAACCCATCGCCGGAAGTCAGGATCCTTACGCAGTAATGGCTAACGTACCTGGGAACTTTGCTTTATGGCATAGGCCATACTGGTCTTACGAGAGGGGAACCGGCTTTTCAATGCTGCCTCCGCTAATGTTTATCGATATTAATCCAGCTTGTAGTCCAGATGTGGAATAAATTCACGCCCCCCCCCTTCGATACGTTCTCTTAAGCTACAAGCGAACTGACAACCCTATGCGAGGAGCCTTGCATTCTACTGATTCTGTACTGCTCATGAATTCGTCGGGGCTGGCGGTAAGTTCTCGGAACCATACCGTTACATACCTACGACTTTGCAAAGGGGAATTAATAGGCGCTTGTTACTCTTAGCTTCGCGCTCGTCACATGATATGAACTTCCGCAACGCGGACCCATTGGCATTGCGTGCGCTGATGTAAGTCGGAATCCAAAGTATAGGGCCCTATACTGCGGTTACTGCAATGCTGTAGGCTGTTTACAGTGGTTCTTACGGAACAGCACGCCCGCAAACTTTTCTACTGTGATATTTTCATGTAACGGAAACAGCTCGACTGAAAATGTGCTTCAC'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"with open('rosalind_revc_1_dataset.txt', 'r') as file:\n",
" s = file.readline()[:-1]\n",
"s"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"assert isinstance(s, str), f'Error: type(s) = {type(s).__name__} must be str!'\n",
"assert all(list(map(is_DNA, s))), f'Error: Your string is not a valid DNA string! \\\n",
" It must be composed using {DNA_ALPHABET} alphabet!'\n",
"assert len(s) <= 1e3, f'Error: len(s) = {len(s)} must be at most 1000!'"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'GTGAAGCACATTTTCAGTCGAGCTGTTTCCGTTACATGAAAATATCACAGTAGAAAAGTTTGCGGGCGTGCTGTTCCGTAAGAACCACTGTAAACAGCCTACAGCATTGCAGTAACCGCAGTATAGGGCCCTATACTTTGGATTCCGACTTACATCAGCGCACGCAATGCCAATGGGTCCGCGTTGCGGAAGTTCATATCATGTGACGAGCGCGAAGCTAAGAGTAACAAGCGCCTATTAATTCCCCTTTGCAAAGTCGTAGGTATGTAACGGTATGGTTCCGAGAACTTACCGCCAGCCCCGACGAATTCATGAGCAGTACAGAATCAGTAGAATGCAAGGCTCCTCGCATAGGGTTGTCAGTTCGCTTGTAGCTTAAGAGAACGTATCGAAGGGGGGGGGCGTGAATTTATTCCACATCTGGACTACAAGCTGGATTAATATCGATAAACATTAGCGGAGGCAGCATTGAAAAGCCGGTTCCCCTCTCGTAAGACCAGTATGGCCTATGCCATAAAGCAAAGTTCCCAGGTACGTTAGCCATTACTGCGTAAGGATCCTGACTTCCGGCGATGGGTTTAAGAGGGCCGGGACAAGGGTTTGACTACGATGATAACTTAAAACTGGTGTATCTTTTGATAATAACTCGACGGCTAGCTACTGGATTCTATCGGCCGCGAATAAGTTCGCACCATAAAGCAGTCAGCTGGTTAATGATGTAAACATAGCATCATTGTAGGACGTAAAGCAAATACGCACTGGTATACCCGGCACGGGACGGGAACGCAATTACACAGGGTGCTCGGAGTATTAAACTCCTCGTAGAAGCTCTAGAGAAAAAGCAAGGGGTCTTTGTTCCCACCGACGCGAGTTTGCGGTCTACATATCGCCGCACCGCCAGGGTCGGAATTATCACCCTGAACATAATCCGCGCACCATAAGTTTGAGACCTCCCTGCCCAAGTCAAGAGAGTTAACCTGCGGGGCG'"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sc = [COMPLEMENT[nt] for nt in s]\n",
"sc.reverse()\n",
"sc = ''.join(sc)\n",
"sc"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"with open('rosalind_revc_1_output.txt', 'w') as file:\n",
" file.write(sc)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment