Skip to content

Instantly share code, notes, and snippets.

@murrellb
Created June 2, 2017 06:19
Show Gist options
  • Select an option

  • Save murrellb/e8bf9f3a033dcfc8504836faf39e005a to your computer and use it in GitHub Desktop.

Select an option

Save murrellb/e8bf9f3a033dcfc8504836faf39e005a to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 1: Construct a function called simSeq() that takes an integer argument, and generates a random DNA string of that length."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"function simSeq()\n",
" some code\n",
" return something\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"simSeq (generic function with 1 method)"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"function simSeq(arg1)\n",
" alp = [\"A\",\"C\",\"G\",\"T\"]\n",
" DNAarray = [alph[rand(1:4)] for i in 1:arg1]\n",
" retVal = join(DNAarray)\n",
" return retVal\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"\"CTTGAACCGA\""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"simSeq(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 2: Write a function evolveSeq() that takes a String DNA sequence argument, and a mutation probability, and mutates each base to a random {A,C,G,T} base with said probability."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"\"joined\""
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Hint:\n",
"join([\"j\",\"o\",\"i\",\"n\",\"e\",\"d\"])"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"evolveSeq (generic function with 1 method)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"function evolveSeq(seq,mutProb)\n",
" alp = [\"A\",\"C\",\"G\",\"T\"]\n",
" arr = []\n",
" for i in 1:length(seq)\n",
" if rand()<mutProb\n",
" push!(arr,alp[rand(1:4)])\n",
" else push!(arr,seq[i])\n",
" end\n",
" end\n",
" return join(arr)\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"\"THISIAADAG\""
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"evolveSeq(\"THISISADOG\",0.2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 3: Create a function hammingProportion() that takes two strings of the same length, and returns the number of differences between them divided by their length."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"hammingProportion (generic function with 1 method)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"function hammingProportion(s1,s2)\n",
" diffs = sum([s1[i]!=s2[i] for i in 1:length(s1)])\n",
" return diffs/length(s1)\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"longseq = simSeq(10000);"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.0371"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mutlongseq = evolveSeq(longseq,0.05)\n",
"hammingProportion(longseq,mutlongseq)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 4: Create a printFasta() function that takes an array of sequences, and an array of names, and prints them in .fasta format."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"printFasta (generic function with 1 method)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"function printFasta(seqs,names)\n",
" for i in 1:length(seqs)\n",
" println(\">\",names[i])\n",
" println(seqs[i])\n",
" end\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#COMPARE CASES:\n",
"#rate = 0.1, numseqs = 20\n",
"#rate = 0.6, numseqs = 20\n",
"#rate = 0.1, numseqs = 200\n",
"#rate = 0.01, numseqs = 200\n",
"\n",
"founder = simSeq(500);\n",
"current = founder;\n",
"seqArr = String[]\n",
"for i in 1:200\n",
" push!(seqArr,current)\n",
" current = evolveSeq(current,0.01)\n",
"end\n",
"namesArr = [\"seq\"*string(i) for i in 1:length(seqArr)];\n",
"printFasta(seqArr,namesArr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 5: Predict the output of the following code. If you can't, then run it and try and explain the output. "
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"using PyPlot\n",
"founder = simSeq(5000);\n",
"current = founder;\n",
"seqArr = String[]\n",
"for i in 1:1000\n",
" push!(seqArr,current)\n",
" current = evolveSeq(current,0.01)\n",
"end\n",
"\n",
"plot([hammingProportion(founder,i) for i in seqArr],\".\")\n",
"plot([1,length(seqArr)],[0.75,0.75])"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Julia 0.5.0",
"language": "julia",
"name": "julia-0.5"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "0.5.0"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment