Swarchal/finding_a_protein_motif.ipynb

## finding_a_protein_motif.ipynb

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              finding_a_protein_motif.ipynb
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## finding_a_shared_motif.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Problem:\n",
    "\n",
    "A common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, \"CG\" is a common substring of \"ACGTACGT\" and \"AACCGGTATA\", but it is not as long as possible; in this case, \"GTA\" is a longest common substring of \"ACGTACGT\" and \"AACCGTATA\".\n",
    "\n",
    "Note that the longest common substring is not necessarily unique; for a simple example, \"AA\" and \"CC\" are both longest common substrings of \"AACC\" and \"CCAA\".\n",
    "\n",
    "**Given:** A collection of $k$ ($k\\leqslant 100$) DNA strings of length at most 1 kbp each in FASTA format.\n",
    "\n",
    "**Return:** A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)\n",
    "\n",
    "\n",
    "\n",
    "### Not working:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "None\n"
     ]
    }
   ],
   "source": [
    "from Bio import SeqIO # FASTAAA!\n",
    "import re\n",
    "\n",
    "# create dictionary of fasta sequences\n",
    "data = open(\"/home/scott/Dropbox/rosalind/rosalind_lcsm.txt\")\n",
    "fasta_dict = SeqIO.to_dict(SeqIO.parse(data, 'fasta'))\n",
    "\n",
    "# single sequence\n",
    "s0 = str(fasta_dict[fasta_dict.keys()[0]].seq)\n",
    " \n",
    "def longest_common_substring():\n",
    "    # decreasing window length\n",
    "    for i in xrange(len(s0)):\n",
    "        win_len = len(s0[:len(s0) - i])\n",
    "    \n",
    "        # sliding window across s0 of decreasing window length\n",
    "        for j in xrange(len(s0) - win_len + 1):\n",
    "            pattern = re.compile(s0[j:j + win_len])\n",
    "        \n",
    "            total = 0\n",
    "            \n",
    "            for k in fasta_dict:\n",
    "                s = str(fasta_dict[k].seq)\n",
    "                total += int(bool(pattern.match(s)))\n",
    "            \n",
    "            if total == len(fasta_dict):\n",
    "                return str(s0[j:j + win_len])\n",
    "                \n",
    "print longest_common_substring()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1000\n"
     ]
    }
   ],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n"
     ]
    }
   ],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Problem:\n",
	"\n",
	"A common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, \"CG\" is a common substring of \"ACGTACGT\" and \"AACCGGTATA\", but it is not as long as possible; in this case, \"GTA\" is a longest common substring of \"ACGTACGT\" and \"AACCGTATA\".\n",
	"\n",
	"Note that the longest common substring is not necessarily unique; for a simple example, \"AA\" and \"CC\" are both longest common substrings of \"AACC\" and \"CCAA\".\n",
	"\n",
	"Given: A collection of $k$ ($k\\leqslant 100$) DNA strings of length at most 1 kbp each in FASTA format.\n",
	"\n",
	"Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)\n",
	"\n",
	"\n",
	"\n",
	"### Not working:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"None\n"
	]
	}
	],
	"source": [
	"from Bio import SeqIO # FASTAAA!\n",
	"import re\n",
	"\n",
	"# create dictionary of fasta sequences\n",
	"data = open(\"/home/scott/Dropbox/rosalind/rosalind_lcsm.txt\")\n",
	"fasta_dict = SeqIO.to_dict(SeqIO.parse(data, 'fasta'))\n",
	"\n",
	"# single sequence\n",
	"s0 = str(fasta_dict[fasta_dict.keys()[0]].seq)\n",
	" \n",
	"def longest_common_substring():\n",
	" # decreasing window length\n",
	" for i in xrange(len(s0)):\n",
	" win_len = len(s0[:len(s0) - i])\n",
	" \n",
	" # sliding window across s0 of decreasing window length\n",
	" for j in xrange(len(s0) - win_len + 1):\n",
	" pattern = re.compile(s0[j:j + win_len])\n",
	" \n",
	" total = 0\n",
	" \n",
	" for k in fasta_dict:\n",
	" s = str(fasta_dict[k].seq)\n",
	" total += int(bool(pattern.match(s)))\n",
	" \n",
	" if total == len(fasta_dict):\n",
	" return str(s0[j:j + win_len])\n",
	" \n",
	"print longest_common_substring()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"1000\n"
	]
	}
	],
	"source": []
	},
	{
	"cell_type": "code",
	"execution_count": 19,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"1\n"
	]
	}
	],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 2",
	"language": "python",
	"name": "python2"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 2
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.6"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 0
	}