Skip to content

Instantly share code, notes, and snippets.

@Swarchal
Created March 11, 2016 13:48
Show Gist options
  • Save Swarchal/6e06beeb2857f3b21fc8 to your computer and use it in GitHub Desktop.
Save Swarchal/6e06beeb2857f3b21fc8 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Problem:\n",
"\n",
"A common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, \"CG\" is a common substring of \"ACGTACGT\" and \"AACCGGTATA\", but it is not as long as possible; in this case, \"GTA\" is a longest common substring of \"ACGTACGT\" and \"AACCGTATA\".\n",
"\n",
"Note that the longest common substring is not necessarily unique; for a simple example, \"AA\" and \"CC\" are both longest common substrings of \"AACC\" and \"CCAA\".\n",
"\n",
"**Given:** A collection of $k$ ($k\\leqslant 100$) DNA strings of length at most 1 kbp each in FASTA format.\n",
"\n",
"**Return:** A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)\n",
"\n",
"\n",
"\n",
"### Not working:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"None\n"
]
}
],
"source": [
"from Bio import SeqIO # FASTAAA!\n",
"import re\n",
"\n",
"# create dictionary of fasta sequences\n",
"data = open(\"/home/scott/Dropbox/rosalind/rosalind_lcsm.txt\")\n",
"fasta_dict = SeqIO.to_dict(SeqIO.parse(data, 'fasta'))\n",
"\n",
"# single sequence\n",
"s0 = str(fasta_dict[fasta_dict.keys()[0]].seq)\n",
" \n",
"def longest_common_substring():\n",
" # decreasing window length\n",
" for i in xrange(len(s0)):\n",
" win_len = len(s0[:len(s0) - i])\n",
" \n",
" # sliding window across s0 of decreasing window length\n",
" for j in xrange(len(s0) - win_len + 1):\n",
" pattern = re.compile(s0[j:j + win_len])\n",
" \n",
" total = 0\n",
" \n",
" for k in fasta_dict:\n",
" s = str(fasta_dict[k].seq)\n",
" total += int(bool(pattern.match(s)))\n",
" \n",
" if total == len(fasta_dict):\n",
" return str(s0[j:j + win_len])\n",
" \n",
"print longest_common_substring()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1000\n"
]
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n"
]
}
],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment