Skip to content

Instantly share code, notes, and snippets.

@mictadlo
Last active August 29, 2015 13:55
Show Gist options
  • Save mictadlo/8762743 to your computer and use it in GitHub Desktop.
Save mictadlo/8762743 to your computer and use it in GitHub Desktop.
Little snpEff example
python test.py
{'STOP_LOST': ('295', '0.205%'), 'NON_SYNONYMOUS_CODING': ('6,800', '4.732%'), 'DOWNSTREAM': ('45,227', '31.475%'), 'SPLICE_SITE_ACCEPTOR': ('18', '0.013%'), 'NON_SYNONYMOUS_START': ('1', '0.001%'), 'INTERGENIC': ('37,384', '26.016%'), 'START_LOST': ('1', '0.001%'), 'SYNONYMOUS_STOP': ('63', '0.044%'), 'EXON': ('11,134', '7.748%'), 'UPSTREAM': ('43,593', '30.337%'), 'SPLICE_SITE_DONOR': ('31', '0.022%'), 'SYNONYMOUS_CODING': ('3,619', '2.519%'), 'INTRON': ('6,307', '4.389%'), 'STOP_GAINED': ('355', '0.247%')}
#Intron = 6,307
<html><head>
<meta charset="utf-8">
<title>SnpEff example</title>
</head>
<body>
<a name="effects">
<center>
<b> Number of effects by type and region </b> <p>
<table border="0">
<tbody><tr>
<th> Type </th>
<th> Region </th>
</tr>
<tr>
<td> <table border="0">
<thead>
<tr>
<th><b> Type (alphabetical order) </b></th>
<th> &nbsp; </th>
<th> Count </th>
<th> Percent </th>
</tr>
</thead>
<tbody><tr>
<td> <b> DOWNSTREAM </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#ff0000"> 45,227 </td>
<td class="numeric" bgcolor="#ff0000"> 31.475% </td>
</tr>
<tr>
<td> <b> INTERGENIC </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#d22c00"> 37,384 </td>
<td class="numeric" bgcolor="#d22c00"> 26.016% </td>
</tr>
<tr>
<td> <b> INTRON </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#23db00"> 6,307 </td>
<td class="numeric" bgcolor="#23db00"> 4.389% </td>
</tr>
<tr>
<td> <b> NON_SYNONYMOUS_CODING </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#26d800"> 6,800 </td>
<td class="numeric" bgcolor="#26d800"> 4.732% </td>
</tr>
<tr>
<td> <b> NON_SYNONYMOUS_START </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#00ff00"> 1 </td>
<td class="numeric" bgcolor="#00ff00"> 0.001% </td>
</tr>
<tr>
<td> <b> SPLICE_SITE_ACCEPTOR </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#00fe00"> 18 </td>
<td class="numeric" bgcolor="#00fe00"> 0.013% </td>
</tr>
<tr>
<td> <b> SPLICE_SITE_DONOR </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#00fe00"> 31 </td>
<td class="numeric" bgcolor="#00fe00"> 0.022% </td>
</tr>
<tr>
<td> <b> START_LOST </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#00ff00"> 1 </td>
<td class="numeric" bgcolor="#00ff00"> 0.001% </td>
</tr>
<tr>
<td> <b> STOP_GAINED </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#01fd00"> 355 </td>
<td class="numeric" bgcolor="#01fd00"> 0.247% </td>
</tr>
<tr>
<td> <b> STOP_LOST </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#01fd00"> 295 </td>
<td class="numeric" bgcolor="#01fd00"> 0.205% </td>
</tr>
<tr>
<td> <b> SYNONYMOUS_CODING </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#14ea00"> 3,619 </td>
<td class="numeric" bgcolor="#14ea00"> 2.519% </td>
</tr>
<tr>
<td> <b> SYNONYMOUS_STOP </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#00fe00"> 63 </td>
<td class="numeric" bgcolor="#00fe00"> 0.044% </td>
</tr>
<tr>
<td> <b> UPSTREAM </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#f50900"> 43,593 </td>
<td class="numeric" bgcolor="#f50900"> 30.337% </td>
</tr>
</tbody></table><br>
</td>
<td> <table border="0">
<thead>
<tr>
<th><b> Type (alphabetical order) </b></th>
<th> &nbsp; </th>
<th> Count </th>
<th> Percent </th>
</tr>
</thead>
<tbody><tr>
<td> <b> DOWNSTREAM </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#ff0000"> 45,227 </td>
<td class="numeric" bgcolor="#ff0000"> 31.475% </td>
</tr>
<tr>
<td> <b> EXON </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#3ec000"> 11,134 </td>
<td class="numeric" bgcolor="#3ec000"> 7.748% </td>
</tr>
<tr>
<td> <b> INTERGENIC </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#d22c00"> 37,384 </td>
<td class="numeric" bgcolor="#d22c00"> 26.016% </td>
</tr>
<tr>
<td> <b> INTRON </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#23db00"> 6,307 </td>
<td class="numeric" bgcolor="#23db00"> 4.389% </td>
</tr>
<tr>
<td> <b> SPLICE_SITE_ACCEPTOR </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#00ff00"> 18 </td>
<td class="numeric" bgcolor="#00ff00"> 0.013% </td>
</tr>
<tr>
<td> <b> SPLICE_SITE_DONOR </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#00fe00"> 31 </td>
<td class="numeric" bgcolor="#00fe00"> 0.022% </td>
</tr>
<tr>
<td> <b> UPSTREAM </b> </td>
<th> &nbsp; </th>
<td class="numeric" bgcolor="#f50900"> 43,593 </td>
<td class="numeric" bgcolor="#f50900"> 30.337% </td>
</tr>
</tbody></table><br>
</td>
</tr>
</tbody></table>
</p>
</a>
</body></html>
from bs4 import BeautifulSoup
test = {}
with open("SnpEff.html") as f:
soup = BeautifulSoup(f, "lxml")
for a in soup.find_all('a', attrs={'name': 'effects'}):
for tr in a.find_all('tr')[3:]:
tds = tr.find_all('td')
if len(tds) > 0:
test[str(tds[0].text).strip()] = (str(tds[1].text).strip(),
str(tds[2].text).strip())
print test
print
print "#Intron = " + str(test['INTRON'][0])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment