Skip to content

Instantly share code, notes, and snippets.

@SamStudio8
Last active October 23, 2018 15:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save SamStudio8/138fd1df9a215e87da3b917e6e564fe8 to your computer and use it in GitHub Desktop.
Save SamStudio8/138fd1df9a215e87da3b917e6e564fe8 to your computer and use it in GitHub Desktop.
import sys
THRESHOLD = 0.25 # reads must have 25% of their k-mers assigned
for line in sys.stdin:
fields = line.strip().split()
kmers_fields = fields[4:]
total_kmers = sum([int(x.split(":")[1]) for x in kmers_fields])
unassigned_kmers = sum([int(x.split(":")[1]) for x in kmers_fields if x[0] == "0"])
if total_kmers == 0:
# Drop unassigned results
continue
elif unassigned_kmers / float(total_kmers) > (1.0 - THRESHOLD):
# Drop reads with many unassigned k-mers
continue
# Elsewise, print the read to stdout
print (line.strip())
@DesmondoDekker
Copy link

Hey hey,
your script seems really useful.
I have only a concern about what to use as input file...Do you have some instructions more about this script?

Alle the best

Luigi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment