Authors(?): David Alexander, Patrick Marks, Jim Bullard, Jonathan Bingham
p4 edit
- Forgetting to do
p4 edit
- Having to tell p4 when you move/rename files
- Bulky p4 clients
Authors(?): David Alexander, Patrick Marks, Jim Bullard, Jonathan Bingham
p4 edit
p4 edit
""" | |
Find coverage in [winStart, winEnd) implied by tStart, tEnd | |
vectors. | |
Original from rangeQueries, and two attempts at speeding it up. | |
In the common case I could imagine projectIntoRangeFast2 being fastest, | |
but on my amplicons test case projectIntoRangeFast1 wins. | |
""" |
Using the BasH5Reader is simple: | |
>>> b = BasH5Reader("m1122...bas.h5") # Load the file | |
>>> zmw = b[9] # Get Zmw object(s) by slicing on holenumber(s) | |
>>> myRead = zmw.subreads[0] # Get ZmwRead object | |
>>> myRead.basecalls() | |
"GATTACA" | |
>>> myRead.QualityValue() | |
array([5, 6, 3, 4, 8, 8, 1]) |
#!/usr/bin/env python | |
from pbcore.io.FastaIO import FastaReader, FastaWriter, FastaRecord | |
import shlex | |
import sys | |
import subprocess | |
import os | |
import re | |
usage = "usage: circulization.py initial_contigs.fastq 20000 /tmp circulaized_contigs.fastq" |
In pbcore for 2.2 I'm introducing a new class, FastaTable, which gives easy random access FASTA reading. It requires a FASTA index (.fai) file sitting next to the FASTA on the filesystem, and it requires a constant wrapping length in the FASTA (note that these requirements are already fulfilled by all PacBio reference repository FASTAs).
Internally the class works by mmap'ing the file contents into virtual
from pbcore.io import FastaTable | |
from nose.tools import eq_ | |
def chunk(keysAndSizes, numChunks): | |
""" | |
Heuristically attempt to split the keys up into sublists such that | |
the total of sizes of each sublist is near the targetSize, and | |
the chunks are well balanced. Better to go over than under. | |
""" |
#!/usr/bin/env python | |
from pbcore.io import BasH5Reader, M4Reader | |
from pbcore.util.Process import backticks | |
import sys, os.path as osp | |
def totalReadLength(m4r): | |
#return m4r.qseqlength | |
# qseqlength is bogus! use the offsets from the query string | |
extent = map(int, m4r.qName.split("/")[-1].split("_")) |
#!/usr/bin/env python | |
import sys | |
import h5py | |
fname = sys.argv[1] | |
f = h5py.File(fname, "r") | |
if fname.endswith("bax.h5"): | |
ri = f["/ScanData/RunInfo"] | |
try: |
# | |
# Push gaps forward in homopolymers | |
# | |
# Rewrite rule 1: XX ===> XX | |
# X- -X | |
# | |
# Rewrite rule 2: X- ===> -X | |
# XX XX | |
# | |
# Iterate until convergence. |