Skip to content

Instantly share code, notes, and snippets.

@dalexander
dalexander / GitReconsidered.md
Created December 14, 2011 17:33
Git Reconsidered

Git Reconsidered: Limitations we have encountered, and a new model for sharing with GitHub

Authors(?): David Alexander, Patrick Marks, Jim Bullard, Jonathan Bingham

P4 annoyances can be a developer headache

  • p4 edit
  • Forgetting to do p4 edit
  • Having to tell p4 when you move/rename files
  • Bulky p4 clients
@dalexander
dalexander / coverage.py
Last active December 15, 2015 11:49
projectIntoRange speedups
"""
Find coverage in [winStart, winEnd) implied by tStart, tEnd
vectors.
Original from rangeQueries, and two attempts at speeding it up.
In the common case I could imagine projectIntoRangeFast2 being fastest,
but on my amplicons test case projectIntoRangeFast1 wins.
"""
@dalexander
dalexander / minor BasH5Reader change proposal
Created April 13, 2013 19:48
minor BasH5Reader change proposal
Using the BasH5Reader is simple:
>>> b = BasH5Reader("m1122...bas.h5") # Load the file
>>> zmw = b[9] # Get Zmw object(s) by slicing on holenumber(s)
>>> myRead = zmw.subreads[0] # Get ZmwRead object
>>> myRead.basecalls()
"GATTACA"
>>> myRead.QualityValue()
array([5, 6, 3, 4, 8, 8, 1])
@dalexander
dalexander / circularization.py
Created April 18, 2013 02:24
Jason's circularization script, using FASTA files
#!/usr/bin/env python
from pbcore.io.FastaIO import FastaReader, FastaWriter, FastaRecord
import shlex
import sys
import subprocess
import os
import re
usage = "usage: circulization.py initial_contigs.fastq 20000 /tmp circulaized_contigs.fastq"
@dalexander
dalexander / FastaTable.md
Last active December 22, 2015 01:09
FastaTable

pbcore in 2.2: new class FastaTable

In pbcore for 2.2 I'm introducing a new class, FastaTable, which gives easy random access FASTA reading. It requires a FASTA index (.fai) file sitting next to the FASTA on the filesystem, and it requires a constant wrapping length in the FASTA (note that these requirements are already fulfilled by all PacBio reference repository FASTAs).

Internally the class works by mmap'ing the file contents into virtual

@dalexander
dalexander / genScatter.py
Created December 4, 2013 06:52
Scatter-gather help for GenomicConsensus
from pbcore.io import FastaTable
from nose.tools import eq_
def chunk(keysAndSizes, numChunks):
"""
Heuristically attempt to split the keys up into sublists such that
the total of sizes of each sublist is near the targetSize, and
the chunks are well balanced. Better to go over than under.
"""
@dalexander
dalexander / AlignmentFormat.md
Last active December 30, 2015 20:09
Alignment file format proposal

Custom Alignnment File Format Proposal

Here's a starting point for a file format, we can discuss/negotiate.

Let's use the extension .aln for a text file like this:

## Custom Alignment Format v0.1
@dalexander
dalexander / findBadReads.py
Last active January 3, 2016 13:19
Find reads where the basecalling went haywire
#!/usr/bin/env python
from pbcore.io import BasH5Reader, M4Reader
from pbcore.util.Process import backticks
import sys, os.path as osp
def totalReadLength(m4r):
#return m4r.qseqlength
# qseqlength is bogus! use the offsets from the query string
extent = map(int, m4r.qName.split("/")[-1].split("_"))
@dalexander
dalexander / showChem.py
Last active August 29, 2015 13:55
script for interrogating PacBio sequencing chemistry used
#!/usr/bin/env python
import sys
import h5py
fname = sys.argv[1]
f = h5py.File(fname, "r")
if fname.endswith("bax.h5"):
ri = f["/ScanData/RunInfo"]
try:
@dalexander
dalexander / realign.py
Last active August 29, 2015 13:59
basic gap realignment for better visualization
#
# Push gaps forward in homopolymers
#
# Rewrite rule 1: XX ===> XX
# X- -X
#
# Rewrite rule 2: X- ===> -X
# XX XX
#
# Iterate until convergence.