Skip to content

Instantly share code, notes, and snippets.

View brentp's full-sized avatar

Brent Pedersen brentp

View GitHub Profile
@brentp
brentp / README.md
Created January 6, 2017 23:34
comparing interval trees

compare different interval tree methods in go.

The parameters are:

// Number of intervals in the tree
var n_intervals = 30000
// length of intervals in the tree
var i_length = 1000
// length of the chromosome (max start of simulated intervals)
var chrom_length = 500000000

compile

CFLAGS+=-fno-omit-frame-pointer -g

google-perftools

usage

cat example-data.txt | go run hist.go -c 1 -g 2 -b 30

"""
usage is:
$ bgz-extract.py some.vcf.gz 14 20
and in another thread:
$ bgz-extract.py some.vcf.gz 15 20
...
The first number is the requested *chunk* and the 2nd is the total number of chunks which can be anything specified by the user.
"""
import sys
@brentp
brentp / h47r.py
Last active August 9, 2016 03:54
# 0 or 1 indicates absence/presence of variant
data = [
[0, 1, 0, 1, 0, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
@brentp
brentp / READMD.md
Created July 23, 2016 13:18
testing gargs

Trying to see if we can get mixing of output. We pipe to uniq and then check that we get the same number of lines of output as input (each call to t.py produces many of the same lines so they should all get uniqed to 1.

seq 1 20 | ./gargs -p 10 "python t.py {}" | uniq | awk '{ print substr($0, 0, 10) }'

The N lines of of "1|1|1|..." or "2|2|2..." based on the sys.argv[1] where N is chosen randomly and we have random ints between each line.

save MC MQ tags to an LMDB

pip install lmdb

create a database

keys are qname///flag

@brentp
brentp / gnu-parallel-xargs.md
Last active February 18, 2016 20:50
remember gnu parallel and / or xargs usage
from gemini.gim import AutoDom, AutoRec, DeNovo, MendelViolations, CompoundHet
# turn a dictionary into something that can be accessed by attribute.
class Args(object):
"""
>>> args = Args(db='some.db')
>>> args.db
'some.db'
"""
_opts = ("columns", "db", "filter", "min_kindreds", "families",

Combine ExAC with non-psych and nonTCGA

ExAC originally released a VCF that contained the aggregate data for all samples. It has INFO fields for AFR, AMR, EAS, FIN, NFE, OTH, and SAS populations along with the combined Adj count. Recently, 2 additional VCF's that contain only non pyschiatric and only non TCGA samples have been released.

Here, we will use vcfanno to decorate the original VCF with the alternate counts,