Skip to content

Instantly share code, notes, and snippets.

import numpy as np
from numpy.random import beta, binomial
#%pylab inline
def get_random_corr(size=1,
same_modulo=True,
is_null_corr = False,
prob_pos_corr=0.5,
corr_neg=[40,10],
@inti
inti / mutual_info.py
Created May 5, 2020 22:00 — forked from naught101/mutual_info.py
Estimating entropy and mutual information with scikit-learn
'''
Non-parametric computation of entropy and mutual-information
Adapted by G Varoquaux for code created by R Brette, itself
from several papers (see in the code).
These computations rely on nearest-neighbor statistics
'''
import numpy as np
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20161220
##source=freeBayes v1.1.0
##reference=/home/shared/app/bcbio/genomes/Ptrichocarpa/Ptri_v3/bwa/Ptri_v3.fa
##contig=<ID=Chr01,length=50495391>
##contig=<ID=Chr02,length=25263035>
##contig=<ID=Chr03,length=21816808>
##contig=<ID=Chr04,length=24267051>
##contig=<ID=Chr05,length=25890704>
@inti
inti / README.md
Last active November 30, 2016 18:19
filter sam file and select close pairs

This scripts takes a sam formatted file sorted by read name produced from the alignment of a interleave fastq file (that is read1/1 and read1/2 format). For each read pair it selects alignment that are close (<5000 kb, see line 22) in the same chromosome and it prints them out. It sets flags to pair both reads (see lines 31-45) and if alignment is correct it sets the pair as proper read (see line 48). If there are more than 1 alignment per read it set flag as 'secondary alignment' except for the one with the highest score (some of MAPQ scores of both reads)

bash -c "python interleave-fastq.py <( zcat fastq/150_R1.fastq.gz) <( zcat fastq/150_R2.fastq.gz)" | sed "s/ 1:N/\/1 1:N/" | sed "s/ 2:N/\/2 2:N/" | gmap -D /home/shared/ref_genome/Poplar/poplarv3_gsnap/ -d poplarv3_gsnap --nthreads=16 --nosplicing -B 5 -f samse -O --read-group-id=test --read-group-name=test /dev/stdin 2> /dev/null | python filter_sam_file.py | sambamba view --format bam /dev/stdin > test.bam

the interleave-fastq.py was obtained from

featureName index alleleOne alleleTwo
Potri.001G000700.v3.0 Chr01_56218_T_G 7 4
Potri.001G000800.v3.0 Chr01_61445_A_C 3 10
Potri.001G000800.v3.0 Chr01_61525_T_A 3 5
Potri.001G000800.v3.0 Chr01_61540_C_T 2 12
Potri.001G000800.v3.0 Chr01_61542_A_G 2 12
Potri.001G000800.v3.0 Chr01_61550_T_A 5 19
Potri.001G000800.v3.0 Chr01_61574_G_T 6 8
Potri.001G000800.v3.0 Chr01_62252_T_C 3 1
Potri.001G001300.v3.0 Chr01_103965_C_T 2 5
@inti
inti / Populus_trichicarpa_example.bed
Created March 10, 2016 18:08
Example bed file for VarDictJava issue #23
Chr03 1671030 1678847
Chr03 1680122 1683702
Chr03 1684822 1686872
Chr03 1689404 1693397
Chr03 1723476 1727153
Chr03 1774605 1776831
Chr03 1778862 1780912
Chr03 1796910 1798963
Chr03 1819329 1823273
Chr03 1828476 1831732
@inti
inti / README.md
Last active March 4, 2016 18:37
Stan implementation of the bayesian model Skelly et al 2011 Genome Research
@inti
inti / snakemake_output.txt
Created June 10, 2015 13:53
SnakeMake: Unexpectedly present output files
Provided cores: 100
Provided resources: java_re=3
Ignored resources: large_n_io, gnu_parallel
Job counts:
count jobs
2240 index_vcf
2100 index_bam
2000 SplitNCigarReads_SampleRegionWise
2000 recode_CIGAR_N_to_D
100 freebayes_joint_regionwise
[ipedroso@jimi log]$ cat bcbio-nextgen-commands.log
[2014-12-17T14:05Z] gunzip -c /media/TeraData/ipedroso/globus_endpoint/Ptrichocarpa_phytozome/v3.0/diversity/bam/BESC-015_1.fastq.gz | /home/shared/app/bcbio/tool/bin/bgzip -c /dev/stdin > /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/work2/align_prep/tx/tmp2vQ808/BESC-015_1.fastq.gz
[2014-12-17T14:05Z] gunzip -c /media/TeraData/ipedroso/globus_endpoint/Ptrichocarpa_phytozome/v3.0/diversity/bam/93-968_1.fastq.gz | /home/shared/app/bcbio/tool/bin/bgzip -c /dev/stdin > /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/work2/align_prep/tx/tmpgNAsco/93-968_1.fastq.gz
[2014-12-17T16:51Z] gunzip -c /media/TeraData/ipedroso/globus_endpoint/Ptrichocarpa_phytozome/v3.0/diversity/bam/BESC-015_2.fastq.gz | /home/shared/app/bcbio/tool/bin/bgzip -c /dev/stdin > /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/work2/align_prep/tx/tmptFim84/BESC-015_2.fastq.gz
[
[ipedroso@jimi log]$ cat bcbio-nextgen.log
[2014-12-17T14:05Z] Timing: organize samples
[2014-12-17T14:05Z] multiprocessing: organize_samples
[2014-12-17T14:05Z] Using input YAML configuration: /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/config/populus_trichocarpa_diversity_1.yaml
[2014-12-17T14:05Z] Checking sample YAML configuration: /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/config/populus_trichocarpa_diversity_1.yaml
[2014-12-17T14:05Z] Testing minimum versions of installed programs
[2014-12-17T14:05Z] Timing: alignment preparation
[2014-12-17T14:05Z] multiprocessing: prep_align_inputs
[2014-12-17T19:37Z] multiprocessing: disambiguate_split
[2014-12-17T19:37Z] Timing: alignment