Inti Pedroso inti

## simulate_coexpression.py
import numpy as np
from numpy.random import beta, binomial

#%pylab inline

def get_random_corr(size=1,
                    same_modulo=True,
                    is_null_corr = False,
                    prob_pos_corr=0.5,
                    corr_neg=[40,10],

## mutual_info.py
'''
Non-parametric computation of entropy and mutual-information

Adapted by G Varoquaux for code created by R Brette, itself
from several papers (see in the code).

These computations rely on nearest-neighbor statistics
'''
import numpy as np

## test.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20161220
##source=freeBayes v1.1.0
##reference=/home/shared/app/bcbio/genomes/Ptrichocarpa/Ptri_v3/bwa/Ptri_v3.fa
##contig=<ID=Chr01,length=50495391>
##contig=<ID=Chr02,length=25263035>
##contig=<ID=Chr03,length=21816808>
##contig=<ID=Chr04,length=24267051>
##contig=<ID=Chr05,length=25890704>

## README.md

      
              2 files
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                inti
                / README.md
            
            
              Last active
              November 30, 2016 18:19
            
              
                filter sam file and select close pairs 
              
          
    This scripts takes a sam formatted file sorted by read name produced from the alignment of a interleave fastq file (that is read1/1 and read1/2 format). For each read pair it selects alignment that are close (<5000 kb, see line 22) in the same chromosome and it prints them out. It sets flags to pair both reads (see lines 31-45) and if alignment is correct it sets the pair as proper read (see line 48). If there are more than 1 alignment per read it set flag as 'secondary alignment' except for the one with the highest score (some of MAPQ scores of both reads)
bash -c "python interleave-fastq.py <( zcat fastq/150_R1.fastq.gz) <( zcat fastq/150_R2.fastq.gz)" | sed "s/ 1:N/\/1 1:N/" | sed "s/ 2:N/\/2 2:N/" | gmap -D /home/shared/ref_genome/Poplar/poplarv3_gsnap/ -d poplarv3_gsnap --nthreads=16 --nosplicing -B 5 -f samse -O --read-group-id=test --read-group-name=test /dev/stdin 2> /dev/null | python filter_sam_file.py | sambamba view --format bam /dev/stdin > test.bam
the interleave-fastq.py was obtained from

  
## counts.txt
featureName	index	alleleOne	alleleTwo
Potri.001G000700.v3.0	Chr01_56218_T_G	7	4
Potri.001G000800.v3.0	Chr01_61445_A_C	3	10
Potri.001G000800.v3.0	Chr01_61525_T_A	3	5
Potri.001G000800.v3.0	Chr01_61540_C_T	2	12
Potri.001G000800.v3.0	Chr01_61542_A_G	2	12
Potri.001G000800.v3.0	Chr01_61550_T_A	5	19
Potri.001G000800.v3.0	Chr01_61574_G_T	6	8
Potri.001G000800.v3.0	Chr01_62252_T_C	3	1
Potri.001G001300.v3.0	Chr01_103965_C_T	2	5

## Populus_trichicarpa_example.bed
Chr03	1671030	1678847
Chr03	1680122	1683702
Chr03	1684822	1686872
Chr03	1689404	1693397
Chr03	1723476	1727153
Chr03	1774605	1776831
Chr03	1778862	1780912
Chr03	1796910	1798963
Chr03	1819329	1823273
Chr03	1828476	1831732

## README.md

      
              3 files
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                inti
                / README.md
            
            
              Last active
              March 4, 2016 18:37
            
              
                Stan implementation of the bayesian model Skelly et al 2011 Genome Research
              
          
    Stan implementation of the bayesian model Skelly et al 2011 Genome Research
Download cmdstan-2.9.0.tar.gz, https://github.com/stan-dev/cmdstan/releases/download/v2.9.0/cmdstan-2.9.0.tar.gz
tar xvzf cmdstan-2.9.0.tar.gz
cd cmdstan-2.9.0


## snakemake_output.txt
Provided cores: 100
Provided resources: java_re=3
Ignored resources: large_n_io, gnu_parallel
Job counts:
	count	jobs
	2240	index_vcf
	2100	index_bam
	2000	SplitNCigarReads_SampleRegionWise
	2000	recode_CIGAR_N_to_D
	100	freebayes_joint_regionwise

## bcbio-nextgen-commands.log
[ipedroso@jimi log]$ cat bcbio-nextgen-commands.log
[2014-12-17T14:05Z] gunzip -c /media/TeraData/ipedroso/globus_endpoint/Ptrichocarpa_phytozome/v3.0/diversity/bam/BESC-015_1.fastq.gz | /home/shared/app/bcbio/tool/bin/bgzip -c /dev/stdin > /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/work2/align_prep/tx/tmp2vQ808/BESC-015_1.fastq.gz
[2014-12-17T14:05Z] gunzip -c /media/TeraData/ipedroso/globus_endpoint/Ptrichocarpa_phytozome/v3.0/diversity/bam/93-968_1.fastq.gz | /home/shared/app/bcbio/tool/bin/bgzip -c /dev/stdin > /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/work2/align_prep/tx/tmpgNAsco/93-968_1.fastq.gz
[2014-12-17T16:51Z] gunzip -c /media/TeraData/ipedroso/globus_endpoint/Ptrichocarpa_phytozome/v3.0/diversity/bam/BESC-015_2.fastq.gz | /home/shared/app/bcbio/tool/bin/bgzip -c /dev/stdin > /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/work2/align_prep/tx/tmptFim84/BESC-015_2.fastq.gz
[

## bcbio-nextgen.log
[ipedroso@jimi log]$ cat bcbio-nextgen.log
[2014-12-17T14:05Z] Timing: organize samples
[2014-12-17T14:05Z] multiprocessing: organize_samples
[2014-12-17T14:05Z] Using input YAML configuration: /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/config/populus_trichocarpa_diversity_1.yaml
[2014-12-17T14:05Z] Checking sample YAML configuration: /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/config/populus_trichocarpa_diversity_1.yaml
[2014-12-17T14:05Z] Testing minimum versions of installed programs
[2014-12-17T14:05Z] Timing: alignment preparation
[2014-12-17T14:05Z] multiprocessing: prep_align_inputs
[2014-12-17T19:37Z] multiprocessing: disambiguate_split
[2014-12-17T19:37Z] Timing: alignment
	import numpy as np
	from numpy.random import beta, binomial

	#%pylab inline

	def get_random_corr(size=1,
	same_modulo=True,
	is_null_corr = False,
	prob_pos_corr=0.5,
	corr_neg=[40,10],
	'''
	Non-parametric computation of entropy and mutual-information

	Adapted by G Varoquaux for code created by R Brette, itself
	from several papers (see in the code).

	These computations rely on nearest-neighbor statistics
	'''
	import numpy as np
	##fileformat=VCFv4.2
	##FILTER=<ID=PASS,Description="All filters passed">
	##fileDate=20161220
	##source=freeBayes v1.1.0
	##reference=/home/shared/app/bcbio/genomes/Ptrichocarpa/Ptri_v3/bwa/Ptri_v3.fa
	##contig=<ID=Chr01,length=50495391>
	##contig=<ID=Chr02,length=25263035>
	##contig=<ID=Chr03,length=21816808>
	##contig=<ID=Chr04,length=24267051>
	##contig=<ID=Chr05,length=25890704>
	featureName index alleleOne alleleTwo
	Potri.001G000700.v3.0 Chr01_56218_T_G 7 4
	Potri.001G000800.v3.0 Chr01_61445_A_C 3 10
	Potri.001G000800.v3.0 Chr01_61525_T_A 3 5
	Potri.001G000800.v3.0 Chr01_61540_C_T 2 12
	Potri.001G000800.v3.0 Chr01_61542_A_G 2 12
	Potri.001G000800.v3.0 Chr01_61550_T_A 5 19
	Potri.001G000800.v3.0 Chr01_61574_G_T 6 8
	Potri.001G000800.v3.0 Chr01_62252_T_C 3 1
	Potri.001G001300.v3.0 Chr01_103965_C_T 2 5
	Chr03 1671030 1678847
	Chr03 1680122 1683702
	Chr03 1684822 1686872
	Chr03 1689404 1693397
	Chr03 1723476 1727153
	Chr03 1774605 1776831
	Chr03 1778862 1780912
	Chr03 1796910 1798963
	Chr03 1819329 1823273
	Chr03 1828476 1831732
	Provided cores: 100
	Provided resources: java_re=3
	Ignored resources: large_n_io, gnu_parallel
	Job counts:
	count jobs
	2240 index_vcf
	2100 index_bam
	2000 SplitNCigarReads_SampleRegionWise
	2000 recode_CIGAR_N_to_D
	100 freebayes_joint_regionwise
	[ipedroso@jimi log]$ cat bcbio-nextgen-commands.log
	[2014-12-17T14:05Z] gunzip -c /media/TeraData/ipedroso/globus_endpoint/Ptrichocarpa_phytozome/v3.0/diversity/bam/BESC-015_1.fastq.gz \| /home/shared/app/bcbio/tool/bin/bgzip -c /dev/stdin > /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/work2/align_prep/tx/tmp2vQ808/BESC-015_1.fastq.gz
	[2014-12-17T14:05Z] gunzip -c /media/TeraData/ipedroso/globus_endpoint/Ptrichocarpa_phytozome/v3.0/diversity/bam/93-968_1.fastq.gz \| /home/shared/app/bcbio/tool/bin/bgzip -c /dev/stdin > /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/work2/align_prep/tx/tmpgNAsco/93-968_1.fastq.gz
	[2014-12-17T16:51Z] gunzip -c /media/TeraData/ipedroso/globus_endpoint/Ptrichocarpa_phytozome/v3.0/diversity/bam/BESC-015_2.fastq.gz \| /home/shared/app/bcbio/tool/bin/bgzip -c /dev/stdin > /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/work2/align_prep/tx/tmptFim84/BESC-015_2.fastq.gz
	[
	[ipedroso@jimi log]$ cat bcbio-nextgen.log
	[2014-12-17T14:05Z] Timing: organize samples
	[2014-12-17T14:05Z] multiprocessing: organize_samples
	[2014-12-17T14:05Z] Using input YAML configuration: /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/config/populus_trichocarpa_diversity_1.yaml
	[2014-12-17T14:05Z] Checking sample YAML configuration: /media/TeraData/ipedroso/ANALYSES/Populus/Diversity_1/populus_trichocarpa_diversity_1/config/populus_trichocarpa_diversity_1.yaml
	[2014-12-17T14:05Z] Testing minimum versions of installed programs
	[2014-12-17T14:05Z] Timing: alignment preparation
	[2014-12-17T14:05Z] multiprocessing: prep_align_inputs
	[2014-12-17T19:37Z] multiprocessing: disambiguate_split
	[2014-12-17T19:37Z] Timing: alignment