Skip to content

Instantly share code, notes, and snippets.

View pythseq's full-sized avatar
😉

pythseq pythseq

😉
View GitHub Profile
\documentclass{article}
\usepackage[
paperwidth=27cm,paperheight=13cm,
margin=1cm,
]{geometry}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
Wheeler graphs
Gagie, Manzini, Siren
Theoretical Computer Science, 2017
https://www.sciencedirect.com/science/article/pii/S0304397517305285
Notes of a whiteboard presentation to the Bonsai team in Lille.
These notes largely follow the paper.
Rayan Chikhi, 2019
@pythseq
pythseq / README.md
Created June 28, 2019 12:04 — forked from lindenb/README.md
https://twitter.com/sjackman/status/748259285151293440 ( flex ) Who's got a command for me to output coordinates of FASTA scaffolds gaps as a BED file? Looking for a one-liner.
$ flex input.l && gcc -O3 lex.yy.c && curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr22.fa.gz" | gunzip -c | ./a.out 
chr22	0	16050000
chr22	16697850	16847850
chr22	19178139	19178140
chr22	19178159	19178160
chr22	19178161	19178164
chr22	19178165	19178167
@pythseq
pythseq / Genomics_A_Programmers_Guide.md
Created May 17, 2019 16:37 — forked from andy-thomason/Genomics_A_Programmers_Guide.md
Genomics a programmers introduction

Genomics - A programmer's guide.

Andy Thomason is a Senior Programmer at Genomics PLC. He has been witing graphics systems, games and compilers since the '70s and specialises in code performance.

https://www.genomicsplc.com

Bedtools Cheatsheet

General:

Tools Description
flank Create new intervals from the flanks of existing intervals.
slop Adjust the size of intervals.
shift Adjust the position of intervals.
subtract Remove intervals based on overlaps b/w two files.
@pythseq
pythseq / highcov.c
Created April 23, 2019 11:39 — forked from lindenb/highcov.c
calculates mean bam coverage and prints a bed of the regions covered more than FACTOR*(mean-coverage) : htslib sam bam coverage depth mask
/**
Author: Pierre Lindenbaum PhD. 2019
This tools calculate the mean depth of a bam (ignoring the non-covered bases)
and output a BED file of the regions covered more than mean-depth*FACTOR
compilation: gcc -o a.out -O3 -Wall -I../htslib highcov.c -L../htslib -lm -lpthread -lhts -lz -llzma -lbz2
*/
@pythseq
pythseq / makeSummaryTable.py
Created April 2, 2019 08:53 — forked from philippbayer/makeSummaryTable.py
Two scripts to plot BLINK results using rMVP
# assuming that all results tables of BLINK are in the current folder, and assuming that all results tables were gzipped
# usage: python makeSummaryTable.py > Summary_Table.tsv
# this script will make one large summary table in the format rMVP requires, SNP, chrom, position, and then x columns for x phenotypes - each cell one p-value
# use grep -v to kick out regions of the genome you don't want to plot (unplaced contigs etc.)
import glob
import gzip
from collections import OrderedDict
all_phenos = ['SNP','chr','pos']
  1. Select which database you want to download, here I will use the nucleotide database: nt.

  2. Using rsync we will retrieve the name of the files composing the database from the NCBI server

rsync --list-only rsync://ftp.ncbi.nlm.nih.gov/blast/db/nt*.gz

  1. Using grep we filter the Warning/Welcome message and retain only the compressed files

rsync --list-only rsync://ftp.ncbi.nlm.nih.gov/blast/db/nt*.gz | grep '.tar.gz'

@pythseq
pythseq / My_data.tsv
Created January 2, 2019 10:14 — forked from philippbayer/My_data.tsv
This is the data and the plot I used for my PAG poster on repeatmasking problems in R-gene prediction.
Class Repeatmasking Assembly Count
CNL before B. napus 23
CNL after B. napus 13
CNL before B. rapa 20
CNL after B. rapa 10
CN before B. napus 47
CN after B. napus 25
CN before B. rapa 16
CN after B. rapa 5
N before B. napus 92
@pythseq
pythseq / extract_transcript_intron.sh
Created December 10, 2018 08:57 — forked from hiraksarkar/extract_transcript_intron.sh
3 line script to extract intron boundaries per transcript
## requirement bed tools
BIN='/home/hirak/bedtools2/bin'
## Gencode
## gencode.v29.chr_patch_hapl_scaff.annotation.gtf
GTF_FILE="gencode.v29.chr_patch_hapl_scaff.annotation.gtf"
# extract transcript boundaries
cat $GTF_FILE | awk 'BEGIN{OFS="\t";} $3=="transcript" {print $1,$4-1,$5,$12}' | tr -d "\"" | tr -d ";" | $BIN/sortBed > gencode_transcript_intervals.bed
# merge exon boundaris