Skip to content

Instantly share code, notes, and snippets.

View pontikos's full-sized avatar
😀

Nikolas Pontikos pontikos

😀
View GitHub Profile
@pontikos
pontikos / CADD.sh
Last active January 15, 2019 16:28
nicer cadd script than the one provided
#!/bin/bash
#CADDpath="`dirname \"$0\"`"
#CADDpath="`( cd \"$CADDpath/..\" && pwd )`"
set +x
CADDpath=/share/apps/genomics/CADD_v1.3/
if [ -z "$VEPpath" ] ; then source ${CADDpath}/bin/config.sh; fi
echo $VEPpath
NCORES=`expr $(lscpu | grep '^CPU(s):' | cut -f2 -d: | tr -d ' ') - 2`
@pontikos
pontikos / three-column.R
Created May 14, 2017 19:38
Convert distance matrix to 3 column matrix
melt(data.matrix(X))->m2
@pontikos
pontikos / bs_download.py
Created February 1, 2017 16:42
Basespace downloader, obtains urls which you can then wget
import sys
import requests
# print urls of fastq files to download
# obtain access token by following instructions here:
# https://support.basespace.illumina.com/knowledgebase/articles/403618-python-run-downloader
AccessToken=sys.argv[1]
# user
@pontikos
pontikos / genecards_scraper.py
Created July 26, 2016 10:38
genecards python scraper uses selenium and phantomjs to circumvent Incapsula
from __future__ import print_function
import sys
import re
from selenium import webdriver
from random import randint
from time import sleep
dr = webdriver.PhantomJS()
#dr.get('http://www.genecards.org')
@pontikos
pontikos / tabix-kaviar.R
Created June 9, 2016 18:02
Add kaviar annotation to annotation csv file.
#!/usr/bin/env Rscript
library(Rsamtools)
# '/cluster/scratch3/vyp-scratch2/reference_datasets/Kaviar/Kaviar-160204-Public/vcfs/Kaviar-160204-Public-hg38.vcf.gz'
#f <- '/cluster/scratch3/vyp-scratch2/reference_datasets/Kaviar/Kaviar-160204-Public/vcfs/Kaviar-160204-Public-hg19.vcf.gz'
read('rare_shared_2006_2006A.csv')->d
x <- do.call('rbind', strsplit(d$VARIANT_ID, '_'))
@pontikos
pontikos / check_chrom_size.sh
Last active June 8, 2016 18:24
Check that your VCFs are not truncated.
# chrom sizes in hg19
declare -A sizes
sizes["chr1"]=249250621
sizes["chr2"]=243199373
sizes["chr3"]=198022430
sizes["chr4"]=191154276
sizes["chr5"]=180915260
sizes["chr6"]=171115067
sizes["chr7"]=159138663
@pontikos
pontikos / michigan_impute_server_download.md
Last active May 30, 2021 16:49
Retrieve download URLs from Michigan impute server

On the results page of the imputation, in Chrome, open you javascript console and run this:

copy(document.body.innerHTML);

This will copy the javacript rendered page to you clipboard. Now paste it in a document say download_page.html.

Then run this python script to extract the urls:

from __future__ import print_function
@pontikos
pontikos / bars_and_stars.py
Created March 10, 2015 13:43
Bars and stars algorithm with three bins. Goal is to extend to N bins. If anyone has any ideas?
# bars and stars algorithm
N=5
for n in range(0,N):
x=[1]*n
for i in range(0,(len(x)+1)):
for j in range(i,(len(x)+1)):
print 100-n, sum(x[0:i]), sum(x[i:j]), sum(x[j:len(x)])
@pontikos
pontikos / pop-pca.R
Last active August 29, 2015 14:15
Populations pca of onekg and aj samples using snpstats. Samples have been LD trimmed using plink.
ibrary(snpStats)
d <- read.plink('all.bed','all.bim','all.fam')
print(dim(X <- d$genotypes))
# snps were everyone is the same thing are boring
#snp.qc <- col.summary(X)
#X <- X[,snp.qc$MAF > 0]
# also should remove singleton variants i.e only present in a single person
@pontikos
pontikos / vcf-samples.sh
Created February 14, 2015 12:36
Get sample names from vcf.
#! /bin/env bash
function error() { >&2 echo -e "\033[31m$*\033[0m"; }
function stop() { error "$*"; exit 1; }
try() { "$@" || stop "cannot $*"; }
file=$1
#doesn't work for double extension .gvcf.gz
ext="${file##*.}"
search=