Skip to content

Instantly share code, notes, and snippets.

@johnbowes
johnbowes / positional_duplicates.py
Last active January 28, 2016 09:30
Identify duplicate SNPs based on chromosome and base position. Idenitify duplicates with missing data to exclude.
#!/usr/bin/python
import sys
import pandas as pd
import numpy as np
# arguments are fixed in the following format:
lmiss_file = sys.argv[1]
bim_file = sys.argv[2]
output_file = sys.argv[3]
@johnbowes
johnbowes / king.py
Last active March 17, 2024 20:04
Select related individuals for exclusion based on output from KING
#!/usr/bin/python
# Run KING to generate sample QC and IBD summary stats
# ./king_1.9 -b <data>.bed --bysample --prefix <prefix_for_output>
# ./king_1.9 -b <data>.bed --kinship --related --degree 2 --prefix <prefix_for_output>
#
# Run this script to create a list of exclusions (member of pair with least data will be excluded)
# python king.py --prefix <prefix_for_output> --out <output_file_name>
# Add error if no samples in kinship file.
@johnbowes
johnbowes / effect_estimate_comp.R
Created November 30, 2015 15:24
Plot of effect estimate comparisions as used in PsA immunochip paper
setwd("C:/Users/mdeasjdb/Dropbox/PsA_immunochip/effect_comparison")
# format known associtions table - this contains a link between the two results sets and r2
xref <- read.csv(file="psoriasis_known_loci.txt", header=TRUE)
colnames(xref)[6] <- 'xref_study_snp'
colnames(xref)[1] <- 'xref_tsoi_snp'
xref <- subset(xref, select=c('xref_study_snp', 'xref_tsoi_snp', 'ld_rs', 'gene'))
xref$ld_rs[is.na(xref$ld_rs)] <- 1
xref$ld <- as.numeric(as.character(xref$ld_rs))
#!/usr/bin/python
import sys
import os
import glob
import pandas as pd
import numpy as np
import argparse
# Future improvements
@johnbowes
johnbowes / HumanCoreExome_array_qc.sh
Last active January 21, 2016 15:20
QC process for Illumina Human Core Exome array
#!/bin/bash
module load apps/binapps/anaconda/2.1.0
module load apps/gcc/R/3.1.0
# clean data threshold variables.
SNP_MISS_THRESH=0.02 # threshold for SNP missing rate
SNP_MAF_THRESH=0.01 # threshold for minor allele
SNP_HWE_THRESH=1e-3 # threshold for HWE
SNP_HWE_GROUP='ALL' # define within which group to calculate HWE
@johnbowes
johnbowes / shapeit2_phasing_array.sh
Created February 9, 2016 15:57
SGE job array for phasing GWAS data with shapeit2
#!/bin/bash
## Inherit user environment from the login node
#$ -V
## Use the current directory as the working directory for SGE output and determining paths to files
#$ -cwd
## Request parallel environment and a number cores
#$ -pe smp.pe 12
##
## create an array
#$ -t 1-22
@johnbowes
johnbowes / impute2_job_array.sh
Created February 9, 2016 15:59
SGE job array for submitting imputation interval jobs
#!/bin/bash
#$ -S /bin/bash
##
## Inherit user environment from the login node
#$ -V
##
## Use the current directory as the working directory for SGE output and determining paths to files
#$ -cwd
##
@johnbowes
johnbowes / impute2_job.sh
Created February 9, 2016 16:00
bash script for imputing a genomic interval. Used with the SGE job array script.
#!/bin/bash
CHR=$1
INTERVAL_START=`printf "%.0f" $2`
INTERVAL_END=`printf "%.0f" $3`
INTERVAL=`printf "%03d" $4`
# directories
NAME="impute_test_data_hg19"
OUTPUT_DIR="/mnt/iusers01/jw01/mdeasjdb/scratch/impute_practice/"
@johnbowes
johnbowes / hg19_5mb_interval.list
Created February 9, 2016 16:03
5Mb interval list for use with impute2.
1 10583 5010583 1
1 5010584 10010584 2
1 10010585 15010585 3
1 15010586 20010586 4
1 20010587 25010587 5
1 25010588 30010588 6
1 30010589 35010589 7
1 35010590 40010590 8
1 40010591 45010591 9
1 45010592 50010592 10
@johnbowes
johnbowes / align_to_reference.sh
Created February 9, 2016 16:07
Align GWAS dataset to reference panel prior to phasing.
#!/bin/bash
# study data details
STUDY=''
DATA_DIR='prepare_data/dataset/'
DATA=${DATA_DIR}${STUDY}"_chr"
# reference data details
REF_DIR='/mnt/iusers01/jw01/jw01-shared-resources/impute2/ref_panel/ALL_1000G_phase1integrated_v3_impute_macGT1/'