Skip to content

Instantly share code, notes, and snippets.

@johnbowes
Created February 9, 2016 16:07
Show Gist options
  • Save johnbowes/0151cd2407e9bb67f97b to your computer and use it in GitHub Desktop.
Save johnbowes/0151cd2407e9bb67f97b to your computer and use it in GitHub Desktop.
Align GWAS dataset to reference panel prior to phasing.
#!/bin/bash
# study data details
STUDY=''
DATA_DIR='prepare_data/dataset/'
DATA=${DATA_DIR}${STUDY}"_chr"
# reference data details
REF_DIR='/mnt/iusers01/jw01/jw01-shared-resources/impute2/ref_panel/ALL_1000G_phase1integrated_v3_impute_macGT1/'
# shortcuts to executables
shapeit2="/mnt/iusers01/jw01/jw01-shared-resources/shapeit2/bin/shapeit.v2.r727.linux.x64"
# initial align to reference panel
mkdir -p prepare_data/strand_issues
seq 1 22 | xargs -n 1 -P 6 -I {} $shapeit2 -check -B ${DATA}{} --input-ref ${REF_DIR}ALL_1000G_phase1integrated_v3_chr{}_impute_macGT1.hap.gz ${REF_DIR}ALL_1000G_phase1integrated_v3_chr{}_impute_macGT1.legend.gz ${REF_DIR}ALL_1000G_phase1integrated_v3.sample --output-log prepare_data/strand_issues/chr{}.alignments &> prepare_data/strand_issues/alignments.stdout
# create flip file
grep -h strand prepare_data/strand_issues/chr*.alignments.snp.strand | cut -f 3 > prepare_data/strand_issues/1kg_alignments.flip
# flip data
seq 1 22 | xargs -n 1 -P 6 -I {} plink --bfile ${DATA}{} --flip prepare_data/strand_issues/1kg_alignments.flip --make-bed --out prepare_data/dataset/temp_chr{} --noweb --silent
# repeat alignment check on flipped dataset
seq 1 22 | xargs -n 1 -P 6 -I {} $shapeit2 -check -B prepare_data/dataset/temp_chr{} --input-ref ${REF_DIR}ALL_1000G_phase1integrated_v3_chr{}_impute_macGT1.hap.gz ${REF_DIR}ALL_1000G_phase1integrated_v3_chr{}_impute_macGT1.legend.gz ${REF_DIR}ALL_1000G_phase1integrated_v3.sample --output-log prepare_data/strand_issues/chr{}.flip.alignments &> prepare_data/strand_issues/flip.alignments.stdout
# create exclusion list based on persistent alignment errors
grep -h strand prepare_data/strand_issues/chr*.flip.alignments.snp.strand | cut -f 3 > prepare_data/strand_issues/1kg_alignments.exclude
# create an aligned dataset excluding alignment errors
seq 1 22 | xargs -n 1 -P 6 -I {} plink --bfile $DATA{} --chr {} --flip prepare_data/strand_issues/1kg_alignments.flip --exclude prepare_data/strand_issues/1kg_alignments.exclude --make-bed --out prepare_data/dataset/${STUDY}_refAlign_chr{} --noweb --silent
# clean up
rm prepare_data/dataset/temp*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment