Ryan Schubert RyanSchu

## dbgap.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                RyanSchu
                / dbgap.md
            
            
              Last active
              January 13, 2022 22:34
            
              
                Downloading Data from dbgap
              
          
    Install the aspera cli client required by dbgap

aspera installation manual
download page
wget https://download.asperasoft.com/download/sw/cli/3.7.7/aspera-cli-3.7.7.608.927cce8-linux-64-release.sh

update permissions of the installation executable and execute
chmod a+x ./aspera-cli-3.7.7.608.927cce8-linux-64-release.sh
./aspera-cli-3.7.7.608.927cce8-linux-64-release.sh


## GwasQCExample.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                RyanSchu
                / GwasQCExample.md
            
            
              Last active
              September 21, 2018 20:57
            
              
                GWAS QC PIPELINE EXAMPLE
              
          
    GWAS QC

Each and every gwas is different so there is no perfect standard of how to run GWAS. However many of the steps we must take are predictable. Most commonly, we find ourselves filtering by missingness, removing related individuals and removing snps that are in linkage disequilibrium with each other. Finally, we usually want to run PCA on our population to validate that they have been adequately filtered. This pipeline combines these general steps and more into three coherent scripts to easily perform gwas qc.
Running the Pipeline

The pipeline is broken into three scripts main scripts that performs one of the steps mentioned above. Each has a handful of options and produces multiple output files for the user. I would encourage you to look at the project wiki and get an understanding of what each option does and what files are produced. While the pipeline produces predicatble results, it is largely up to the user to give direction and decide on wh

  
## PCA.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                RyanSchu
                / PCA.md
            
            
              Last active
              September 24, 2018 17:30
            
              
                PCA w/out hapmap
              
          
    These scripts are meant to be run from the command line. Save the eigenplot.R script attached. You can then run it from the command line as such
Rscript eigenplot.R --val Path/to/pca.eigencal --vec Path/to/pca.eigenvec --o /directory/to/output

This will make the pdf file pca_plots in the specified output directory.

  
## Illumina to Lgen.md

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              5 stars
            
          
                RyanSchu
                / Illumina to Lgen.md
            
            
              Last active
              February 8, 2024 01:38
            
              
                Convert FinalReport.txt files to plink lgen format
              
          
    About

This gist details how to convert illumina final report genotype data to an input useable by PLINK. From there the genotype can be quality controlled and the end results exported to a .vcf file or otherwise parsed.
Lgen/PLINK format

lgen from the PLINK documentation
"A text file with no header line, and one line per genotype call (or just not-homozygous-major calls if 'lgen-ref' was invoked) usually with the following five fields:
Family ID

  
## dosage_extract.py
#origianlly written by Brianne Coffey Github@breecoffey
#modified by Ryan Schubert

import argparse
import gzip
import re
from os.path import expanduser
current = expanduser("~")

parser = argparse.ArgumentParser(description='Input & Output Files') #create the argument parser

## How To: Sanger Imputation.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                RyanSchu
                / How To: Sanger Imputation.md
            
            
              Last active
              January 16, 2020 13:17
            
          
    Preimputation steps

Read the Sanger How-to page to make sure your vcf file meets all the requirements for Sanger imputation.
The preimputation steps for UMichigan and Sanger are relatively similar. The main difference is that VCF files are not split by chromosome under sanger imputation.  I recommend carrying out the Michigan preimputation steps anyways. If you have already carried out the preimputation steps for U Mich imputation then the only other requirement will be to merge the chr.vcf files into one vcf file and sort the result.
#merge vcf files from list of files
bcftools concat --file-list ~/preimputation/vcf_list.txt -o merged.vcf
gzip merged.vcf


## aliases.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                RyanSchu
                / aliases.md
            
            
              Last active
              January 7, 2019 16:52
            
          
    About Aliases

aliases are a way to create shortcuts for a given command. While their scope is limited they are useful for saving time on tedious typing. For example, using the bash command less there is the useful flag -S that prevents text from wrapping on the display. This is quite useful for readability but it can be tedious to write less -S every time. Instead what you can do is use alias to create a shortcut for that specific command.
alias less='less -S'

Now whenever you ype less it instead performs less -S by default. However this alias will go away once you leave the session. If you want the alias to be permanent then we will need to work with two hidden files
Making a permanent alias

first lets create a hidden file that will contain all the aliases you want. hidden files follow the syntax .file_name. To create a hidden file do


## pi1_calc.R
library(data.table)
library(dplyr)
library(ggplot2)
library(qvalue)
"%&%" = function(a,b) paste(a,b,sep="")

#List METS and MESA pops
discovery_cohorts<-c("METS_FDR0.05_PC0_PF0.txt.gz","METS_FDR0.05_PC0_PF10.txt.gz","METS_FDR0.05_PC0_PF20.txt.gz","METS_FDR0.05_PC10_PF0.txt.gz","METS_FDR0.05_PC10_PF10.txt.gz","METS_FDR0.05_PC10_PF20.txt.gz","METS_FDR0.05_PC3_PF0.txt.gz","METS_FDR0.05_PC3_PF10.txt.gz","METS_FDR0.05_PC3_PF20.txt.gz")
replication_cohorts<-as.matrix(read.table(file="~/mets_analysis/meqtl/replication_pops/pop_list"))
discovery_dir<-"~/mets_analysis/meqtl/combined_pop/"

## MEQTLvsGEMMA.R
library(dplyr)
library(ggplot2)
library(tidyr)
library(data.table)
METS<-as.data.frame(fread("zcat ~/mets_analysis/meqtl/combined_pop/METS_FDR0.05_PC0_PF0.txt.gz", colClasses = "character"))
colnames(METS)<-c("snps", "gene", "statistic", "pvalue", "FDR", "beta")

gemma<-as.data.frame(fread("zcat ~/mets_analysis/for_gemma/gemma_whole_genome_FDR0.05.txt.gz", header = F, sep = "\t"))
colnames(gemma)<-c("chr", "rs", "ps", "n_miss", "allele1", "allele0", "af", "beta", "se", "l_remle", "l_mle", "p_wald", "p_lrt", "p_score", "gene_inf", "FDR")
info<-separate(gemma, gene_inf, into = c("gene1","gene2",NA,NA,NA,NA), sep = "\\.")

## QQplotHowTo.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                RyanSchu
                / QQplotHowTo.md
            
            
              Last active
              February 19, 2019 21:24
            
              
                QQplot_Tophits.R
              
          
    Here is an rscript to generate a qqplot that comes with five basic flags. It is designed to be run through the command line using these flags as necessary. This script can be used to generate either partial or complete qqplots, with partial qqplots displaying only the top proportion of hits according to user input. The script is designed to take in a variety of types described as follows:

Data with or without a header (Default is with)

The script assumes by default that the input file has a header. If the input file does not have a header, then the user may signal this by supplying the --noheader flag at runtime

Data in .gz format

The script in utilizing the fread function from data.table interprets whether or not the data is in .gz format based on the file name ending with .gz or not

Single or multi column data
	#origianlly written by Brianne Coffey Github@breecoffey
	#modified by Ryan Schubert

	import argparse
	import gzip
	import re
	from os.path import expanduser
	current = expanduser("~")

	parser = argparse.ArgumentParser(description='Input & Output Files') #create the argument parser
	library(data.table)
	library(dplyr)
	library(ggplot2)
	library(qvalue)
	"%&%" = function(a,b) paste(a,b,sep="")

	#List METS and MESA pops
	discovery_cohorts<-c("METS_FDR0.05_PC0_PF0.txt.gz","METS_FDR0.05_PC0_PF10.txt.gz","METS_FDR0.05_PC0_PF20.txt.gz","METS_FDR0.05_PC10_PF0.txt.gz","METS_FDR0.05_PC10_PF10.txt.gz","METS_FDR0.05_PC10_PF20.txt.gz","METS_FDR0.05_PC3_PF0.txt.gz","METS_FDR0.05_PC3_PF10.txt.gz","METS_FDR0.05_PC3_PF20.txt.gz")
	replication_cohorts<-as.matrix(read.table(file="~/mets_analysis/meqtl/replication_pops/pop_list"))
	discovery_dir<-"~/mets_analysis/meqtl/combined_pop/"
	library(dplyr)
	library(ggplot2)
	library(tidyr)
	library(data.table)
	METS<-as.data.frame(fread("zcat ~/mets_analysis/meqtl/combined_pop/METS_FDR0.05_PC0_PF0.txt.gz", colClasses = "character"))
	colnames(METS)<-c("snps", "gene", "statistic", "pvalue", "FDR", "beta")

	gemma<-as.data.frame(fread("zcat ~/mets_analysis/for_gemma/gemma_whole_genome_FDR0.05.txt.gz", header = F, sep = "\t"))
	colnames(gemma)<-c("chr", "rs", "ps", "n_miss", "allele1", "allele0", "af", "beta", "se", "l_remle", "l_mle", "p_wald", "p_lrt", "p_score", "gene_inf", "FDR")
	info<-separate(gemma, gene_inf, into = c("gene1","gene2",NA,NA,NA,NA), sep = "\\.")