Skip to content

Instantly share code, notes, and snippets.

View chrisamiller's full-sized avatar

Chris Miller chrisamiller

View GitHub Profile

Germline Variant Calling and filtering

Module objectives

  • Perform single-sample germline variant calling with GATK HaplotypeCaller and the GATK GVCF workflow on exome data
  • Perform joint genotype calling on exome data, including additional exomes from 1000 Genomes Project
  • Manipulate and Filter VCFs to remove artifacts and identify variants of interest

In this module we will use the GATK HaplotypeCaller to call germline variants from "normal" bams. For a more in-depth look, see this excellent GATK tutorial, provided by the Broad Institute.

@chrisamiller
chrisamiller / long_read_alignment.md
Last active November 10, 2023 23:31
Long Read Alignment

Long Read Alignment

Let's start by looking at some outputs of long read sequencing from the Oxford Nanopore (ONT) platform. These are sequences from the K562 cell line, prepared with the ONT cDNA sequencing kit (poly-A selected). Off the machines, the data will consist of a FAST5 or POD5 file, which are a compressed representation of the raw signal. These are subsequently run through a basecalling algorithm (such as Dorado) to generate FASTQ files.

The choice of basecalling algorithm and parameters goes pretty deep, so we'll assume that reasonable choices have been made. For simplicity, we've also subset the data to include just small portions of the genome, including a few genes of interest.

Go ahead and pull down this fastq file:

wget https://storage.googleapis.com/bfx_workshop_tmp/k562_ont_raw.fastq.gz
@chrisamiller
chrisamiller / somatic_calling.md
Last active November 11, 2023 13:20
Somatic Variant Calling exercise

Somatic Variant calling

Gather your inputs.

Start by gathering some data. Navigate to a somatic folder and pull down a set of input data from (human build38) from this location:

wget https://storage.googleapis.com/bfx_workshop_tmp/inputs.tar.gz

There are three inputs you need to run a workflow:

  1. A .cwl file that contains the steps to be run
  2. A .yaml file that gives the inputs to that CWL
  3. A config file that tells cromwell about it's environment, how to submit jobs to the cluster, and where to stick the results

Let's start with #3 - the config file. I've made this easy for you. Create a directory where you want to run things, then inside of it, run the following command:

/storage1/fs1/timley/Active/aml_ppg/src/utilities/create_cromwell_config -o cromwell.config -l logs -d output -q timley -G compute-timley```

New employee setup

Tasks

## New employee setup
### Tasks
- **Join the mgibio Slack**. Ask any user to invite you using your wustl address, or email [c.a.miller@wustl.edu](mailto:c.a.miller@wustl.edu). Excellent place to post questions about anything and get answers. Useful channels include #bfx-workshop, #analysis-workflows, #cancergenomics, and #docker
- **Get compute1 access set up**. This requires a ticket to the [RIS Servicedesk](https://jira.ris.wustl.edu/servicedesk/customer/portal/1) requesting to be added to the appropriate compute and storage groups
- **VPN access** Connect to msvpn.wusm.wustl.edu through Cisco AnyConnect. Use WUSTL key log in and submit request at [https://it.wustl.edu/items/connect/](https://it.wustl.edu/items/connect/)
- **Set up compute1 config files** (_Need a link for this - env variables, etc_)
- **Sign up for the bfx_workshop** get on the [email list](https://outlook.office365.com/owa/bioinformatics@gowustl.onmicrosoft.com/groupsubscription.ashx?action=join&source=MSExchange/LokiServer&guid=
@chrisamiller
chrisamiller / oct_2020.md
Last active October 6, 2020 14:47
Oct 2020 Hackathon
@chrisamiller
chrisamiller / lsf_and_docker_tutorial.md
Last active May 31, 2023 02:45
LSF and Docker basics - bfx_workshop

Compute cluster basics

Logging in

Open up a terminal (Terminal/iTerm on Mac, putty or WSL on Windows) and SSH into the cluster, replacing USERNAME with your WUSTL key.

ssh c.a.miller@compute1-client-3.ris.wustl.edu
library(edgeR);
library(gplots);
library(RColorBrewer);
library(tximport);
# takes three arguments - config file, transcript to gene table, and output directory
# config file specifies the samples to import, groupings, and paths to abundance.tsv files from kallisto
# groups should be either 0 or 1
# header: sample \t group \t /path/to/abundance.tsv
#!/bin/bash
#auto update the build time
grep -v "^Packaged:" sciClone/DESCRIPTION >zz
mv -f zz sciClone/DESCRIPTION
thedate=`date +"%F %r"`;
echo "Packaged: $thedate; cmiller" >>sciClone/DESCRIPTION
version=$(grep "Version" sciClone/DESCRIPTION | awk '{print $2}')