On your laptop:
docker pull ubuntu
What is that doing? It's going to https://hub.docker.com/_/ubuntu and pulling down the image with the "latest" tag
docker run ubuntu
On your laptop:
docker pull ubuntu
What is that doing? It's going to https://hub.docker.com/_/ubuntu and pulling down the image with the "latest" tag
docker run ubuntu
In this module we will use the GATK HaplotypeCaller to call germline variants from "normal" bams. For a more in-depth look, see this excellent GATK tutorial, provided by the Broad Institute.
Let's start by looking at some outputs of long read sequencing from the Oxford Nanopore (ONT) platform. These are sequences from the K562 cell line, prepared with the ONT cDNA sequencing kit (poly-A selected). Off the machines, the data will consist of a FAST5 or POD5 file, which are a compressed representation of the raw signal. These are subsequently run through a basecalling algorithm (such as Dorado) to generate FASTQ files.
The choice of basecalling algorithm and parameters goes pretty deep, so we'll assume that reasonable choices have been made. For simplicity, we've also subset the data to include just small portions of the genome, including a few genes of interest.
Go ahead and pull down this fastq file:
wget https://storage.googleapis.com/bfx_workshop_tmp/k562_ont_raw.fastq.gz
There are three inputs you need to run a workflow:
Let's start with #3 - the config file. I've made this easy for you. Create a directory where you want to run things, then inside of it, run the following command:
/storage1/fs1/timley/Active/aml_ppg/src/utilities/create_cromwell_config -o cromwell.config -l logs -d output -q timley -G compute-timley```
## New employee setup | |
### Tasks | |
- **Join the mgibio Slack**. Ask any user to invite you using your wustl address, or email [c.a.miller@wustl.edu](mailto:c.a.miller@wustl.edu). Excellent place to post questions about anything and get answers. Useful channels include #bfx-workshop, #analysis-workflows, #cancergenomics, and #docker | |
- **Get compute1 access set up**. This requires a ticket to the [RIS Servicedesk](https://jira.ris.wustl.edu/servicedesk/customer/portal/1) requesting to be added to the appropriate compute and storage groups | |
- **VPN access** Connect to msvpn.wusm.wustl.edu through Cisco AnyConnect. Use WUSTL key log in and submit request at [https://it.wustl.edu/items/connect/](https://it.wustl.edu/items/connect/) | |
- **Set up compute1 config files** (_Need a link for this - env variables, etc_) | |
- **Sign up for the bfx_workshop** get on the [email list](https://outlook.office365.com/owa/bioinformatics@gowustl.onmicrosoft.com/groupsubscription.ashx?action=join&source=MSExchange/LokiServer&guid= |
library(edgeR); | |
library(gplots); | |
library(RColorBrewer); | |
library(tximport); | |
# takes three arguments - config file, transcript to gene table, and output directory | |
# config file specifies the samples to import, groupings, and paths to abundance.tsv files from kallisto | |
# groups should be either 0 or 1 | |
# header: sample \t group \t /path/to/abundance.tsv |