Skip to content

Instantly share code, notes, and snippets.

@lmtani
Last active December 30, 2020 20:25
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save lmtani/d37343a40e143b59336e4606055d1723 to your computer and use it in GitHub Desktop.
Save lmtani/d37343a40e143b59336e4606055d1723 to your computer and use it in GitHub Desktop.
FunGAP docker image

FunGAP in Docker Container

This gist has instructions about runnig FunGAP pipeline from inside a Docker Container.

Requirements:

  • Docker
  • 16Gb of available disk space
  • GeneMark-ES/ET release and it's key (gm_et_linux_64.tar.gz and gm_key_64.gz)

Steps

Build FunGAP docker image

Be sure you have the following files in the working directory:

Dockerfile fungap.conf gm_et_linux_64.tar.gz gm_key_64.gz

GeneMark is not free for everybody, so you need to register in order to have gm_* files. If was not for that I could have push FunGAP docker image ready for use in DockerHub.

# 1. Download the Dockerfile and fungap.conf from this gist to an empty directory
mkdir fungap
cd fungap
wget https://gist.githubusercontent.com/lmtani/d37343a40e143b59336e4606055d1723/raw/Dockerfile
wget https://gist.githubusercontent.com/lmtani/d37343a40e143b59336e4606055d1723/raw/fungap.conf

# 2. Download gm_et_linux_64.tar.gz and gm_key_64.gz and put it in same directory
# 3. Build the image
docker build -t fungap .

Enter Docker image and execute FunGAP pipeline

  1. Go to the directory you have your rna-seq reads and genome fasta.

  2. Enter into a docker container of fungap:

    docker run -it -w /fungap_workspace --rm -v $(pwd):/fungap_workspace fungap bash
  3. Go to /fungap_workspace and use helper script to get Augustus species.

    python /workspace/FunGAP/get_augustus_species.py \
      --genus_name "Saccharomyces" \
      --email_address byoungnammin@lbl.gov
  4. Make protein database

    python /workspace/FunGAP/download_sister_orgs.py \
      --taxon "Saccharomyces" \
      --email_address byoungnammin@lbl.gov
    zcat sister_orgs/*faa.gz > prot_db.faa
  5. Run FunGAP

    python /workspace/FunGAP/fungap.py \
      --output_dir fungap_out \
      --trans_read_1 SRR1198667_sampled_1.fastq \
      --trans_read_2 SRR1198667_sampled_2.fastq \
      --genome_assembly GCF_000146045.2_R64_genomic.fna  \
      --augustus_species saccharomyces_cerevisiae_S288C  \
      --sister_proteome prot_db.faa  \
      --num_cores 8

Now you can exit docker container. Your current working directory was mounted inside FunGAP container (on /fungap_workspace) so all output files will be available on your system.

FROM continuumio/miniconda:4.6.14
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y build-essential postgresql postgresql-contrib
RUN conda config --add channels bioconda/label/cf201901 \
&& conda config --add channels conda-forge/label/cf201901 \
&& conda install augustus rmblast maker hisat2 braker busco=3.0.2 blast pfam_scan \
&& pip install biopython bcbio-gff networkx markdown2 matplotlib \
&& cpanm Hash::Merge Logger::Simple Parallel::ForkManager YAML
ENV FUNGAP_DIR=/workspace/FunGAP
WORKDIR /workspace
RUN git clone https://github.com/CompSynBioLab-KoreaUniv/FunGAP.git \
&& cd FunGAP/ \
&& mkdir -p db/pfam \
&& cd db/pfam \
&& wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz \
&& wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz \
&& gunzip Pfam-A.hmm.gz \
&& gunzip Pfam-A.hmm.dat.gz \
&& hmmpress Pfam-A.hmm
# BuscoDB download
RUN cd $FUNGAP_DIR \
&& mkdir -p db/busco \
&& cd db/busco \
&& wget https://busco-archive.ezlab.org/v3/datasets/fungi_odb9.tar.gz \
&& wget https://busco-archive.ezlab.org/v3/datasets/ascomycota_odb9.tar.gz \
&& wget https://busco-archive.ezlab.org/v3/datasets/basidiomycota_odb9.tar.gz \
&& tar -zxvf fungi_odb9.tar.gz \
&& tar -zxvf ascomycota_odb9.tar.gz \
&& tar -zxvf basidiomycota_odb9.tar.gz
# Install GeneMark
COPY gm_et_linux_64.tar.gz .
COPY gm_key_64.gz .
RUN mkdir $FUNGAP_DIR/external/ \
&& mv gm_et_linux_64.tar.gz gm_key_64.gz $FUNGAP_DIR/external/ \
&& cd $FUNGAP_DIR/external/ \
&& tar -zxvf gm_et_linux_64.tar.gz \
&& gunzip gm_key_64.gz \
&& cp gm_key_64 ~/.gm_key \
&& cd $FUNGAP_DIR/external/gm_et_linux_64/ \
&& cp other/reformat_fasta.pl . \
&& perl change_path_in_perl_scripts.pl "/usr/bin/env perl"
# Install RECON
RUN cd $FUNGAP_DIR/external/ \
&& wget http://www.repeatmasker.org/RepeatModeler/RECON-1.08.tar.gz \
&& tar -zxvf RECON-1.08.tar.gz \
&& cd RECON-1.08/src/ \
&& make \
&& make install
# Install RepeatScout 1.0.5
RUN cd $FUNGAP_DIR/external/ \
&& wget http://www.repeatmasker.org/RepeatScout-1.0.5.tar.gz \
&& tar -zxvf RepeatScout-1.0.5.tar.gz \
&& cd RepeatScout-1 \
&& make
# Install NSEG
RUN cd $FUNGAP_DIR/external/ \
&& mkdir nseg \
&& cd nseg \
&& wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/genwin.c \
&& wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/genwin.h \
&& wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/lnfac.h \
&& wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/makefile \
&& wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/nmerge.c \
&& wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/nseg.c \
&& wget ftp://ftp.ncbi.nih.gov/pub/seg/nseg/runnseg \
&& make
ENV TERM xterm
# # Install RepeatMasker 4.0.8
RUN cd $FUNGAP_DIR/external/ \
&& wget http://www.repeatmasker.org/RepeatMasker-open-4-0-8.tar.gz \
&& tar -zxvf RepeatMasker-open-4-0-8.tar.gz
RUN cd $FUNGAP_DIR/external/RepeatMasker \
&& echo -e "\n/opt/conda/bin/perl\n$FUNGAP_DIR/external/RepeatMasker\n/opt/conda/bin/trf\n2\n/opt/conda/bin\nY\n5\n" > tmp \
&& perl ./configure < tmp
# Install RepeatModeler 1.0.11
RUN cd $FUNGAP_DIR/external/ \
&& wget http://www.repeatmasker.org/RepeatModeler/RepeatModeler-open-1.0.11.tar.gz \
&& tar -zxvf RepeatModeler-open-1.0.11.tar.gz \
&& cd RepeatModeler-open-1.0.11 \
&& echo -e "\n/opt/conda/bin/perl\n$FUNGAP_DIR/external/RepeatModeler-open-1.0.11\n$FUNGAP_DIR/external/RepeatMasker\n$FUNGAP_DIR/external/RECON-1.08/bin\n$FUNGAP_DIR/external/RepeatScout-1\n$FUNGAP_DIR/external/nseg\n/opt/conda/bin\n1\n/opt/conda/bin\nY\n3\n" > tmp \
&& perl ./configure < tmp \
&& cd ..
# Add fungap.conf
ADD https://gist.githubusercontent.com/lmtani/d37343a40e143b59336e4606055d1723/raw/fungap.conf \
$FUNGAP_DIR/
##########
## Trinity
ENV TRINITY_VERSION="2.8.5"
ENV TRINITY_CO="d35f3c1149bab077ca7c83f209627784469c41c6"
RUN apt-get update && apt-get install -y cmake build-essential gcc g++ bowtie2 jellyfish default-jre curl libdb-dev zlib1g-dev bzip2 libncurses5-dev \
&& cd $FUNGAP_DIR/external \
&& git clone https://github.com/trinityrnaseq/trinityrnaseq.git \
&& cd trinityrnaseq \
&& git checkout $TRINITY_CO \
&& make && make plugins
## Jellyfish
RUN cd $FUNGAP_DIR/external \
&& wget https://github.com/gmarcais/Jellyfish/releases/download/v2.2.7/jellyfish-2.2.7.tar.gz \
&& tar xvf jellyfish-2.2.7.tar.gz \
&& cd jellyfish-2.2.7/ \
&& ./configure \
&& make
## Salmon
RUN cd $FUNGAP_DIR/external \
&& wget https://github.com/COMBINE-lab/salmon/releases/download/v0.9.1/Salmon-0.9.1_linux_x86_64.tar.gz \
&& tar xvf Salmon-0.9.1_linux_x86_64.tar.gz
ENV PATH=${PATH}:$FUNGAP_DIR/external/trinityrnaseq:$FUNGAP_DIR/external/Salmon-latest_linux_x86_64/bin/:$FUNGAP_DIR/external/jellyfish-2.2.7/bin/
# Need to enter container, configure RepeadMask and RepeatModeler manually.
#python /workspace/FunGAP/fungap.py \
# --output_dir fungap_out \
# --trans_read_1 sscita_1.fastq \
# --trans_read_2 sscita_2.fastq \
# --genome_assembly genome/pilon.fasta \
# --augustus_species ustilago_maydis \
# --sister_proteome sister_prot/prot_db.faa \
# --num_cores 10
PFAM_DB_PATH=/workspace/FunGAP/db/pfam
BUSCO_DB_PATH=/workspace/FunGAP/db/busco/basidiomycota_odb9
GENEMARK_PATH=/workspace/FunGAP/external/gm_et_linux_64/gmes_petap.pl
GMHMME3_PATH=/workspace/FunGAP/external/gm_et_linux_64/gmhmme3
PROBUILD_PATH=/workspace/FunGAP/external/gm_et_linux_64/probuild
BUILDDATABASE_PATH=/workspace/FunGAP/external/RepeatModeler-open-1.0.11/BuildDatabase
REPEATMODELER_PATH=/workspace/FunGAP/external/RepeatModeler-open-1.0.11/RepeatModeler
HISAT2_PATH=/opt/conda/bin/hisat2
TRINITY_PATH=/workspace/FunGAP/external/trinityrnaseq/Trinity
MAKER_PATH=/opt/conda/bin/maker
GFF3_MERGE_PATH=/opt/conda/bin/gff3_merge
FASTA_MERGE_PATH=/opt/conda/bin/fasta_merge
MAKER2ZFF_PATH=/opt/conda/bin/maker2zff
FATHOM_PATH=/opt/conda/bin/fathom
FORGE_PATH=/opt/conda/bin/forge
HMM_ASSEMBLER_PATH=/opt/conda/bin/hmm-assembler.pl
BRAKER1_PATH=/opt/conda/bin/braker.pl
BUSCO_PATH=/opt/conda/bin/run_busco
PFAM_SCAN_PATH=/opt/conda/bin/pfam_scan.pl
BLASTP_PATH=/opt/conda/bin/blastp
BLASTN_PATH=/opt/conda/bin/blastn
BLASTX_PATH=/opt/conda/bin/blastx
MAKEBLASTDB_PATH=/opt/conda/bin/makeblastdb
SAMTOOLS_PATH=/opt/conda/bin/samtools
BAMTOOLS_PATH=/opt/conda/bin/bamtools
AUGUSTUS_PATH=/opt/conda/bin/augustus
@rwmurdoch
Copy link

rwmurdoch commented Dec 30, 2020

Current Biopython version (1.78) is not compatible with python 2.7. The dockerfile needs to specify "biopython=1.76" in order for this build to work. Thanks for all of your efforts :)

Edit: BUSCO version 3.0.2 does not accept the "--list-datasets" command, thus the initial "check_inputs.py" script terminates the pipeline. Maybe lock all programs to whatever versions were used during initial FunGAP development?

@lmtani
Copy link
Author

lmtani commented Dec 30, 2020

Hi! This gist is not updated, sorry about that. Could you try to follow these instructions?

https://github.com/CompSynBioLab-KoreaUniv/FunGAP/tree/master/docker

Please let me know if you have more problems. A while ago I've prepared this video, today I uploaded it to youtube.

https://youtu.be/naWbozG_6b4

@rwmurdoch
Copy link

Thanks so much for redirecting me. I'm glad this project is being maintained and that you guys are on top of things! The new instructions led to a sucessful build and my test set is running already. Not upset, I learned a lot during my patchwork repair attempts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment