NOTE: These instructions are for working off of the KwanLab/dev branch
- Install Autometa environment and commands
- Configure nextflow so Autometa commands can be run through your scheduler
- Configure run parameters (Set metagenome filepath and output directories)
- Run autometa pipeline using nextflow
cd $HOME
git clone --branch dev https://github.com/KwanLab/Autometa
cd Autometa
# NOTE: For a list of all available make options just type `make` with no arguments
# Build Autometa image (requires docker)
# This will create the docker image --> jason-c-kwan/autometa:dev
make image
cd $HOME
git clone --branch dev https://github.com/KwanLab/Autometa
cd Autometa
# NOTE: For a list of all available make options just type `make` with no arguments
#Create conda env for Autometa (will create a conda env named autometa)
make create_environment
#Activate Autometa conda environment
conda activate autometa
#Install Autometa commands within environment
make install
# hmmpress markers for single-copy marker gene guided binning
DB_DIR="$HOME/Autometa/autometa/databases"
hmmpress -f autometa/databases/markers/bacteria.single_copy.hmm \
&& hmmpress -f autometa/databases/markers/archaea.single_copy.hmm \
&& autometa-config --section databases --option base --value ${DB_DIR} \
&& echo "databases base directory set in ${DB_DIR}/"
NOTE: After make install
you will have access to all of the autometa commands. For more information on these commands see the step-by-step tutorial in the documentation.
For nextflow to run the Autometa pipeline through a job scheduler (e.g. SLURM) you will need to update the respective 'profile' section in nextflow's config file. Each 'profile' may be configured with any available scheduler as noted in the nextflow executors docs. By default nextflow will use your local computer as the 'executor'. The next section briefly walks through nextflow executor configuration to run with the slurm job scheduler.
NOTE: You can find the available slurm partitions by running sinfo
Run sinfo to see what partitions are available (e.g. ours is depicted below)
Will need to change $HOME/Autometa/nextflow.config
// Find this section of code in nextflow.config
}
slurm {
process.executor = "slurm"
// queue is the slurm partition to use.
// Set SLURM partition with queue directive.
process.queue = "queue" // <<-- change this to whatever your partition is called
// See https://www.nextflow.io/docs/latest/executor.html#slurm for more details.
}
More parameters that are available for the slurm executor are listed in the nextflow executor docs for slurm
You can use/alter the default template parameters config file here: $HOME/Autometa/nextflow/parameters.config
NOTE: Data inputs must be wrapped in 'single quotes' or "double quotes"
data="$HOME/autometa_results"
mkdir -p "${data}/raw"
cp path/to/your/final.contigs.fa "${data}/raw/."
params.metagenome = "$HOME/autometa_results/raw/final.contigs.fa" // <<-- Path to your metagenome
params.interim = "$HOME/autometa_results/interim" // <<-- Path to where you want interim results stored - This will make a directory to store intermediate results
params.processed = "$HOME/autometa_results/processed" //<<-- Path to where you want final results stored - This will make a directory to store final results
Database directory path must contain the following:
-
diamond formatted nr file => nr.dmnd
- Perform the following:
# Download nr.gz wget -O $HOME/Autometa/autometa/databases/ncbi/nr.gz ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz # Set the number of threads you have available: num_threads=4 # Format with diamond diamond makedb --in $HOME/Autometa/autometa/databases/ncbi/nr.gz --db $HOME/Autometa/autometa/databases/ncbi/nr -p $num_threads
-
Extracted files from tarball taxdump.tar.gz
wget -O $HOME/Autometa/autometa/databases/ncbi/taxdump.tar.gz ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz cd $HOME/Autometa/autometa/databases/ncbi/ tar -xvzf taxdump.tar.gz cd -
-
prot.accession2taxid.gz
wget -O $HOME/Autometa/autometa/databases/ncbi/prot.accession2taxid.gz ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz
params.ncbi_database = "$HOME/Autometa/autometa/databases/ncbi" // <<-- Update this path to folder with all NCBI databases (You will NOT need to update this if you followed the downloads from above)
You may also find the links to the above database files in the Autometa databases documentation
params.cpus = 2 // <<-- Number of CPUs each job uses
// Metagenome Length filtering
params.length_cutoff = 3000 // <<-- Smallest contig you want binned (3000 is default)
// Kmer counting/normalization/embedding
params.kmer_size = 5
params.kmer_norm_method = "am_clr" // choices: "am_clr" (default), "clr", "ilr"
params.kmer_pca_dimensions = 50
params.kmer_embed_method = "bhsne" // choices: "sksne", "bhsne" (default), "umap"
params.kmer_embed_dimensions = 2
// Binning parameters
params.kingdom = "bacteria" // choices: "bacteria", "archaea"
params.classification_kmer_pca_dimensions = 50
params.clustering_method = "dbscan" // choices: "dbscan", "hdbscan"
params.binning_starting_rank = "superkingdom" // choices: "superkingdom", "phylum", "class", "order", "family", "genus", "species"
params.classification_method = "decision_tree" // choices: "decision_tree", "random_forest"
params.completeness = 20.0 // Will keep clusters over 20% complete
params.purity = 95.0 // Will keep clusters over 95% pure
params.cov_stddev_limit = 25.0 // Will keep clusters less than 25% coverage std.dev.
params.gc_stddev_limit = 5.0 // Will keep clusters less than 5% GC% std.dev.
NOTE: This is from within the Autometa directory. If you would like to run the workflow outside of the autometa directory, you will need to supply an additional configuration argument to nextflow that holds your executor configuration.
NOTE: nextflow will find the nextflow.config
file from the current directory, so the executor configuration will be available by default.
nextflow run \
# main logic of autometa workflow
$HOME/Autometa/main.nf \
# supplying profile is only needed if you have configured SLURM or some other executor profile
-profile slurm \
# parameters configuration
-c $HOME/Autometa/nextflow/parameters.config \
# working directory where nextflow intermediate/tmp dirs/files will be written
-w $HOME/autometa_results/work
nextflow run $HOME/Autometa/main.nf \
# executor configuration
-c $HOME/Autometa/nextflow.config \
# parameters configuration
-c </path/to/your/parameters.config> \
# working directory where nextflow intermediate/tmp dirs/files will be written
-w </path/to/nextflow/work/directory> \
# available profiles mentioned above are slurm, chtc and standard (default)
-profile <profile to use>