Skip to content

Instantly share code, notes, and snippets.

@DrYak
Created November 3, 2023 09:23
Show Gist options
  • Save DrYak/2f650c96b77033f7d44a82144596d567 to your computer and use it in GitHub Desktop.
Save DrYak/2f650c96b77033f7d44a82144596d567 to your computer and use it in GitHub Desktop.
NOTES running MeSS with LIMBO

LIMBO

configuration, config/config.yaml:

num_samples: 60
max_genotypes: 2
genome_path: genomes

(see workflow/schema/config.schema.json for details)

genome path genomes/:

  • splitting fasta files
    while read l; do if [[ $l =~ ^\>([^[:space:]]+) ]]; then F="genomes/${BASH_REMATCH[1]}.fasta"; rm -f "${F}"; echo $F; fi; echo "$l" >> "${F}"; done < bacteria.fasta
  • results:
    > ls -l genomes/
    total 6156
    -rw-r--r-- 64 dryak users 2344128 Nov  2 18:04 NZ_CP031133.1.fasta
    -rw-r--r-- 81 dryak users 3913384 Nov  2 18:04 NZ_CP102358.1.fasta
    -rw-r--r-- 58 dryak users     698 Nov  2 18:04 NZ_JABAHH010000080.1.fasta
    -rw-r--r-- 55 dryak users   33207 Nov  2 18:04 NZ_KI391983.1.fasta

run the snakemake, (assuming we're in a working/ subdirectory next to git clone):

snakemake --snakefile ../workflow/Snakemake --configfile config/config.yaml --use-conda --cores 1  all_mess

(dependencies are automatically handled by snakemake)

MeSS

per-sample configuration file, mess_config.yml:

input_table_path: mess_sample1.tsv
sd_read_num: 0
sd_rep: 0
replicates: 1
community_name: sample1
seq_tech: illumina
read_status: paired
illumina_sequencing_system: HS20
illumina_read_len: 100
illumina_mean_frag_len: 200
illumina_sd_frag_len: 20
set_seed: 20
NCBI_key: your_ncbi_key
NCBI_email: your_ncbi_email
complete_assemblies: True
reference_assemblies: False
representative_assemblies: False
exclude_from_metagenomes: True
Genbank_assemblies: True
Refseq_assemblies: True
Rank_to_filter_by: False
seed: 1
bam: False

install the dependencyes and runnin the snakemake (assuming that there's a git clone in MeSS. One could alternatively use the mess run wrapper):

# install as per README
mamba create -n mess mess
mamba activate mess
# missing dependency, as per MeSS' messenv.yml
mamba install art
mamba install seqkit
mamba install biopython
snakemake --snakefile ../../MeSS/mess/scripts/Snakefile --configfile mess_config.yml  --use-conda  --resources ncbi_requests=3 nb_simulation=2 parallel_cat=2 --cores 4 all_sim

Results will be in `simreads/samples1-1_R1.fq.gz' etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment