Skip to content

Instantly share code, notes, and snippets.

@dgcamblor
Created May 27, 2024 13:28
Show Gist options
  • Save dgcamblor/696453631c47950ce71c79aa89eb766a to your computer and use it in GitHub Desktop.
Save dgcamblor/696453631c47950ce71c79aa89eb766a to your computer and use it in GitHub Desktop.
This function automates FASTQ downsampling with seqtk, which can be easily parallelized with GNU parallel
#!/bin/bash
downsample_fastq() {
local input_fastq=$1 #$ FASTQ in .gz compression (ID_1.fastq.gz)
local downsample_factor=$2
local base_name=$(basename "$input_fastq" .fastq.gz)
local prefix="${base_name%%_*}"
local suffix="${base_name#*_}"
local output_fastq="${prefix}s_${suffix}.fastq.gz" # Add an "s" to the ID
seqtk sample -s12 "$input_fastq" "$downsample_factor" | pigz > "$output_fastq"
echo "Output file: $output_fastq"
}
export -f downsample_fastq
# Parallelize the function using GNU parallel
# parallel downsample_fastq ::: ID_1.fastq.gz ID_2.fastq.gz ::: 0.01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment