Skip to content

Instantly share code, notes, and snippets.

@jfear
Last active March 17, 2023 00:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jfear/f6a837e50d1c129efcada3febca69276 to your computer and use it in GitHub Desktop.
Save jfear/f6a837e50d1c129efcada3febca69276 to your computer and use it in GitHub Desktop.
Example using loops and regular expressions
#!/bin/bash
# I personally would merge sample across lane. Gzip file can just be concated
# together and as long as R1 and R2 are done in the same order it should work
# fine.
#
# going from
# - 1860_Time_0_S1_L001_R1_001.fastq.gz
# - 1860_Time_0_S1_L002_R1_001.fastq.gz
# - 1860_Time_0_S1_L002_R1_001.fastq.gz
# - 1860_Time_0_S1_L004_R1_001.fastq.gz
# to
# - 1860_Time_0_R1.fastq.gz
pattern='([[:digit:]]+)_(Time_[[:digit:]]+)_(S[[:digit:]])_(L[[:digit:]]{3})_(R[12])_001.fastq.gz'
for file_name in *_001.fastq.gz
do
[[ "$file_name" =~ $pattern ]]
sample_name="${BASH_REMATCH[1]}"
time="${BASH_REMATCH[2]}"
lane="${BASH_REMATCH[4]}"
read="${BASH_REMATCH[5]}"
cat $file_name >> "${sample_name}_${time}_${read}.fastq.gz"
done
# Then we can do the same thing but iterate over the new files only. Since I
# want R1 and R2 together, I will just iterate over the new R1s.
#
pattern2='(.*)_R1.fastq.gz'
for file_name in *R1.fastq.gz
do
[[ "$file_name" =~ $pattern2 ]]
prefix="${BASH_REMATCH[1]}"
read1="${prefix}_R1.fastq.gz"
read2="${prefix}_R2.fastq.gz"
/usr/local/usrapps/florfenicolamr/fastp \
-i $read1 -I $read2 \
-o "new${prefix}_R1.fastq.gz" -O "new${prefix}_R2.fastq.gz" \
-l 36 \
-c \
-f 4 -F 4 \
-t 4 -T 4 \
-m --merged_out "${prefix}_merged.fastq.gz" \
--unpaired1 "${prefix}_U1.fastq.gz" --unpaired2 "${prefix}_U2.fastq.gz" \
-h "${prefix}.html"
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment