Skip to content

Instantly share code, notes, and snippets.

@acvill
Last active March 24, 2022 14:53
Show Gist options
  • Save acvill/71e2c83c45754a68aa0a3bd5e8c85fe8 to your computer and use it in GitHub Desktop.
Save acvill/71e2c83c45754a68aa0a3bd5e8c85fe8 to your computer and use it in GitHub Desktop.
#!/bin/bash
# FUNCTION
## this script recursively searches a directory for fasta files matching a pattern
## found files are concatenated and sorted by descending sequence length
# INPUT
## first positional parameter is the directory to search
## second is the pattern to match in fasta filenames
## third is the output filename
# USAGE EXAMPLE
## ./find_combine_sort.sh . phage.fna sort.fna
find ${1} -name "*${2}*" | \
xargs cat | \
awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' | \
awk -F '\t' '{printf("%d\t%s\n",length($2),$0);}' | \
sort -k1,1nr | \
cut -f 2- | \
tr "\t" "\n" \
> ${3}
@acvill
Copy link
Author

acvill commented Mar 24, 2022

Based on this Biostars post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment