Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 27 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save astatham/621901 to your computer and use it in GitHub Desktop.
Save astatham/621901 to your computer and use it in GitHub Desktop.
cat hg18.fa | awk '{
if (substr($0, 1, 1)==">") {filename=(substr($0,2) ".fa")}
print $0 > filename
}'
@Joytee
Copy link

Joytee commented Jul 19, 2021

thanks, it worked

@svedwards
Copy link

Amazing. Worked the first time. Not bad for an 11 year old script. Thanks!

@wudustan
Copy link

wudustan commented Nov 4, 2021

I was running into the 'Too many open files' error.

Here is a fix:

cat hg18.fasta | awk '{
        if (substr($0, 1, 1)==">") {filename=(substr($0,2) ".fasta")}
        print $0 >> filename
        close(filename)
}'

Fix originally provided here: https://unix.stackexchange.com/questions/498001/splitting-file-by-1st-column-too-many-open-files

@DzmitryGB
Copy link

In case fasta headers contain additional information (after whitespace), e.g. >chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38:

cat hg18.fasta | awk '{
        if (substr($0, 1, 1)==">") {filename=(substr($1,2) ".fasta")}
        print $0 >> filename
        close(filename)
}'

@diegogarciamartinezdeartola

I got a problem bc "File name too long" after the "> entry". Is there any easy way to shorten this names? TYIA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment