Created
October 12, 2010 09:13
-
-
Save astatham/621901 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cat hg18.fa | awk '{ | |
if (substr($0, 1, 1)==">") {filename=(substr($0,2) ".fa")} | |
print $0 > filename | |
}' |
Amazing. Worked the first time. Not bad for an 11 year old script. Thanks!
I was running into the 'Too many open files' error.
Here is a fix:
cat hg18.fasta | awk '{
if (substr($0, 1, 1)==">") {filename=(substr($0,2) ".fasta")}
print $0 >> filename
close(filename)
}'
Fix originally provided here: https://unix.stackexchange.com/questions/498001/splitting-file-by-1st-column-too-many-open-files
In case fasta headers contain additional information (after whitespace), e.g. >chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38
:
cat hg18.fasta | awk '{
if (substr($0, 1, 1)==">") {filename=(substr($1,2) ".fasta")}
print $0 >> filename
close(filename)
}'
I got a problem bc "File name too long" after the "> entry". Is there any easy way to shorten this names? TYIA
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
thanks, it worked