Skip to content

Instantly share code, notes, and snippets.

@darencard
Last active February 15, 2017 14:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save darencard/da60d32a43a46f9fa93a4fb808b55266 to your computer and use it in GitHub Desktop.
Save darencard/da60d32a43a46f9fa93a4fb808b55266 to your computer and use it in GitHub Desktop.
shell one-liner that parses the fasta headers from the NCBI python genome. will likely work on other genomes from NCBI as well.

shell one-liner that parses the fasta headers from the NCBI python genome. will likely work on other genomes from NCBI as well.

output fields:

  1. transcript ID
  2. full transcript ID w/ version (.1, .2, etc.)
  3. full gene identifier (watch out for spaces and weird symbols)
  4. gene symbol
  5. transcript variant (watch out for spaces), with NA meaning none
  6. type of transcript (mRNA, ncNRA, etc.)
paste \
<(zcat GCF_000186305.1_Python_molurus_bivittatus-5.0.2_rna.fna.gz | \
  grep ">" | sed 's/>//g' | cut -d " " -f 1 | awk -v OFS="\t" -F '.' '{ print $1, $1FS$2 }') \
<(zcat GCF_000186305.1_Python_molurus_bivittatus-5.0.2_rna.fna.gz | grep ">" | sed 's/>//g' | \
  cut -d " " -f 5- | awk -v OFS="\t" -F ', ' '{ if ($(NF-1) ~ /^transcript/) print $(NF-2); else print $(NF-1) }' | \
  rev | cut -d " " -f 2- | rev) \
<(zcat GCF_000186305.1_Python_molurus_bivittatus-5.0.2_rna.fna.gz | grep ">" | sed 's/>//g' | \
  cut -d " " -f 5- | awk -v OFS="\t" -F ', ' '{ if ($(NF-1) ~ /^transcript/) print $(NF-2); else print $(NF-1) }' | \
  rev | cut -d " " -f 1 | rev | sed -e 's/(//g' -e 's/)//g') \
<(zcat GCF_000186305.1_Python_molurus_bivittatus-5.0.2_rna.fna.gz | grep ">" | sed 's/>//g' | \
  cut -d " " -f 5- | awk -v OFS="\t" -F ', ' '{ if ($(NF-1) ~ /^transcript/) print $(NF-1); else print "NA" }') \
<(zcat GCF_000186305.1_Python_molurus_bivittatus-5.0.2_rna.fna.gz | grep ">" | sed 's/>//g' | \
  cut -d " " -f 5- | awk -v OFS="\t" -F ', ' '{ print $NF }') \
> GCF_000186305.1_Python_molurus_bivittatus-5.0.2_rna_metadata_parsed.tsv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment