Skip to content

Instantly share code, notes, and snippets.

@ShaiberAlon
Last active June 15, 2017 17:16
Show Gist options
  • Save ShaiberAlon/4bade075f0b65ca707196c798e9e6391 to your computer and use it in GitHub Desktop.
Save ShaiberAlon/4bade075f0b65ca707196c798e9e6391 to your computer and use it in GitHub Desktop.
Create a fake gene call from nucleotides in the range between two pre-existing gene calls
# Short bash to create an artificial gene call for a sequence for a collection of genes
# the artificial gene call would start at the first nucleotide of the first gene (denoted n_i) and end at the
# last nucleotide of the second gene (denoted n_f). The assumption is that n_i < n_f.
# Another assumption: both genes are from the same contig.
# The output is a file in the external gene calls format of anvi'o
# Example
# $ bash create_my_gene_calls_file.shx MY_GENE_CALL_FILE.txt CONTIGS.db 45 105
# this will give a gene call starting from the first nucleotide of gene 45 and ending at the last nucleotide of gene 105
# INPUTS:
output=$1 # the name of the output file
db=$2 # path to contigs database
start_gene=$3 # gene call for the first gene
end_gene=$4 # gene call for the last gene
n_i=`sqlite3 $db "select * from genes_in_contigs where gene_callers_id==$start_gene" | cut -f 3 -d \|`
n_f=`sqlite3 $db "select * from genes_in_contigs where gene_callers_id==$end_gene" | cut -f 4 -d \|`
contig_id=`sqlite3 $db "select * from genes_in_contigs where gene_callers_id==$start_gene" | cut -f 2 -d \|`
echo -e "gene_callers_id\tcontig\tstart\tstop\tdirection\tpartial\tsource\tversion" > $output
echo -e "0\t$contig_id\t$n_i\t$n_f\tf\t0\tTHE_BEST_GENE_CALL_SOFTWARE\tv_inf" >> $output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment