Skip to content

Instantly share code, notes, and snippets.

@epaule
epaule / filter_fasta.rb
Last active July 8, 2022 16:09
filter fasta file by size (less than)
#!/usr/bin/env ruby
# usage: ruby filter_fasta.rb size fasta.file
require 'bio'
s = ARGV.shift.to_i
Bio::FlatFile.auto(ARGF) do |ff|
ff.each do |entry|
if entry.seq.length < s
# SUPER_1_1 is 1085289420bp
# SUPER_1_2 joins and both become SUPER_1
samtools view -h split_1.mapped.bam |perl -pne 's/SUPER_1_1/SUPER_1/g' |perl -ne 'chomp;@F=split(/\t/,$_);next if $F[1] eq "SN:SUPER_1";if(/SN:SUPER_1_2/){$F[1]="SN:SUPER_1";$F[2]="LN:2170579067"};if($F[2] eq "SUPER_1_2"){$F[2]="SUPER_1";$F[3]+=1085289420;$F[7]+=1085289420 if $F[6] eq "="};if($F[6] eq "SUPER_1_2"){$F[6]="SUPER_1";$F[7]+=1085289420};print join "\t", @F;print "\n"' | /software/grit/conda/envs/snake_env/bin/PretextMap --sortby nosort --mapq 0 -o fixed.pretext --highRes
GitHub Repositories
=======================
contamination files: https://github.com/epaule/btk_sequences_to_remove
blast scripts: https://github.com/Aquatic-Symbiosis-Genomics-Project/BLAST-scripts
decon blast
===========
bash decon_blastBTK.sh <FASTA-file> <CSV file with ticks> <output directory>
Useful one-liners:
@epaule
epaule / rc_release_bump.rb
Last active January 24, 2022 13:46
create the new files for a rc release
#!/usr/bin/env ruby
require "fileutils"
class Assembly
def initialize(id,dir)
@id=id
@dir=dir
end
@epaule
epaule / fix_braker_gtf.pl
Last active October 8, 2021 15:55
fix the braker GTF, so it can be used with htsseq
#!/usr/bin/env perl
while (<>){
chomp;
@F=split(/\s\s+/);
if ($F[2] eq 'gene'){
my $t = "gene_id \"$F[-1]\";";
$F[-1]=$t;
}elsif($F[2] eq 'transcript'){
my $t="transcript_id \"$F[-1]\";";
@epaule
epaule / filter_tpf.pl
Created July 22, 2021 08:12
remove scaffolds from a TPF based on a decon file
#!/usr/bin/env perl
# filter_tpf.pl decon_file TPF
# * will leave gaps/etc in the file
my %ids;
open IN,$ARGV[0];
while (<IN>){
$ids{$1}=1 if /^REMOVE\s+(\w+)/
}
close IN;
@epaule
epaule / change_mrna_gff3.pl
Last active February 20, 2019 11:29
fiddles with the GFF3 mRNA spans based on CDSes
#!/usr/bin/env perl
my $inf = shift;
open IN, $inf;
my %cds;
# slurpy block
while (<IN>){
@epaule
epaule / genbankCDS2fasta.rb
Last active April 16, 2018 10:18
extract CDSes from a GenBank file and print it as FASTA
#!/usr/bin/env ruby
require 'bio'
ff = Bio::FlatFile.new(Bio::GenBank, ARGF)
ff.each_entry{|gb|
gb.each_cds{|cds|
position = cds.position
puts gb.naseq.splicing(position).to_fasta(cds.to_hash['protein_id'][0],60)
}
}
@epaule
epaule / fix_geneace_orthologs.pl
Created November 21, 2017 14:23
search for missing tags in orthologs and assign them based on the reverse edge
#!/usr/bin/env perl
use Ace;
my $db = Ace->connect(-path => shift)||die(Ace::Error);
my $genes = $db->fetch_many(-query => 'find Gene Species="Pristionchus pacificus"; Ortholog');
while (my $gene = $genes->next){
foreach my $o ($gene->Ortholog){
@epaule
epaule / dump_species_functional_descriptions.pl
Last active April 19, 2017 14:34
script to dump gene descriptions
#!/usr/bin/perl
#
# dumps gene descriptions
#
# Options:
# -format <record || tab> (defaults to record)
# -species <name> WormBase species name
# -store <storable file> pass a stored config
# -debug <user> send log mails to user
# -test use the test database