Skip to content

Instantly share code, notes, and snippets.

#!/nfs/users/nfs_m/mh6/.linuxbrew/homebrew/bin/ruby
# usage: ruby get_rnaseq_number_for_species.rb "Xestospongia muta"
require "net/http"
require "uri"
require "cgi"
species = ARGV[0]
species = CGI.escapeURIComponent(species)
@epaule
epaule / gist:6a6b78f07a9ddc64e0a58fbd8c12094e
Last active November 23, 2023 13:19
find_omark_contamination.pl
#!/bin/env perl
open IN,shift;
while(<IN>){
push @h,"$1" if />(\S+)/
}
close IN;
open IN,shift;
while(<IN>){
@F=split;
next unless $F[2] eq 'CDS';
@epaule
epaule / get_lineage_from_species.cr
Last active November 23, 2023 13:21
get taxonomic lineage from ENA for a species name
#!/usr/bin/env crystal
require "http/client"
require "json"
species = ARGV[0]
species = species.gsub(/\s/, "%20")
r = HTTP::Client.get("https://www.ebi.ac.uk/ena/taxonomy/rest/scientific-name/#{species}", headers: HTTP::Headers{"Accept" => "application/json"})
raise "cannot get the taxonomy" unless r.success?
#!/usr/bin/env ruby
# frozen_string_literal: true
# add > signs to the nearest gaps in a TPF based on AGP breaks
# usage: ./mark_scaffolds.rb -a AGP -t TPF -o TPF2
require 'optparse'
# get the gaps from the AGP as Hash of Arrays [seq_id] => [1,2,3]
def parse_agp(file)
@epaule
epaule / cut.txt
Last active September 7, 2023 09:18
chromosome renaming
Step 9 - Rename and re-run maps
It is likely that between the pairs of chromosomes there will be discrepancies in size. This will result in the files being in different orders when they get size sorted. This needs to be corrected before submission. The can be done manually or using the following method:
Run:
bsub -G grit-grp -n 16 -e e_hap2hap_mapping -o o_hap2hap_mapping -M 16000 -R 'select[mem>16000] rusage[mem=16000] span[hosts=1]' sh /software/grit/projects/vgp_curation_scripts/hap2_hap1_ID_mapping.sh <h1_fasta> <h2_fasta>
Then:
/software/grit/projects/vgp_curation_scripts/update_mapping.rb -c <renamed_hap2_chroms.csv> -f <hap2.fa> -t <hap2_hap1.tsv> > hap2.renamed.fa
#!/bin/bash
# simple files
for f in *.bed *.contamination *recommendation *report.txt *taxonomy.rpt */*/*combined_summary.csv *.fa *.gb
do
perl -i_original -pne 's/scaffold_/SCAFFOLD_/' $f
done
# compressed fastas
for f in *.gz
@epaule
epaule / BUSCO_synteny_v1.1.sh
Last active July 17, 2023 14:59
curation scripts
#!/bin/bash
# Script by Tom Mathers.
# Script requires 32 cores and ~50Gb ram (depending on genome size).
# Seqkit and BUSCO need to be in path.
# Scaffold IDs need to be in final tol format.
# Query and ref short IDs need to be two letter codes eg. "Ac".
# Query and ref fasta files need to be in working dir.
@epaule
epaule / elixir snippet
Created December 9, 2022 11:48
comments on CSV parsing using Elixir & Nimnble & Flow from https://www.poeticoding.com/processing-large-csv-files-with-elixir-streams/
NimbleCSV.define(MyParser, separator: ";", escape: "\"")
def load_stream file do
file
|> File.stream!()
|> Flow.from_enumerable()
|> Flow.map(&MyParser.parse_string/1)
|> Enum.count()
end
@epaule
epaule / cobiont_stats.cr
Last active November 23, 2022 13:06
quick thing to parse two cobiont files
#!/bin/env crystal
# cobiont_stats.cr contamination_file1 contamination_file_2
# returns a Array of String
def read_contamination_file(file : String)
ids = Array(String).new
File.each_line(file){|line|
ids << $1 if /REMOVE\s+(\S+)/.match(line)
}
return ids.uniq
@epaule
epaule / filter_merged.cr
Last active March 17, 2023 16:55
rough parser for a ASG contamination file
#!/bin/env crystal
require "option_parser"
phylum="Arthropoda|insect"
dir="20230226_qqAmaFero1.20230225.haplotigs.fa_asg_cobiont_check_run/collected_tables/"
OptionParser.parse do |parser|
parser.banner = "Usage: filter_merged --phylum xyz --infile <in.merged>"
parser.on("-p PHYLUM","--phylum=PHYLUM","Specifies the phylum(s) of the host separated by | [default=#{phylum}]"){|p|phylum=p}
parser.on("-d directory","--directory=DIR","merged ASG directory[default=#{dir}]"){|d|dir=d}