Skip to content

Instantly share code, notes, and snippets.

#!/usr/bin/env perl
#
# 2019, Peter Menzel, Labor Berlin
# This script converts the Kraken output format into
# a 3-column output format with columns:
# 1: read name
# 2: taxon id
# 3: score, which is defined as the fraction of k-mers having that taxon id and its ancestors over all k-mers
#
# use as input for ktImportTaxonomy
@pmenzel
pmenzel / make-igv-genome-json.pl
Created February 8, 2023 15:33
make-igv-genome-json.pl
#!/usr/bin/env perl
# make-igv-genome-json.pl
# Peter Menzel
#
# This script creates an IGV genome json file (https://github.com/igvteam/igv/wiki/JSON-Genome-Format)
# The arguments are the paths to fasta, fai and gff files.
#
# Example usage:
# datasets download genome accession GCF_002101575.1
# unzip ncbi_dataset.zip
@pmenzel
pmenzel / gist:c0d4dbd9f71e0ec5e4b60a0b11d39778
Last active December 18, 2022 11:13
Advent of Code 2022 in mostly Perl
===========================
Days 10 ...
===========================
see https://github.com/pmenzel/advent-of-code/tree/master/2022
===========================
Day 9
===========================
Part 1:
@pmenzel
pmenzel / rotate_assembly.pl
Created June 18, 2022 14:44
Perl script for rotating a fasta sequence to start a specified gene search by BLAST
#!/usr/bin/env perl
#
# rotates and (if necessary reverse complements) an assembly of a circular genome
# so that it starts with the sequence having the best blast hit to a gene database,
# e.g. dnaA
#
# depends on blastn being installed
#
# Example usage:
# rotate_assembly.pl assembly.fasta dnaA.fa > rotated_assembly.fa
@pmenzel
pmenzel / softclip_cigar.pl
Last active April 9, 2021 10:03
Modify SAM CIGAR string to soft-clip the last n bases
my $cigar = "30M40D50M";
my $n_soft = 71;
my $out_cigar = "";
# there are already at least $n_soft soft-clipped bases at the end of the CIGAR
# then do nothing
if($cigar =~ m/(\d+)S$/ and $1 >= $n_soft) {
$out_cigar = $cigar;
}
@pmenzel
pmenzel / lca.cpp
Created July 16, 2018 18:23
calculate Least Common Ancestor from NCBI taxon ids
/* lca.cpp, 2018, Peter Menzel
1. Download https://github.com/bioinformatics-centre/kaiju/ and compile
2. Copy lca.cpp to kaiju/src
3. Compile lca.cpp with:
g++ -O3 -std=c++11 -I./include/ncbi-blast+ -o lca lca.cpp Config.o util.o bwt/bwt.o bwt/compactfmi.o bwt/sequence.o bwt/suffixArray.o include/ncbi-blast+/algo/blast/core/blast_seg.o include/ncbi-blast+/algo/blast/core/blast_util.o include/ncbi-blast+/algo/blast/core/blast_filter.o include/ncbi-blast+/algo/blast/core/ncbi_std.o include/ncbi-blast+/algo/blast/core/blast_program.o include/ncbi-blast+/algo/blast/core/blast_encoding.o include/ncbi-blast+/algo/blast/core/blast_query_info.o include/ncbi-blast+/algo/blast/core/blast_stat.o include/ncbi-blast+/algo/blast/core/blast_options.o include/ncbi-blast+/algo/blast/core/blast_message.o include/ncbi-blast+/algo/blast/core/ncbi_math.o include/ncbi-blast+/algo/blast/core/pattern.o include/ncbi-blast+/algo/blast/core/blast_psi_priv.o include/ncbi-blast+/algo/blast/core/blast_dynarray.o include/ncbi-blast+/algo