Skip to content

Instantly share code, notes, and snippets.

@mkweskin
mkweskin / mamba_install.sh
Last active February 16, 2024 18:45
create a conda env with one bioconda package and all its dependencies
#!/bin/sh
# creates a new env for a bioconda package and its dependencies. Then, it display the list
# of newly installed binaries for the focal program.
# The directory where the env is created is: ~/share/apps/bioinformatics/[program]/conda/[version]/
if [ -z $1 ]; then
echo "error: give the name of the bioconda program to be installed when calling this program"
echo " example: $0 augustus"
exit 1
@mkweskin
mkweskin / randomize_columns.md
Created October 24, 2022 17:57
Randomize columns in a CSV

Method to randomize column order in a csv using the BASH

I received a question about how to randomize column order in a text file. I came up with the method that using common Unix command line tools (sort, sed, tr, join) and the BASH shell. It has been tested on the BSD command line tools in macOS 12 (running zsh) and gnu command line tools BASH on CentOS (running bash).

NOTE: this does not handle quoted commas in the CSV. The only commas should be the delimiters.

Randomize all columns

NUM_COLS=10
NUM_RANDOMIZED_OUTPUT=5
@mkweskin
mkweskin / viridal_def_file
Created December 17, 2021 15:22
VirIdAl singularity def file
Bootstrap: docker
From: continuumio/miniconda3
%post
/opt/conda/bin/conda config --add channels defaults
/opt/conda/bin/conda config --add channels bioconda
/opt/conda/bin/conda config --add channels conda-forge
/opt/conda/bin/conda config --add channels omnia
/opt/conda/bin/conda config --add channels plotly
@mkweskin
mkweskin / batch-find-replace.py
Created April 16, 2020 18:32
A general utility to do a batch find/replace. Takes a translation file with the find/replace pairs and a file to be translated.
#!/usr/bin/env python3
"""
Author: Matthew Kweskin, github: @mkweskin
A general utility to read in a delimited translation file with two columns
and rename any text file with these values.
"""
import argparse
@mkweskin
mkweskin / fasta-names-to-md5.py
Last active April 16, 2020 18:30
Python3 script to convert the sequence names in a FASTA file to the md5 hash of the sequence. Requires: Python3, biopython
#!/usr/bin/env python3
"""
Author: Matthew Kweskin, github: @mkweskin
This script converts the sequence names in a FASTA file
to the md5 hash of the sequence.
Notes:
- Gaps will automatically be removed by biopython
@mkweskin
mkweskin / putty.md
Last active February 22, 2020 21:37
My PuTTY settings

My preferred settings for PuTTY (0.71)

  • Terminal

    • Bell
      • Action to happen when a bell occurs : None (bell disabled)
  • Window

    • Lines of scrollback: 50000
    • Appearance
  • Font : Courier Std, 12 pt

@mkweskin
mkweskin / gist:e531e65791c1d8036dd720ef3baf8af6
Created January 22, 2020 20:12
Github markdown to Jira/Confluence markup using pandoc
pandoc -f gfm -w jira -o outfile.jira infile.md
# To import converted file into Confluence:
# - Create new page
# - Click on the body of the page, click on the " + \/" dropdown in toolbar ("Insert more content") and select "Markup"
# - Paste the contents into the pop-up window (select "Confluence wiki" as the format)
# - Note: The "Markdown" option in the import pop-up doesn't seem to work for Github flavored markdown (gfm).
@mkweskin
mkweskin / featuretofasta.py
Last active July 18, 2019 11:56
Takes a tab-delimited BIOM feature table that lists sequences for each sample and a fasta file with all sequences for all samples. Outputs a separate fasta file for each sample containing only the sequences found in that sample. Optionally, you can specify a minimum count for each sequence for each sample and a minimum proportion (calculated wit…
#!/usr/bin/env python3
import pandas as pd
from Bio import SeqIO
import argparse
import os
import logging
import sys
import datetime
@mkweskin
mkweskin / merge-phyluce-matches.py
Created April 5, 2019 12:40
This will merge two databases that are output from the `phyluce_assembly_match_contigs_to_probes` step in phyluce.
#!/usr/bin/env python3
import sqlite3
import sys
import shutil
import os
#Existing DB
exist="probe.matches.sqlite"
#To be added DB
@mkweskin
mkweskin / find.md
Last active November 8, 2018 19:02
Linux `find` basics

Linux find

find is used to find files based on their name or other attributes of the file.

  • It does not search inside files like the grep command
  • To find find examples for a speicific purpose, I do a web search for: linux find ...

Basic find command: search for a filename

find /path/to/dir -iname '*text*'

Dissecting this:

  • /path/to/dir: where you want to search, I often use . for my current directory