Skip to content

Instantly share code, notes, and snippets.

View bow's full-sized avatar

Wibowo Arindrarto bow

View GitHub Profile
@bow
bow / abs2rel.py
Last active August 29, 2015 14:04
Script for changing all absolute links in a given directory to symlinks, UNIX only
#!/usr/bin/env python
"""
Script for changing all absolute links in a given directory to symlinks, UNIX only.
Requirements:
* Python >= 2.7.x or Python 3.x
Author:

Keybase proof

I hereby claim:

  • I am bow on github.
  • I am bow (https://keybase.io/bow) on keybase.
  • I have a public key whose fingerprint is 07EF EC69 6E46 0E86 A036 8C94 D4EF 801C 7A10 C00C

To claim this, I am signing this object:

@bow
bow / count_aa_triplet.py
Created June 29, 2011 17:17
Script for counting AA triplet occurence in a fasta file.
#!/usr/bin/env python
import random
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
# function to generate random protein sequence
@bow
bow / blastxml_benchmark.py
Created May 20, 2012 20:36
Quick script to compare SearchIO and NCBIXML blast xml parser performance
#!/usr/bin/env python
# quick script to compare SearchIO and NCBIXML blast xml parser performance
searchio="""
from Bio import SearchIO
for result in SearchIO.parse('%s', 'blast-xml'):
query_id = result.id
@bow
bow / m_cdna2genome.exon
Created July 12, 2012 11:58
Exonerate outputs
Command line: [exonerate -m cdna2genome ../scer_cad1.fa /media/Waterloo/Downloads/genomes/scer_s288c/scer_s288c.fa --bestn 3]
Hostname: [blackbriar]
C4 Alignment:
------------
Query: gi|296143771|ref|NM_001180731.1| Saccharomyces cerevisiae S288c Cad1p (CAD1) mRNA, complete cds
Target: gi|330443520|ref|NC_001136.10| Saccharomyces cerevisiae S288c chromosome IV, complete sequence:[revcomp]
Model: cdna2genome
Raw score: 6146
Query range: 0 -> 1230
@bow
bow / bioinf-format-gotchas.md
Created April 5, 2017 20:05
Bioinformatics file format-specific quirks
  • Coordinates are one-based, fully closed (i.e. position start at 1 and an interval's end position is included).
  • (for files released by GENCODE & Ensembl) CDS include start_codon but not stop_codon. stop_codon is included in the UTR instead.
  • Coordinates are zero-based, half open (i.e. position start at 0 and an interval's end position is not included).
  • (for refFlat.txt.gz file available via UCSC) CDS does include start and stop codons
@bow
bow / handy.sql
Last active April 19, 2018 12:29
Handy PostgreSQL queries
-- View index sizes and some of their stats, largest first.
SELECT idx.relname AS table_name,
idx.indexrelname AS index_name,
pg_size_pretty(pg_relation_size(cls.oid)) AS size,
cls.reltuples AS num_tuples,
idx.idx_scan AS num_scanned,
idx.idx_tup_read AS num_read,
idx.idx_tup_fetch AS num_fetched
FROM pg_stat_user_indexes idx,
pg_class cls,
@bow
bow / cli_argparse.py
Last active April 25, 2018 12:29
Python CLI templates
#!/usr/bin/env python
"""
One-line description.
More elaborate description.
"""
import argparse
@bow
bow / Vagrantfile
Last active November 23, 2021 09:49
Stock Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Vagrantfile API version
VAGRANTFILE_API = 2
# Set global VM name.
VM_NAME = ENV["VM_NAME"] || "stock"
# Environment-variable controlled config values with some defaults
@bow
bow / get_ucsc_rrna.sh
Last active November 23, 2021 09:50
Retrieve rRNA regions in the UCSC rmsk track as BED file
#!/usr/bin/env sh
# Script for retrieving rRNA regions denoted in UCSC as a BED file.
# Requirements: mysql and an internet connection.
GENOME_BUILD=${GENOME_BUILD:-hg38}
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A --column-names=FALSE << QUERY
USE ${GENOME_BUILD};
SELECT genoName, genoStart, genoEnd, repName, swScore, strand