Skip to content

Instantly share code, notes, and snippets.

View davetang's full-sized avatar
🦀
🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀

Dave Tang davetang

🦀
🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀
View GitHub Profile
#!/bin/bash
if type -P gemini > /dev/null
then
for SEVERITY in HIGH MED LOW; do
gemini query -q "select impact_so, count(impact_so) from variants where impact_severity == \"$SEVERITY\" group by impact_so order by count(impact_so)" --header *.db > $SEVERITY.tsv
plot_gemini.R $SEVERITY.tsv
rm -f $SEVERITY.tsv
done
#!/usr/bin/env Rscript
#
# Usage: plot_gemini.R <file.tsv>
#
my_required <- c('ggplot2', 'reshape2', 'ggthemes')
for (my_package in my_required){
if(my_package %in% rownames(installed.packages()) == FALSE){
stop(paste("Please install", my_package, "first"))
}
@davetang
davetang / get_sequence.R
Last active March 23, 2017 15:23
From a data frame with chromosomal coordinates, obtain the sequence, and calculate the dinucleotide frequencies
#I want to fetch sequences from
#my_random_loci and my_refseq_tss
head(my_random_loci,2)
chr start end strand
1 chr18 59415403 59415407 +
2 chr22 8535632 8535636 -
#install if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("BSgenome.Hsapiens.UCSC.hg19")
#!/usr/bin/env perl
# Script to output names and synonyms of HPO terms
use strict;
use warnings;
my $usage = "Usage: $0 <HPO term> [HPO terms]\n";
if (scalar(@ARGV) == 0){
@davetang
davetang / text_to_hpo_term.pl
Last active March 18, 2016 04:49
Align free text to Human Phenotype Ontology terms
#!/usr/bin/env perl
# Strings (the query) present on each line of a file are matched to Human Phenotype Ontology (HPO) terms (the subject)
# If a direct match between the query and subject could not be found, a global alignment is performed
# Alignments will only take place between queries and subjects that are not longer than each
# other by a length of 5 characters (including spaces)
# For example, 'short' and 'microphones' differ by a length of 6 and will not be compared
# The terms 'short' and 'computer' will be aligned because they differ by a length of 3
# Change $threshold if you want to change the length difference threshold
@davetang
davetang / split_chr.pl
Last active December 28, 2015 20:19
Script that takes as input a BED file stream and outputs the stream to its corresponding chromosome. Do not use this script in parallel.
#!/bin/env perl
use strict;
use warnings;
#hash for filehandles
my %fh = ();
#read from stream
while (<>){
@davetang
davetang / random_bed.pl
Created November 19, 2013 04:03
Randomise a BED file.
#!/bin/env perl
use strict;
use warnings;
my $usage = "Usage: $0 <infile.bed>\n";
my $infile = shift or die $usage;
my %bed = ();
@davetang
davetang / copy_directory.pl
Last active December 28, 2015 09:29
Perl script that takes two directory paths, one old and one new, compares the two and copies directories in the old to the new if it doesn't exist.
#!/bin/env perl
use strict;
use warnings;
my $usage = "Usage: $0 <old_dir> <new_dir>\n";
my $old = shift or die $usage;
my $new = shift or die $usage;
my %current = ();
@davetang
davetang / transfac_to_tess.pl
Last active December 24, 2015 12:09
Convert the TRANSFAC matrix into a matrix readable by TESS (Transcription Element Search System).
#!/bin/env perl
use strict;
use warnings;
my $usage = "Usage: $0 <matrix.dat>\n";
my $infile = shift or die $usage;
my $accession = '';
my $start = 0;
@davetang
davetang / random_forest.R
Last active December 23, 2015 01:49
From two sets of dinucleotide counts, use random forests to create a predictor
#install if necessary
install.packages("randomForest")
#load library
library(randomForest)
#I have two sets of dinucleotide counts stored in
#my_random_loci_seq_di and my_refseq_tss_seq_di
head(my_refseq_tss_seq_di,2)