Skip to content

Instantly share code, notes, and snippets.

View davetang's full-sized avatar
🦀
🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀

Dave Tang davetang

🦀
🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀
View GitHub Profile
@davetang
davetang / transfac_to_tess.pl
Last active December 24, 2015 12:09
Convert the TRANSFAC matrix into a matrix readable by TESS (Transcription Element Search System).
#!/bin/env perl
use strict;
use warnings;
my $usage = "Usage: $0 <matrix.dat>\n";
my $infile = shift or die $usage;
my $accession = '';
my $start = 0;
@davetang
davetang / random_forest.R
Last active December 23, 2015 01:49
From two sets of dinucleotide counts, use random forests to create a predictor
#install if necessary
install.packages("randomForest")
#load library
library(randomForest)
#I have two sets of dinucleotide counts stored in
#my_random_loci_seq_di and my_refseq_tss_seq_di
head(my_refseq_tss_seq_di,2)
@davetang
davetang / intersect_coordinate.R
Last active May 1, 2018 18:43
Given two list of coordinates, find the ones that overlap/intersect
#install if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("GenomicRanges")
#load library
library(GenomicRanges)
#create a GRanges object given an object, my_refseq_loci
head(my_refseq_loci,2)
# refseq_mrna chromosome_name transcript_start transcript_end strand
@davetang
davetang / get_sequence.R
Last active March 23, 2017 15:23
From a data frame with chromosomal coordinates, obtain the sequence, and calculate the dinucleotide frequencies
#I want to fetch sequences from
#my_random_loci and my_refseq_tss
head(my_random_loci,2)
chr start end strand
1 chr18 59415403 59415407 +
2 chr22 8535632 8535636 -
#install if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("BSgenome.Hsapiens.UCSC.hg19")
@davetang
davetang / biomart_refseq.R
Last active May 12, 2022 06:55
Using biomaRt to fetch all human mRNA refSeqs and their corresponding chromosome coordinates
#install if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")
#load library
library("biomaRt")
#use ensembl mart
ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")
@davetang
davetang / random_region.R
Last active April 17, 2018 20:56
Sampling random regions of the hg19 genome and obtaining the corresponding sequence
#how many regions to sample?
number <- 50000
#how many bp to add to start
span <- 4
#some necessary packages
#install if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("BSgenome")
@davetang
davetang / cmd_line_param.R
Last active December 22, 2015 17:28
An example R script that takes in 2 command line parameters, checks whether these 2 arguments are digits and then sums them.
#!/bin/env Rscript
#usage
usage <- 'Usage: cmd_line_param.R <integer_1> <integer_2>';
#store command line arguments
args <- commandArgs(trailingOnly = T)
#conditional checks
if (length(args) != 2){
@davetang
davetang / comb_with_replacement.R
Created September 9, 2013 02:43
Calculate the number of combinations with replacement
comb_with_replacement <- function(n, r){
return( factorial(n + r - 1) / (factorial(r) * factorial(n - 1)) )
}
#have 3 elements, choosing 3
comb_with_replacement(3,3)
#[1] 10
@davetang
davetang / perm_without_replacement.R
Created September 8, 2013 15:20
Function for calculating number of permutations without replacement
perm_without_replacement <- function(n, r){
return(factorial(n)/factorial(n - r))
}
#16 choices, choose 16
perm_without_replacement(16,16)
#[1] 2.092279e+13
#16 choices, choose 3
perm_without_replacement(16,3)
@davetang
davetang / permutation_without_replacement.R
Created September 8, 2013 14:28
Calculating the number of permutations without repetition/replacement
#install if necessary
install.packages('gtools')
#load library
library(gtools)
#urn with 3 balls
x <- c('red', 'blue', 'black')
#pick 2 balls from the urn with replacement
#get all permutations
permutations(n=3,r=2,v=x)
# [,1] [,2]