Skip to content

Instantly share code, notes, and snippets.

View jwaageSnippets's full-sized avatar

jwaageSnippets

View GitHub Profile
@jwaageSnippets
jwaageSnippets / gist:4224188
Created December 6, 2012 12:40
Convert bed with sequences to fasta
# substitute cut -f 20 with column containing sequence. Sed bit removes last line
cat file_with_sequences.bed | cut -f 20 | awk '{print; if (FNR % 1 == 0 ) printf ">" NR "\n";}' | sed -n '$!p' > sequences.fasta
@jwaageSnippets
jwaageSnippets / gist:4258267
Created December 11, 2012 12:36
Get Refseq Genes Using rtracklayer
library("rtracklayer")
session <- browserSession("UCSC")
genome(session)<-"mm9"
query <- ucscTableQuery(session, "refGene")
tableName(query) <- "refGene"
getTable(query) -> refseq
@jwaageSnippets
jwaageSnippets / gist:4266791
Created December 12, 2012 10:39
Change legend in ggplot2
plot + scale_fill_discrete(name="Experimental\nCondition",
breaks=c("ctrl", "trt1", "trt2"),
labels=c("Control", "Treatment 1", "Treatment 2"))
@jwaageSnippets
jwaageSnippets / gist:4317970
Created December 17, 2012 12:33
Convert between human and mouse gene IDs
read.delim("/Volumes/BP_NextGen/jwg/Projects/Porse/TET1/homologene.data", stringsAsFactors=F) -> homologene
split(homologene, f=homologene[,1]) -> hSplit
#Tax ID: Homo 9606
#Mus: 10090
getHumanAndMouseHomologues <- function(x)
{
if (c(9606)%in%x[,2]&&c(10090)%in%x[,2])
{
GTF2TranscriptDB <- function(gtf.file, out.file = NULL, verbose = TRUE)
{
require(rtracklayer)
require(GenomicRanges)
require(GenomicFeatures)
min.info <- c("gene_id", "transcript_id", "exon_number")
if (verbose) message("Importing ", gtf.file)
gtf <- import.gff(gtf.file, asRangedData = FALSE)
@jwaageSnippets
jwaageSnippets / gist:4994974
Created February 20, 2013 11:47
Sort tab delimited file by column 18 in linux
sort -b -k18 inputfile.txt > outputfile.txt
@jwaageSnippets
jwaageSnippets / gist:5133941
Created March 11, 2013 12:34
Get refseq genes from UCSC, and extract longest transcripts for each gene symbol
library("rtracklayer")
session <- browserSession("UCSC")
genome(session)<-"mm9"
query <- ucscTableQuery(session, "refGene")
tableName(query) <- "refGene"
getTable(query) -> refseq
refseq[,c(2,3,4,5,6,7,8,13)] -> refseq
refseq$"width" <- refseq$"txEnd"-refseq$"txStart"
as.character(refseq[,1]) -> refseq[,1]