Skip to content

Instantly share code, notes, and snippets.

View GabrielHoffman's full-sized avatar

Gabriel Hoffman GabrielHoffman

View GitHub Profile
@GabrielHoffman
GabrielHoffman / gene_CV2.R
Last active July 8, 2020 16:24
Identifying gene with signal using coefficient of variation
# some setup code from tximport
library("tximport")
library("readr")
library("tximportData")
dir <- system.file("extdata", package="tximportData")
samples <- read.table(file.path(dir,"samples.txt"), header=TRUE)
samples$condition <- factor(rep(c("A","B"),each=3))
rownames(samples) <- samples$run
samples[,c("pop","center","run","condition")]
#' @title Sparse Principal Component Analysis (spca).
#
#' @description Implementation of SPCA, using variable projection as an optimization strategy.
#
#' @details
#' Adapted from sparsepca package to use initial values to speed up convergence.
#'
#' Sparse principal component analysis is a modern variant of PCA. Specifically, SPCA attempts to find sparse
#' weight vectors (loadings), i.e., a weight vector with only a few 'active' (nonzero) values. This approach
library(glmnet)
# You must have glmnet v4 from CRAN
# See documentation: https://glmnet.stanford.edu
# multinomial elastic-net
# you can change alpha
fit = glmnet( X_PRS, metadata$Catogory_factor, family="multinomial", alpha = .8, type.multinomial = "grouped")
# see regularization path
The goal of meta-analysis is to test if an effect is significant by combining multiple studies. One way of doing this is by using the effect size and standard errors from each study. In a fixed effect meta-analysis (i.e. FE) the null hypothesis is that effects sizes from all studies are the same and equal zero. A random effect meta-analysis assumes that the true effect sizes can vary across studies. The null hypothesis is that the mean of the observed effect sizes is zero and the variance of the effect sizes is zero (i.e. all effect sizes are constant and zero). in REML (i.e. random effects) the variance of effect sizes is computed and a test is significant if there is sufficient variability.
The choice between FE and REML is the question: should effects of opposite signs be significant? In practice, if two studies have effect sizes -10 and 10 (i.e. opposite sign) FE will say that the mean effect size is zero and the test is not significant. But REML will say that the effect sizes vary substantially
library(qvalue)
p = runif(1000, 0, 1)
# compute pi0, the fraction of tests where the null is supported
# so pi1 = 1-p0 is the fraction of tests where the null is rejected
# This method doesn't depend on p-value cutoffs and works even of nothing
# is genome-wide significant
pi1 = 1 - qvalue(p)$pi0
# simulate design matrix wti Dx with 3 categories and Sex with 2
# A: Control
# B: Schiz
# C: Other
info = data.frame(Dx = sample(LETTERS[1:3], 100, replace=TRUE),
Sex = sample(c('M', 'F'), 100, replace=TRUE))
table(info$Dx)
#>
#> A B C
library(edgeR)
library(variancePartition)
dge = DGEList( counts = NEW.COUNTS )
dge = calcNormFactors(dge)
# standard design
form = ~ Dx + scale(RIN) + ageOfDeath + Reported_Gender + scale(IntronicRate) + scale(IntragenicRate) + scale(IntergenicRate)
design = model.matrix(form, COVARIATES)
# Gabriel Hoffman
# September 16, 2020
#
# Find expression of genes in iPS-neurons and NPCs
library(edgeR)
library(limma)
library(ggplot2)
library(reshape2)
# Gabriel Hoffman
# October 19, 2020
#
# Examples illustrating using contrasts with dream
library(variancePartition)
register(SnowParam(4))
data(varPartData)
# Since Tissue is a factor, extract the levels
@GabrielHoffman
GabrielHoffman / plotScatterDensity.R
Created January 8, 2021 18:37
plotScatterDensity() using viridis colors
# Gabriel Hoffman
# January 8, 2021
library(ggplot2)
library(viridis)
# Compute two dimensional density
get_density <- function(x, y, n = 250) {
dens <- MASS::kde2d(x = x, y = y, n = n)
ix <- findInterval(x, dens$x)