Gabriel Hoffman GabrielHoffman

## gene_CV2.R
# some setup code from tximport
library("tximport")
library("readr")
library("tximportData")
dir <- system.file("extdata", package="tximportData")
samples <- read.table(file.path(dir,"samples.txt"), header=TRUE)
samples$condition <- factor(rep(c("A","B"),each=3))
rownames(samples) <- samples$run
samples[,c("pop","center","run","condition")]

## spca_path.R

#' @title  Sparse Principal Component Analysis (spca).
#
#' @description Implementation of SPCA, using variable projection as an optimization strategy.
#
#' @details
#' Adapted from sparsepca package to use initial values to speed up convergence.
#'
#' Sparse principal component analysis is a modern variant of PCA. Specifically, SPCA attempts to find sparse
#' weight vectors (loadings), i.e., a weight vector with only a few 'active' (nonzero) values. This approach

## Multinomial_glmnet.R
library(glmnet)

# You must have glmnet v4 from CRAN
# See documentation: https://glmnet.stanford.edu

# multinomial elastic-net
# you can change alpha
fit = glmnet( X_PRS, metadata$Catogory_factor, family="multinomial", alpha = .8, type.multinomial = "grouped")

# see regularization path

## Random effect meta-analysis.txt
The goal of meta-analysis is to test if an effect is significant by combining multiple studies.  One way of doing this is by using the effect size and standard errors from each study.  In a fixed effect meta-analysis (i.e. FE) the null hypothesis is that effects sizes from all studies are the same and equal zero.  A random effect meta-analysis assumes that the true effect sizes can vary across studies.  The null hypothesis is that the mean of the observed effect sizes is zero and the variance of the effect sizes is zero (i.e. all effect sizes are constant and zero).  in REML (i.e. random effects) the variance of effect sizes is computed and a test is significant if there is sufficient variability.

The choice between FE and REML is the question: should effects of opposite signs be significant?   In practice, if two studies have effect sizes -10 and 10 (i.e. opposite sign) FE will say that the mean effect size is zero and the test is not significant.  But REML will say that the effect sizes vary substantially

## pi1.R
library(qvalue)

p = runif(1000, 0, 1)

# compute pi0, the fraction of tests where the null is supported
# so pi1 = 1-p0 is the fraction of tests where the null is rejected
# This method doesn't depend on p-value cutoffs and works even of nothing
# is genome-wide significant
pi1 = 1 - qvalue(p)$pi0

## Design_matrix_and_contrasts.R
# simulate design matrix wti Dx with 3 categories and Sex with 2
# A: Control
# B: Schiz
# C: Other
info = data.frame(Dx = sample(LETTERS[1:3], 100, replace=TRUE),
                  Sex = sample(c('M', 'F'), 100, replace=TRUE))

table(info$Dx)
#>
#>  A  B  C

## residuals.R
library(edgeR)
library(variancePartition)

dge = DGEList( counts = NEW.COUNTS )
dge = calcNormFactors(dge)

# standard design
form = ~ Dx + scale(RIN) + ageOfDeath + Reported_Gender + scale(IntronicRate) + scale(IntragenicRate) + scale(IntergenicRate)
design = model.matrix(form, COVARIATES)

## NPC_vs_ipsNeuron_expression.R

# Gabriel Hoffman
# September 16, 2020
#
# Find expression of genes in iPS-neurons and NPCs

library(edgeR)
library(limma)
library(ggplot2)
library(reshape2)

## Dream_and_contrasts_demo.R
# Gabriel Hoffman
# October 19, 2020
#
# Examples illustrating using contrasts with dream

library(variancePartition)
register(SnowParam(4))
data(varPartData)

# Since Tissue is a factor, extract the levels

## plotScatterDensity.R
# Gabriel Hoffman
# January 8, 2021

library(ggplot2)
library(viridis)

# Compute two dimensional density
get_density <- function(x, y, n = 250) {
  dens <- MASS::kde2d(x = x, y = y, n = n)
  ix <- findInterval(x, dens$x)
	# some setup code from tximport
	library("tximport")
	library("readr")
	library("tximportData")
	dir <- system.file("extdata", package="tximportData")
	samples <- read.table(file.path(dir,"samples.txt"), header=TRUE)
	samples$condition <- factor(rep(c("A","B"),each=3))
	rownames(samples) <- samples$run
	samples[,c("pop","center","run","condition")]

	#' @title Sparse Principal Component Analysis (spca).
	#
	#' @description Implementation of SPCA, using variable projection as an optimization strategy.
	#
	#' @details
	#' Adapted from sparsepca package to use initial values to speed up convergence.
	#'
	#' Sparse principal component analysis is a modern variant of PCA. Specifically, SPCA attempts to find sparse
	#' weight vectors (loadings), i.e., a weight vector with only a few 'active' (nonzero) values. This approach
	library(glmnet)

	# You must have glmnet v4 from CRAN
	# See documentation: https://glmnet.stanford.edu

	# multinomial elastic-net
	# you can change alpha
	fit = glmnet( X_PRS, metadata$Catogory_factor, family="multinomial", alpha = .8, type.multinomial = "grouped")

	# see regularization path
	The goal of meta-analysis is to test if an effect is significant by combining multiple studies. One way of doing this is by using the effect size and standard errors from each study. In a fixed effect meta-analysis (i.e. FE) the null hypothesis is that effects sizes from all studies are the same and equal zero. A random effect meta-analysis assumes that the true effect sizes can vary across studies. The null hypothesis is that the mean of the observed effect sizes is zero and the variance of the effect sizes is zero (i.e. all effect sizes are constant and zero). in REML (i.e. random effects) the variance of effect sizes is computed and a test is significant if there is sufficient variability.

	The choice between FE and REML is the question: should effects of opposite signs be significant? In practice, if two studies have effect sizes -10 and 10 (i.e. opposite sign) FE will say that the mean effect size is zero and the test is not significant. But REML will say that the effect sizes vary substantially
	library(qvalue)

	p = runif(1000, 0, 1)

	# compute pi0, the fraction of tests where the null is supported
	# so pi1 = 1-p0 is the fraction of tests where the null is rejected
	# This method doesn't depend on p-value cutoffs and works even of nothing
	# is genome-wide significant
	pi1 = 1 - qvalue(p)$pi0
	# simulate design matrix wti Dx with 3 categories and Sex with 2
	# A: Control
	# B: Schiz
	# C: Other
	info = data.frame(Dx = sample(LETTERS[1:3], 100, replace=TRUE),
	Sex = sample(c('M', 'F'), 100, replace=TRUE))

	table(info$Dx)
	#>
	#> A B C
	library(edgeR)
	library(variancePartition)

	dge = DGEList( counts = NEW.COUNTS )
	dge = calcNormFactors(dge)

	# standard design
	form = ~ Dx + scale(RIN) + ageOfDeath + Reported_Gender + scale(IntronicRate) + scale(IntragenicRate) + scale(IntergenicRate)
	design = model.matrix(form, COVARIATES)

	# Gabriel Hoffman
	# September 16, 2020
	#
	# Find expression of genes in iPS-neurons and NPCs

	library(edgeR)
	library(limma)
	library(ggplot2)
	library(reshape2)
	# Gabriel Hoffman
	# October 19, 2020
	#
	# Examples illustrating using contrasts with dream

	library(variancePartition)
	register(SnowParam(4))
	data(varPartData)

	# Since Tissue is a factor, extract the levels
	# Gabriel Hoffman
	# January 8, 2021

	library(ggplot2)
	library(viridis)

	# Compute two dimensional density
	get_density <- function(x, y, n = 250) {
	dens <- MASS::kde2d(x = x, y = y, n = n)
	ix <- findInterval(x, dens$x)