explodecomputer/reverse-mr.rmd

## reverse-mr.rmd
---
title: "Reverse MR Sims"
output: html_notebook
---


```{r}
library(simulateGP)
library(TwoSampleMR)
```

model:

g -> gl -> x    <- u
        -> y    <-


```{r}
n <- 10000
nsnp <- 100
af <- 0.3

g <- make_geno(n, nsnp, af)
u <- rnorm(n)
bgl <- rnorm(100)

l <- scale(g %*% bgl)

bux <- 0.1
buy <- 0.1
bxy <- 0

blx <- 0.02
bly <- 0.5

x <- make_phen(c(blx, bux), cbind(l, u))
y <- make_phen(c(bly, buy, bxy), cbind(l, u, x))

summary(lm(x ~ l))
```

Observation association between x and y
e.g. x measured before disease onset, y is case control status some time later

```{r}
cor(x,y)
```

Compare against MR

```{r}
cor(l, x)
```


Sample size for reverse MR  depends on dataset that has x and g measured
Sample size for obs prediction depends on dataset that has x and y measured


```{r}
dat <- get_effs(x, y, g)
mr(dat, metho="mr_ivw") %>% str()
```


These two are equivalent

# MR using fixed effects IVW
bgt1 / bgt2

# PRS
t1 ~ score_t2


# Strategy

sample 1: Do GWAS to identify causal variants for y based on heritability and sample size
- we know how many hits there are for glioma, and their effects, variance explained
- we know heritability of glioma

sample 2: Reverse MR of discovered genetic variants for y against x with sample sizes for x based on available data
- mr association
  - fraction of liability in y discovered, correlated with x
  - sample size

sample 3: Individual level data where x and y are both measured - obs association
- correlation between x and y will be due to
  - total liability of y associating with x
  - total liability of y associating with y
  - confounding effect
  - sample size


get this from the literature e.g.  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4667278/:
# this generates our liability used for obs prediction
h2 of y <- 0.25

# this generates our instruments for glioma whish is used for reverse MR e.g.  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4667278/
h2 of y explained <- 0.06

get this from the literature
sample sizes for genotype + proteins (e.g. sun et al) <- 3301
sample size for glioma + protein measures <- 500

confounder effects on x and y <- varied
might get a sense of this from comparing Phil's mr of telomere -> glioma vs obs study Bondy


effect of glioma liability on x <- varied


future work
- how does the gl -> x association change over time?

The overall question
- how plausible is using reverse MR for identifying non-causal predictors


g1 -> x -> y <- G_{2...m}
r2 <- cov(G, x)^2 / [ var(x) * var(G) ]

G <- sum(g_i * b_i)
	---
	title: "Reverse MR Sims"
	output: html_notebook
	---


	```{r}
	library(simulateGP)
	library(TwoSampleMR)
	```

	model:

	g -> gl -> x <- u
	-> y <-


	```{r}
	n <- 10000
	nsnp <- 100
	af <- 0.3

	g <- make_geno(n, nsnp, af)
	u <- rnorm(n)
	bgl <- rnorm(100)

	l <- scale(g %*% bgl)

	bux <- 0.1
	buy <- 0.1
	bxy <- 0

	blx <- 0.02
	bly <- 0.5

	x <- make_phen(c(blx, bux), cbind(l, u))
	y <- make_phen(c(bly, buy, bxy), cbind(l, u, x))

	summary(lm(x ~ l))
	```

	Observation association between x and y
	e.g. x measured before disease onset, y is case control status some time later

	```{r}
	cor(x,y)
	```

	Compare against MR

	```{r}
	cor(l, x)
	```


	Sample size for reverse MR depends on dataset that has x and g measured
	Sample size for obs prediction depends on dataset that has x and y measured


	```{r}
	dat <- get_effs(x, y, g)
	mr(dat, metho="mr_ivw") %>% str()
	```



	These two are equivalent

	# MR using fixed effects IVW
	bgt1 / bgt2

	# PRS
	t1 ~ score_t2


	# Strategy

	sample 1: Do GWAS to identify causal variants for y based on heritability and sample size
	- we know how many hits there are for glioma, and their effects, variance explained
	- we know heritability of glioma

	sample 2: Reverse MR of discovered genetic variants for y against x with sample sizes for x based on available data
	- mr association
	- fraction of liability in y discovered, correlated with x
	- sample size

	sample 3: Individual level data where x and y are both measured - obs association
	- correlation between x and y will be due to
	- total liability of y associating with x
	- total liability of y associating with y
	- confounding effect
	- sample size


	get this from the literature e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4667278/:
	# this generates our liability used for obs prediction
	h2 of y <- 0.25

	# this generates our instruments for glioma whish is used for reverse MR e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4667278/
	h2 of y explained <- 0.06

	get this from the literature
	sample sizes for genotype + proteins (e.g. sun et al) <- 3301
	sample size for glioma + protein measures <- 500

	confounder effects on x and y <- varied
	might get a sense of this from comparing Phil's mr of telomere -> glioma vs obs study Bondy


	effect of glioma liability on x <- varied


	future work
	- how does the gl -> x association change over time?

	The overall question
	- how plausible is using reverse MR for identifying non-causal predictors


	g1 -> x -> y <- G_{2...m}
	r2 <- cov(G, x)^2 / [ var(x) * var(G) ]

	G <- sum(g_i * b_i)