Skip to content

Instantly share code, notes, and snippets.

@explodecomputer
explodecomputer / bmi-alcohol-ancestry.r
Created December 5, 2025 07:58
BMI-alcohol-ancestry interaction
library(dplyr)
library(data.table)
library(seqminer)
library(stringr)
dat <- fread("/mnt/project/analysis/bmi-alcohol-ancestry/phen_extract.csv")
str(dat)
anc <- fread("/mnt/project/data/ancestry/ukb_InferredAncestry.txt")
str(anc)
@explodecomputer
explodecomputer / reset_rhds.sh
Last active December 5, 2025 07:34
Reset RHDS project
#!/bin/bash
set -e
MY_GITHUB_URL=$1
if [ -z "${MY_GITHUB_URL}" ]; then
echo "Please MY_GITHUB_URL as the argument to this script"
exit 1;
fi
@explodecomputer
explodecomputer / README.md
Last active August 15, 2025 13:49
Quickly obtain rsids from dbsnp for all variants in VCF file

RSID lookup with tabix

Strategy:

  1. Extract the chr/pos/rsid from dbsnp into a .bed file. Here it makes a separate .bed file per chromosome for parallelisation of lookups
  2. Create a tabix index for the .bed files
  3. Extract the chr/pos from the target VCF file
  4. Use tabix to query the target chr/pos against the tabix indexed .bed files. Parellelised across chromosomes using GNU parallel.
  5. Update the VCF file with the extracted RSIDs
@explodecomputer
explodecomputer / bgen_to_plink.sh
Created October 20, 2024 10:41
UKB convert bgen to best guess plink format
@explodecomputer
explodecomputer / pqtl_inst_strength.r
Created October 15, 2023 21:50
Instrument strength of pQTLs
library(readxl)
library(dplyr)
download.file("https://www.biorxiv.org/content/biorxiv/early/2022/06/18/2022.06.17.496443/DC2/embed/media-2.xlsx?download=true", here("data", "media-2.xlsx"))
a <- read_xlsx(here("data", "media-2.xlsx"), sheet="ST6", skip=2) %>%
dplyr::select(
exposure="Assay Target",
id.exposure="Assay Target",
SNP="rsID",
beta.exposure="BETA (discovery, wrt. A1)",
@explodecomputer
explodecomputer / harmonise.r
Last active January 19, 2024 16:06
example harmonisation script for multiple GWAS summary datasets
library(dplyr)
standardise <- function(d, ea_col="ea", oa_col="oa", beta_col="beta", eaf_col="eaf", chr_col="chr", pos_col="pos", vid_col="vid")
{
toflip <- d[[ea_col]] > d[[oa_col]]
d[[eaf_col]][toflip] <- 1 - d[[eaf_col]][toflip]
d[[beta_col]][toflip] <- d[[beta_col]][toflip] * -1
temp <- d[[oa_col]][toflip]
d[[oa_col]][toflip] <- d[[ea_col]][toflip]
d[[ea_col]][toflip] <- temp
@explodecomputer
explodecomputer / exercise.r
Created March 24, 2023 16:47
Data for exercise
library(ieugwasr)
library(dplyr)
ukb <- tophits("ukb-d-I9_IHD")
bbj <- tophits("bbj-a-159", pop="EAS", r2=0.0011)
table(duplicated(c(ukb$rsid, bbj$rsid)))
cl <- bind_rows(ukb, bbj) %>% select(rsid, p) %>% ld_clump()
a <- associations(cl$rsid, c("bbj-a-159", "ukb-d-I9_IHD"))
@explodecomputer
explodecomputer / sib_sim.r
Created December 9, 2022 11:31
Simulate siblings
library(dplyr)
#' Genarte genotypes for nuclear families
#'
#' @param af Vector of allele frequencies for SNPs (length = number of SNPs to simulate)
#' @param nfam Number of families to generate
#'
#' @return List of genotype matrices for each family member type (mum, dad, sib1, sib2)
make_families <- function(af, nfam)
{
@explodecomputer
explodecomputer / group_agenda.r
Last active January 26, 2022 12:05
group_agenda
library(lubridate)
library(dplyr)
library(knitr)
#' Create agenda
#'
#' Alternating weeks of Talks and Discussions.
#' Presenter for a talk leads discussion in the following week.
#'
#' @param start Start date in format "yyyy/mm/dd"
@explodecomputer
explodecomputer / examples.r
Created January 12, 2022 21:29
Reading in data to TwoSampleMR options
# MR-Base (i.e. using the cloud)
library(TwoSampleMR)
metab_ids <- c("met-e-1", "met-e-2")
# Extract data - from API
exp_dat <- extract_instruments("alz_id")
out_dat <- extract_outcome_data(exp_dat$SNP, metab_ids)
# Harmonise and run