Skip to content

Instantly share code, notes, and snippets.

@slowkow
Created March 12, 2024 16:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save slowkow/b3617a203359b47bd924308dae433fd8 to your computer and use it in GitHub Desktop.
Save slowkow/b3617a203359b47bd924308dae433fd8 to your computer and use it in GitHub Desktop.
Find the top 10 correlated genes, based on data from ARCHS4
# find_correlated_genes.R
# 2024-03-12
# Kamil Slowikowski
library(arrow) # install.packages("arrow")
# Download correlation from ARCHS4 <https://maayanlab.cloud/archs4/download.html>
# wget https://s3.amazonaws.com/mssm-data/human_correlation_archs4.f
# The file is large:
# > utils:::format.object_size(file.size("human_correlation_archs4.f"), "auto")
# [1] "5.2 Gb"
# Memory-map the data into our R session:
d <- read_feather("human_correlation_archs4.f")
# We read data lazily, as-needed:
# > pryr::object_size(d)
# 6.40 MB
# Here is a simple function to find the top 10 correlated genes:
correlated_genes <- function(my_gene = "CXCL8", n = 10) {
stopifnot(my_gene %in% colnames(d))
i <- which(colnames(d) == my_gene)
x <- d[,i]
o <- order(-x)
values <- x[o]
names(values) <- colnames(d)[o]
head(values, n = n)
}
# Usage examples:
# > correlated_genes("IL6")
# IL6 CSF3 TNF ZC3H12A IL1B PTGS2 ICAM1 IL27 ACOD1 IL1A
# 1.0000000 0.5144461 0.4947951 0.4922328 0.4828641 0.4776033 0.4670185 0.4648258 0.4582155 0.4370421
#
# > correlated_genes("GRN")
# GRN CTSZ CD68 MAN2B1 FCGRT SLC15A3 CTSD IFI30 PSAP MGAT1
# 1.0000000 0.7919509 0.7654468 0.7480757 0.7458239 0.7423101 0.7395988 0.7245633 0.7099952 0.6936354
#
# > correlated_genes("PLAC8")
# PLAC8 PLD4 IRF8 VAMP8 RAC2 SCT MS4A3 GMFG LSP1 PSME1
# 1.0000000 0.4843658 0.4672987 0.4208228 0.3617905 0.3185449 0.3137622 0.3133707 0.3121387 0.3034305
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment