Skip to content

Instantly share code, notes, and snippets.

View LouisFaure's full-sized avatar

LouisFaure

View GitHub Profile
@LouisFaure
LouisFaure / check_partition.sh
Created May 30, 2025 16:47
Check all node resources
#!/bin/bash
# --- Configuration ---
PARTITION_NAME="componc_gpu" # Define the partition name here
# --- Function to convert memory string (e.g., "972G", "1031308M") to MB ---
convert_mem_to_mb() {
local mem_str="$1"
local value
local unit
if [[ "$mem_str" =~ ([0-9]+)([MG]) ]]; then
value="${BASH_REMATCH[1]}"
@LouisFaure
LouisFaure / check_node_resources.sh
Created May 23, 2025 16:16
Check available resources in SLURM node
#!/bin/bash
# This script parses the output of 'scontrol show node <node_name>'
# to determine the available CPU, memory, and GPUs for a new task.
# Check if a node name is provided as an argument
if [ -z "$1" ]; then
echo "Usage: $0 <node_name>"
echo "Example: $0 iscg009"
exit 1

Negative Binomial Modeling for scRNA-seq

The negative binomial (NB) distribution is the canonical statistical model for single-cell RNA-seq (scRNA-seq) gene expression integer counts. It’s favored over the Poisson distribution because real scRNA-seq data are overdispersed—their variance greatly exceeds their mean–due to both biological and technical factors (cell heterogeneity, bursty transcription, varied sequencing depth, dropouts, etc.).

The Poisson’s simple assumption (variance equals mean) fails in this context. In contrast, the NB augments the Poisson distribution with a dispersion parameter ($\theta$), so its variance is $\sigma^2 = \mu + \mu^2 / \theta$ (for mean $\mu$). This flexibility means two genes with equal mean expression but different variances (as commonly seen) can be modeled, simply by adjusting $\theta$.

From a generative perspective, the NB arises as a Poisson–Gamma mixture: each cell/gene’s unknown expression rate $\lambda$ is drawn from a Gamma distribution

@LouisFaure
LouisFaure / get_jax_gpu_mem.py
Created August 3, 2024 22:31
Get gpu memory used by current JAX script
import subprocess as subp
import re
if jax.default_backend()=='gpu':
# Step 2: Call nvidia-smi and get the output
result = subp.run([
'nvidia-smi',
'--query-compute-apps=pid,used_memory',
'--format=csv,noheader,nounits'
], stdout=subp.PIPE, text=True)
@LouisFaure
LouisFaure / get_raw.py
Last active November 6, 2024 15:21
rescale log-norm scRNAseq counts back to raw
import scipy.sparse as sp
import numpy as np
def get_raw(adata):
X = adata.X
if not sp.isspmatrix_csr(X):
raise Exception("matrix needs to be in csr format")
if is_integer_vector(adata.X[0,:].data):
raise Exception("matrix is already in integer format")
row_mins = np.minimum.reduceat(X.data, X.indptr[:-1])
@LouisFaure
LouisFaure / set_R_HOME.py
Created April 19, 2024 19:57
set conda environment specific R_HOME before import rpy2
import os, sys
os.environ['R_HOME'] = sys.exec_prefix+"/lib/R/"
import rpy2
@LouisFaure
LouisFaure / muon_to_scFates.py
Last active January 22, 2023 11:35
Mudata scFates test
import mudatasets
import numpy as np
import pandas as pd
import scanpy as sc
import anndata as ad
import muon as mu
mdata = mudatasets.load("brain3k_multiome", full=True)
mdata.var_names_make_unique()
@LouisFaure
LouisFaure / gpu_wrappers.py
Last active January 19, 2024 17:22
some GPU accelerated function for scanpy
import scanpy as sc
import cupy as cp
from scipy.sparse import csr_matrix, find, issparse
from scipy.sparse.linalg import eigs
import numpy as np
import pandas as pd
import cudf
import glob
from cupy.sparse import cupyx as cpx
from cupyx.scipy.sparse import coo_matrix as coo_matrix_gpu
@LouisFaure
LouisFaure / dropEst2adata.py
Last active March 1, 2024 14:43
Convert dropEst velocyto output to anndata
#!/usr/bin/env python
import sys
import warnings
warnings.filterwarnings("ignore")
import logging
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
from anndata2ri.scipy2ri._r2py import rmat_to_spmat
import anndata as ann
import pandas as pd