Skip to content

Instantly share code, notes, and snippets.

View wflynny's full-sized avatar

Bill Flynn wflynny

View GitHub Profile
@wflynny
wflynny / cellranger_count_scenarios.sh
Created January 23, 2019 19:55
Cellranger count snippets (version 2)
# Some universal variables
NCELLS=6000
OUTPUT_NAME="nice-name"
FASTQ_DIR="/path/to/fastqs"
REFERENCE_GENOME="/path/to/reference_dir"
[[ -z "${PBS_NUM_PPN}" ]] && NCORES=20 || NCORES=${PBS_NUM_PPN}
# When reads look like:
# sample-name_S?_L00?_R1_001.fastq.gz
# sample-name_S?_L00?_R2_001.fastq.gz
@wflynny
wflynny / gpfs_expiration_checker.sh
Created January 24, 2019 20:05
Check file lifetime stats on a GPFS
# I usually put this in my ~/.bash_aliases
# A portion of our GPFS storage removes files after 21 days of creation.
# `stat` does not show creation time, so we have to resort to parsing the
# output of `mmlsattr`
ftime() {
# Usage:
# ftime path/to/file
#
# Outputs:
@wflynny
wflynny / gist:d6c95deadf0c4d1cce4f01a729314dbb
Created January 24, 2019 21:20
Illumina sequencer identifiers in fastq read headers
# Find myself referring to this thread a lot:
# https://www.biostars.org/p/198143/
# However updating codes with what I see at JAX
@Mxxxx - MiSeq
@Dxxxx - HiSeq 2500
@Kxxxx - HiSeq 4000
@NSxxx - NextSeq 500/550
@Axxxxx - NovaSeq
@wflynny
wflynny / scanpy_cluster_proportions.py
Last active October 13, 2023 17:42
Stacked barplot of scRNA-seq cluster proportions per sample
import scanpy.api as sc
import matplotlib.pyplot as plt
import seaborn as sns
def get_cluster_proportions(adata,
cluster_key="cluster_final",
sample_key="replicate",
drop_values=None):
"""
Input
@wflynny
wflynny / build_10x_reference.sh
Last active February 5, 2019 20:01
Building 10X reference genomes from Ensembl
# Visit the Ensembl ftp site.
# ftp://ftp.ensembl.org/pub/release-95/
#
# You want to find data under the following two URLs:
# 1. ftp://ftp.ensembl.org/pub/release-95/fasta/[YOUR_SPECIES_HERE]/dna/
# 2. ftp://ftp.ensembl.org/pub/release-95/gtf/[YOUR_SPECIES_HERE]/
#
# The first file of interest is under the fasta URL:
# [YOUR_SPECIES_HERE].[ASSEMBLY].dna.primary_assembly.fa.gz
# or, if that doesn't exist,
@wflynny
wflynny / jupyter-server
Created June 6, 2019 15:28
Running jupyter on a cluster
#!/usr/bin/env bash
#### PBS preamble
#PBS -N jupyter-server
#PBS -o /path/to/software/logs/jupyter-server.${PBS_JOBID%%.*}.out
#PBS -j oe
#PBS -m n
#PBS -l mem=128GB
@wflynny
wflynny / jupyter-launch.bash
Last active June 6, 2019 15:31
Bash alias/functions to launch jupyter-server
_grab_ip() {
jobid=$1
port=$2
hostname=$(qstat -f ${jobid} | grep -oP "exec_host = (\K[a-z0-9]+)")
echo "http://${hostname}:${port}"
}
_submit_job() {
queue=$1
port=$2
@wflynny
wflynny / hto_demux.py
Last active December 18, 2019 19:54
HTO demuxing in python
from sklearn.cluster import KMeans
import numpy as np
import pandas as pd
import scanpy as sc
def load_hto_matrix(mtx_dir):
raw_htos = sc.read_mtx(mtx_dir + "/matrix.mtx.gz").T
raw_htos.var = pd.read_csv(mtx_dir + "/features.tsv.gz", header=None, index_col=0)
raw_htos.obs = pd.read_csv(mtx_dir + "/barcodes.tsv.gz", header=None, index_col=0)
raw_htos = raw_htos[:, ~raw_htos.var_names.isin(["unmapped"])]
@wflynny
wflynny / susage
Last active July 29, 2020 02:00
Small utility to run top or nvidia-smi on a compute node from the login node
#!/usr/bin/env bash
TEMP=$(getopt -o hsg --long help,snapshot,gpu -n 'susuage' -- "$@")
if [ $? != 0 ] ; then echo "Terminating..." >&2 ; exit 1 ; fi
# Note the quotes around `$TEMP': they are essential!
eval set -- "$TEMP"
SNAPSHOT=false
import re
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--infile", required=True)
parser.add_argument("-o", "--outfile", required=True)
args = parser.parse_args()
gene_matcher = re.compile('\tgene\t.*gene_id (".*?");.*Name (".*?");')
parent_matcher = re.compile('gene_id (".*?");.*Parent (".*?");')