Skip to content

Instantly share code, notes, and snippets.

Avatar

slava ilnytskyy slavailn

  • Lethbridge
View GitHub Profile
@slavailn
slavailn / cluster_sidebar.R
Created Jul 10, 2020
How to attach a side bar with cluster labels to a heatmap (pheatmap). Just a little snippet.
View cluster_sidebar.R
library(pheatmap)
# We need to use cutree inside pheatmap()
# functio, capture the returned object into a variable,
# and than extract the tree, cut it again to the same
# number of clusters and reorder.
# Resulting vector is converted to factor and used in the annotation data frame
# to create cluster labels
x <- some_matrix # matrix to cluster and show as a heatmap
@slavailn
slavailn / gage_wikipath.R
Created Jun 28, 2020
Run GAGE analysis against wikipathways (modify as neccessary)
View gage_wikipath.R
library(gage)
library(org.Hs.eg.db)
library(pathview)
library(rWikiPathways)
library(DESeq2)
## Get annotation from csv file (obtained via biomaRt)
annot <- read.table("../annotation/annotation.csv", sep = ",", header = T,
quote = "\"", fill = T, stringsAsFactors = F)
@slavailn
slavailn / download_with_geoquery.R
Created Jun 25, 2020
Dowload data from GEO with GEOquery (short note)
View download_with_geoquery.R
library(GEOquery)
# Get GSE
gse <- getGEO("GSE147507",GSEMatrix=FALSE)
head(Meta(gse))
# names of all the GSM objects contained in the GSE
names(GSMList(gse))
# and get the first GSM object on the list
@slavailn
slavailn / get_longest_transcript.R
Created Mar 2, 2020
Get longest transcript for a set of select genes and write them out as fasta
View get_longest_transcript.R
library("GenomicFeatures")
library("DESeq2")
library("TxDb.Hsapiens.UCSC.hg19.knownGene")
library("BSgenome.Hsapiens.UCSC.hg19")
library("biomaRt")
setwd("<working_dir>")
# Assuming we are working with DESeq2 projects
# get all genes expressed in the project
@slavailn
slavailn / SPIAcalc.R
Created Aug 12, 2019
Function to calculate SPIA scores for DESeq2 results table
View SPIAcalc.R
resFile <- "DESeq2_result_name" # has to have entrez ids as annotations
compName <- "comparison" # Name of the comparison
calcSPIA <- function(resFile, compName)
{
res <- read.table(resFile, header = T, sep = "\t", fill=T, quote = "\"", stringsAsFactors = F)
# Select differentially expressed genes (adjusted p-value < 0.05)
# This is a named vector, where names are entrez ids, and values are log2 fold changes
ids <- res[res$padj < 0.05,]$entrezgene_id
@slavailn
slavailn / collect_HISAT2_mapping_stats.sh
Created Jun 8, 2019
Collect HISAT2 mapping statistics from multiple summary files located in the current directory and organized them into a table
View collect_HISAT2_mapping_stats.sh
#! /bin/bash
# 22498713 reads; of these:
# 22498713 (100.00%) were unpaired; of these:
# 1754404 (7.80%) aligned 0 times
# 17667104 (78.52%) aligned exactly 1 time
# 3077205 (13.68%) aligned >1 times
# 92.20% overall alignment rate
echo -e "sampleID\ttotal_reads\tunmapped\taligned_one_time\taligned_multiple\talignment_rate"
@slavailn
slavailn / edgeR_GLM_smallRNA.R
Created Feb 20, 2019
Edger GLM approach with 2 treatments over 3 time-points on small RNA tags
View edgeR_GLM_smallRNA.R
library(edgeR)
setwd(<dir>)
files <- list.files("/tag_counts/", full.names = T) # tab delimited file with sequenca tags and raw counts
files
d <- readDGE(files, columns = c(1,2))
counts <- d$counts
sampleFile <- <sampleFile>
sampleInfo <- read.table(sampleFile, sep = "\t", header = T, stringsAsFactors = F)
@slavailn
slavailn / get_bwa_mapping_stats.sh
Created Jan 17, 2019
Get BWA mapping stats (mapped, unmapped reads, number of reads with MAPQ20)
View get_bwa_mapping_stats.sh
#! /bin/bash
echo -e "Samplename\tTotal_reads\tMapped_reads\tUnmapped_reads\tMAPQ20_reads\tPercent_mapped\tPercent_MAPQ20_reads"
for file in ./*bam
do
filename=`basename $file`
samplename=${filename%.bam}
total_reads=`samtools view -c $file`
mapped_reads=`samtools view -c -F 4 $file`
@slavailn
slavailn / Process_ChIP-seq_peaks.R
Created Jan 14, 2019
Process ChIP-seq peaks, based on multiple Bioconductor packages, but centered on ChipSeeker and ClusterProfiler
View Process_ChIP-seq_peaks.R
library(GenomicRanges)
library(ChIPpeakAnno)
library(genomation)
library(biomaRt)
library(ChIPseeker)
library(org.Mm.eg.db)
library(ReactomePA)
library(clusterProfiler)
setwd("<project_dir>")
@slavailn
slavailn / ChipSeeker_vignette.R
Created Jan 10, 2019
Going through ChIPseeker vignette, will modify as necessary for the actual analysis
View ChipSeeker_vignette.R
library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(clusterProfiler)
# Gene data from UCSC hg19
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
# Peak file in bed format (these are MACS files)
files <- getSampleFiles()
files
You can’t perform that action at this time.