Skip to content

Instantly share code, notes, and snippets.

View PoisonAlien's full-sized avatar
🌻

Anand Mayakonda PoisonAlien

🌻
  • Heidelberg
  • 06:52 (UTC +02:00)
View GitHub Profile
@PoisonAlien
PoisonAlien / bwview.sh
Created July 27, 2023 09:13
subset a bigWig file
#!/usr/bin/env bash
#Script to subset a bigWig file for user specific loci
#MIT License (Anand Mayakonda; anandmt3@gmail.com)
function usage (){
echo "Subset a bigWig file for genomic loci.
Requires UCSC kentutils bigWigToBedGraph and bedGraphToBigWig to be installed
Binaries available from: https://hgdownload.soe.ucsc.edu/admin/exe/
pipeline_dir = "./"
echo "Downloading VEP cache.." 1>&2
mkdir -p ${pipeline_dir}/resources/vep_cache/
cd ${pipeline_dir}/resources/vep_cache/
curl -O https://ftp.ensembl.org/pub/release-107/variation/indexed_vep_cache/homo_sapiens_vep_107_GRCh38.tar.gz
tar -xzf homo_sapiens_vep_107_GRCh38.tar.gz -C ./
wget https://ftp.ensembl.org/pub/release-107/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip -c Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz | bgzip > Homo_sapiens.GRCh38.dna.primary_assembly.fa.bgzip
#!/usr/bin/env nextflow
VERSION="2.0.0"
System.err.println "------------------------------------"
System.err.println " Annotate VCF files"
System.err.println " ${VERSION}"
System.err.println "------------------------------------"
params.vcf = "my.vcf.gz"
@PoisonAlien
PoisonAlien / createproject.sh
Last active November 28, 2021 11:41
A minimal project template directory structure that I use for my Bioinformatic projects
#!/usr/bin/env bash
#A minimal project template structure that I use for my Bioinformatic projects
#MIT License (Anand Mayakonda; anandmt3@gmail.com)
function usage() {
echo "createproject.sh - Create a project template directory structure
Usage: createproject.sh [option] <project_name>
#Wrapper around goseq
#'@param assayedGenes total gene IDs that were measured
#'@param deGenes DE gene IDs
#'@param source_id Can be `ensGene` or `geneSymbol`
#'@param hyperGeo Dfault TRUE. Set to FALSE for rna-seq data
goseq_wrapper = function(assayedGenes, deGenes, source_id = "ensGene", hyperGeo = TRUE){
gene_vector = as.integer(assayedGenes %in% deGenes)
names(gene_vector)= assayedGenes
pwf = suppressWarnings(suppressMessages(goseq::nullp(DEgenes = gene_vector, genome = "hg19", id = source_id, plot.fit = FALSE)))
####################################################################################
#
# Best-practice 450k/EPIC QC and preprocessing workflow for the PPCG project
#
# creator: Pavlo Lutsik
#
# 30.01.2021
####################################################################################
library(RnBeads)
@PoisonAlien
PoisonAlien / ntcounts.c
Last active October 15, 2021 12:56
Tool to extract nucleotide counts at user specific loci
//A minimal program to extract nucelotide counts of selected genomic loci from the BAM file
//gcc -g -O3 -pthread ntcounts.c -lhts -Ihts -o ntcounts
//MIT License
//Copyright (c) 2021 Anand Mayakonda <anandmt3@gmail.com>
#include <unistd.h>
#include <stdio.h>
# Get the COSMIC variant file from here: https://cancer.sanger.ac.uk/cosmic/download (for. ex: CosmicCompleteTargetedScreensMutantExport.tsv.gz)
# You will have to register and sign in
# Readin only these selected columns: `Gene name GENOMIC_MUTATION_ID Mutation AA Mutation Description Mutation genome position SNP FATHMM prediction HGVSG`
cosm = data.table::fread(cmd = "zcat CosmicCompleteTargetedScreensMutantExport.tsv.gz | cut -f 1,17,21,22,26,28,30,40 | sed 1d | sort -k1,2", header = FALSE)
csom = cosm[!V2 %in% ""]
csom = csom[!V4 %in% "Substitution - coding silent"] #Remove silent variants
csom = csom[!V4 %in% ""] #Remove vars with no sub. type variants
csom[, id := paste0(V2, ":", V3)]
@PoisonAlien
PoisonAlien / compile_bwtool
Last active November 29, 2020 07:55
Compile bwtool
git clone 'https://github.com/CRG-Barcelona/bwtool'
git clone 'https://github.com/CRG-Barcelona/libbeato'
git clone https://github.com/madler/zlib
cd libbeato/
git checkout 0c30432af9c7e1e09ba065ad3b2bc042baa54dc2
./configure
make
cd ..
@PoisonAlien
PoisonAlien / geneCloud.r
Created May 21, 2020 09:39
Plots wordcloud from MAF object
#' Plots wordcloud.
#'
#' @description Plots word cloud of mutated genes or altered cytobands with size proportional to the event frequency.
#' @param input an \code{\link{MAF}} or \code{\link{GISTIC}} object generated by \code{\link{read.maf}} or \code{\link{readGistic}}
#' @param minMut Minimum number of samples in which a gene is required to be mutated.
#' @param col vector of colors to choose from.
#' @param top Just plot these top n number of mutated genes.
#' @param genesToIgnore Ignore these genes.
#' @param ... Other options passed to \code{\link{wordcloud}}
#' @return nothing.