Skip to content

Instantly share code, notes, and snippets.

View gireeshkbogu's full-sized avatar

Gireesh Bogu gireeshkbogu

View GitHub Profile
@gireeshkbogu
gireeshkbogu / identify_tissue-specific_genes_using_median_expression.r
Last active June 25, 2017 17:12
How to identify tissue-specific genes
# Aim: Identifying genes that are at least 2 fold higher mRNA levels (FPKM) in a particular tissue as
# -compared to all other tissues.
# Author: Gireesh Bogu
# Date: Jun 25th, 2017
# Location: CRG, Barcelona
# Problem: How to idenitfy tissue-specific genes especially when you have
# - large number of tissues (>50) and even larger number of samples per tissue (>100 for example).
# GTEx (2) has 53 tissue sites and each tissue site has 10 to 400 samples
@gireeshkbogu
gireeshkbogu / multi_join.py
Last active June 1, 2017 13:20
join multiple files (by using their location instead of specifying each file name) with similar repeat_ids and renames the columns with file names.
# Author: Gireesh Bogu
# Location: CRG, Barcelona
# Date: June 1, 2017
#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# what it does: joins multiple files (by using their location instead of specifying each file name) with --
# ---similar repeat_ids and renames the columns with file names.
# file1 = SRRX10101
@gireeshkbogu
gireeshkbogu / convert_3columns_into_a_matrix.py
Last active May 31, 2017 09:04
How to convert a bigdata file (200 million rows) with three columns into a matrix file?
# Author: Gireesh Bogu, Date: 27th May 2017, Place: CRG, Barcelona
# worked on a file with 200 million rows and 3 columns (50 GB file) (Python 2.7)
# make sure the file has 3 columns with a proper header
# make sure there are no duplicates in the file
# make sure the file is tab delimited
###################################################################################
# USE THIS IF IT IS NOT A BIG FILE
import pandas as pd
# How to plot a BIG data set (600 million rows/values with 8555 keys)
# Use the follow example!!
library(dplyr)
library(ggplot2)
data(diamonds)
# plot density of different keys
ggplot(diamonds, aes(x=depth)) + geom_line(aes(color= cut), stat="density", size=0.4, alpha=0.4)
@gireeshkbogu
gireeshkbogu / annoying_flatmate_alerts.py
Last active March 15, 2023 02:34
Annoying Flatmate Alerts
# Author: Gireesh Bogu
# Location: Barcelona
# Time: Dec 21, 2016
# Aim-1 [Accomplished]: Sending Flat Rental And Utilities Bills To My Forgetful Flatmate Every First Day Of The Month :/
# Aim-2 [Pending]: Calculating bills from the bank account automatically (This is tricky because of two reasons: (1) probably bank information is hard to access and (2) bills timing on bank statemenet often do not match with the actual monthly bills)
# Aim-3 [Pending]: Aim-1 is using gmail but it is nice to extend this to Facebook as she checks it more often than gmail ;)
# Add the below usage code to crontab (open it using this command: crontab -e)
# Usage: * 09 1 * * /fullpath/annoyingFlatmateAlerts.py (sends email at 9 A.M at every first day of the month)
@gireeshkbogu
gireeshkbogu / convert_GTF_to_BED12.sh
Last active April 17, 2024 03:21
How to convert GTF format into BED12 or BIGBED format?
# see below for UPDATES that include more shorter ways of conversions
# How to convert GTF format into BED12 format (Human-hg19)?
# How to convert GTF or BED format into BIGBED format?
# Why BIGBED (If GTF or BED file is very large to upload in UCSC, you can use trackHubs. However trackHubs do not accept either of the formats. Therefore you would need bigBed format)
# First, download UCSC scripts
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/gtfToGenePred
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/genePredToBed
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed
wget https://github.com/downloads/taoliu/MACS/MACS-1.4.2-1.tar.gz
tar -zxvf MACS-1.4.2-1.tar.gz
python setup.py install --prefix /users/rg/gbogu/software/MACS-1.4.2
export PYTHONPATH=/users/rg/gbogu/software/MACS-1.4.2/lib/python2.7/site-packages/:$PYTHONPATH
export PATH="/users/rg/gbogu/software/MACS-1.4.2/bin/:$PATH"
library(pheatmap)
a <-read.table("expression", head=T)
head(a)
GTEX.ZP4G.0526.SM.4YCED GTEX.ZYFD.2126.SM.5E43D
a1 3.469363 2.903798
a2 2.551825 -1.003092
a3 2.841332 2.903798
a6 4.936489 5.225726
a7 6.300763 6.574336
MYdata <- data.frame(Age = rep(c(0,1,3,6,9,12), each=20),
Richness = rnorm(120, 10000, 2500))
ggplot(data = MYdata, aes(x = Age, y = Richness)) +
geom_boxplot(aes(fill=factor(Age))) +
geom_point(aes(color = factor(Age))) +
scale_x_continuous(breaks = c(0, 1, 3, 6, 9, 12)) +
library(ggplot2)
ggplot(mtcars, aes(x=wt, y=disp, colour=cyl, size=gear)) +
geom_point(shape=19, alpha=0.8)+ scale_colour_gradientn(colours=rainbow(10)) +
stat_smooth( method="lm", size=0.5, colour="black")