Skip to content

Instantly share code, notes, and snippets.

View gu-mi's full-sized avatar

Gu Mi gu-mi

  • Sanofi
  • Cambridge, MA
View GitHub Profile
@rasmusab
rasmusab / the-probability-my-son-will-be-stung-by-a-bumblebee.R
Created August 14, 2017 12:17
R and Stan script calculating the probability that my son will be stung by a bumblebee.
library(tidyverse)
library(purrr)
library(rstan)
### Defining the data ###
#########################
bumblebees <- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0)
toddler_steps <- c(26, 16, 37, 101, 12, 122, 90, 55, 56, 39, 55, 15, 45, 8)
@hadley
hadley / ds-training.md
Created March 13, 2015 18:49
My advise on what you need to do to become a data scientist...

If you were to give recommendations to your "little brother/sister" on things that they need to do to become a data scientist, what would those things be?

I think the "Data Science Venn Diagram" (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) is a great place to start. You need three things to be a good data scientist:

  • Statistical knowledge
  • Programming/hacking skills
  • Domain expertise

Statistical knowledge

@araastat
araastat / ggkm.R
Last active July 6, 2018 17:39
Plotting a Kaplan-Meier curve using ggplot. ggkmTable.R adds a table below the plot showing numbers at risk at different times.
#’ Create a Kaplan-Meier plot using ggplot2
#’
#’ @param sfit a \code{\link[survival]{survfit}} object
#’ @param returns logical: if \code{TRUE}, return an ggplot object
#’ @param xlabs x-axis label
#’ @param ylabs y-axis label
#’ @param ystratalabs The strata labels. \code{Default = levels(summary(sfit)$strata)}
#’ @param ystrataname The legend name. Default = “Strata”
#’ @param timeby numeric: control the granularity along the time-axis
#’ @param main plot title

The function modified

multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL, 
                      labs=list(), labpos=list(c(0.5,0.03), c(0.03,0.5))) {
  require(grid)
  
  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)
  
@cjbayesian
cjbayesian / AUC.R
Last active January 7, 2017 04:50
Calculate and plot AUC
###################################################
##
## Functions for calculating AUC and plotting ROC
## Corey Chivers, 2013
## corey.chivers@mail.mcgill.ca
##
###################################################
## Descrete integration for AUC calc
@akloster
akloster / screw_all.py
Last active December 24, 2015 14:49
Script to download all "complete" bacterial genomes from NCBI and prepare GC skew plots from them.
# -*- coding: utf-8 -*-
# <nbformat>3.0</nbformat>
# <codecell>
from numpy import *
from pandas import *
from scipy.signal import argrelextrema
@schaunwheeler
schaunwheeler / xlsxToR.r
Last active December 11, 2020 16:41
Import an xlsx file into R by parsing the file's XML structure.
# The MIT License (MIT)
#
# Copyright (c) 2012 Schaun Jacob Wheeler
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
@davidliwei
davidliwei / getinsertsize.py
Last active October 25, 2022 06:15
Estimating NGS paired-end read insert size (or fragment length) from SAM/BAM files
#!/usr/bin/env python
'''
Automatically estimate insert size of the paired-end reads for a given SAM/BAM file.
Usage: getinsertsize.py <SAM file> or samtools view <BAM file> | getinsertsize.py -
Author: Wei Li
Copyright (c) <2015> <Wei Li>
@malcook
malcook / bamTabulateGaps.R
Created June 9, 2011 15:20
bamTabulateGaps : For a bam file containing gapped alignments tabulate the (unstranded) coverage of all gaps therein.
library(IRanges) # for: coverage, psetdiff, etc
library(GenomicRanges) # for: readGappedAlignments,
library(Rsamtools) # for: reading bam files scanBam, countBam etc
library(rtracklayer) # for: track file IO (bed, wig, etc)
library(sqldf) # for: querying dataframes using SQL\
bamTabulateGaps <- function(bamPath,
bedPath=paste(gsub('.bam$','',bamPath),'.junctions.bed',sep=''),
trackName=gsub('\\..*','',basename(bamPath)),
trackLine=sprintf("track name=%s graphType=junctions",trackName),