Skip to content

Instantly share code, notes, and snippets.

View davetang's full-sized avatar
🦀
🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀

Dave Tang davetang

🦀
🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀🦀
View GitHub Profile
@mick001
mick001 / logistic_regression.R
Last active October 1, 2023 10:22
Logistic regression tutorial code. Full article available at http://datascienceplus.com/perform-logistic-regression-in-r/
# Load the raw training data and replace missing values with NA
training.data.raw <- read.csv('train.csv',header=T,na.strings=c(""))
# Output the number of missing values for each column
sapply(training.data.raw,function(x) sum(is.na(x)))
# Quick check for how many different values for each feature
sapply(training.data.raw, function(x) length(unique(x)))
# A visual way to check for missing data
@richardcornish
richardcornish / git.md
Last active August 11, 2023 08:44
Enough Git for your résumé in 100ish lines
@arq5x
arq5x / cadd_compress.sh
Last active March 30, 2017 09:09
Various attempts at compressing the raw CADD datasets for use with GEMINI (and by others).
# Download the raw CADD TSV and Tabix index (no annotations, just scores)
wget http://krishna.gs.washington.edu/download/CADD/v1.0/whole_genome_SNVs.tsv.gz
wget http://krishna.gs.washington.edu/download/CADD/v1.0/whole_genome_SNVs.tsv.gz.tbi
# it is big. 79Gb
ls -ltrh whole_genome_SNVs.tsv.gz
-rw-r--r-- 1 arq5x users 79G Sep 26 01:44 whole_genome_SNVs.tsv.gz
# for testing, let's play with the chr22 intervals
tabix whole_genome_SNVs.tsv.gz 22 | bgzip > whole_genome_SNVs.tsv.22.gz