Skip to content

Instantly share code, notes, and snippets.

@allenday
allenday / scalding_cos_cor.scala
Last active August 29, 2015 13:58
Scalding cosine and correlation coefficient
// cosine distance and pearson correlation
// see http://ow.ly/vtm44
package com.allenday
import com.twitter.scalding._
import com.twitter.scalding.mathematics._
import com.twitter.scalding.mathematics.Matrix._
class SimJob(args : Args) extends Job(args) {
@allenday
allenday / nomadlist.io_cost.vs.score
Last active May 8, 2018 20:17
Plot nomadlist.io score as a function of cost
nomad=fromJSON(paste(scan("http://nomadlist.io/api/v1", what="c",sep=""),collapse=""))
nomad.has=function(){names(nomad[3]$cities[[1]])}
nomad.get=function(y){unlist(lapply(nomad[3]$cities,function(x){x[[y]]}))}
nomadCost = as.numeric(nomadvar("nomadCost")[names(nomadvar("nomadCost"))=="USD"])
nomadScore = as.numeric(nomadvar("nomadScore"))
plot(nomadCost, nomadScore)
cost.vs.score = lm(nomadScore ~ nomadCost)
abline(cost.vs.score,col="red")
{
"metadata": {
"name": "",
"signature": "sha256:69ee4419084fa9384e9f67f3928f7bad7602644de1278e93ed44ca70930b09cb"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
Verifying that +allenday is my Bitcoin username. You can send me #bitcoin here: https://onename.io/allenday
@allenday
allenday / Dockerfile
Last active January 15, 2017 06:51
Dockerfile for scikit-learn, tensorflow, jupyter
MAINTAINER Allen Day <allenday@google.com>
FROM tensorflow/tensorflow
FROM gcr.io/tensorflow/tensorflow:latest
###
### install more system packages, e.g.
###
#RUN apt-get update
#RUN apt-get install -y \
# gcc
wget -O - -q 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&rettype=runinfo&db=sra&term=PRJNA347566' | head -3
Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR4451123,2016-10-26,2016-10-26,1335786,201703669,1335786,150,67,,https://sra-download.ncbi.nlm.nih.gov/srapub/SRR4451123,SRX2269170,c6b9959a2057-TSCA,AMPLICON,PCR,GENOMIC,PAIRED,0,0,ILLUMINA,NextSeq 500,SRP092005,PRJNA347566,,347566,SRS1760286,SAMN05937632,simple,3483,Cannabis sativa,c6b9959a2057-TSCA_XXX,,,,,,,no,,,,,,SRA486992,,public,FEBDF4A4FCD9FA47F7BD6F3D50B2D02B,EE2D
for i in `wget -O - -q 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&rettype=runinfo&db=sra&term=PRJNA347566' | head | grep SRR | perl -ne 'chomp;@F=split/,/;$F[9]=~s#.+/##;print $F[9],"\n"'` ; do
echo $i ;
~/sratoolkit.2.8.1-2-ubuntu64/bin/fastq-dump --split-files -F $i ;
java -jar ~/bin/picard.jar FastqToSam F1=$i.fastq O=$i.bam SAMPLE_NAME=$i ;
gsutil cp $i.bam gs://$BUCKET_NAME/open-cannabis/samples/;
rm -v $i*;
done
wget ftp://ftp.ncbi.nlm.nih.gov/sra/wgs_aux/MN/PR/MNPR01/MNPR01.1.fsa_nt.gz
wget ftp://ftp.ncbi.nlm.nih.gov/sra/wgs_aux/MN/PR/MNPR01/MNPR01.2.fsa_nt.gz
wget ftp://ftp.ncbi.nlm.nih.gov/sra/wgs_aux/AG/QN/AGQN01/AGQN01.1.fsa_nt.gz
wget ftp://ftp.ncbi.nlm.nih.gov/sra/wgs_aux/AG/QN/AGQN01/AGQN01.2.fsa_nt.gz
wget ftp://ftp.ncbi.nlm.nih.gov/sra/wgs_aux/LK/UA/LKUA01/LKUA01.1.fsa_nt.gz
wget ftp://ftp.ncbi.nlm.nih.gov/sra/wgs_aux/LK/UA/LKUA01/LKUA01.2.fsa_nt.gz
wget ftp://ftp.ncbi.nlm.nih.gov/sra/wgs_aux/LK/UB/LKUB01/LKUB01.1.fsa_nt.gz
gunzip *nt.gz
git clone https://github.com/cbib/MIX.git
./MIX/bin/preprocessing.py -o contigs.fa AGQN01.1.fsa_nt AGQN01.2.fsa_nt LKUA01.1.fsa_nt LKUA01.2.fsa_nt LKUB01.1.fsa_nt MNPR01.1.fsa_nt MNPR01.2.fsa_nt
nucmer --maxmatch --prefix alignments contigs.fa contigs.fa
show-coords -rcl alignments.delta > alignments.coords
./MIX/bin/Mix.py -a alignments.coords -c contigs.fa -o output_dir/ -C 300 -A 200
wget -O - -q 'https://s3-us-west-1.amazonaws.com/strainseek' | xml_pp | grep fastq | perl -ne 'm#>(.+?)<#;print qq(https://s3-us-west-1.amazonaws.com/strainseek/$1\n)' | xargs wget -c