Created
June 25, 2013 22:31
-
-
Save sgibb/5863062 to your computer and use it in GitHub Desktop.
demonstrate differences between strict/relaxed binning
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library("MALDIquant") | |
#' create toy example | |
p <- c(p1=createMassPeaks(mass=c(1, 1.5, 10), intensity=c(1, 2, 1), | |
metaData=list(name="p1")), | |
p2=createMassPeaks(mass=c(0.9, 10.1), intensity=c(1, 1), | |
metaData=list(name="p2")), | |
p3=createMassPeaks(mass=c(1.1, 9.9), intensity=c(1, 1), | |
metaData=list(name="p3"))) | |
#' show details | |
p | |
#' show intensity matrix (pre binning) | |
intensityMatrix(p) | |
#' "strict" binning | |
#' 1. all mass values are collected in sorted vector: | |
mass <- c(0.9, 1, 1.1, 1.5, 9.9, 10, 10.1) | |
#' 2. this is divided at the largest gap (obviously between 1.5 and 9.9) | |
#' 3. now the left (0.9 - 1.5) and the right part (9.9 - 10.1) are treated | |
#' separately | |
#' 4a right part (c(9.9, 10, 10.1)): | |
#' 4a.1 the potential bin must not contain two or more mass values from the same | |
#' sample: c(9.9, 10, 10.1) correspond to c("p2", "p1", "p3") | |
#' 4a.2 could we create a bin that center mass +/- (center mass * tolerance) | |
#' contains all mass values? (e.g. tolerance == 0.5; left border: 5; right | |
#' border: 15) | |
centerMass <- mean(c(9.9, 10, 10.1)) | |
all((abs(c(9.9, 10, 10.1)-centerMass)/centerMass) < 0.5) # TRUE | |
#' 4a.3 now the bin is created and all mass values would set to the center mass | |
#' 4b left part (c(0.9, 1, 1.1, 1.5)): | |
#' 4b.1 the potential bin contains two mass values from "p1" (1 and 1.5) => | |
#' abort, restart at 2. (largest gap is now between 1.1 and 1.5); | |
#' 4c right part (1.5): | |
#' 4c.1 only one value from one sample; create a single bin with only one value | |
#' 4c.2 center mass == value | |
#' 4c.3 new mass == old mass | |
#' 4d left part (c(0.9, 1, 1.1)): | |
#' 4d.1 similar to 4a.1 | |
#' 4d.2 similar to 4a.2 | |
#' 4d.3 similar to 4a.3; new center mass == 1 | |
#' | |
#' try it: | |
intensityMatrix(binPeaks(p, method="strict", tolerance=0.5)) | |
#' now the same for the "relaxed" binning method | |
#' here the rule "a bin must not contain more than one peak from each sample" is | |
#' deactivated. | |
#' 4x.1 is replaced by a rule that chooses the highest intensity corresponding | |
#' to multiple mass values of one sample | |
#' e.g. for 4b | |
#' 4b left part (c(0.9, 1, 1.1, 1.5)): | |
#' 4b.1: c(1, 1.5) from "p1"; choose 1.5 (intensity == 2); exclude 1 (leave it | |
#' untouched) | |
#' 4b.2 center mass == 1.16667 (mean(c(0.9, 1.1, 1.5))) | |
#' | |
#' everything else is similar to the "strict" method | |
#' | |
#' try it: | |
intensityMatrix(binPeaks(p, method="relaxed", tolerance=0.5)) | |
#' sometime you don't want to keep peaks that are not binned => filter them | |
#' use the "strict" method and remove bins with only 1 peak (the 1.5 one) | |
intensityMatrix(filterPeaks(binPeaks(p, method="strict", tolerance=0.5), | |
minFrequency=2/3)) | |
#' different result for "relaxed" (like expected) | |
intensityMatrix(filterPeaks(binPeaks(p, method="relaxed", tolerance=0.5), | |
minFrequency=2/3)) | |
#' if you are interested in the implementation: | |
#' | |
#' find the largest gap (step 2 and 3): | |
#' https://github.com/sgibb/MALDIquant/blob/master/R/binPeaks-functions.R#L105-L194 | |
#' the "strict" method (step 4) is defined in: | |
#' https://github.com/sgibb/MALDIquant/blob/master/R/grouper-functions.R#L21-L50 | |
#' the "relaxed" method (step 4) is defined in: | |
#' https://github.com/sgibb/MALDIquant/blob/master/R/grouper-functions.R#L52-L92 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment