Skip to content

Instantly share code, notes, and snippets.

@sgibb
Created June 25, 2013 22:31
Show Gist options
  • Save sgibb/5863062 to your computer and use it in GitHub Desktop.
Save sgibb/5863062 to your computer and use it in GitHub Desktop.
demonstrate differences between strict/relaxed binning
library("MALDIquant")
#' create toy example
p <- c(p1=createMassPeaks(mass=c(1, 1.5, 10), intensity=c(1, 2, 1),
metaData=list(name="p1")),
p2=createMassPeaks(mass=c(0.9, 10.1), intensity=c(1, 1),
metaData=list(name="p2")),
p3=createMassPeaks(mass=c(1.1, 9.9), intensity=c(1, 1),
metaData=list(name="p3")))
#' show details
p
#' show intensity matrix (pre binning)
intensityMatrix(p)
#' "strict" binning
#' 1. all mass values are collected in sorted vector:
mass <- c(0.9, 1, 1.1, 1.5, 9.9, 10, 10.1)
#' 2. this is divided at the largest gap (obviously between 1.5 and 9.9)
#' 3. now the left (0.9 - 1.5) and the right part (9.9 - 10.1) are treated
#' separately
#' 4a right part (c(9.9, 10, 10.1)):
#' 4a.1 the potential bin must not contain two or more mass values from the same
#' sample: c(9.9, 10, 10.1) correspond to c("p2", "p1", "p3")
#' 4a.2 could we create a bin that center mass +/- (center mass * tolerance)
#' contains all mass values? (e.g. tolerance == 0.5; left border: 5; right
#' border: 15)
centerMass <- mean(c(9.9, 10, 10.1))
all((abs(c(9.9, 10, 10.1)-centerMass)/centerMass) < 0.5) # TRUE
#' 4a.3 now the bin is created and all mass values would set to the center mass
#' 4b left part (c(0.9, 1, 1.1, 1.5)):
#' 4b.1 the potential bin contains two mass values from "p1" (1 and 1.5) =>
#' abort, restart at 2. (largest gap is now between 1.1 and 1.5);
#' 4c right part (1.5):
#' 4c.1 only one value from one sample; create a single bin with only one value
#' 4c.2 center mass == value
#' 4c.3 new mass == old mass
#' 4d left part (c(0.9, 1, 1.1)):
#' 4d.1 similar to 4a.1
#' 4d.2 similar to 4a.2
#' 4d.3 similar to 4a.3; new center mass == 1
#'
#' try it:
intensityMatrix(binPeaks(p, method="strict", tolerance=0.5))
#' now the same for the "relaxed" binning method
#' here the rule "a bin must not contain more than one peak from each sample" is
#' deactivated.
#' 4x.1 is replaced by a rule that chooses the highest intensity corresponding
#' to multiple mass values of one sample
#' e.g. for 4b
#' 4b left part (c(0.9, 1, 1.1, 1.5)):
#' 4b.1: c(1, 1.5) from "p1"; choose 1.5 (intensity == 2); exclude 1 (leave it
#' untouched)
#' 4b.2 center mass == 1.16667 (mean(c(0.9, 1.1, 1.5)))
#'
#' everything else is similar to the "strict" method
#'
#' try it:
intensityMatrix(binPeaks(p, method="relaxed", tolerance=0.5))
#' sometime you don't want to keep peaks that are not binned => filter them
#' use the "strict" method and remove bins with only 1 peak (the 1.5 one)
intensityMatrix(filterPeaks(binPeaks(p, method="strict", tolerance=0.5),
minFrequency=2/3))
#' different result for "relaxed" (like expected)
intensityMatrix(filterPeaks(binPeaks(p, method="relaxed", tolerance=0.5),
minFrequency=2/3))
#' if you are interested in the implementation:
#'
#' find the largest gap (step 2 and 3):
#' https://github.com/sgibb/MALDIquant/blob/master/R/binPeaks-functions.R#L105-L194
#' the "strict" method (step 4) is defined in:
#' https://github.com/sgibb/MALDIquant/blob/master/R/grouper-functions.R#L21-L50
#' the "relaxed" method (step 4) is defined in:
#' https://github.com/sgibb/MALDIquant/blob/master/R/grouper-functions.R#L52-L92
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment