do_findChromPeaks_massifquant {xcms} | R Documentation |
Core API function for massifquant peak detection
Description
Massifquant is a Kalman filter (KF)-based chromatographic peak
detection for XC-MS data in centroid mode. The identified peaks
can be further refined with the centWave method (see
do_findChromPeaks_centWave
for details on centWave)
by specifying withWave = TRUE
.
Usage
do_findChromPeaks_massifquant(mz, int, scantime, valsPerSpect, ppm = 10,
peakwidth = c(20, 50), snthresh = 10, prefilter = c(3, 100),
mzCenterFun = "wMean", integrate = 1, mzdiff = -0.001,
fitgauss = FALSE, noise = 0, verboseColumns = FALSE,
criticalValue = 1.125, consecMissedLimit = 2, unions = 1,
checkBack = 0, withWave = FALSE)
Arguments
mz |
Numeric vector with the individual m/z values from all scans/
spectra of one file/sample.
|
int |
Numeric vector with the individual intensity values from all
scans/spectra of one file/sample.
|
scantime |
Numeric vector of length equal to the number of
spectra/scans of the data representing the retention time of each scan.
|
valsPerSpect |
Numeric vector with the number of values for each
spectrum.
|
ppm |
numeric(1) defining the maximal tolerated m/z deviation in
consecutive scans in parts per million (ppm) for the initial ROI
definition.
|
peakwidth |
numeric(2) with the expected approximate
peak width in chromatographic space. Given as a range (min, max)
in seconds.
|
snthresh |
numeric(1) defining the signal to noise ratio cutoff.
|
prefilter |
numeric(2) : c(k, I) specifying the prefilter
step for the first analysis step (ROI detection). Mass traces are only
retained if they contain at least k peaks with intensity
>= I .
|
mzCenterFun |
Name of the function to calculate the m/z center of the
chromatographic peak. Allowed are: "wMean" : intensity weighted
mean of the peak's m/z values, "mean" : mean of the peak's m/z
values, "apex" : use the m/z value at the peak apex,
"wMeanApex3" : intensity weighted mean of the m/z value at the
peak apex and the m/z values left and right of it and "meanApex3" :
mean of the m/z value of the peak apex and the m/z values left and right
of it.
|
integrate |
Integration method. For integrate = 1 peak limits
are found through descent on the mexican hat filtered data, for
integrate = 2 the descent is done on the real data. The latter
method is more accurate but prone to noise, while the former is more
robust, but less exact.
|
mzdiff |
numeric(1) representing the minimum difference in m/z
dimension required for peaks with overlapping retention times; can be
negative to allow overlap. During peak post-processing, peaks
defined to be overlapping are reduced to the one peak with the largest
signal.
|
fitgauss |
logical(1) whether or not a Gaussian should be fitted
to each peak. This affects mostly the retention time position of the
peak.
|
noise |
numeric(1) allowing to set a minimum intensity required
for centroids to be considered in the first analysis step (centroids with
intensity < noise are omitted from ROI detection).
|
verboseColumns |
logical(1) whether additional peak meta data
columns should be returned.
|
criticalValue |
numeric(1) . Suggested values:
(0.1-3.0 ). This setting helps determine the the Kalman Filter
prediciton margin of error. A real centroid belonging to a bonafide
peak must fall within the KF prediction margin of error. Much like
in the construction of a confidence interval, criticalVal loosely
translates to be a multiplier of the standard error of the prediction
reported by the Kalman Filter. If the peak in the XC-MS sample have
a small mass deviance in ppm error, a smaller critical value might be
better and vice versa.
|
consecMissedLimit |
integer(1) Suggested values: (1,2,3 ).
While a peak is in the proces of being detected by a Kalman Filter, the
Kalman Filter may not find a predicted centroid in every scan. After 1
or more consecutive failed predictions, this setting informs Massifquant
when to stop a Kalman Filter from following a candidate peak.
|
unions |
integer(1) set to 1 if apply t-test union on
segmentation; set to 0 if no t-test to be applied on
chromatographically continous peaks sharing same m/z range.
Explanation: With very few data points, sometimes a Kalman Filter stops
tracking a peak prematurely. Another Kalman Filter is instantiated
and begins following the rest of the signal. Because tracking is done
backwards to forwards, this algorithmic defect leaves a real peak
divided into two segments or more. With this option turned on, the
program identifies segmented peaks and combines them (merges them)
into one with a two sample t-test. The potential danger of this option
is that some truly distinct peaks may be merged.
|
checkBack |
integer(1) set to 1 if turned on; set to
0 if turned off. The convergence of a Kalman Filter to a peak's
precise m/z mapping is very fast, but sometimes it incorporates erroneous
centroids as part of a peak (especially early on). The scanBack
option is an attempt to remove the occasional outlier that lies beyond
the converged bounds of the Kalman Filter. The option does not directly
affect identification of a peak because it is a postprocessing measure;
it has not shown to be a extremely useful thus far and the default is set
to being turned off.
|
withWave |
logical(1) if TRUE , the peaks identified first
with Massifquant are subsequently filtered with the second step of the
centWave algorithm, which includes wavelet estimation.
|
Details
This algorithm's performance has been tested rigorously
on high resolution LC/OrbiTrap, TOF-MS data in centroid mode.
Simultaneous kalman filters identify peaks and calculate their
area under the curve. The default parameters are set to operate on
a complex LC-MS Orbitrap sample. Users will find it useful to do some
simple exploratory data analysis to find out where to set a minimum
intensity, and identify how many scans an average peak spans. The
consecMissedLimit
parameter has yielded good performance on
Orbitrap data when set to (2
) and on TOF data it was found best
to be at (1
). This may change as the algorithm has yet to be
tested on many samples. The criticalValue
parameter is perhaps
most dificult to dial in appropriately and visual inspection of peak
identification is the best suggested tool for quick optimization.
The ppm
and checkBack
parameters have shown less influence
than the other parameters and exist to give users flexibility and
better accuracy.
Value
A matrix, each row representing an identified chromatographic peak,
with columns:
- mz
Intensity weighted mean of m/z values of the peaks across
scans.
- mzmin
Minumum m/z of the peak.
- mzmax
Maximum m/z of the peak.
- rtmin
Minimum retention time of the peak.
- rtmax
Maximum retention time of the peak.
- rt
Retention time of the peak's midpoint.
- into
Integrated (original) intensity of the peak.
- maxo
Maximum intensity of the peak.
If withWave
is set to TRUE
, the result is the same as
returned by the do_findChromPeaks_centWave
method.
Author(s)
Christopher Conley
References
Conley CJ, Smith R, Torgrip RJ, Taylor RM, Tautenhahn R and Prince JT
"Massifquant: open-source Kalman filter-based XC-MS isotope trace feature
detection" Bioinformatics 2014, 30(18):2636-43.
See Also
massifquant
for the standard user interface method.
Other core peak detection functions: do_findChromPeaks_centWaveWithPredIsoROIs
,
do_findChromPeaks_centWave
,
do_findChromPeaks_matchedFilter
,
do_findPeaks_MSW
Examples
library(faahKO)
library(xcms)
cdfpath <- system.file("cdf", package = "faahKO")
cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE)
## Read the first file
xraw <- xcmsRaw(cdffiles[1])
## Extract the required data
mzVals <- xraw@env$mz
intVals <- xraw@env$intensity
## Define the values per spectrum:
valsPerSpect <- diff(c(xraw@scanindex, length(mzVals)))
## Perform the peak detection using massifquant
res <- do_findChromPeaks_massifquant(mz = mzVals, int = intVals,
scantime = xraw@scantime, valsPerSpect = valsPerSpect)
head(res)
[Package
xcms version 3.6.0
Index]