vsn {vsn} | R Documentation |
Robust estimation of variance-stabilizing and calibrating transformations for microarray data. This is the main function of this package; see also the vignette vsn.pdf.
vsn(intensities, lts.quantile = 0.5, verbose = TRUE, niter = 10, cvg.check = NULL, pstart = NULL, describe.preprocessing = TRUE)
intensities |
An object that contains intensity values from
a microarray experiment. See
getIntensityMatrix for details.
The intensities are assumed to be the raw
scanner data, summarized over the spots by an image analysis program,
and possibly "background" subtracted.
The intensities must not be logarithmically or otherwise transformed,
and not thresholded or "floored". NAs are not accepted.
See details. |
lts.quantile |
Numeric. The quantile that is used for the resistant least trimmed sum of squares regression. Allowed values are between 0.5 and 1, corresponding to least median sum of squares regression, and to ordinary least sum of squares regression, respectively. |
niter |
Integer. The number of iterations to be used in the least trimmed sum of squares regression. |
verbose |
Logical. If TRUE, some messages are printed. |
pstart |
Numeric vector. Starting values for the model parameters
in the iterative parameter estimation algorithm. If NULL, the function
tries to determine reasonable starting values from the distribution of
intensities . |
describe.preprocessing |
Logical. If TRUE, calibration and
transformation parameters, plus some other information are stored in
the preprocessing slot of the returned object. See details. |
cvg.check |
List. If non-NULL, this allows finer control of the iterative least trimmed sum of squares regression. See details. |
The function calibrates for sample-to-sample variations through
shifting and scaling, and transforms the intensities to a scale where
the variance is approximately independent of the mean intensity.
The variance stabilizing transformation is equivalent to the
natural logarithm in the high-intensity range, and to a
linear transformation in the low-intensity range. In an intermediate
range, the arsinh function interpolates smoothly between the
two. The calibration consists of estimating an offset offs[i]
and a scale factor fac[i]
for each column i
of the matrix
intensities
. Thus, the calibration is:
intensities[k,i] <- intensities[k,i] * fac[i] + offs[i]
The parameters offs[i]
and fac[i]
are estimated through
a robust variant of maximum likelihood. The model assumes that for
the majority of genes the expression levels are not much different
across the samples, i.e., that only a minority of genes (less than
a fraction of lts.quantile
) is differentially expressed.
Format: The format of the matrix of intensities is as follows:
for the two-color printed array technology, each row
corresponds to one spot, and the columns to the different arrays
and wave-lengths (usually red and green, but could be any number).
For example, if there are 10 arrays, the matrix would have 20 columns,
columns 1...10 containing the green intensities, and 11...20 the
red ones. In fact, the ordering of the columns does not matter to
vsn
, but it is your responsibility to keep track of it for
subsequent analyses.
For one-color arrays, each row corresponds to a probe, and each
column to an array.
Performance: This function is slow. That is due to the nested
iteration loops of the numerical optimization of the likelihood function
and the heuristic that identifies the non-outlying data points in the
least trimmed squares regression. For large arrays with many tens of
thousands of probes, you may want to consider random subsetting: that is,
only use a subset of the e.g. 10-20,000 rows of the data matrix
intensities
to fit the parameters, then apply the transformation
to all the data, using vsnh
. An example for this can be
seen in the function normalize.AffyBatch.vsn
, whose code
you can inspect by typing normalize.AffyBatch.vsn
on the R
command line.
Calibration and transformation parameters: The parameters
are stored in the preprocessing
slot of the description
slot of the exprSet
object that
is returned, in the form of a list
with three elements
vsnParams
: a length(2*d) numeric vector of parameters
vsnParamsIter
: an (2*d) x niter numeric matrix that
contains the parameter trajectory during the
iterative fit process (see vsnPlotPar
).
vsnTrimSelection
: a length(n) logical vector that for
each row of the intensities matrix reports whether it was below
(TRUE) or above (FALSE) the trimming threshold.
If intensities
has class
exprSet
, and its description
slot has class MIAME
, then this
list is appended to any existing entries in the preprocessing
slot. Otherwise, the description
object and its
preprocessing
slot are created.
By default, if cvg.check
is NULL
, the function will run
the fixed number niter
of iterations in the least trimmed sum
of squares regression. More fine-grained control can be obtained by
passing a list with elements eps
and n
. If the maximum
change between transformed data values is smaller than eps
for
n
subsequent iterations, then the iteration terminates.
An object of class exprSet
.
Differences
between the columns of the transformed intensities may be interpreted
as "regularized" or "shrunken" log-ratios. For the calibration and
transformation parameters, see the Details section.
Wolfgang Huber http://www.dkfz.de/mga/whuber
Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, Martin Vingron; Bioinformatics (2002) 18 Suppl.1 S96-S104.
Parameter estimation for the calibration and variance stabilization of microarray data, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, and Martin Vingron; Statistical Applications in Genetics and Molecular Biology (2003) Vol. 2 No. 1, Article 3. http://www.bepress.com/sagmb/vol2/iss1/art3.
exprSet-class
, MIAME-class
,
normalize.AffyBatch.vsn
data(kidney) if(interactive()) { x11(width=9, height=4.5) par(mfrow=c(1,2)) } plot(log.na(exprs(kidney)), pch=".", main="log-log") vsnkid = vsn(kidney) ## transform and calibrate plot(exprs(vsnkid), pch=".", main="h-h") if (interactive()) { x11(width=9, height=4) par(mfrow=c(1,3)) } meanSdPlot(vsnkid) vsnPlotPar(vsnkid, "factors") vsnPlotPar(vsnkid, "offsets") ## this should always hold true params = preproc(description(vsnkid))$vsnParams stopifnot(all(vsnh(exprs(kidney), params) == exprs(vsnkid)))