vsn {vsn}R Documentation

Variance stabilization and calibration for microarray data.

Description

Robust estimation of variance-stabilizing and calibrating transformations for microarray data. This is the main function of this package; see also the vignette vsn.pdf.

Usage

vsn(intensities,
    lts.quantile = 0.5,
    verbose      = TRUE,
    niter        = 10,
    cvg.check    = NULL,
    pstart       = NULL,
    describe.preprocessing = TRUE)

Arguments

intensities An object that contains intensity values from a microarray experiment. See getIntensityMatrix for details. The intensities are assumed to be the raw scanner data, summarized over the spots by an image analysis program, and possibly "background" subtracted. The intensities must not be logarithmically or otherwise transformed, and not thresholded or "floored". NAs are not accepted. See details.
lts.quantile Numeric. The quantile that is used for the resistant least trimmed sum of squares regression. Allowed values are between 0.5 and 1, corresponding to least median sum of squares regression, and to ordinary least sum of squares regression, respectively.
niter Integer. The number of iterations to be used in the least trimmed sum of squares regression.
verbose Logical. If TRUE, some messages are printed.
pstart Numeric vector. Starting values for the model parameters in the iterative parameter estimation algorithm. If NULL, the function tries to determine reasonable starting values from the distribution of intensities.
describe.preprocessing Logical. If TRUE, calibration and transformation parameters, plus some other information are stored in the preprocessing slot of the returned object. See details.
cvg.check List. If non-NULL, this allows finer control of the iterative least trimmed sum of squares regression. See details.

Details

The function calibrates for sample-to-sample variations through shifting and scaling, and transforms the intensities to a scale where the variance is approximately independent of the mean intensity. The variance stabilizing transformation is equivalent to the natural logarithm in the high-intensity range, and to a linear transformation in the low-intensity range. In an intermediate range, the arsinh function interpolates smoothly between the two. The calibration consists of estimating an offset offs[i] and a scale factor fac[i] for each column i of the matrix intensities. Thus, the calibration is:

intensities[k,i] <- intensities[k,i] * fac[i] + offs[i]

The parameters offs[i] and fac[i] are estimated through a robust variant of maximum likelihood. The model assumes that for the majority of genes the expression levels are not much different across the samples, i.e., that only a minority of genes (less than a fraction of lts.quantile) is differentially expressed.

Format: The format of the matrix of intensities is as follows: for the two-color printed array technology, each row corresponds to one spot, and the columns to the different arrays and wave-lengths (usually red and green, but could be any number). For example, if there are 10 arrays, the matrix would have 20 columns, columns 1...10 containing the green intensities, and 11...20 the red ones. In fact, the ordering of the columns does not matter to vsn, but it is your responsibility to keep track of it for subsequent analyses. For one-color arrays, each row corresponds to a probe, and each column to an array.

Performance: This function is slow. That is due to the nested iteration loops of the numerical optimization of the likelihood function and the heuristic that identifies the non-outlying data points in the least trimmed squares regression. For large arrays with many tens of thousands of probes, you may want to consider random subsetting: that is, only use a subset of the e.g. 10-20,000 rows of the data matrix intensities to fit the parameters, then apply the transformation to all the data, using vsnh. An example for this can be seen in the function normalize.AffyBatch.vsn, whose code you can inspect by typing normalize.AffyBatch.vsn on the R command line.

Calibration and transformation parameters: The parameters are stored in the preprocessing slot of the description slot of the exprSet object that is returned, in the form of a list with three elements

If intensities has class exprSet, and its description slot has class MIAME, then this list is appended to any existing entries in the preprocessing slot. Otherwise, the description object and its preprocessing slot are created.

By default, if cvg.check is NULL, the function will run the fixed number niter of iterations in the least trimmed sum of squares regression. More fine-grained control can be obtained by passing a list with elements eps and n. If the maximum change between transformed data values is smaller than eps for n subsequent iterations, then the iteration terminates.

Value

An object of class exprSet. Differences between the columns of the transformed intensities may be interpreted as "regularized" or "shrunken" log-ratios. For the calibration and transformation parameters, see the Details section.

Author(s)

Wolfgang Huber http://www.dkfz.de/mga/whuber

References

Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, Martin Vingron; Bioinformatics (2002) 18 Suppl.1 S96-S104.

Parameter estimation for the calibration and variance stabilization of microarray data, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, and Martin Vingron; Statistical Applications in Genetics and Molecular Biology (2003) Vol. 2 No. 1, Article 3. http://www.bepress.com/sagmb/vol2/iss1/art3.

See Also

exprSet-class, MIAME-class, normalize.AffyBatch.vsn

Examples

data(kidney)

if(interactive()) {
  x11(width=9, height=4.5)
  par(mfrow=c(1,2))
}
plot(log.na(exprs(kidney)), pch=".", main="log-log")

vsnkid = vsn(kidney)   ## transform and calibrate
plot(exprs(vsnkid), pch=".", main="h-h")

if (interactive()) {
  x11(width=9, height=4)
  par(mfrow=c(1,3))
}

meanSdPlot(vsnkid)
vsnPlotPar(vsnkid, "factors")
vsnPlotPar(vsnkid, "offsets")

## this should always hold true
params = preproc(description(vsnkid))$vsnParams
stopifnot(all(vsnh(exprs(kidney), params) == exprs(vsnkid))) 

[Package Contents]