Parallel analysis {scran}R Documentation

Parallel analysis for PCA

Description

Perform a parallel analysis to choose the number of principal components.

Usage

## S4 method for signature 'ANY'
parallelPCA(x, subset.row=NULL, value=c("pca", "n", "lowrank"), 
    min.rank=5, max.rank=100, niters=50, threshold=0.1, approximate=NULL, 
    irlba.args=list(), BSPARAM=ExactParam(), BPPARAM=SerialParam())

## S4 method for signature 'SingleCellExperiment'
parallelPCA(x, ..., subset.row=NULL, 
    value=c("pca", "n", "lowrank"), assay.type="logcounts", 
    get.spikes=FALSE, sce.out=TRUE)

Arguments

x

A numeric matrix of log-expression values for parallelPCA,ANY-method, or a SingleCellExperiment object containing such values for parallelPCA,SingleCellExperiment-method.

subset.row

See ?"scran-gene-selection".

value

A string specifying the type of value to return; the PCs, the number of retained components, or a low-rank approximation.

min.rank, max.rank

Integer scalars specifying the minimum and maximum number of PCs to retain.

niters

Integer scalar specifying the number of iterations to use for the parallel analysis.

threshold

Numeric scalar representing the “p-value” threshold above which PCs are to be ignored.

approximate

A logical scalar indicating whether approximate SVD should be performed via irlba.

irlba.args

A named list of additional arguments to pass to irlba when approximate=TRUE.

BSPARAM

A BiocSingularParam object specifying the algorithm to use for PCA.

BPPARAM

A BiocParallelParam object specifying how the iterations should be paralellized.

...

Further arguments to pass to denoisePCA,ANY-method.

assay.type

A string specifying which assay values to use.

get.spikes

See ?"scran-gene-selection".

sce.out

A logical scalar specifying whether a modified SingleCellExperiment object should be returned.

Details

This function performs Horn's parallel analysis to decide how many PCs to retain in a principal components analysis. Parallel analysis involves permuting the expression vector for each gene and repeating the PCA to obtain the fractions of variance explained under a random null model. The number of PCs to retain is determined by the intersection of the “fraction explained” lines on a scree plot. This is justified as discarding PCs that explain less variance than would be expected under a random model.

In practice, we discard all PCs from the first PC that has a fraction explained similar to that under the null. A PC is considered similar if the permuted fractions exceed the observed fraction in more than threshold of iterations. (For want of a better word, we have described this as a “p-value” threshold, though it is not interpretable as a measure of significance.) This is a more conservative criterion than discarding PCs with fractions below the average null fraction, which tends to overstate the rank in noisy datasets. Note that the number of PCs will be coerced to lie between min.rank and max.rank.

This function can be sped up by specifying approximate=TRUE, which will use approximate strategies for performing the PCA. Another option is to set BPPARAM to perform the iterations in parallel.

Value

For parallelPCA,ANY-method, a numeric matrix is returned containing the selected PCs (columns) for all cells (rows) if value="pca". If value="n", it will return an integer scalar specifying the number of retained components. If value="lowrank", it will return a low-rank approximation of x with the same dimensions.

For parallelPCA,SingleCellExperiment-method, the return value is the same as parallelPCA,ANY-method if sce.out=FALSE or value="n". Otherwise, a SingleCellExperiment object is returned that is a modified version of x. If value="pca", the modified object will contain the PCs as the "PCA" entry in the reducedDims slot. If value="lowrank", it will return a low-rank approximation in assays slot, named "lowrank".

In all cases, the fractions of variance explained by the first max.rank PCs will be stored as the "percentVar" attribute in the return value. Fractions of variance explained by these PCs after each permutation iteration are also recorded as a matrix in "permuted.percentVar".

Author(s)

Aaron Lun

References

Buja A and Eyuboglu N (1992). Remarks on Parallel Analysis. Multivariate Behav. Res., 27:509-40.

See Also

denoisePCA

Examples

# Mocking up some data.
ngenes <- 1000
means <- 2^runif(ngenes, 6, 10)
dispersions <- 10/means + 0.2
nsamples <- 50
counts <- matrix(rnbinom(ngenes*nsamples, mu=means, 
            size=1/dispersions), ncol=nsamples)

# Choosing the number of PCs
lcounts <- log2(counts + 1)
parallelPCA(lcounts, min.rank=0, value="n")

[Package scran version 1.12.0 Index]