Q2 {pcaMethods} | R Documentation |
Internal cross-validation can be used for estimating the level of structure in a data set and to optimise the choice of number of principal components.
Q2(object, originalData, nPcs=object@nPcs, fold=5, nruncv=10)
object |
A pcaRes object (result from previous PCA
analysis.) |
originalData |
The matrix used to obtain the pcaRes object |
nPcs |
The amount of principal components to estimate Q2 for. |
fold |
The amount of groups to divide the data in. |
nruncv |
The amount of times to repeat the whole cross-validation |
This method calculates Q^2 for a PCA model. This is the predictory version of R^2 and can be interpreted as the ratio of variance in a left out data chunk that can be estimated by the PCA model. Poor (low) Q^2 means that the PCA model only describes noise and that the model is unrelated to the true data structure. The definition of Q^2 is:
Q^2 = 1 - sum_i^k sum_j^n (x - hat{x})^2 / sum_i^k sum_j^n(x^2)
for the matrix x which has n rows and k columns. For a given amount of PC's x is estimated as hat{x} = TP' (T are scores and P are loadings). Though this defines the leave-one-out cross-validation this is not what is performed if fold is less than the amount of rows and/or columns.
A random set of values in the matrix are set to NA and scores and loadings are estimated without them.
A matrix with Q^2 estimates.
Wolfram Stacklies, Henning Redestig
Wold, H. (1966) Estimation of principal components and related models by iterative least squares. In Multivariate Analysis (Ed., P.R. Krishnaiah), Academic Press, NY, 391-420.
data(iris) pcIr <- pca(iris[,1:4], nPcs=2, scale="UV", method="ppca") #can only get Q2 estimats for the two first PC's q2 <- Q2(pcIr, iris[,1:4], nruncv=2) #Typically Q2 increases only very slowly after the optimal amount of PC's boxplot(q2~row(q2), xlab="Amount of PC's", ylab=expression(Q^2))