xval-methods {MLInterfaces}R Documentation

support for cross-validatory machine learning with ExpressionSets

Description

support for cross-validatory machine learning with ExpressionSets

Usage

xval( data, classLab, proc, xvalMethod, group, indFun, niter,
fsFun=NULL, fsNum=NULL, decreasing=TRUE, cluster=NULL, ... )
balKfold(K)
xvalML( formula, data, proc, xvalMethod="LOO", group, indFun, niter,
fsFun=NULL, fsNum=10, decreasing=TRUE, cluster=NULL, ... )

Arguments

data instance of class ExpressionSet
formula a model formula, typically with a dot on the RHS, and response variable chosen from pData columns.
classLab character string identifying phenoData variable to label classifications
proc an MLInterfaces method that returns an instance of "classifOutput"
xvalMethod character string identifying cross-validation procedure to use: default is "LOO" (leave one out), alternatives are "LOG" (leave group out) and "FUN" (user-supplied partition extraction function, see Details below)
group a vector (length equal to number of samples) enumerating groups for LOG xval method
indFun a function that returns a set of indices to be saved as a test set; this function must have parameters data, clab, iternum; see Details
niter number of iterations for user-specified partition function to be run
fsFun function computing ranks of features for feature selection
fsNum number of features to be kept for learning in each iteration
decreasing logical, should be TRUE if fsFun provides high scores for high-performing features (e.g., is absolute value of a test statistics) and false if it provides low scores for high-performing features (e.g., p-value of a test).
cluster NULL or an S4-class object with a defined xvalLoop method. Use this to execute xval on several nodes in a computer cluster. See documentation for xvalLoop for more information
... arguments passed to the MLInterfaces generic proc
K number of partitions to be used if balKfold is used as indFun

Value

For fixed feature sets (fsFun not specified), a vector or matrix with length equal to the number of cross-validation assignments. Each element contains the label resulting from the cross-validation.
For dynamic feature sets (fsFun specified), a list with element out containing labels from cross-validations, and element fs.memory recording features used in each cross-validation.

Details

If xvalMethod is "FUN", then indFun must be a function with parameters data, clab, and iternum. This function returns indices that identify the training set for a given cross-validation iteration passed as the value of iternum. An example function is printed out when the example of this page is executed.

if fsFun is not NULL, then it must be a function with two arguments: the first can be transformed to a feature matrix (rows are objects, columns are features) and the second is a vector of class labels. The function returns a vector of scores, one for each object. The scores will be interpreted according to the value of decreasing, to select fsNum features. Thanks to Stephen Henderson of University College London for this functionality.

Note that if fsFun is non-null, then the RHS of formula will be ignored, and it is assumed that the RHS is ".". We will attempt to ameliorate this in a future revision. If you wish to subset the features in data before applying cross-validated feature selection, do this manually, not by specifying a nontrivial formula.

Examples

library(golubEsets)
data(Golub_Merge)
smallG <- Golub_Merge[200:250,]
lk1 <- xval(smallG, "ALL.AML", knnB, xvalMethod="LOO", group=as.integer(0))
table(lk1,smallG$ALL.AML)
lk2 <- xval(smallG, "ALL.AML", knnB, xvalMethod="LOG", group=as.integer(
 rep(1:8,each=9)))
table(lk2,smallG$ALL.AML)
balKfold
lk3 <- xval(smallG, "ALL.AML", knnB, xvalMethod="FUN", 0:0, indFun=balKfold(5), niter=5)
table(lk3, smallG$ALL.AML)
#
# illustrate the xval FUN method in comparison to LOO
#
LOO2 <- xval(smallG, "ALL.AML", knnB, "FUN", 0:0, function(x,y,i) {
  (1:ncol(exprs(x)))[-i] }, niter=72 )
table(lk1, LOO2)
#
# use Stephen Henderson's feature selection extensions
#
t.fun<-function(data, fac)
{
        require(genefilter)
        # deal with the integer storage of golubTrain@exprs!
        xd <- matrix(as.double(exprs(data)), nrow=nrow(exprs(data)))
        return(abs(rowttests(xd,pData(data)[[fac]], tstatOnly=FALSE)$statistic))
}
lk3f <- xval(smallG, "ALL.AML", knnB, xvalMethod="LOO", 0:0, fsFun=t.fun)
table(lk3f$out, smallG$ALL.AML)
# use MLearn xval
XXml = xvalML(ALL.AML~., smallG, "knn", "LOO")
# show that it agrees with the fB approach
table(XXml, lk1)
# use MLearn xval with feature selection
XXmlfs = xvalML(ALL.AML~., smallG, "knn", "LOO", fsFun=t.fun)
# show that it agrees with the previous approach
table(XXmlfs$out, lk3f$out)

[Package MLInterfaces version 1.10.3 Index]