xval-methods {MLInterfaces} | R Documentation |
support for cross-validatory machine learning with ExpressionSets
xval( data, classLab, proc, xvalMethod, group, indFun, niter, fsFun=NULL, fsNum=NULL, decreasing=TRUE, cluster=NULL, ... ) balKfold(K) xvalML( formula, data, proc, xvalMethod="LOO", group, indFun, niter, fsFun=NULL, fsNum=10, decreasing=TRUE, cluster=NULL, ... )
data |
instance of class ExpressionSet |
formula |
a model formula, typically with a dot on the RHS,
and response variable chosen from pData columns. |
classLab |
character string identifying phenoData variable to label classifications |
proc |
an MLInterfaces method that returns an instance of
"classifOutput" |
xvalMethod |
character string identifying cross-validation procedure to use: default is "LOO" (leave one out), alternatives are "LOG" (leave group out) and "FUN" (user-supplied partition extraction function, see Details below) |
group |
a vector (length equal to number of samples) enumerating groups for LOG xval method |
indFun |
a function that returns a set of indices to be saved as a test set;
this function must have parameters data , clab , iternum ; see
Details |
niter |
number of iterations for user-specified partition function to be run |
fsFun |
function computing ranks of features for feature selection |
fsNum |
number of features to be kept for learning in each iteration |
decreasing |
logical, should be TRUE if fsFun provides high scores for high-performing features
(e.g., is absolute value of a test statistics) and false if it provides low scores
for high-performing features (e.g., p-value of a test). |
cluster |
NULL or an S4-class object with a defined
xvalLoop method. Use this to execute xval on
several nodes in a computer cluster. See documentation for
xvalLoop for more information |
... |
arguments passed to the MLInterfaces generic proc |
K |
number of partitions to be used if balKfold is used as indFun |
For fixed feature sets (fsFun
not specified),
a vector or matrix with length equal to the number of cross-validation
assignments. Each element contains the label resulting from the
cross-validation.
For dynamic feature sets (fsFun
specified), a list with element
out
containing labels from cross-validations, and element
fs.memory
recording features used in each cross-validation.
If xvalMethod
is "FUN"
, then indFun
must be a function
with parameters data
, clab
, and iternum
.
This function returns
indices that identify the training set for a given
cross-validation iteration passed as the value of iternum
. An example
function is printed out when the example of this page is executed.
if fsFun
is not NULL
, then it must be a function with two
arguments: the first can be transformed to a feature matrix (rows are objects,
columns are features) and the second is a vector of class labels.
The function returns a vector of scores, one for each object. The
scores will be interpreted according to the value of decreasing
,
to select fsNum
features. Thanks to Stephen Henderson of University
College London for
this functionality.
Note that if fsFun
is non-null, then the RHS of
formula
will be
ignored, and it is assumed that the RHS is ".". We will attempt
to ameliorate this in a future revision. If you wish to subset
the features in data
before applying cross-validated
feature selection, do this manually, not by specifying a nontrivial
formula.
library(golubEsets) data(Golub_Merge) smallG <- Golub_Merge[200:250,] lk1 <- xval(smallG, "ALL.AML", knnB, xvalMethod="LOO", group=as.integer(0)) table(lk1,smallG$ALL.AML) lk2 <- xval(smallG, "ALL.AML", knnB, xvalMethod="LOG", group=as.integer( rep(1:8,each=9))) table(lk2,smallG$ALL.AML) balKfold lk3 <- xval(smallG, "ALL.AML", knnB, xvalMethod="FUN", 0:0, indFun=balKfold(5), niter=5) table(lk3, smallG$ALL.AML) # # illustrate the xval FUN method in comparison to LOO # LOO2 <- xval(smallG, "ALL.AML", knnB, "FUN", 0:0, function(x,y,i) { (1:ncol(exprs(x)))[-i] }, niter=72 ) table(lk1, LOO2) # # use Stephen Henderson's feature selection extensions # t.fun<-function(data, fac) { require(genefilter) # deal with the integer storage of golubTrain@exprs! xd <- matrix(as.double(exprs(data)), nrow=nrow(exprs(data))) return(abs(rowttests(xd,pData(data)[[fac]], tstatOnly=FALSE)$statistic)) } lk3f <- xval(smallG, "ALL.AML", knnB, xvalMethod="LOO", 0:0, fsFun=t.fun) table(lk3f$out, smallG$ALL.AML) # use MLearn xval XXml = xvalML(ALL.AML~., smallG, "knn", "LOO") # show that it agrees with the fB approach table(XXml, lk1) # use MLearn xval with feature selection XXmlfs = xvalML(ALL.AML~., smallG, "knn", "LOO", fsFun=t.fun) # show that it agrees with the previous approach table(XXmlfs$out, lk3f$out)