KolmogorovSmirnovSelection {ClassifyR} | R Documentation |
Ranks features by largest Kolmogorov-Smirnov distance and chooses the features which have best resubstitution performance.
## S4 method for signature 'matrix' KolmogorovSmirnovSelection(measurements, classes, ...) ## S4 method for signature 'DataFrame' KolmogorovSmirnovSelection(measurements, classes, datasetName, trainParams, predictParams, resubstituteParams, ..., selectionName = "Kolmogorov-Smirnov Test", verbose = 3) ## S4 method for signature 'MultiAssayExperiment' KolmogorovSmirnovSelection(measurements, targets = names(measurements), ...)
measurements |
Either a |
classes |
Either a vector of class labels of class |
targets |
If |
... |
Variables not used by the |
datasetName |
A name for the data set used. Stored in the result. |
trainParams |
A container of class |
predictParams |
A container of class |
resubstituteParams |
An object of class |
selectionName |
A name to identify this selection method by. Stored in the result. |
verbose |
Default: 3. A number between 0 and 3 for the amount of progress messages to give. This function only prints progress messages if the value is 3. |
Features are sorted in order of biggest distance to smallest. The top number of features is used in a classifier, to determine which number of features has the best resubstitution performance.
Data tables which consist entirely of non-numeric data cannot be analysed. If measurements
is an object of class MultiAssayExperiment
, the factor of sample classes must be stored
in the DataFrame accessible by the colData
function with column name "class"
.
An object of class SelectResult
or a list of such objects, if the classifier which
was used for determining the specified performance metric made a number of prediction varieties.
Dario Strbenac
# First 20 features have bimodal distribution for Poor class. # Other 80 features have normal distribution for both classes. set.seed(1984) genesMatrix <- sapply(1:25, function(sample) { randomMeans <- sample(c(8, 12), 20, replace = TRUE) c(rnorm(20, randomMeans, 1), rnorm(80, 10, 1)) } ) genesMatrix <- cbind(genesMatrix, sapply(1:25, function(sample) rnorm(100, 10, 1))) rownames(genesMatrix) <- paste("Gene", 1:nrow(genesMatrix)) classes <- factor(rep(c("Poor", "Good"), each = 25)) resubstituteParams <- ResubstituteParams(nFeatures = seq(5, 25, 5), performanceType = "balanced error", better = "lower") selected <- KolmogorovSmirnovSelection(genesMatrix, classes, "Example", trainParams = TrainParams(naiveBayesKernel), predictParams = PredictParams(NULL), resubstituteParams = resubstituteParams) head(selected@chosenFeatures) plotFeatureClasses(genesMatrix, classes, "Gene 13", dotBinWidth = 0.25, xAxisLabel = bquote(log[2]*'(expression)'))