internalFunctionCheck {clusterExperiment} | R Documentation |
ClusterFunction
is a class for holding functions that can
be used for clustering in the clustering algorithms in this package.
The constructor ClusterFunction
creates an object of the
class ClusterFunction
.
internalFunctionCheck(clusterFUN, inputType, algorithmType, outputType) ClusterFunction(clusterFUN, ...) ## S4 method for signature ''function'' ClusterFunction(clusterFUN, inputType, outputType, algorithmType, inputClassifyType = NA_character_, requiredArgs = NA_character_, classifyFUN = NULL, checkFunctions = TRUE)
clusterFUN |
function passed to slot |
inputType |
character for slot |
algorithmType |
character for slot |
outputType |
character for slot |
... |
arguments passed to different methods of |
inputClassifyType |
character for slot |
requiredArgs |
character for slot |
classifyFUN |
function for slot |
checkFunctions |
logical for whether to check the input functions with
|
internalFunctionCheck
is the function that is called by the
validity check of the ClusterFunction
constructor (if
checkFunctions=TRUE
). It is available as an S3 function for the user
to be able to test their functions and debug them, which is difficult to do
with a S4 validity function.
Required arguments for clusterFUN
:
"x or
diss"either x
and/or diss
depending on inputType
. If
x
, then x
is assumed to be nfeatures x nsamples (like
assay(CEObj) would give)
"checkArgs"logical argument. If
checkArgs=TRUE
, the clusterFUN
should check if the arguments
passed in ...
are valid and return an error if not; otherwise, no
error will be given, but the check should be done and only valid arguments
in ...
passed along. This is necessary for the function to work with
clusterMany
which passes all arguments to all functions without
checking.
"cluster.only"logical argument. If
cluster.only=TRUE
, then clusterFUN
should return only the
vector of cluster assignments (or list if outputType="list"
). If
cluster.only=FALSE
then the clusterFUN
should return a named
list where one of the elements entitled clustering
contains the
vector described above (no list!); anything else needed by the
classifyFUN
to classify new data should be contained in the output
list as well. cluster.only
is set internally depending on whether
classifyFUN
will be used by subsampling or only for clustering the
final product.
"..."Any additional arguments specific to the
algorithm used by clusterFUN
should be passed via ...
and NOT
passed via arguments to clusterFUN
"Other required
arguments"clusterFUN
must also accept arguments required for its
algorithmType
(see Details below).
algorithmType
: Type "01" is for clustering functions that
expect as an input a dissimilarity matrix that takes on 0-1 values (e.g.
from subclustering) with 1 indicating more dissimilarity between samples.
"01" algorithm types must also have inputType
equal to
"diss"
. It is also generally expected that "01" algorithms use the
0-1 nature of the input to set criteria as to where to find clusters. "01"
functions must take as an argument alpha
between 0 and 1 to
determine the clusters, where larger values of alpha
require less
similarity between samples in the same cluster. "K" is for clustering
functions that require an argument k
(the number of clusters), but
arbitrary inputType
. On the other hand, "K" algorithms are assumed
to need a predetermined 'k' and are also assumed to cluster all samples to
a cluster. If not, the post-processing steps in
mainClustering
such as findBestK
and removeSil
may not operate correctly since they rely on silhouette distances.
A ClusterFunction
object.
clusterFUN
a function defining the clustering function. See details for required arguments.
inputType
a character defining what type of input clusterFUN
takes. Must be one of either "diss","X", or "either"
algorithmType
a character defining what type of clustering algorithm
clusterFUN
is. Must be one of either "01" or "K". clusterFUN
must take the corresponding required arguments (see details below).
classifyFUN
a function that takes as input new data and the output of
clusterFUN
(when cluster.only=FALSE
and results in cluster
assignments of the new data. Note that the function should assume that the
input 'x' is not the same samples that were input to the ClusterFunction
(but can assume that it is the same number of features/columns). Used in
subsampling clustering. If given value NULL
then subsampling can
only be "InSample"
, see subsampleClustering
.
inputClassifyType
the input type for the classification function (if
not NULL); like inputType
, must be one of "diss","X", or "either"
outputType
the type of output given by clusterFUN
. Must either
be "vector" or "list". If "vector" then the output should be a vector of
length equal to the number of observations with integer-valued elements
identifying them to different clusters; the vector assignments should be in
the same order as the original input of the data. Samples that are not
assigned to any cluster should be given a '-1' value. If "list", then it
must be a list equal to the length of the number of clusters, and the
elements of the list contain the indices of the samples in that cluster.
Any indices not in any of the list elements are assumed to be -1. The main
advantage of "list" is that it can preserve the order of the clusters if
the clusterFUN
desires to do so. In which case the orderBy
argument of mainClustering
can preserve this ordering
(default is to order by size).
requiredArgs
Any additional required arguments for clusterFUN
(beyond those required of all clusterFUN
, described in details).
checkFunctions
logical. If TRUE, the validity check of the
ClusterFunction
object will check the clusterFUN
with simple
toy data using the function internalFunctionCheck
.
#Use internalFunctionCheck to check possible function goodFUN<-function(x,diss,k,checkArgs,cluster.only,...){ cluster::pam(x=t(x),k=k,cluster.only=cluster.only) } #passes internal check internalFunctionCheck(goodFUN,inputType="X",algorithmType="K",outputType="vector") #Note it doesn't pass if inputType="either" because no catches for x=NULL internalFunctionCheck(goodFUN, inputType="either",algorithmType="K",outputType="vector") myCF<-ClusterFunction(clusterFUN=goodFUN, inputType="X",algorithmType="K", outputType="vector") badFUN<-function(x,diss,k,checkArgs,cluster.only,...){cluster::pam(x=x,k=k)} internalFunctionCheck(badFUN,inputType="X",algorithmType="K",outputType="vector")