clusterSingle {clusterExperiment} | R Documentation |
Given input data, this function will find clusters, based on a single specification of parameters.
## S4 method for signature 'missing,matrixOrNULL' clusterSingle(x, diss, ...) ## S4 method for signature 'matrixOrHDF5OrNULL,missing' clusterSingle(x, diss, ...) ## S4 method for signature 'SummarizedExperiment,missing' clusterSingle(x, diss, ...) ## S4 method for signature 'ClusterExperiment,missing' clusterSingle(x, replaceCoClustering = FALSE, ...) ## S4 method for signature 'SingleCellExperiment,missing' clusterSingle(x, reduceMethod = "none", nDims = defaultNDims(x, reduceMethod), whichAssay = 1, ...) ## S4 method for signature 'matrixOrHDF5OrNULL,matrixOrNULL' clusterSingle(x, diss, subsample = TRUE, sequential = FALSE, mainClusterArgs = NULL, subsampleArgs = NULL, seqArgs = NULL, isCount = FALSE, transFun = NULL, reduceMethod = c("none", listBuiltInReducedDims(), listBuiltInFilterStats()), nDims = defaultNDims(x, reduceMethod), clusterLabel = "clusterSingle", checkDiss = TRUE)
x |
the data on which to run the clustering (features in rows), or a
|
diss |
|
... |
arguments to be passed on to the method for signature
|
replaceCoClustering |
logical. Applicable if |
reduceMethod |
character A character identifying what type of dimensionality reduction to perform before clustering. Options are 1) "none", 2) one of listBuiltInReducedDims() or listBuiltInFitlerStats OR 3) stored filtering or reducedDim values in the object. |
nDims |
integer An integer identifying how many dimensions to reduce to
in the reduction specified by |
whichAssay |
numeric or character specifying which assay to use. See
|
subsample |
logical as to whether to subsample via
|
sequential |
logical whether to use the sequential strategy (see details
of |
mainClusterArgs |
list of arguments to be passed for the mainClustering
step, see help pages of |
subsampleArgs |
list of arguments to be passed to the subsampling step
(if |
seqArgs |
list of arguments to be passed to |
isCount |
if |
transFun |
a transformation function to be applied to the data. If the
transformation applied to the data creates an error or NA values, then the
function will throw an error. If object is of class
|
clusterLabel |
a string used to describe the clustering. By default it
is equal to "clusterSingle", to indicate that this clustering is the result
of a call to |
checkDiss |
logical. Whether to check whether the input |
clusterSingle
is an 'expert-oriented' function, intended to
be used when a user wants to run a single clustering and/or have a great
deal of control over the clustering parameters. Most users will find
clusterMany
more relevant. However, clusterMany
makes certain assumptions about the intention of certain combinations of
parameters that might not match the user's intent; similarly
clusterMany
does not directly take a dissimilarity matrix but
only a matrix of values x
(though a user can define a distance
function to be applied to x
in clusterMany
).
Unlike clusterMany
, most of the relevant arguments for
the actual clustering algorithms in clusterSingle
are passed to the
relevant steps via the arguments mainClusterArgs
,
subsampleArgs
, and seqArgs
. These arguments should be
named lists with parameters that match the corresponding functions:
mainClustering
,subsampleClustering
, and
seqCluster
. These functions are not meant to be called by the
user, but rather accessed via calls to clusterSingle
. But the user
can look at the help files of those functions for more information
regarding the parameters that they take.
Only certain combinations of parameters are possible for certain
choices of sequential
and subsample
. These restrictions are
documented below.
clusterFunction
for
mainClusterArgs
: The choice of subsample=TRUE
also controls
what algorithm type of clustering functions can be used in the
mainClustering step. When subsample=TRUE
, then resulting
co-clustering matrix from subsampling is converted to a dissimilarity
(specificaly 1-coclustering values) and is passed to diss
of
mainClustering
. For this reason, the ClusterFunction
object given to mainClustering
via the argument
mainClusterArgs
must take input of the form of a dissimilarity. When
subsample=FALSE
and sequential=TRUE
, the
clusterFunction
passed in clusterArgs
element of
mainClusterArgs
must define a ClusterFunction
object with
algorithmType
'K'. When subsample=FALSE
and
sequential=FALSE
, then there are no restrictions on the
ClusterFunction
and that clustering is applied directly to the input
data.
clusterFunction
for subsampleArgs
: If the
ClusterFunction
object given to the clusterArgs
of
subsamplingArgs
is missing the algorithm will use the default for
subsampleClustering
(currently "pam"). If
sequential=TRUE
, this ClusterFunction
object must be of type
'K'.
Setting k
for subsampling: If subsample=TRUE
and sequential=TRUE
, the current K of the sequential iteration
determines the 'k' argument passed to subsampleClustering
so
setting 'k=' in the list given to the subsampleArgs will not do anything
and will produce a warning to that effect (see documentation of
seqCluster
).
Setting k
for mainClustering step:
If sequential=TRUE
then the user should not set k
in the
clusterArgs
argument of mainClusterArgs
because it must be
set by the sequential code, which has a iterative reseting of the
parameters. Specifically if subsample=FALSE
, then the sequential
method iterates over choices of k
to cluster the input data. And if
subsample=TRUE
, then the k
in the clustering of
mainClustering step (assuming the clustering function is of type 'K') will
use the k
used in the subsampling step to make sure that the
k
used in the mainClustering step is reasonable.
Setting
findBestK
in mainClusterArgs
: If sequential=TRUE
and
subsample=FALSE
, the user should not set 'findBestK=TRUE' in
mainClusterArgs
. This is because in this case the sequential method
changes k
; an error message will be given if this combination of
options are set. However, if sequential=TRUE
and
subsample=TRUE
, then passing either 'findBestK=TRUE' or
'findBestK=FALSE' via mainClusterArgs
will function as expected
(assuming the clusterFunction
argument passed to
mainClusterArgs
is of type 'K'). In particular, the sequential step
will set the number of clusters k
for clustering of each subsample.
If findBestK=FALSE, that same k
will be used for mainClustering step
that clusters the resulting co-occurance matrix after subsampling. If
findBestK=TRUE, then mainClustering
will search for best k.
Note that the default 'kRange' over which mainClustering
searches when findBestK=TRUE depends on the input value of k
which
is set by the sequential method if sequential=TRUE
), see above. The
user can change kRange
to not depend on k
and to be fixed
across all of the sequential steps by setting kRange
explicitly in
the mainClusterArgs
list.
To provide a distance matrix via the argument distFunction
,
the function must be defined to take the distance of the rows of a matrix
(internally, the function will call distFunction(t(x))
. This is to
be compatible with the input for the dist
function. as.matrix
will be performed on the output of distFunction
, so if the object
returned has a as.matrix
method that will convert the output into a
symmetric matrix of distances, this is fine (for example the class
dist
for objects returned by dist
have such a method). If
distFunction=NA
, then a default distance will be calculated based on
the type of clustering algorithm of clusterFunction
. For type "K"
the default is to take dist
as the distance function. For type "01",
the default is to take the (1-cor(x))/2.
A ClusterExperiment
object if run=TRUE
.
If input was diss
, then the result is a list with values
clustering: The vector of clustering results
clusterInfo: A list with information about the parameters run in the clustering
diss: The dissimilarity matrix used in the clustering
clusterMany
to compare multiple choices of parameters,
and mainClustering
,subsampleClustering
, and
seqCluster
for the underlying functions called by
clusterSingle
.
data(simData) ## Not run: #following code takes some time. #use clusterSingle to do sequential clustering #(same as example in seqCluster only using clusterSingle ...) clusterFunction="hierarchical01",clusterArgs=list(alpha=0.1))) ## End(Not run) #use clusterSingle to do just clustering k=3 with no subsampling clustNothing <- clusterSingle(simData, subsample=FALSE, sequential=FALSE, mainClusterArgs=list(clusterFunction="pam", clusterArgs=list(k=3))) #compare to standard pam cluster::pam(t(simData),k=3,cluster.only=TRUE)