MbkmeansParam-class {bluster} | R Documentation |
Run the mini-batch k-means mbkmeans
function with the specified number of centers within clusterRows
.
This sacrifices some accuracy for speed compared to the standard k-means algorithm.
Note that this requires installation of the mbkmeans package.
MbkmeansParam( centers, batch_size = NULL, max_iters = 100, num_init = 1, init_fraction = NULL, initializer = "kmeans++", calc_wcss = FALSE, early_stop_iter = 10, tol = 1e-04, BPPARAM = SerialParam() ) ## S4 method for signature 'ANY,MbkmeansParam' clusterRows(x, BLUSPARAM, full = FALSE)
centers |
An integer scalar specifying the number of centers. Alternatively, a function that takes the number of observations and returns the number of centers. |
batch_size, max_iters, num_init, init_fraction, initializer, calc_wcss, early_stop_iter, tol, BPPARAM |
Further arguments to pass to |
x |
A numeric matrix-like object where rows represent observations and columns represent variables. |
BLUSPARAM |
A MbkmeansParam object. |
full |
Logical scalar indicating whether the full mini-batch k-means statistics should be returned. |
This class usually requires the user to specify the number of clusters beforehand. However, we can also allow the number of clusters to vary as a function of the number of observations. The latter is occasionally useful, e.g., to allow the clustering to automatically become more granular for large datasets.
To modify an existing MbkmeansParam object x
,
users can simply call x[[i]]
or x[[i]] <- value
where i
is any argument used in the constructor.
For batch_size
and init_fraction
, a value of NULL
means that the default arguments in the mbkmeans
function signature are used.
These defaults are data-dependent and so cannot be specified during construction of the MbkmeansParam object, but instead are defined within the clusterRows
method.
The MbkmeansParam
constructor will return a MbkmeansParam object with the specified parameters.
The clusterRows
method will return a factor of length equal to nrow(x)
containing the cluster assignments.
If full=TRUE
, a list is returned with clusters
(the factor, as above) and objects
(a list containing mbkmeans
, the direct output of mbkmeans
).
Stephanie Hicks
mbkmeans
from the mbkmeans package, which actually does all the heavy lifting.
KmeansParam, for dispatch to the standard k-means algorithm.
clusterRows(iris[,1:4], MbkmeansParam(centers=3)) clusterRows(iris[,1:4], MbkmeansParam(centers=3, batch_size=10)) clusterRows(iris[,1:4], MbkmeansParam(centers=3, init_fraction=0.5))