Mclust {mclust} | R Documentation |
Clustering via EM initialized by hierarchical clustering for parameterized Gaussian mixture models. The number of clusters and the clustering model is chosen to maximize the BIC.
Mclust(data, minG, maxG)
data |
A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables. |
minG |
An integer vector specifying the minimum number of mixture components
(clusters) to be considered.
The default is 1 component.
|
maxG |
An integer vector specifying the maximum number of mixture components
(clusters) to be considered.
The default is 9 components.
|
A list representing the best model (according to BIC) for the given range of numbers of clusters. The following components are included:
BIC |
A matrix giving the BIC value for each model (rows) and number of clusters (columns). |
bic |
A scalar giving the optimal BIC value. |
modelName |
The MCLUST name for the best model according to BIC. |
classification |
The classification corresponding to the optimal BIC value. |
uncertainty |
The uncertainty in the classification corresponding to the optimal BIC value. |
mu |
For multidimensional models, a matrix whose columns are the means of each group in the best model. For one-dimensional models, a vector whose entries are the means for each group in the best model. |
sigma |
For multidimensional models, a three dimensional array in which
sigma[,,k] gives the covariance for the kth group in
the best model. For one-dimensional models, either a scalar giving
a common variance for the groups or a vector whose entries are the
variances for each group in the best model.
|
pro |
The mixing probabilities for each component in the best model. |
z |
A matrix whose [i,k]th entry is the probability that observation i belongs to the k component in the model. The optimal classification is derived from this, chosing the class to be the one giving the maximum probability. |
loglik |
The log likelihood for the data under the best model. |
The following models are compared in Mclust
:
"E" for spherical, equal variance (one-dimensional)
"V" for spherical, variable variance (one-dimensional)
"EII": spherical, equal volume
"VII": spherical, unequal volume
"EEI": diagonal, equal volume, equal shape
"VVI": diagonal, varying volume, varying shape
"EEE": ellipsoidal, equal volume, shape, and orientation
"VVV": ellipsoidal, varying volume, shape, and orientation
Mclust
is intended to combine EMclust
and its
summary
in a simiplified one-step model-based clustering
function. The latter provide more flexibility including choice of
models.
C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97:611-631. See http://www.stat.washington.edu/mclust.
C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density estimation and discriminant analysis. Technical Report, Department of Statistics, University of Washington. See http://www.stat.washington.edu/mclust.
data(iris) irisMatrix <- as.matrix(iris[,1:4]) irisClass <- iris[,5] irisMclust <- Mclust(irisMatrix) ## Not run: plot(irisMclust,irisMatrix)