Mclust {mclust}R Documentation

Model-Based Clustering

Description

Clustering via EM initialized by hierarchical clustering for parameterized Gaussian mixture models. The number of clusters and the clustering model is chosen to maximize the BIC.

Usage

Mclust(data, minG, maxG)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables.
minG An integer vector specifying the minimum number of mixture components (clusters) to be considered. The default is 1 component.
maxG An integer vector specifying the maximum number of mixture components (clusters) to be considered. The default is 9 components.

Value

A list representing the best model (according to BIC) for the given range of numbers of clusters. The following components are included:

BIC A matrix giving the BIC value for each model (rows) and number of clusters (columns).
bic A scalar giving the optimal BIC value.
modelName The MCLUST name for the best model according to BIC.
classification The classification corresponding to the optimal BIC value.
uncertainty The uncertainty in the classification corresponding to the optimal BIC value.
mu For multidimensional models, a matrix whose columns are the means of each group in the best model. For one-dimensional models, a vector whose entries are the means for each group in the best model.
sigma For multidimensional models, a three dimensional array in which sigma[,,k] gives the covariance for the kth group in the best model. For one-dimensional models, either a scalar giving a common variance for the groups or a vector whose entries are the variances for each group in the best model.
pro The mixing probabilities for each component in the best model.
z A matrix whose [i,k]th entry is the probability that observation i belongs to the k component in the model. The optimal classification is derived from this, chosing the class to be the one giving the maximum probability.
loglik The log likelihood for the data under the best model.

Details

The following models are compared in Mclust:

"E" for spherical, equal variance (one-dimensional)
"V" for spherical, variable variance (one-dimensional)

"EII": spherical, equal volume
"VII": spherical, unequal volume
"EEI": diagonal, equal volume, equal shape
"VVI": diagonal, varying volume, varying shape
"EEE": ellipsoidal, equal volume, shape, and orientation
"VVV": ellipsoidal, varying volume, shape, and orientation

Mclust is intended to combine EMclust and its summary in a simiplified one-step model-based clustering function. The latter provide more flexibility including choice of models.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97:611-631. See http://www.stat.washington.edu/mclust.

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density estimation and discriminant analysis. Technical Report, Department of Statistics, University of Washington. See http://www.stat.washington.edu/mclust.

See Also

plot.Mclust, EMclust

Examples

data(iris)
irisMatrix <- as.matrix(iris[,1:4])
irisClass <- iris[,5]
irisMclust <- Mclust(irisMatrix)

## Not run: plot(irisMclust,irisMatrix)

[Package mclust version 2.1-11 Index]