EMclust {mclust}R Documentation

BIC for Model-Based Clustering

Description

BIC for EM initialized by hierarchical clustering for parameterized Gaussian mixture models.

Usage

EMclust(data, G, emModelNames, hcPairs, subset, eps, tol, itmax, equalPro,
        warnSingular, ...)

Arguments

data A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations and columns correspond to variables.
G An integer vector specifying the numbers of mixture components (clusters) for which the BIC is to be calculated. The default is 1:9.
emModelNames A vector of character strings indicating the models to be fitted in the EM phase of clustering. Possible models:

"E" for spherical, equal variance (one-dimensional)
"V" for spherical, variable variance (one-dimensional)
"EII": spherical, equal volume
"VII": spherical, unequal volume
"EEI": diagonal, equal volume, equal shape
"VEI": diagonal, varying volume, equal shape
"EVI": diagonal, equal volume, varying shape
"VVI": diagonal, varying volume, varying shape
"EEE": ellipsoidal, equal volume, shape, and orientation
"EEV": ellipsoidal, equal volume and equal shape
"VEV": ellipsoidal, equal shape
"VVV": ellipsoidal, varying volume, shape, and orientation

The default is .Mclust\$emModelNames.
hcPairs A matrix of merge pairs for hierarchical clustering such as produced by function hc. The default is to compute a hierarchical clustering tree by applying function hc with modelName = .Mclust\$hcModelName[1] to univariate data and modelName = .Mclust\$hcModelName[2] to multivariate data or a subset as indicated by the subset argument. The hierarchical clustering results are used as starting values for EM.
subset A logical or numeric vector specifying the indices of a subset of the data to be used in the initial hierarchical clustering phase.
eps A scalar tolerance for deciding when to terminate computations due to computational singularity in covariances. Smaller values of eps allow computations to proceed nearer to singularity. The default is .Mclust\$eps.
tol A scalar tolerance for relative convergence of the loglikelihood. The default is .Mclust\$tol.
itmax An integer limit on the number of EM iterations. The default is .Mclust\$itmax.
equalPro Logical variable indicating whether or not the mixing proportions are equal in the model. The default is .Mclust\$equalPro.
warnSingular A logical value indicating whether or not a warning should be issued whenever a singularity is encountered. The default is warnSingular=FALSE.
... Provided to allow lists with elements other than the arguments can be passed in indirect or list calls with do.call.

Value

Bayesian Information Criterion for the specified mixture models numbers of clusters. Auxiliary information returned as attributes.

References

C. Fraley and A. E. Raftery (2002a). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association 97:611:631. See http://www.stat.washington.edu/mclust.

C. Fraley and A. E. Raftery (2002b). MCLUST:Software for model-based clustering, density estimation and discriminant analysis. Technical Report, Department of Statistics, University of Washington. See http://www.stat.washington.edu/mclust.

See Also

summary.EMclust, EMclustN, hc, me, mclustOptions

Examples

data(iris)
irisMatrix <- as.matrix(iris[,1:4])

irisBic <- EMclust(irisMatrix)
irisBic
plot(irisBic)

irisBic <- EMclust(irisMatrix, subset = sample(1:nrow(irisMatrix), 100))
irisBic
plot(irisBic)

[Package mclust version 2.1-11 Index]