macluster {maanova} | R Documentation |
This function bootstraps K-means or hierarchical clusters and builds a consensus tree (consensus group for K-means) from the bootstrap result.
macluster(anovaobj, term, idx.gene, what = c("gene", "sample"), method = c("hc", "kmean"), dist.method = "correlation", hc.method = "ward", kmean.ngroups, n.perm = 100)
anovaobj |
The result object for fitting ANOVA model. |
term |
The factor (in formula) used in clustering. The expression level for this term will be used in clustering. This term has to correspond to the gene list, e.g, idx.gene in this function. The gene list should be the significant hits in testing this term. |
idx.gene |
A vector indicating the list of differential expressed genes. The expression level of these genes will be used to construct the cluster. |
what |
What to be clustered, either gene or sample. |
method |
The clustering method. Right now hierarchical clustering ("hc") and K-means ("kmean") are available. |
dist.method |
Distance measure to be used in hierarchical
clustering. Besides the methods listed in dist ,
there is a new method "correlation" (default). The "correlation"
distance equals to (1 - $r^2$), where r is the sample correlation
between observations. |
hc.method |
The agglomeration method to be used in hierarchical
clustering. See hclust for detail. |
kmean.ngroups |
The number of groups for K-means cluster. |
n.perm |
Number of bootstraps. If it is 1, this function will cluster the observed data. If it is bigger than 1, a bootstrap will be performed. |
Normally after the F test, user can select a list of differential expressed
genes. The next step is to investiagte the relationship among these
genes. Using the expression levels of these genes, the user can cluster the
genes or the samples using either hierarchical or K-means clustering
algorithm. In order to evaluate the stability of the relationship,
this function bootstraps the data, refits the model and recluster the
genes/samples. Then for a certain number of bootstrap iterations, say,
1000, we have 1000 cluster results. We can use
consensus
to build the consensus tree from
these 1000 trees.
Note that if you have a large number (say, more than 100) of genes/samples to cluster, hierarchical clustering could be very unstable. A slight change in the data can result in a big change in the tree structure. In that case, K-means will give better results.
An object of class macluster
.
Hao Wu hao@jax.org
# load in data data(paigen) # make data object with rep 2 paigen <- createData(paigen.raw, 2) # make interactive model model.int.fix <- makeModel(data=paigen, formula=~Dye+Array+Strain+Diet+Strain:Diet) # fit ANOVA model anova.int <- fitmaanova(paigen, model.int.fix) # test interaction effect ## Not run: test.int.fix <- matest(paigen, model.int.fix, term="Strain:Diet", n.perm=100) # pick significant genes - pick the genes selected by Fs test idx <- volcano(test.int.fix)$idx.Fs # do k-means cluster on genes gene.cluster <- macluster(anova.int, "Strain:Diet", idx, "gene", "kmean", kmean.ngroups=5) # get the consensus group consensus(gene.cluster, 0.5) # HC cluster on samples sample.cluster <- macluster(anova.int, "Strain:Diet", idx, "sample","hc") # get the consensus group consensus(sample.cluster, 0.5)## End(Not run)