disscosangle {hopach}R Documentation

Functions to compute pair-wise distances

Description

Given a matrix X, these functions compute the nrow(X) by nrow{X} matrix of pair-wise distances between all variables (rows) in X, across all observations (columns) of X. Each function uses a different distance metric, i.e. definition of what it means for two variables to be similar.

Usage

disscosangle(X, na.rm = TRUE)

disseuclid(X, na.rm = TRUE)

disscor(X, na.rm = TRUE)

dissabscosangle(X, na.rm = TRUE)

dissabseuclid(X, na.rm = TRUE)

dissabscor(X, na.rm = TRUE)

vdisscosangle(X, y, na.rm = TRUE)

vdisseuclid(X, y, na.rm = TRUE)

vdisscor(X, y, na.rm = TRUE)

vdissabscosangle(X, y, na.rm = TRUE)

vdissabseuclid(X, y, na.rm = TRUE)

vdissabscor(X, y, na.rm = TRUE)

Arguments

X A numeric data matrix. Each column corresponds to an observation, and each row corresponds to a variable. In the gene expression context, observations are arrays and variables are genes. All values must be numeric. Missing values are ignored.
na.rm Indicator of whether to remove missing values (i.e. only compute distance over non-missing observations).
y A numeric data vector of length ncol(X).

Details

Different choices of distance metric are discussed in the references. Briefly, Euclidean distance (disseuclid) defines two variables to be close if they are similar in magnitude across observations. Correlation distance (disscor), in contrast, defines similarity to mean having the same pattern, but not necessarily the same magnitude. Cosine-angle (disscosangle) distance is a correlation distance that also accounts for magnitude. Cosine-angle distance is also known as uncentered correlation distance. The distance metrics with 'abs' in their names are absolute versions of each metric; the absolute value is applied to the data before computing the distance.

Value

A numeric nrow(X) by nrow{X} matrix of pair-wise distances between all variables (rows) in X. For the vector versions (e.g. vdisscosangle), a numeric vector of nrow(X) pair-wise distances between each variable (row) in X and the vector y.

Author(s)

Katherine S. Pollard <kpollard@soe.ucsc.edu> and Mark J. van der Laan <laan@stat.berkeley.edu>

References

van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. Journal of Statistical Planning and Inference, 2003, 117, pp. 275-303.

http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/hopach.pdf

http://www.bepress.com/ucbbiostat/paper107/

http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/jsmpaper.pdf

See Also

distancematrix

Examples

data<-matrix(rnorm(50),nr=5)
disscosangle(data)

[Package hopach version 1.4.0 Index]