findMutualNN {batchelor}R Documentation

Find mutual nearest neighbors

Description

Find mutual nearest neighbors (MNN) across two data sets.

Usage

findMutualNN(data1, data2, k1, k2 = k1, BNPARAM = KmknnParam(),
  BPPARAM = SerialParam())

Arguments

data1

A numeric matrix containing samples (e.g., cells) in the rows and variables/dimensions in the columns.

data2

A numeric matrix like data1 for another data set with the same variables/dimensions.

k1

Integer scalar specifying the number of neighbors to search for in data1.

k2

Integer scalar specifying the number of neighbors to search for in data2.

BNPARAM

A BiocNeighborParam object specifying the neighbour search algorithm to use.

BPPARAM

A BiocParallelParam object specifying how parallelization should be performed.

Details

The concept of a MNN pair can be explained by considering cells in each of two data sets. For each cell in data set 1, the set of k2 nearest cells in data set 2 is identified, based on the Euclidean distance in expression space. For each cell in data set 2, the set of k1 nearest cells in data set 1 is similarly identified. Two cells in different batches are considered to be MNNs if each cell is in the other's set.

The value of k can be interpreted as the minimum size of a subpopulation in each batch. Larger values allow for more MNN pairs to be obtained, which improves the stability of batch correction in fastMNN and mnnCorrect. It also increases robustness against non-orthogonality, which would otherwise result in MNN pairs being detected on the “surface” of the distribution. Obviously, though, values of k should not be too large, as this would result in MNN pairs being inappropriately identified between biologically distinct populations.

Value

A list containing the integer vectors first and second. Corresponding entries in first and second specify a MNN pair of cells from data1 and data2, respectively.

Author(s)

Aaron Lun

See Also

queryKNN for the underlying neighbor search code.

Examples

B1 <- matrix(rnorm(10000), ncol=50) # Batch 1 
B2 <- matrix(rnorm(10000), ncol=50) # Batch 2
out <- findMutualNN(B1, B2, k1=20)
head(out$first)
head(out$second)


[Package batchelor version 1.0.0 Index]