build*NNGraph {scran}R Documentation

Build a nearest-neighbor graph

Description

Build a shared or k-nearest-neighbors graph for cells based on their expression profiles.

Usage

## S4 method for signature 'ANY'
buildSNNGraph(x, k=10, d=50, type=c("rank", "number"), transposed=FALSE,
    pc.approx=NULL, irlba.args=list(), subset.row=NULL, BNPARAM=KmknnParam(), 
    BSPARAM=ExactParam(), BPPARAM=SerialParam())

## S4 method for signature 'SingleCellExperiment'
buildSNNGraph(x, ..., subset.row=NULL, assay.type="logcounts", 
    get.spikes=FALSE, use.dimred=NULL)

## S4 method for signature 'ANY'
buildKNNGraph(x, k=10, d=50, directed=FALSE, transposed=FALSE, 
    pc.approx=NULL, irlba.args=list(), subset.row=NULL, BNPARAM=KmknnParam(), 
    BSPARAM=ExactParam(), BPPARAM=SerialParam())

## S4 method for signature 'SingleCellExperiment'
buildKNNGraph(x, ..., subset.row=NULL, assay.type="logcounts", 
    get.spikes=FALSE, use.dimred=NULL)

Arguments

x

A SingleCellExperiment object, or a matrix containing expression values for each gene (row) in each cell (column). If it is matrix, it can also be transposed.

k

An integer scalar specifying the number of nearest neighbors to consider during graph construction.

d

An integer scalar specifying the number of dimensions to use for the k-NN search.

type

A string specifying the type of weighting scheme to use for shared neighbors.

directed

A logical scalar indicating whether the output of buildKNNGraph should be a directed graph.

transposed

A logical scalar indicating whether x is transposed (i.e., rows are cells).

pc.approx, irlba.args

Deprecated, use BSPARAM instead.

subset.row

See ?"scran-gene-selection".

BNPARAM

A BiocNeighborParam object specifying the nearest neighbor algorithm.

BSPARAM

A BiocSingularParam object specifying the algorithm to use for PCA, if d is not NA.

BPPARAM

A BiocParallelParam object to use for parallel processing.

...

Additional arguments to pass to buildSNNGraph,ANY-method.

assay.type

A string specifying which assay values to use.

get.spikes

See ?"scran-gene-selection".

use.dimred

A string specifying whether existing values in reducedDims(x) should be used.

Details

The buildSNNGraph method builds a shared nearest-neighbour graph using cells as nodes. For each cell, its k nearest neighbours are identified based on Euclidean distances in their expression profiles. An edge is drawn between all pairs of cells that share at least one neighbour, weighted by the characteristics of the shared nearest neighbors:

More shared neighbors, or shared neighbors that are close to both cells, will generally yield larger weights.

The aim is to use the SNN graph to perform clustering of cells via community detection algorithms in the igraph package. This is faster and more memory efficient than hierarchical clustering for large numbers of cells. In particular, it avoids the need to construct a distance matrix for all pairs of cells. Only the identities of nearest neighbours are required, which can be obtained quickly with methods in the BiocNeighbors package.

The choice of k can be roughly interpreted as the minimum cluster size. Smaller values of k will generally yield smaller, more resolved clusters upon running community detection algorithms. By comparison, increasing k will increase the connectivity of the graph and make it more difficult to resolve different communities.

Note that the setting of k here is slightly different from that used in SNN-Cliq. The original implementation considers each cell to be its first nearest neighbor that contributes to k. In buildSNNGraph, the k nearest neighbours refers to the number of other cells.

The buildKNNGraph method builds a simpler k-nearest neighbour graph. Cells are again nodes, and edges are drawn between each cell and its k-nearest neighbours. No weighting of the edges is performed. In theory, these graphs are directed as nearest neighour relationships may not be reciprocal. However, by default, directed=FALSE such that an undirected graph is returned.

Value

An igraph-type graph, where nodes are cells and edges represent connections between nearest neighbors. For buildSNNGraph, these edges are weighted by the number of shared nearest neighbors. For buildKNNGraph, edges are not weighted but may be directed if directed=TRUE.

Choice of input data

In practice, PCA is performed on x to obtain the first d principal components. This is necessary in order to perform the k-NN search (done using the findKNN function) in reasonable time. By default, the first 50 components are chosen, which should retain most of the substructure in the data set. If d is NA or greater than or equal to the number of cells, no dimensionality reduction is performed.

The PCA is performed using methods the runSVD function from the BiocSingular package. To improve speed, this can be done using approximate algorithms by modifying BSPARAM, e.g., to IrlbaParam(). Approximate algorithms will converge towards the correct result but often involve some random initialization and thus are technically dependent on the session seed. For full reproducibility, users are advised to call set.seed beforehand when using this option.

Expression values in x should typically be on the log-scale, e.g., log-transformed counts. Ranks can also be used for greater robustness, e.g., from quickCluster with get.ranks=TRUE. (Dimensionality reduction is still okay when ranks are provided - running PCA on ranks is equivalent to running MDS on the distance matrix derived from Spearman's rho.)

If the input matrix x is already transposed for the ANY method, transposed=TRUE avoids an unnecessary internal transposition. A typical use case is when x contains some reduced dimension coordinates with cells in the rows. In such cases, setting transposed=TRUE and d=NA will use the input coordinates directly for graph-building.

If use.dimred is not NULL, existing PCs are used from the specified entry of reducedDims(x), and any setting of d, subset.row and get.spikes are ignored.

Author(s)

Aaron Lun

References

Xu C and Su Z (2015). Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31:1974-80

See Also

See make_graph for details on the graph output object.

See cluster_walktrap, cluster_louvain and related functions in igraph for clustering based on the produced graph.

Also see findKNN for specifics of the nearest-neighbor search.

Examples

exprs <- matrix(rnorm(100000), ncol=100)
g <- buildSNNGraph(exprs)

clusters <- igraph::cluster_fast_greedy(g)$membership
table(clusters)

[Package scran version 1.12.0 Index]