Brick_local_score_differentiator {HiCBricks} | R Documentation |
Local_score_differentiator
calls topologically associated domains on Hi-C
matrices. Local score differentiator at the most fundamental level is a
change point detector, which detects change points in the directionality
index using various thresholds defined on a local directionality index
distributions.
The directionality index (DI) is calculated as defined by Dixon et al., 2012
Nature. Next, the difference of DI is calculated between neighbouring bins to
get the change in DI distribution in each bin. When a DI value goes from a
highly negative value to a highly positive one as expected to occur at domain
boundaries, the ensuing DI difference distribution becomes a very flat
distribution interjected by very large peaks signifying regions where such
a change may take place. We use two difference vectors, one is the difference
vector between a bin and its adjacent downstream bin and another is the
difference between a bin and its adjacent upstream bin. Using these vectors,
and the original directionality index, we define domain borders as outliers.
Brick_local_score_differentiator(Brick, chrs = NULL, min.sum = -1, di.window = 200L, lookup.window = 200L, tukeys.constant = 1.5, strict = TRUE, fill.gaps = TRUE, ignore.sparse = TRUE, sparsity.threshold = 0.8, remove.empty = NULL, chunk.size = 500, force.retrieve = TRUE)
Brick |
Required. A string specifying the path to the Brick store created with CreateBrick. |
chrs |
Optional. Default NULL If present, only TAD calls for elements in chrs will be done. |
min.sum |
Optional. Default -1 Process bins in the matrix with row.sums greater than min.sum. |
di.window |
Optional. Default 200 Use di.window to define the directionality index. |
lookup.window |
Optional. Default 200 Use lookup.window local window to call borders. At smaller di.window values we recommend setting this to 2*di.window |
tukeys.constant |
Optional. Default 1.5 tukeys.constant*IQR (inter-quartile range) defines the lower and upper fence values. |
strict |
Optional. Default TRUE If TRUE, strict creates an additional filter on the directionality index requiring it to be either greater than or less than 0 on the right tail or left tail respectively. |
fill.gaps |
Optional. Default TRUE If TRUE, this will affect the TAD stiching process. All Border starts are stiched to the next downstream border ends. Therefore, at times border ends remain unassociated to a border start. These border ends are stiched to the adjacent downstream bin from their upstream border end when fill.gaps is true. TADs inferred in this way will be annotated with two metadata columns in the GRanges object. gap.fill will hold a value of 1 and level will hold a value 1. TADs which were not filled in will hold a gap.fill value of 0 and a level value of 2. |
ignore.sparse |
Optional. Default TRUE If TRUE, a matrix which has been defined as sparse during the matrix loading process will be treated as a dense matrix. The sparsity.threshold filter will not be applied. Please note, that if a matrix is defined as sparse and fill.gaps is TRUE, fill.gaps will be turned off. |
sparsity.threshold |
Optional. Default 0.8 Sparsity threshold relates to the sparsity index, which is computed as the number of non-zero bins at a certain distance from the diagonal. If a matrix is sparse and ignore.sparse is FALSE, bins which have a sparsity index value below this threshold will be discarded from DI computation. |
remove.empty |
Not implemented. After implementation, this will ensure that the presence of centromeric regions is accounted for. |
chunk.size |
Optional. Default 500 The size of the matrix chunk to process. This value should be larger than 2x di.window. |
force.retrieve |
Optional. Default TRUE If TRUE, this will force the retrieval of a matrix chunk even when the retrieval includes interaction points which were not loaded into a Brick store (larger chunks). Please note, that this does not mean that DI can be computed at distances larger than max distance. Rather, this is meant to aid faster computation. |
To define an outlier, fences are first defined. The fences are defined using tukeys.constant x inter-quartile range of the directionality index. The upper fence used for detecting domain starts is the 75th quartile + (IQR x tukeys.constant), while the lower fence is the 25th quartile - (IQR x tukeys.constant). For domain starts the DI difference must be greater than or equal to the upper fence, it must be greater than the DI and the DI must be a finite real value. If strict is TRUE, DI will also be required to be greater than 0. Similarly, for domain ends the DI difference must be lower than or equal to the lower fence, it must be lower than the DI and the DI must be a finite real value. If strict is TRUE, DI will also be required to be lower than 0.
After defining outliers, each domain start will be associated to its nearest downstream domain end. If fill.gaps is defined as TRUE and there are domain ends which remain unassociated to a domain start, These domain ends will be associated to the bin adjacent to their nearest upstream domain end. This associations will be marked by metadata columns, gap.fill= 1 and level = 1.
This function provides the capability to call very accurante TAD definitions in a very fast way.
A ranges object containing domain definitions. The starts and ends of the ranges coincide with the starts and ends of their contained bins from the bintable.
Brick.file <- system.file("extdata", "test.hdf", package = "HiCBricks") TAD_ranges <- Brick_local_score_differentiator(Brick = Brick.file, chrs = "chr19", di.window = 10, lookup.window = 30, strict = TRUE, fill.gaps = TRUE, chunk.size = 500)