CreateBrick {HiCBricks}R Documentation

Create the entire HDF5 structure and load the bintable

Description

CreateBrick creates the complete HDF5 on-disk data structure

Usage

CreateBrick(ChromNames, BinTable, bin.delim = "\t", col.index = c(1,
    2, 3), impose.discontinuity = TRUE, ChunkSize = NULL,
    Output.Filename, exec = "cat", remove.existing = FALSE)

Arguments

ChromNames

Required A character vector containing the chromosomes to be considered for the dataset. This string is used to verify the presence of all chromosomes in the provided bitable.

BinTable

Required A string containing the path to the file to load as the binning table for the Hi-C experiment. The number of entries per chromosome defines the dimension of the associated Hi-C data matrices. For example, if chr1 contains 250 entries in the binning table, the cis Hi-C data matrix for chr1 will be expected to contain 250 rows and 250 cols. Similary, if the same binning table contained 150 entries for chr2, the trans Hi-C matrices for chr1,chr2 will be a matrix with dimension 250 rows and 150 cols.

There are no constraints on the bintable format. As long as the table is in a delimited format, the corresponding table columns can be outlined with the associated parameters. The columns of importance are chr, start and end.

It is recommended to always use binning tables where the end and start of consecutive ranges are not the same. If they are the same, this may lead to unexpected behaviour when using the GenomicRanges "any" overlap function.

bin.delim

Optional. Defaults to tabs. A character vector of length 1 specifying the delimiter used in the file containing the binning table.

col.index

Optional. Default "c(1,2,3)". A character vector of length 3 containing the indexes of the required columns in the binning table. the first index, corresponds to the chr column, the second to the start column and the third to the end column.

impose.discontinuity

Optional. Default TRUE. If TRUE, this parameter ensures a check to make sure that required the end and start coordinates of consecutive entries are not the same per chromosome.

ChunkSize

Optional. A numeric vector of length 1. If provided, the HDF dataset will use this value as the chunk size, for all matrices. By default, the ChunkSize is set to matrix dimensions/100.

Output.Filename

Required A string specifying the location and name of the HDF file to create. If path is not provided, it will be created in the Bioc File cache. Otherwise, it will be created in the specified directory and tracked via Bioc File Cache.

exec

Optional. Default cat. A string specifying the program or expression to use for reading the file. For bz2 files, use bzcat and for gunzipped files use zcat.

remove.existing

Optional. Default FALSE. If TRUE, will remove the HDF file with the same name and create a new one. By default, it will not replace existing files.

Details

This function creates the complete HDF data structure, loads the binning table associated to the Hi-C experiment and creates (for now) a 2D matrix layout for all chromosome pairs. Please note, the binning table must be a discontinuous one (first range end != secode range start), as ranges overlaps using the "any" form will routinely identify adjacent ranges with the same end and start to be in the overlap. Therefore, this criteria is enforced as default behaviour.

The structure of the HDF file is as follows: The structure contains three major groups which are then hierarchically nested with other groups to finally lead to the corresponding datasets.

Value

This function will generate the target Brick file. Upon completion, the function will provide the path to the created/tracked HDF file.

Examples

Bintable.path <- system.file("extdata",
"Bintable_40kb.txt", package = "HiCBricks")
Chromosomes <- "chr19"
Path_to_cached_file <- CreateBrick(ChromNames = Chromosomes,
  BinTable = Bintable.path, bin.delim = " ",
  Output.Filename = file.path(tempdir(),"test.hdf"), exec = "cat",
  remove.existing = TRUE)

## Not run: 
Bintable.path <- system.file("extdata",
"Bintable_40kb.txt", package = "HiCBricks")
Chromosomes <- c("chr19", "chr20", "chr22", "chr21")
Path_to_cached_file <- CreateBrick(ChromNames = Chromosomes,
BinTable = Bintable.path, impose.discontinuity=TRUE,
col.index = c(1,2,3), Output.Filename = file.path(tempdir(),"test.hdf"),
exec = "cat", remove.existing = TRUE)

This will cause an error as the file located at Bintable.path,
contains coordinates for only chromosome 19. For this code to work, either
all other chromosomes need to be removed from the Chromosomes variable or
coordinate information for the other chromosomes need to be provided.

Similarly vice-versa is also true. If the Bintable contains data for other
chromosomes, but they were not listed in ChromNames, this will cause an
error.

Keep in mind that if the end coordinates and start coordinates of adjacent
ranges are not separated by at least a value of 1, then
impose.discontinuity = TRUE will likely cause an error to occur.
This may seem obnoxious, but GenomicRanges by default will consider an
overlap of 1 bp as an overlap. Therefore, to be certain that ranges which
should not be, are not being targeted during retrieval operations, a check
is initiated to make sure that adjacent ends and starts are not
overlapping.
To load continuous ranges, use impose.discontinuity = FALSE.

Also note, that col.index determines which columns to use for chr, start
and end. Therefore, the original binning table may have 10 or 20 columns,
but it only requires the first three in order of chr, start and end.

## End(Not run)


[Package HiCBricks version 1.2.0 Index]