TileDBArray 1.2.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.08558684 -0.23824940 1.07160296 . -0.415731505 -0.653949442
## [2,] -1.25072083 0.89806277 -1.06438170 . -0.710962795 2.065190360
## [3,] -0.55629803 -0.14605264 1.78460374 . -0.006565066 -0.751190481
## [4,] 0.91348272 -1.30433767 -1.74300142 . -1.303705163 0.872648387
## [5,] 1.54827739 0.26238670 0.40473323 . -0.925468460 0.807441073
## ... . . . . . .
## [96,] 1.20491839 -0.97168340 0.64287760 . 1.7532490 0.7418432
## [97,] 1.15405991 -2.83621125 -0.66582599 . 0.8785627 0.5315016
## [98,] -0.06443278 0.04636555 -0.37620043 . 0.6608582 0.3736762
## [99,] 0.05150205 -0.33325002 1.79960554 . -0.2390331 1.9413084
## [100,] -0.49529480 0.47791139 0.56975905 . -0.7330675 -0.2775456
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 0.08558684 -0.23824940 1.07160296 . -0.415731505 -0.653949442
## [2,] -1.25072083 0.89806277 -1.06438170 . -0.710962795 2.065190360
## [3,] -0.55629803 -0.14605264 1.78460374 . -0.006565066 -0.751190481
## [4,] 0.91348272 -1.30433767 -1.74300142 . -1.303705163 0.872648387
## [5,] 1.54827739 0.26238670 0.40473323 . -0.925468460 0.807441073
## ... . . . . . .
## [96,] 1.20491839 -0.97168340 0.64287760 . 1.7532490 0.7418432
## [97,] 1.15405991 -2.83621125 -0.66582599 . 0.8785627 0.5315016
## [98,] -0.06443278 0.04636555 -0.37620043 . 0.6608582 0.3736762
## [99,] 0.05150205 -0.33325002 1.79960554 . -0.2390331 1.9413084
## [100,] -0.49529480 0.47791139 0.56975905 . -0.7330675 -0.2775456
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0.00 0.22
## [997,] 0 0 0 . 0.00 0.00
## [998,] 0 0 0 . 0.00 0.00
## [999,] 0 0 0 . 0.00 0.00
## [1000,] 0 0 0 . 0.00 0.00
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE TRUE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.08558684 -0.23824940 1.07160296 . -0.415731505 -0.653949442
## GENE_2 -1.25072083 0.89806277 -1.06438170 . -0.710962795 2.065190360
## GENE_3 -0.55629803 -0.14605264 1.78460374 . -0.006565066 -0.751190481
## GENE_4 0.91348272 -1.30433767 -1.74300142 . -1.303705163 0.872648387
## GENE_5 1.54827739 0.26238670 0.40473323 . -0.925468460 0.807441073
## ... . . . . . .
## GENE_96 1.20491839 -0.97168340 0.64287760 . 1.7532490 0.7418432
## GENE_97 1.15405991 -2.83621125 -0.66582599 . 0.8785627 0.5315016
## GENE_98 -0.06443278 0.04636555 -0.37620043 . 0.6608582 0.3736762
## GENE_99 0.05150205 -0.33325002 1.79960554 . -0.2390331 1.9413084
## GENE_100 -0.49529480 0.47791139 0.56975905 . -0.7330675 -0.2775456
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## 0.08558684 -1.25072083 -0.55629803 0.91348272 1.54827739 -1.00930170
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 0.08558684 -0.23824940 1.07160296 0.02055645 -2.17386330
## GENE_2 -1.25072083 0.89806277 -1.06438170 -1.64767241 -0.07318863
## GENE_3 -0.55629803 -0.14605264 1.78460374 -0.72996027 -0.41837908
## GENE_4 0.91348272 -1.30433767 -1.74300142 -0.13629030 -0.67932707
## GENE_5 1.54827739 0.26238670 0.40473323 0.04442898 -0.86158474
out * 2
## <100 x 10> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 0.1711737 -0.4764988 2.1432059 . -0.83146301 -1.30789888
## GENE_2 -2.5014417 1.7961255 -2.1287634 . -1.42192559 4.13038072
## GENE_3 -1.1125961 -0.2921053 3.5692075 . -0.01313013 -1.50238096
## GENE_4 1.8269654 -2.6086753 -3.4860028 . -2.60741033 1.74529677
## GENE_5 3.0965548 0.5247734 0.8094665 . -1.85093692 1.61488215
## ... . . . . . .
## GENE_96 2.4098368 -1.9433668 1.2857552 . 3.5064981 1.4836863
## GENE_97 2.3081198 -5.6724225 -1.3316520 . 1.7571255 1.0630032
## GENE_98 -0.1288656 0.0927311 -0.7524009 . 1.3217165 0.7473524
## GENE_99 0.1030041 -0.6665000 3.5992111 . -0.4780661 3.8826168
## GENE_100 -0.9905896 0.9558228 1.1395181 . -1.4661349 -0.5550912
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## -2.7797136 -19.3658937 6.3568918 -11.3904402 -18.4925590 -7.4982862
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## -10.5728696 -0.5768203 0.7484583 -15.9789032
out %*% runif(ncol(out))
## <100 x 1> matrix of class DelayedMatrix and type "double":
## y
## GENE_1 -1.5693970
## GENE_2 -2.3627220
## GENE_3 0.2901704
## GENE_4 -3.0090484
## GENE_5 -1.2493515
## ... .
## GENE_96 -0.4486141
## GENE_97 -0.4739330
## GENE_98 -0.9677075
## GENE_99 1.3373617
## GENE_100 -0.4490143
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.03847059 -1.06872978 0.97080123 . -0.97115430 0.77182350
## [2,] -0.32146203 -0.16199033 -0.38235846 . 0.78729902 -0.08254158
## [3,] 0.14292865 -0.18012447 0.11291112 . -1.56051270 -0.83091658
## [4,] 0.11517908 -1.17587333 -0.16267531 . 0.79293255 -1.94135840
## [5,] 0.30567423 0.26450265 2.11886882 . -0.26579077 0.41886510
## ... . . . . . .
## [96,] -1.1069001 2.1786036 -0.1467080 . 0.03801506 1.02465438
## [97,] -1.3452935 -1.6491208 -0.3424179 . -0.07238739 -0.29859645
## [98,] 0.3185266 -0.4539266 -0.1830422 . 1.32908193 -0.59929428
## [99,] 0.7550147 -0.3928649 1.1072289 . -0.48274006 -0.16760091
## [100,] -0.3746986 0.3052312 0.7549379 . -0.46261464 0.39592268
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.03847059 -1.06872978 0.97080123 . -0.97115430 0.77182350
## [2,] -0.32146203 -0.16199033 -0.38235846 . 0.78729902 -0.08254158
## [3,] 0.14292865 -0.18012447 0.11291112 . -1.56051270 -0.83091658
## [4,] 0.11517908 -1.17587333 -0.16267531 . 0.79293255 -1.94135840
## [5,] 0.30567423 0.26450265 2.11886882 . -0.26579077 0.41886510
## ... . . . . . .
## [96,] -1.1069001 2.1786036 -0.1467080 . 0.03801506 1.02465438
## [97,] -1.3452935 -1.6491208 -0.3424179 . -0.07238739 -0.29859645
## [98,] 0.3185266 -0.4539266 -0.1830422 . 1.32908193 -0.59929428
## [99,] 0.7550147 -0.3928649 1.1072289 . -0.48274006 -0.16760091
## [100,] -0.3746986 0.3052312 0.7549379 . -0.46261464 0.39592268
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] TileDBArray_1.2.0 DelayedArray_0.18.0 IRanges_2.26.0
## [4] S4Vectors_0.30.0 MatrixGenerics_1.4.0 matrixStats_0.58.0
## [7] BiocGenerics_0.38.0 Matrix_1.3-3 BiocStyle_2.20.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 bslib_0.2.5.1 compiler_4.1.0
## [4] BiocManager_1.30.15 jquerylib_0.1.4 tools_4.1.0
## [7] digest_0.6.27 bit_4.0.4 jsonlite_1.7.2
## [10] evaluate_0.14 lattice_0.20-44 nanotime_0.3.2
## [13] rlang_0.4.11 RcppCCTZ_0.2.9 yaml_2.2.1
## [16] xfun_0.23 stringr_1.4.0 knitr_1.33
## [19] sass_0.4.0 bit64_4.0.5 grid_4.1.0
## [22] R6_2.5.0 rmarkdown_2.8 bookdown_0.22
## [25] tiledb_0.9.1 magrittr_2.0.1 htmltools_0.5.1.1
## [28] stringi_1.6.2 zoo_1.8-9