h5mread {HDF5Array} | R Documentation |
rhdf5::h5read
h5mread
is the result of experimenting with alternative
rhdf5::h5read
implementations.
It should still be considered experimental!
h5mread(filepath, name, starts, counts=NULL, noreduce=FALSE, as.integer=FALSE, method=0L) get_h5mread_returned_type(filepath, name, as.integer=FALSE)
filepath |
The path (as a single character string) to the HDF5 file where the dataset to read from is located. |
name |
The name of the dataset in the HDF5 file. |
starts, counts |
2 lists specifying the array selection. The 2 lists must have one list element per dimension in the dataset. Each list element in Each list element in If Finally note that when |
noreduce |
TODO |
as.integer |
TODO |
method |
TODO |
COMING SOON...
An array for h5mread
.
The type of the array that will be returned by h5mread
for
get_h5mread_returned_type
.
Equivalent to:
typeof(h5mread(filepath, name, rep(list(integer(0)), ndim)))
where ndim
is the number of dimensions (a.k.a. the rank
in hdf5 jargon) of the dataset. get_h5mread_returned_type
is
provided for convenience.
h5read
in the rhdf5.
type
in the DelayedArray
package.
extract_array
in the DelayedArray
package.
The TENxBrainData
dataset (in the
TENxBrainData package).
## --------------------------------------------------------------------- ## BASIC USAGE ## --------------------------------------------------------------------- m0 <- matrix((runif(600) - 0.5) * 10, ncol=12) M0 <- writeHDF5Array(m0, name="M0") m <- h5mread(path(M0), "M0", starts=list(NULL, c(3, 12:8))) stopifnot(identical(m0[ , c(3, 12:8)], m)) m <- h5mread(path(M0), "M0", starts=list(integer(0), c(3, 12:8))) stopifnot(identical(m0[NULL , c(3, 12:8)], m)) m <- h5mread(path(M0), "M0", starts=list(1:5, NULL), as.integer=TRUE) storage.mode(m0) <- "integer" stopifnot(identical(m0[1:5, ], m)) m1 <- matrix(1:60, ncol=6) M1 <- writeHDF5Array(m1, filepath=path(M0), name="M1") h5ls(path(M1)) m <- h5mread(path(M1), "M1", starts=list(c(2, 7), NULL), counts=list(c(4, 2), NULL)) stopifnot(identical(m1[c(2:5, 7:8), ], m)) ## --------------------------------------------------------------------- ## PERFORMANCE ## --------------------------------------------------------------------- library(ExperimentHub) hub <- ExperimentHub() ## With the "sparse" TENxBrainData dataset ## --------------------------------------- fname0 <- hub[["EH1039"]] h5ls(fname0) # all datasets are 1D datasets index <- list(77 * sample(34088679, 5000, replace=TRUE)) ## h5mread() about 3x faster than h5read(): system.time(a <- h5mread(fname0, "mm10/data", index)) system.time(b <- h5read(fname0, "mm10/data", index=index)) stopifnot(identical(a, b)) index <- list(sample(1306127, 7500, replace=TRUE)) ## h5mread() about 20x faster than h5read(): system.time(a <- h5mread(fname0, "mm10/barcodes", index)) system.time(b <- h5read(fname0, "mm10/barcodes", index=index)) stopifnot(identical(a, b)) ## With the "dense" TENxBrainData dataset ## --------------------------------------- fname1 <- hub[["EH1040"]] h5ls(fname1) # "counts" is a 2D dataset index <- list(sample( 27998, 250, replace=TRUE), sample(1306127, 250, replace=TRUE)) ## h5mread() about 2x faster than h5read(): system.time(a <- h5mread(fname1, "counts", index)) system.time(b <- h5read(fname1, "counts", index=index)) stopifnot(identical(a, b)) ## The bigger the selection, the greater the speedup between ## h5read() and h5mread(): ## Not run: index <- list(sample( 27998, 1000, replace=TRUE), sample(1306127, 1000, replace=TRUE)) ## h5mread() about 8x faster than h5read() (22s vs 3min): system.time(a <- h5mread(fname1, "counts", index)) system.time(b <- h5read(fname1, "counts", index=index)) stopifnot(identical(a, b)) ## End(Not run)