write10xCounts {DropletUtils} | R Documentation |
Create a directory containing the count matrix and cell/gene annotation from a sparse matrix of UMI counts, in the format produced by the CellRanger software suite.
write10xCounts(path, x, barcodes=colnames(x), gene.id=rownames(x), gene.symbol=gene.id, gene.type="Gene Expression", overwrite=FALSE, type=c("auto", "sparse", "HDF5"), genome="unknown", version=c("2", "3"))
x |
A sparse numeric matrix of UMI counts. |
path |
A string containing the path to the output directory. |
barcodes |
A character vector of cell barcodes, one per column of |
gene.id |
A character vector of gene identifiers, one per row of |
gene.symbol |
A character vector of gene symbols, one per row of |
gene.type |
A character vector of gene types, expanded to one per row of |
overwrite |
A logical scalar specifying whether |
type |
String specifying the type of 10X format to save |
genome |
String specifying the genome for storage when |
version |
String specifying the version of the CellRanger format to produce. |
This function will try to automatically detect the desired format based on whether path
ends with ".h5"
.
If so, it assumes that path
specifies a HDF5 file path and sets type="HDF5"
.
Otherwise it will set type="sparse"
under the assumption that path
specifies a path to a directory.
Note that there were major changes in the output format for CellRanger version 3.0, to account for non-gene features such as antibody or CRISPR tags.
Users can switch to this new format using version="3"
.
See the documentation for “latest” for this new format, otherwise see “2.2” or earlier.
For type="sparse"
, a directory is produced at path
.
If version="2"
, this will contain the files "matrix.mtx"
, "barcodes.tsv"
and "genes.tsv"
.
If version="3"
, it will instead contain "matrix.mtx.gz"
, "barcodes.tsv.gz"
and "features.tsv.gz"
.
For type="HDF5"
, a HDF5 file is produced at path
containing data in column-sparse format.
If version="2"
, data are stored in the HDF5 group named genome
.
If version="3"
, data are stored in the group "matrix"
.
A TRUE
value is invisibly returned.
Aaron Lun
10X Genomics (2017). Gene-Barcode Matrices. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/2.2/output/matrices
10X Genomics (2018). Feature-Barcode Matrices. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices
10X Genomics (2018). HDF5 Gene-Barcode Matrix Format. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/2.2/advanced/h5_matrices
10X Genomics (2018). HDF5 Feature Barcode Matrix Format. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices
# Mocking up some count data. library(Matrix) my.counts <- matrix(rpois(1000, lambda=5), ncol=10, nrow=100) my.counts <- as(my.counts, "dgCMatrix") cell.ids <- paste0("BARCODE-", seq_len(ncol(my.counts))) ngenes <- nrow(my.counts) gene.ids <- paste0("ENSG0000", seq_len(ngenes)) gene.symb <- paste0("GENE", seq_len(ngenes)) # Writing this to file: tmpdir <- tempfile() write10xCounts(tmpdir, my.counts, gene.id=gene.ids, gene.symbol=gene.symb, barcodes=cell.ids) list.files(tmpdir)