filter_transcripts {BANDITS} | R Documentation |
filter_transcripts
filters transcripts, before loading the data, according to estimated transcript level counts.
The function outputs a vector containing the list of transcripts which respect the filtering criteria across all samples
(i.e., min_transcript_proportion, min_transcript_counts and min_gene_counts).
filter_transcripts(gene_to_transcript, transcript_counts, min_transcript_proportion = 0.01, min_transcript_counts = 1, min_gene_counts = 10)
gene_to_transcript |
a matrix or data.frame with a list of gene-to-transcript correspondances. The first column represents the gene id, while the second one contains the transcript id. |
transcript_counts |
a matrix or data.frame, with 1 column per sample and 1 row per transcript, containing the estimated abundances for each transcript in each sample. |
min_transcript_proportion |
the minimum relative abundance (i.e., proportion) of a transcript in a gene. |
min_transcript_counts |
the minimum overall abundance of a transcript (adding counts from all samples). |
min_gene_counts |
the minimum overall abundance of a gene (adding counts from all samples). |
Transcript pre-filtering is highly suggested: it both improves the performance of the method and decreases its computational cost.
A vector containing the list of transcripts which respect the filtering criteria.
Simone Tiberi simone.tiberi@uzh.ch
filter_genes
, create_data
, BANDITS_data
# specify the directory of the internal data: data_dir = system.file("extdata", package = "BANDITS") # load gene_to_transcript matching: data("gene_tr_id", package = "BANDITS") # Load the transcript level estimated counts via tximport: library(tximport) quant_files = file.path(data_dir, "STAR-salmon", paste0("sample", seq_len(4)), "quant.sf") txi = tximport(files = quant_files, type = "salmon", txOut = TRUE) counts = txi$counts # transcript pre-filtering: transcripts_to_keep = filter_transcripts(gene_to_transcript = gene_tr_id, transcript_counts = counts, min_transcript_proportion = 0.01, min_transcript_counts = 10, min_gene_counts = 20) head(transcripts_to_keep)