extract_UTR3Anno {InPAS} | R Documentation |
extract 3' UTR information from a GenomicFeatures::TxDb object. The 3'UTR is defined as the last 3'UTR fragment for each transcript and it will be cut if there is any overlaps with other exons.
extract_UTR3Anno( TxDb = NULL, edb = NULL, removeScaffolds = FALSE, MAX_EXONS_GAP = 10000 )
TxDb |
an object of GenomicFeatures::TxDb |
edb |
An object of ensembldb::EnsDb |
removeScaffolds |
A logical(1) vector, whether the scaffolds should be removed from the genome If you use a TxDb containing alternative scaffolds, you'd better to remove the scaffolds. |
MAX_EXONS_GAP |
An integer(1) vector, maximal gap sizes between last known CP sites to downstream exons |
A good practice is to perform read alignment using a reference genome from Ensembl/GenCode including only the primary assembly and build a TxDb using the GTF/GFF files downloaded from the same source as the reference genome, such as BioMart/Ensembl/GenCode. For instruction, see Vignette of the GenomicFeatures. The UCSC reference genomes and their annotation can be very cubersome.
An object of GenomicRanges::GRangesList, containing GRanges for extracted 3' UTRs, and the corresponding last CDSs and next.exon.gap for each chromosome/scaffold.
Jianhong Ou, Haibo Liu
library("EnsDb.Hsapiens.v86") library("GenomicFeatures") samplefile <- system.file("extdata", "hg19_knownGene_sample.sqlite", package = "GenomicFeatures") TxDb <- loadDb(samplefile) edb <- EnsDb.Hsapiens.v86 utr3 <- extract_UTR3Anno(TxDb, edb, removeScaffolds = TRUE, MAX_EXONS_GAP = 10000)