computeFeaturesCage {ORFik} | R Documentation |
If you have a txdb with correctly reassigned transcripts, use: [computeFeatures()]
computeFeaturesCage(grl, RFP, RNA = NULL, Gtf = NULL, tx = NULL, fiveUTRs = NULL, cds = NULL, threeUTRs = NULL, faFile = NULL, riboStart = 26, riboStop = 34, orfFeatures = TRUE, includeNonVarying = TRUE, grl.is.sorted = FALSE)
grl |
a |
RFP |
RiboSeq reads as GAlignment, GRanges or GRangesList object |
RNA |
RnaSeq reads as GAlignment, GRanges or GRangesList object |
Gtf |
a TxDb object of a gtf file or path to gtf, gff .sqlite etc. |
tx |
a GrangesList of transcripts, normally called from: exonsBy(Gtf, by = "tx", use.names = T) only add this if you are not including Gtf file You do not need to reassign these to the cage peaks, it will do it for you. |
fiveUTRs |
fiveUTRs as GRangesList, if you used cage-data to extend 5' utrs, remember to input CAGE assigned version and not original! |
cds |
a GRangesList of coding sequences |
threeUTRs |
a GrangesList of transcript 3' utrs, normally called from: threeUTRsByTranscript(Gtf, use.names = T) |
faFile |
a FaFile or BSgenome from the fasta file, see ?FaFile |
riboStart |
usually 26, the start of the floss interval, see ?floss |
riboStop |
usually 34, the end of the floss interval |
orfFeatures |
a logical, is the grl a list of orfs? |
includeNonVarying |
a logical, if TRUE, include all features not dependent on RiboSeq data and RNASeq data, that is: Kozak, fractionLengths, distORFCDS, isInFrame, isOverlapping and rankInTx |
grl.is.sorted |
logical (F), a speed up if you know argument grl is sorted, set this to TRUE. |
A specialized version if you don't have a correct txdb, for example with CAGE reassigned leaders while txdb is not updated. It is 2x faster for tested data. The point of this function is to give you the ability to input transcript etc directly into the function, and not load them from txdb. Each feature have a link to an article describing feature, try ?floss
a data.table with scores, each column is one score type, name of columns are the names of the scores, i.g [floss()] or [fpkm()]
Other features: computeFeatures
,
disengagementScore
,
distToCds
, distToTSS
,
entropy
, floss
,
fpkm_calc
, fpkm
,
fractionLength
,
initiationScore
,
insideOutsideORF
, isInFrame
,
isOverlapping
,
kozakSequenceScore
, orfScore
,
rankOrder
,
ribosomeReleaseScore
,
ribosomeStallingScore
,
startRegionCoverage
,
startRegion
, subsetCoverage
,
translationalEff
# a small example without cage-seq data: # we will find ORFs in the 5' utrs # and then calculate features on them ## Not run: if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19")) { library(GenomicFeatures) # Get the gtf txdb file txdbFile <- system.file("extdata", "hg19_knownGene_sample.sqlite", package = "GenomicFeatures") txdb <- loadDb(txdbFile) # Extract sequences of fiveUTRs. fiveUTRs <- fiveUTRsByTranscript(txdb, use.names = TRUE)[1:10] faFile <- BSgenome.Hsapiens.UCSC.hg19::Hsapiens # need to suppress warning because of bug in GenomicFeatures, will # be fixed soon. tx_seqs <- suppressWarnings(extractTranscriptSeqs(faFile, fiveUTRs)) # Find all ORFs on those transcripts and get their genomic coordinates fiveUTR_ORFs <- findMapORFs(fiveUTRs, tx_seqs) unlistedORFs <- unlistGrl(fiveUTR_ORFs) # group GRanges by ORFs instead of Transcripts fiveUTR_ORFs <- groupGRangesBy(unlistedORFs, unlistedORFs$names) # make some toy ribo seq and rna seq data starts <- unlistGrl(ORFik:::firstExonPerGroup(fiveUTR_ORFs)) RFP <- promoters(starts, upstream = 0, downstream = 1) score(RFP) <- rep(29, length(RFP)) # the original read widths # set RNA seq to duplicate transcripts RNA <- unlistGrl(exonsBy(txdb, by = "tx", use.names = TRUE)) computeFeaturesCage(grl = fiveUTR_ORFs, orfFeatures = TRUE, RFP = RFP, RNA = RNA, Gtf = txdb, faFile = faFile) } # See vignettes for more examples ## End(Not run)