topMotif {ORFik} | R Documentation |
Per leader, detect if the leader has a TOP motif at TSS (5' end of leader) TOP motif defined as: (C, then 4 pyrimidines)
topMotif(seqs, start = 1, stop = max(nchar(seqs)), return.sequence = TRUE)
seqs |
the sequences (character vector, DNAStringSet),
of 5' UTRs (leaders) start region.
seqs must be of minimum widths start - stop + 1 to be included.
|
start |
position in seqs to start at (first is 1), default 1. |
stop |
position in seqs to stop at (first is 1), default max(nchar(seqs)), that is the longest sequence length |
return.sequence |
logical, default TRUE, return as data.table with sequence as columns in addition to TOP class. If FALSE, return character vector. |
default: return.sequence == FALSE, a character vector of either TOP, C or OTHER. C means leaders started on C, Other means not TOP and did not start on C. If return.sequence == TRUE, a data.table is returned with the base per position in the motif is included as additional columns (per position called seq1, seq2 etc) and a id column called X.gene_id (with names of seqs).
## Not run: if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19")) { txdbFile <- system.file("extdata", "hg19_knownGene_sample.sqlite", package = "GenomicFeatures") #Extract sequences of Coding sequences. leaders <- loadRegion(txdbFile, "leaders") # Should update by CAGE if not already done cageData <- system.file("extdata", "cage-seq-heart.bed.bgz", package = "ORFik") leadersCage <- reassignTSSbyCage(leaders, cageData) # Get region to check seqs <- startRegionString(leadersCage, NULL, BSgenome.Hsapiens.UCSC.hg19::Hsapiens, 0, 4) topMotif(seqs) } ## End(Not run)