runMatrixSPMA {transite} | R Documentation |
SPMA helps to illuminate the relationship between RBP binding evidence and the transcript sorting criterion, e.g., fold change between treatment and control samples.
runMatrixSPMA(background.set, motifs = NULL, n.bins = 40, max.model.degree = 1, max.cs.permutations = 1e+07, min.cs.permutations = 5000, max.hits = 5, threshold.method = "p.value", threshold.value = 0.25^6, max.fg.permutations = 1e+06, min.fg.permutations = 1000, e = 5, p.adjust.method = "BH", n.cores = 1, cache = paste0(tempdir(), "/sc/"))
background.set |
named character vector of ranked sequences
(only containing upper case characters A, C, G, T), where the
names are RefSeq identifiers
and sequence
type qualifiers ( |
motifs |
a list of motifs that is used to score the specified sequences.
If |
n.bins |
specifies the number of bins in which the sequences will be divided, valid values are between 7 and 100 |
max.model.degree |
maximum degree of polynomial |
max.cs.permutations |
maximum number of permutations performed in Monte Carlo test for consistency score |
min.cs.permutations |
minimum number of permutations performed in Monte Carlo test for consistency score |
max.hits |
maximum number of putative binding sites per mRNA that are counted |
threshold.method |
either |
threshold.value |
semantics of the |
max.fg.permutations |
maximum number of foreground permutations performed in Monte Carlo test for enrichment score |
min.fg.permutations |
minimum number of foreground permutations performed in Monte Carlo test for enrichment score |
e |
integer-valued stop criterion for enrichment score Monte Carlo test: aborting
permutation process after
observing |
p.adjust.method |
adjustment of p-values from Monte Carlo tests to
avoid alpha error
accumulation, see |
n.cores |
the number of cores that are used |
cache |
either logical or path to a directory where scores are cached.
The scores of each
motif are stored in a
separate file that contains a hash table with RefSeq identifiers and
sequence type
qualifiers as keys and the number of putative binding sites as values.
If |
In order to investigate how motif targets are distributed across a spectrum of transcripts (e.g., all transcripts of a platform, ordered by fold change), Spectrum Motif Analysis visualizes the gradient of RBP binding evidence across all transcripts.
The matrix-based approach skips the k-merization step of the k-mer-based approach and instead scores the transcript sequence as a whole with a position specific scoring matrix.
For each sequence in foreground and background sets and each sequence motif, the scoring algorithm evaluates the score for each sequence position. Positions with a relative score greater than a certain threshold are considered hits, i.e., putative binding sites.
By scoring all sequences in foreground and background sets, a hit count for each motif and each set is obtained, which is used to calculate enrichment values and associated p-values in the same way in which motif-compatible hexamer enrichment values are calculated in the k-mer-based approach. P-values are adjusted with one of the available adjustment methods.
An advantage of the matrix-based approach is the possibility of detecting clusters of binding sites. This can be done by counting regions with many hits using positional hit information or by simply applying a hit count threshold per sequence, e.g., only sequences with more than some number of hits are considered. Homotypic clusters of RBP binding sites may play a similar role as clusters of transcription factors.
A list with the following components:
foreground.scores | the result of scoreTranscripts
for the foreground
sets (the bins) |
background.scores | the result of scoreTranscripts
for the background
set |
enrichment.dfs | a list of data frames, returned by
calculateMotifEnrichment |
spectrum.info.df | a data frame with the SPMA results |
spectrum.plots | a list of spectrum plots, as generated by
scoreSpectrum |
classifier.scores | a list of classifier scores, as returned by
spectrumClassifier
|
Other SPMA functions: runKmerSPMA
,
scoreSpectrum
,
spectrumClassifier
,
subdivideData
Other matrix functions: calculateMotifEnrichment
,
runMatrixTSMA
,
scoreTranscriptsSingleMotif
,
scoreTranscripts
# example data set background.df <- transite:::ge$background # sort sequences by signal-to-noise ratio background.df <- dplyr::arrange(background.df, value) # character vector of named and ranked (by signal-to-noise ratio) sequences background.set <- gsub("T", "U", background.df$seq) names(background.set) <- paste0(background.df$refseq, "|", background.df$seq.type) results <- runMatrixSPMA(background.set, motifs = getMotifById("M178_0.6"), n.bins = 20, max.fg.permutations = 10000) ## Not run: results <- runMatrixSPMA(background.set) ## End(Not run)