runKmerSPMA {transite}R Documentation

k-mer-based Spectrum Motif Analysis

Description

SPMA helps to illuminate the relationship between RBP binding evidence and the transcript sorting criterion, e.g., fold change between treatment and control samples.

Usage

runKmerSPMA(background.set, motifs = NULL, k = 6, n.bins = 40,
  max.model.degree = 1, max.cs.permutations = 1e+07,
  min.cs.permutations = 5000, fg.permutations = 5000,
  p.adjust.method = "BH", p.combining.method = "fisher", n.cores = 1)

Arguments

background.set

character vector of ranked sequences, either DNA (only containing upper case characters A, C, G, T) or RNA (A, C, G, U). The sequences in background.set must be ranked (i.e., sorted). Commonly used sorting criteria are measures of differential expression, such as fold change or signal-to-noise ratio (e.g., between treatment and control samples in gene expression profiling experiments).

motifs

a list of motifs that is used to score the specified sequences. If is.null(motifs) then all Transite motifs are used.

k

length of k-mer, either 6 for hexamers or 7 for heptamers

n.bins

specifies the number of bins in which the sequences will be divided, valid values are between 7 and 100

max.model.degree

maximum degree of polynomial

max.cs.permutations

maximum number of permutations performed in Monte Carlo test for consistency score

min.cs.permutations

minimum number of permutations performed in Monte Carlo test for consistency score

fg.permutations

numer of foreground permutations

p.adjust.method

see p.adjust

p.combining.method

one of the following: Fisher (1932) ("fisher"), Stouffer (1949), Liptak (1958) ("SL"), Mudholkar and George (1979) ("MG"), and Tippett (1931) ("tippett") (see pCombine)

n.cores

number of computing cores to use

Details

In order to investigate how motif targets are distributed across a spectrum of transcripts (e.g., all transcripts of a platform, ordered by fold change), Spectrum Motif Analysis visualizes the gradient of RBP binding evidence across all transcripts.

The k-mer-based approach differs from the matrix-based approach by how the sequences are scored. Here, sequences are broken into k-mers, i.e., oligonucleotide sequences of k bases. And only statistically significantly enriched or depleted k-mers are then used to calculate a score for each RNA-binding protein, which quantifies its target overrepresentation.

Value

A list with the following components:

foreground.scores the result of runKmerTSMA for the binned data
spectrum.info.df a data frame with the SPMA results
spectrum.plots a list of spectrum plots, as generated by scoreSpectrum
classifier.scores a list of classifier scores, as returned by spectrumClassifier

See Also

Other SPMA functions: runMatrixSPMA, scoreSpectrum, spectrumClassifier, subdivideData

Other k-mer functions: calculateKmerEnrichment, checkKmers, computeKmerEnrichment, drawVolcanoPlot, empiricalEnrichmentMeanCDF, generateKmers, generatePermutedEnrichments, homopolymerCorrection, permTestGeometricMean, runKmerTSMA

Examples

# example data set
background.df <- transite:::ge$background
# sort sequences by signal-to-noise ratio
background.df <- dplyr::arrange(background.df, value)
# character vector of named and ranked (by signal-to-noise ratio) sequences
background.set <- gsub("T", "U", background.df$seq)
names(background.set) <- paste0(background.df$refseq, "|",
  background.df$seq.type)

results <- runKmerSPMA(background.set,
                       motifs = getMotifById("M178_0.6"),
                       n.bins = 20,
                       fg.permutations = 10)

## Not run: 
results <- runKmerSPMA(background.set)
## End(Not run)


[Package transite version 1.2.1 Index]