runKmerTSMA {transite} | R Documentation |
Calculates the enrichment of putative binding sites in foreground sets versus a background set using k-mers to identify putative binding sites
runKmerTSMA(foreground.sets, background.set, motifs = NULL, k = 6, fg.permutations = 5000, kmer.significance.threshold = 0.01, produce.plot = TRUE, p.adjust.method = "BH", p.combining.method = "fisher", n.cores = 1)
foreground.sets |
list of foreground sets; a foreground set is a
character vector of
DNA or RNA sequences (not both) and a strict subset of the
|
background.set |
character vector of DNA or RNA sequences that constitute the background set |
motifs |
a list of motifs that is used to score the specified sequences.
If |
k |
length of k-mer, either |
fg.permutations |
numer of foreground permutations |
kmer.significance.threshold |
p-value threshold for significance,
e.g., |
produce.plot |
if |
p.adjust.method |
see |
p.combining.method |
one of the following: Fisher (1932)
( |
n.cores |
number of computing cores to use |
Motif transcript set analysis can be used to identify RNA binding proteins, whose targets are significantly overrepresented or underrepresented in certain sets of transcripts.
The aim of Transcript Set Motif Analysis (TSMA) is to identify the overrepresentation and underrepresentation of potential RBP targets (binding sites) in a set (or sets) of sequences, i.e., the foreground set, relative to the entire population of sequences. The latter is called background set, which can be composed of all sequences of the genes of a microarray platform or all sequences of an organism or any other meaningful superset of the foreground sets.
The k-mer-based approach breaks the sequences of foreground and background sets into k-mers and calculates the enrichment on a k-mer level. In this case, motifs are not represented as position weight matrices, but as lists of k-mers.
Statistically significantly enriched or depleted k-mers are then used to calculate a score for each RNA-binding protein, which quantifies its target overrepresentation.
A list of lists with the following components:
enrichment.df | |
motif.df | |
motif.kmers.dfs | |
volcano.plots | |
perm.test.plots | |
enriched.kmers.combined.p.values | |
depleted.kmers.combined.p.values |
Other TSMA functions: drawVolcanoPlot
,
runMatrixTSMA
Other k-mer functions: calculateKmerEnrichment
,
checkKmers
,
computeKmerEnrichment
,
drawVolcanoPlot
,
empiricalEnrichmentMeanCDF
,
generateKmers
,
generatePermutedEnrichments
,
homopolymerCorrection
,
permTestGeometricMean
,
runKmerSPMA
# define simple sequence sets for foreground and background foreground.set1 <- c( "CAACAGCCUUAAUU", "CAGUCAAGACUCC", "CUUUGGGGAAU", "UCAUUUUAUUAAA", "AAUUGGUGUCUGGAUACUUCCCUGUACAU", "AUCAAAUUA", "AGAU", "GACACUUAAAGAUCCU", "UAGCAUUAACUUAAUG", "AUGGA", "GAAGAGUGCUCA", "AUAGAC", "AGUUC", "CCAGUAA" ) foreground.set2 <- c("UUAUUUA", "AUCCUUUACA", "UUUUUUU", "UUUCAUCAUU") foreground.sets <- list(foreground.set1, foreground.set2) background.set <- unique(c(foreground.set1, foreground.set2, c( "CCACACAC", "CUCAUUGGAG", "ACUUUGGGACA", "CAGGUCAGCA", "CCACACCGG", "GUCAUCAGU", "GUCAGUCC", "CAGGUCAGGGGCA" ))) # run k-mer based TSMA with all Transite motifs (recommended): # results <- runKmerTSMA(foreground.sets, background.set) # run TSMA with one motif: motif.db <- getMotifById("M178_0.6") results <- runKmerTSMA(foreground.sets, background.set, motifs = motif.db) ## Not run: # define example sequence sets for foreground and background foreground.set1 <- gsub("T", "U", transite:::ge$foreground1$seq) foreground.set2 <- gsub("T", "U", transite:::ge$foreground2$seq) foreground.sets <- list(foreground.set1, foreground.set2) background.set <- gsub("T", "U", transite:::ge$background$seq) # run TSMA with all Transite motifs results <- runKmerTSMA(foreground.sets, background.set) # run TSMA with a subset of Transite motifs results <- runKmerTSMA(foreground.sets, background.set, motifs = getMotifByRBP("ELAVL1")) # run TSMA with user-defined motif toy.motif <- createKmerMotif( "toy.motif", "example RBP", c("AACCGG", "AAAACG", "AACACG"), "example type", "example species", "user" ) results <- runMatrixTSMA(foreground.sets, background.set, motifs = list(toy.motif)) ## End(Not run)