sbeaMethods {EnrichmentBrowser} | R Documentation |
This is the main function for the enrichment analysis of gene sets. It implements and wraps existing implementations of several frequently used methods and allows a flexible inspection of resulting gene set rankings.
sbeaMethods() sbea(method = EnrichmentBrowser::sbeaMethods(), se, gs, alpha = 0.05, perm = 1000, padj.method = "none", out.file = NULL, browse = FALSE, ...)
method |
Set-based enrichment analysis method. Currently, the following set-based enrichment analysis methods are supported: ‘ora’, ‘safe’, ‘gsea’, ‘padog’, ‘roast’, ‘camera’, ‘gsa’, ‘gsva’, ‘globaltest’, ‘samgs’, ‘ebm’, and ‘mgsa’. For basic ora also set 'perm=0'. Default is ‘ora’. This can also be the name of a user-defined function implementing set-based enrichment. See Details. |
se |
Expression dataset. An object of class
Additional optional annotations:
|
gs |
Gene sets. Either a list of gene sets (character vectors of gene IDs) or a text file in GMT format storing all gene sets under investigation. |
alpha |
Statistical significance level. Defaults to 0.05. |
perm |
Number of permutations of the sample group assignments. Defaults to 1000. For basic ora set 'perm=0'. Using method="gsea" and 'perm=0' invokes the permutation approximation from the npGSEA package. |
padj.method |
Method for adjusting nominal gene set p-values to
multiple testing. For available methods see the man page of the stats
function |
out.file |
Optional output file the gene set ranking will be written to. |
browse |
Logical. Should results be displayed in the browser for interactive exploration? Defaults to FALSE. |
... |
Additional arguments passed to individual sbea methods. This includes currently for ORA and MGSA:
|
'ora': overrepresentation analysis, simple and frequently used test based on the hypergeometric distribution (see Goeman and Buhlmann, 2007, for a critical review).
'safe': significance analysis of function and expression, generalization of ORA, includes other test statistics, e.g. Wilcoxon's rank sum, and allows to estimate the significance of gene sets by sample permutation; implemented in the safe package (Barry et al., 2005).
'gsea': gene set enrichment analysis, frequently used and widely accepted, uses a Kolmogorov-Smirnov statistic to test whether the ranks of the p-values of genes in a gene set resemble a uniform distribution (Subramanian et al., 2005).
'padog': pathway analysis with down-weighting of overlapping genes, incorporates gene weights to favor genes appearing in few pathways versus genes that appear in many pathways; implemented in the PADOG package.
'roast': rotation gene set test, uses rotation instead of permutation for assessment of gene set significance; implemented in the limma and edgeR packages for microarray and RNA-seq data, respectively.
'camera': correlation adjusted mean rank gene set test, accounts for inter-gene correlations as implemented in the limma and edgeR packages for microarray and RNA-seq data, respectively.
'gsa': gene set analysis, differs from GSEA by using the maxmean statistic, i.e. the mean of the positive or negative part of gene scores in the gene set; implemented in the GSA package.
'gsva': gene set variation analysis, transforms the data from a gene by sample matrix to a gene set by sample matrix, thereby allowing the evaluation of gene set enrichment for each sample; implemented in the GSVA package.
'globaltest': global testing of groups of genes, general test of groups of genes for association with a response variable; implemented in the globaltest package.
'samgs': significance analysis of microarrays on gene sets, extends the SAM method for single genes to gene set analysis (Dinu et al., 2007).
'ebm': empirical Brown's method, combines $p$-values of genes in a gene set using Brown's method to combine $p$-values from dependent tests; implemented in the EmpiricalBrownsMethod package.
'mgsa': model-based gene set analysis, Bayesian modeling approach taking set overlap into account by working on all sets simultaneously, thereby reducing the number of redundant sets; implemented in the mgsa package.
It is also possible to use additional set-based enrichment methods. This requires to implement a function that takes 'se', 'gs', 'alpha', and 'perm' as arguments and returns a numeric vector 'ps' storing the resulting p-value for each gene set in 'gs'. This vector must be named accordingly (i.e. names(ps) == names(gs)). See examples.
sbeaMethods: a character vector of currently supported methods;
sbea: if(is.null(out.file)): an enrichment analysis result object that can
be detailedly explored by calling eaBrowse
and from which a
flat gene set ranking can be extracted by calling gsRanking
.
If 'out.file' is given, the ranking is written to the specified file.
Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>
Goeman and Buhlmann (2007) Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics, 23, 980-7.
Barry et al. (2005) Significance Analysis of Function and Expression. Bioinformatics, 21:1943-9.
Subramanian et al. (2005) Gene Set Enrichment Analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA, 102:15545-50.
Dinu et al. (2007) Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics, 8:242
Input: readSE
, probe2gene
getGenesets
to retrieve gene sets from databases such as GO
and KEGG.
Output: gsRanking
to retrieve the ranked list of gene sets.
eaBrowse
for exploration of resulting gene sets.
Other: nbea
to perform network-based enrichment analysis.
combResults
to combine results from different methods.
# currently supported methods sbeaMethods() # (1) expression data: # simulated expression values of 100 genes # in two sample groups of 6 samples each se <- makeExampleData(what="SE") se <- deAna(se) # (2) gene sets: # draw 10 gene sets with 15-25 genes gs <- makeExampleData(what="gs", gnames=names(se)) # (3) make 2 artificially enriched sets: sig.genes <- names(se)[rowData(se)$ADJ.PVAL < 0.1] gs[[1]] <- sample(sig.genes, length(gs[[1]])) gs[[2]] <- sample(sig.genes, length(gs[[2]])) # (4) performing the enrichment analysis ea.res <- sbea(method="ora", se=se, gs=gs, perm=0) # (5) result visualization and exploration gsRanking(ea.res) # using your own tailored function as enrichment method dummySBEA <- function(se, gs, alpha, perm) { sig.ps <- sample(seq(0, 0.05, length=1000), 5) nsig.ps <- sample(seq(0.1, 1, length=1000), length(gs)-5) ps <- sample(c(sig.ps, nsig.ps), length(gs)) names(ps) <- names(gs) return(ps) } ea.res2 <- sbea(method=dummySBEA, se=se, gs=gs) gsRanking(ea.res2)