ebam {siggenes}R Documentation

Empirical Bayes Analysis of Microarrays

Description

Performs an Empirical Bayes Analysis of Microarrays (EBAM). It is possible to perform one and two class analyses using either a modified t-statistic or a (standardized) Wilcoxon rank statistic, and a multiclass analysis using a modified F-statistic. Moreover, this function provides a EBAM procedure for categorical data such as SNP data and the possibility to employ an user-written score function.

Usage

  ebam(x, cl, method = z.ebam, delta = 0.9, which.a0 = NULL, p0 = NA, 
       p0.estimation = c("splines", "interval", "adhoc"), lambda = NULL, 
       ncs.value = "max", use.weights = FALSE, gene.names = dimnames(x)[[1]], 
       ...)

Arguments

x either a matrix, a data frame or an ExpressionSet object, or the output of find.a0, i.e. an object of class FindA0. If x is not a FindA0 object, then each row of data (or exprs(data), respectively) must correspond to a variable (e.g. a gene), and each column to a sample.
cl a specification of the class labels of the samples. Ignored if x is a FindA0 object. Typically, cl is specified by a vector of length ncol(data). In the two class paired case, cl can also be a matrix with ncol(data) rows and 2 columns. If data is an ExpressionSet object, cl can also be a character string naming the column of pData(data) that contains the class labels of the samples.
In the one-class case, cl should be a vector of 1's.
In the two class unpaired case, cl should be a vector containing 0's (specifying the samples of, e.g., the control group) and 1's (specifying, e.g., the case group).
In the two class paired case, cl can be either a numeric vector or a numeric matrix. If it is a vector, then cl has to consist of the integers between -1 and -n/2 (e.g., before treatment group) and between 1 and n/2 (e.g., after treatment group), where n is the length of cl and k is paired with -k, k=1,...,n/2. If cl is a matrix, one column should contain -1's and 1's specifying, e.g., the before and the after treatment samples, respectively, and the other column should contain integer between 1 and n/2 specifying the n/2 pairs of observations.
In the multiclass case and if method = cat.stat, cl should be a vector containing integers between 1 and g, where g is the number of groups.
For examples of how cl can be specified, see the manual of siggenes
method a character string or name specifying the method or function that should be used in the computation of the expression score z. If method = z.ebam, a modified t- or F-statistic, respectively, will be computed as proposed by Efron et al. (2001). If method = wilc.ebam, a (standardized) Wilcoxon sum / signed rank statistic will be used as expression score. For an analysis of categorical data such as SNP data, method can be set to cat.ebam. In this case, Pearson's Chi-squared statistic is computed for each row.
It is also possible to employ an user-written function for computing an user-specified expression score. For details, see the vignette of siggenes
delta a numeric vector consisting of probabilities for which the number of differentially expressed genes and the FDR should be computed, where a gene is called differentially expressed if its posterior probability is larger than Delta
which.a0 an integer between 1 and the length of quan.a0 of find.a0. If NULL, the suggested choice of find.a0 is used. Ignored if x is a matrix, data frame or ExpressionSet object
p0 a numeric value specifying the prior probability p0 that a gene is not differentially expressed. If NA, p0 will be estimated automatically
p0.estimation either "splines" (default), "interval", or "adhoc". If "splines", the spline based method of Storey and Tibshirani (2003) is used to estimate p0. If \"adhoc" ("interval"), the adhoc (interval based) method proposed by Efron et al. (2001) is used to estimate p0
lambda a numeric vector or value specifying the lambda values used in the estimation of p0. If NULL, lambda is set to seq(0, 0.95, 0.05) if p0.estimation = "splines", and to 0.5 if p0.estimation = "interval". Ignored if p0.estimation = "adhoc". For details, see pi0.est
ncs.value a character string. Only used if p0.estimation = "splines" and lambda is a vector. Either "max" or "paper". For details, see pi0.est
use.weights should weights be used in the spline based estimation of p0? If TRUE, 1 - lambda is used as weights. For details, see pi0.est
gene.names a vector of length nrow(x) specifying the names of the variables. By default, the row names of the matrix / data frame comprised by x are used
... further arguments of the specific EBAM methods. If method = z.ebam, see z.ebam. If method = wilc.ebam, see wilc.ebam. If method = cat.ebam, see cat.ebam

Value

an object of class EBAM

Author(s)

Holger Schwender, holger.schw@gmx.de

References

Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, JASA, 96, 1151–1160.

Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany. http://www.sfb475.uni-dortmund.de/berichte/tr44-03.pdf.

Storey, J.D. and Tibshirani, R. (2003). Statistical Significance for Genome-Wide Studies. Proceedings of the National Academy of Sciences, 100, 9440–9445.

See Also

EBAM-class, find.a0, z.ebam, wilc.ebam, cat.ebam

Examples

## Not run: 
  # Load the data of Golub et al. (1999) contained in the package multtest.
  data(golub)
  
  # golub.cl contains the class labels.
  golub.cl
  
  # Perform an EBAM analysis for the two class unpaired case assuming
  # unequal variances. Specify the fudge factor a0 by the suggested
  # choice of find.a0
  find.out <- find.a0(golub, golub.cl, rand = 123)
  ebam.out <- ebam(find.out)
  ebam.out
    
  # Since a0 = 0 leads to the largest number of genes (i.e. the suggested
  # choice of a0), the following leads to the same results as the above
  # analysis (but only if the random number generator, i.e. rand, is set
  # to the same number).
  ebam.out2 <- ebam(golub, golub.cl, a0 = 0, fast = TRUE, rand = 123)
  ebam.out2

  # If fast is set to TRUE in ebam, a crude estimate of the number of
  # falsely called genes is used (see the help file for z.ebam). This
  # estimate is always employed in find.a0. 
  # The exact number is used in ebam when performing
  ebam.out3 <- ebam(golub, golub.cl, a0 = 0, rand = 123)
  ebam.out3  

  # Since this is the recommended way, we use ebam.out3 at the end of
  # the Examples section for further analyses.


  # Perform an EBAM analysis for the two class unpaired case assuming
  # equal group variances. Set a0 = 0, and use B = 50 permutations
  # of the class labels.
  ebam.out4 <- ebam(golub, golub.cl, a0 = 0, var.equal = TRUE, B = 50,
     rand = 123)
  ebam.out4
    
  # Perform an EBAM analysis for the two class unpaired cased assuming
  # unequal group variances. Use the median (i.e. the 50% quantile)
  # of the standard deviations of the genes as fudge factor a0. And
  # obtain the number of genes and the FDR if a gene is called 
  # differentially when its posterior probability is larger than
  # 0.95.
  ebam.out5 <- ebam(golub, golub.cl, quan.a0 = 0.5, delta = 0.95,
     rand = 123)
  ebam.out5
    
  # For the third analysis, obtain the number of differentially
  # expressed genes and the FDR if a gene is called differentially
  # expressed if its posterior probability is larger than 0.8, 0.85,
  # 0.9, 0.95.
  print(ebam.out3, c(0.8, 0.85, 0.9, 0.95))
    
  # Generate a plot of the posterior probabilities for delta = 0.9.
  plot(ebam.out3, 0.9)
    
  # Obtain the list of genes called differentially expressed if their
  # posterior probability is larger than 0.99, and gene-specific 
  # statistics for these variables such as their z-value and their
  # local FDR.
  summary(ebam.out3, 0.99)
## End(Not run)

[Package siggenes version 1.10.1 Index]