ebam {siggenes}R Documentation

Empirical Bayes Analysis of Microarrays

Description

Performs an Empirical Bayes Analysis of Microarrays (EBAM). It is possible to perform one and two class analyses using either a modified t-statistic or a (standardized) Wilcoxon rank statistic, and a multiclass analysis using a modified F-statistic. Moreover, this function provides a EBAM procedure for categorical data such as SNP data and the possibility to employ an user-written score function.

Usage

  ebam(x, cl, method = z.ebam, delta = 0.9, which.a0 = NULL, 
       control = ebamControl(), gene.names = dimnames(x)[[1]],
       ...)

Arguments

x

either a matrix, a data frame or an ExpressionSet object, or the output of find.a0, i.e.\ an object of class FindA0. Can also be a list (if method = chisq.ebam or method = trend.ebam). For the latter case, see chisq.ebam. If x is not a FindA0 object, then each row of x (or exprs(x), respectively) must correspond to a variable (e.g., a gene or a SNP), and each column to a sample.

cl

a specification of the class labels of the samples. Ignored if x is a FindA0 object. Needs not to be specified if x is a list.

Typically, cl is specified by a vector of length ncol(x). In the two class paired case, cl can also be a matrix with ncol(x) rows and 2 columns. If x is an ExpressionSet object, cl can also be a character string naming the column of pData(x) that contains the class labels of the samples.

In the one-class case, cl should be a vector of 1's.

In the two class unpaired case, cl should be a vector containing 0's (specifying the samples of, e.g., the control group) and 1's (specifying, e.g., the case group).

In the two class paired case, cl can be either a numeric vector or a numeric matrix. If it is a vector, then cl has to consist of the integers between -1 and -n/2 (e.g., before treatment group) and between 1 and n/2 (e.g., after treatment group), where n is the length of cl and k is paired with -k, k=1,…,n/2. If cl is a matrix, one column should contain -1's and 1's specifying, e.g., the before and the after treatment samples, respectively, and the other column should contain integer between 1 and n/2 specifying the n/2 pairs of observations.

In the multiclass case and if method = chisq.ebam or method = trend.ebam, cl should be a vector containing integers between 1 and g, where g is the number of groups. In the two latter cases, cl needs not to be specified, if x is a list. For details, see chisq.ebam.

For examples of how cl can be specified, see the manual of siggenes.

method

a character string or name specifying the method or function that should be used in the computation of the expression score z.

If method = z.ebam, a modified t- or F-statistic, respectively, will be computed as proposed by Efron et al. (2001).

If method = wilc.ebam, a (standardized) Wilcoxon sum / signed rank statistic will be used as expression score.

For an analysis of categorical data such as SNP data, method can be set to chisq.ebam. In this case, Pearson's Chi-squared statistic is computed for each row.

If the variables are ordinal and a trend test should be applied (e.g., in the two-class case, the Cochran-Armitage trend test), method = trend.ebam can be employed.

It is also possible to employ an user-written function for computing an user-specified expression score. For details, see the vignette of siggenes.

delta

a numeric vector consisting of probabilities for which the number of differentially expressed genes and the FDR should be computed, where a gene is called differentially expressed if its posterior probability is larger than Delta.

which.a0

an integer between 1 and the length of quan.a0 of find.a0. If NULL, the suggested choice of find.a0 is used. Ignored if x is a matrix, data frame or ExpressionSet object.

control

further arguments for controlling the EBAM analysis. For these arguments, see ebamControl.

gene.names

a vector of length nrow(x) specifying the names of the variables. By default, the row names of the matrix / data frame comprised by x are used.

...

further arguments of the specific EBAM methods. If method = z.ebam, see z.ebam. If method = wilc.ebam, see wilc.ebam. If method = chisq.ebam, see chisq.ebam.

Value

An object of class EBAM.

Author(s)

Holger Schwender, holger.schw@gmx.de

References

Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment. JASA, 96, 1151-1160.

Schwender, H., Krause, A., and Ickstadt, K. (2006). Identifying Interesting Genes with siggenes. RNews, 6(5), 45-50.

Storey, J.D. and Tibshirani, R. (2003). Statistical Significance for Genome-Wide Studies. Proceedings of the National Academy of Sciences, 100, 9440-9445.

See Also

EBAM-class, find.a0, z.ebam, wilc.ebam, chisq.ebam

Examples

## Not run: 
  # Load the data of Golub et al. (1999) contained in the package multtest.
  data(golub)
  
  # golub.cl contains the class labels.
  golub.cl
  
  # Perform an EBAM analysis for the two class unpaired case assuming
  # unequal variances. Specify the fudge factor a0 by the suggested
  # choice of find.a0
  find.out <- find.a0(golub, golub.cl, rand = 123)
  ebam.out <- ebam(find.out)
  ebam.out
    
  # Since a0 = 0 leads to the largest number of genes (i.e. the suggested
  # choice of a0), the following leads to the same results as the above
  # analysis (but only if the random number generator, i.e. rand, is set
  # to the same number).
  ebam.out2 <- ebam(golub, golub.cl, a0 = 0, fast = TRUE, rand = 123)
  ebam.out2

  # If fast is set to TRUE in ebam, a crude estimate of the number of
  # falsely called genes is used (see the help file for z.ebam). This
  # estimate is always employed in find.a0. 
  # The exact number is used in ebam when performing
  ebam.out3 <- ebam(golub, golub.cl, a0 = 0, rand = 123)
  ebam.out3  

  # Since this is the recommended way, we use ebam.out3 at the end of
  # the Examples section for further analyses.



  # Perform an EBAM analysis for the two class unpaired case assuming
  # equal group variances. Set a0 = 0, and use B = 50 permutations
  # of the class labels.
  ebam.out4 <- ebam(golub, golub.cl, a0 = 0, var.equal = TRUE, B = 50,
     rand = 123)
  ebam.out4
    
  # Perform an EBAM analysis for the two class unpaired cased assuming
  # unequal group variances. Use the median (i.e. the 50% quantile)
  # of the standard deviations of the genes as fudge factor a0. And
  # obtain the number of genes and the FDR if a gene is called 
  # differentially when its posterior probability is larger than
  # 0.95.
  ebam.out5 <- ebam(golub, golub.cl, quan.a0 = 0.5, delta = 0.95,
     rand = 123)
  ebam.out5
    
  # For the third analysis, obtain the number of differentially
  # expressed genes and the FDR if a gene is called differentially
  # expressed if its posterior probability is larger than 0.8, 0.85,
  # 0.9, 0.95.
  print(ebam.out3, c(0.8, 0.85, 0.9, 0.95))
    
  # Generate a plot of the posterior probabilities for delta = 0.9.
  plot(ebam.out3, 0.9)
    
  # Obtain the list of genes called differentially expressed if their
  # posterior probability is larger than 0.99, and gene-specific 
  # statistics for these variables such as their z-value and their
  # local FDR.
  summary(ebam.out3, 0.99)

## End(Not run)

[Package siggenes version 1.58.0 Index]