cat.ebam {siggenes} | R Documentation |
Generates the required statistics for an Empirical Bayes Analysis of Microarrays (EBAM) of categorical data such as SNP data.
Should not be called directly, but via ebam(..., method = cat.ebam).
cat.ebam(data, cl, approx = FALSE, B = 100, check.levels = TRUE, check.for.NN = FALSE, lev = NULL, B.more = 0.1, B.max = 50000, n.subset = 10, fast = FALSE, n.interval = 139, df.ratio = 3, df.glm = NULL, rand = NA)
data |
a matrix or data frame. Each row must correspond to a variable/SNP, and each column to a sample |
cl |
a numeric vector of length ncol(data) indicating to which class
a sample belongs. Must consist of the
integers between 1 and c, where c is the number of different groups |
approx |
should the null distribution be approximated by a Chisquare-distribution? |
B |
the number of permutations used in the estimation of the null distribution, and hence, in the computation of the expected z-values. |
check.levels |
if TRUE , it will be checked if all variables/SNPs have
the same number of levels/categories |
check.for.NN |
if TRUE , it will be checked if any of the genotypes
is equal to "NN". Can be very time-consuming when the data set is high-dimensional |
lev |
numeric or character vector specifying the codings of the levels of the variables/SNPs. Must only be specified if the variables are not coded by the integers between 1 and the number of levels |
B.more |
a numeric value. If the number of all possible permutations is smaller
than or equal to (1+B.more )*B , full permutation will be done.
Otherwise, B permutations are used |
B.max |
a numeric value. If the number of all possible permutations is smaller
than or equal to B.max , B randomly selected permutations will be used
in the computation of the null distribution. Otherwise, B random draws
of the group labels are used |
n.subset |
a numeric value indicating how many permutations are considered simultaneously when computing the expected z-values |
fast |
if FALSE the exact number of permuted test scores that are
more extreme than a particular observed test score is computed for each of
the variables/SNPs. If TRUE , a crude estimate of this number is used |
n.interval |
the number of intervals used in the logistic regression with
repeated observations for estimating the ratio f0/f
(if approx = FALSE ), or in the Poisson regression used to estimate
the density of the observed z-values (if approx = TRUE ) |
df.ratio |
integer specifying the degrees of freedom of the natural cubic
spline used in the logistic regression with repeated observations. Ignored
if approx = TRUE |
df.glm |
integer specifying the degrees of freedom of the natural cubic
spline used in the Poisson regression to estimate the density of the observed
z-values. If NULL , df.glm is set to min{df.chisq, 5} ,
where df.chisq are the degrees of freedom of the Chisquare-distribution.
Ignored if approx = FALSE |
rand |
numeric value. If specified, i.e. not NA , the random number generator
will be set into a reproducible state |
For each variable, Pearson's Chi-Square statistic is computed to test if the distribution of the variable differs between several groups. Since only one null distribution is estimated for all variables as proposed in the original EBAM application of Efron et al. (2001), all variables must have the same number of levels/categories.
a list containing statistics required by ebam
This procedure will only work correctly if all SNPs/variables have the same number of levels/categories.
Holger Schwender, holger.schw@gmx.de
Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, JASA, 96, 1151-1160.
Schwender, H. (2007). Empirical Bayes Analysis of Single Nucleotide Polymorphisms. Technical Report, Department of Statistics, University of Dortmund. To appear soon.
Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany.