d.stat {siggenes} | R Documentation |
Computes the required statistics for a Significance Analysis of Microarrays (SAM) using either a (modified) t- or F-statistic.
Should not be called directly, but via the function sam.
d.stat(data, cl, var.equal = FALSE, B = 100, med = FALSE, s0 = NA, s.alpha = seq(0, 1, 0.05), include.zero = TRUE, n.subset = 10, mat.samp = NULL, B.more = 0.1, B.max = 30000, gene.names = NULL, R.fold = 1, R.unlog = TRUE, na.replace = TRUE, na.method = "mean", rand = NA)
data |
a matrix, data frame or exprSet object. Each row of data
(or exprs(data) , respectively) must correspond to a gene, and
each column to a sample |
cl |
a numeric vector of length ncol(data) containing the class
labels of the samples. In the two class paired case, cl can also
be a matrix with ncol(data) rows and 2 columns. If data is
a exprSet object, cl can also be a character string. For details
on how cl should be specified, see ?sam |
var.equal |
if FALSE (default), Welch's t-statistic will be computed.
If TRUE , the pooled variance will be used in the computation of
the t-statistic |
B |
numeric value indicating how many permutations should be used in the estimation of the null distribution |
med |
if FALSE (default), the mean number of falsely called genes
will be computed. Otherwise, the median number is calculated |
s0 |
a numeric value specifying the fudge factor. If NA (default),
s0 will be computed automatically |
s.alpha |
a numeric vector or value specifying the quantiles of the
standard deviations of the genes used in the computation of s0 . If
s.alpha is a vector, the fudge factor is computed as proposed by
Tusher et al. (2001). Otherwise, the quantile of the standard deviations
specified by s.alpha is used as fudge factor |
include.zero |
if TRUE , s0 =0 will also be a possible choice
for the fudge factor. Hence, the usual t-statistic or F statistic, respectively,
can also be a possible choice for the expression score d. If FALSE ,
s0=0 will not be a possible choice for the fudge factor. The latter
follows Tusher et al. (2001) definition of the fudge factor in which only strictly
positive values are considered |
n.subset |
a numeric value indicating how many permutations are considered
simultaneously when computing the p-value and the number of falsely called
genes. If med=TRUE , n.subset will be set to 1 |
mat.samp |
a matrix having ncol(data) columns except for the two class
paired case in which mat.samp has ncol(data) /2 columns.
Each row specifies one permutation of the group labels used in the computation
of the expected expression scores d.bar. If not specified
(mat.samp=NULL ), a matrix having B rows and ncol(data) is
generated automatically and used in the computation of d.bar. In
the two class unpaired case and the multiclass case, each row of mat.samp
must contain the same group labels as cl . In the one class and the two
class paired case, each row must contain -1's and 1's. In the one class case,
the expression values are multiplied by these -1's and 1's. In the two class paired
case, each column corresponds to one observation pair whose difference is multiplied
by either -1 or 1. For more details and examples, see the manual of siggenes |
B.more |
a numeric value. If the number of all possible permutations is smaller
than or equal to (1+B.more )*B , full permutation will be done.
Otherwise, B permutations are used. This avoids that B permutations
will be used – and not all permutations – if the number of all possible permutations
is just a little larger than B |
gene.names |
a character vector of length nrow(data) containing the
names of the genes |
B.max |
a numeric value. If the number of all possible permutations is smaller
than or equal to B.max , B randomly selected permutations will be used
in the computation of the null distribution. Otherwise, B random draws
of the group labels are used. In the latter way of permuting it is possible that
some of the permutations are used more than once |
R.fold |
a numeric value. If the fold change of a gene is smaller than or
equal to R.fold , or larger than or equal to 1/R.fold ,respectively,
then this gene will be excluded from the SAM analysis. The expression score
d of excluded genes is set to NA . By default, R.fold
is set to 1 such that all genes are included in the SAM analysis. Setting
R.fold to 0 or a negative value will avoid the computation of the fold
change. The fold change is only computed in the two-class cases |
R.unlog |
if TRUE , the anti-log of data will be used in the computation of the
fold change. Otherwise, data is used. This transformation should be done
when data is log2-tranformed (in a SAM analysis it is highly recommended
to use log2-transformed expression data) |
na.replace |
if TRUE , missing values will be removed by the genewise/rowwise
statistic specified by na.method . If a gene has less than 2 non-missing
values, this gene will be excluded from further analysis. If na.replace=FALSE ,
all genes with one or more missing values will be excluded from further analysis.
The expression score d of excluded genes is set to NA |
na.method |
a character string naming the statistic with which missing values
will be replaced if na.replace=TRUE . Must be either "mean" (default)
or median |
rand |
numeric value. If specified, i.e. not NA , the random number generator
will be set into a reproducible state |
an object of class SAM
Holger Schwender, holger.schw@gmx.de
Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.