cat.stat {siggenes}R Documentation

SAM Analysis for Categorical Data

Description

Generates the required statistics for a Significance Analysis of Microarrays of categorical data such as SNP data.

Should not be called directly, but via sam(..., method = cat.stat).

Usage

  cat.stat(data, cl, B = 100, approx = FALSE, check.levels = TRUE, 
    check.for.NN = FALSE, lev = NULL, B.more = 0.1, B.max = 50000, 
    n.subset = 10, rand = NA)

Arguments

data a matrix or data frame. Each row must correspond to a variable/SNP, and each column to a sample.
cl a numeric vector of length ncol(data) indicating to which class a sample belongs. Must consist of the integers between 1 and c, where c is the number of different groups.
B the number of permutations used in the estimation of the null distribution, and hence, in the computation of the expected d-values.
approx should the null distribution be approximated by the chi^2-distribution?
check.levels if TRUE, it will be checked if all variables/SNPs have the same number of levels/categories.
check.for.NN if TRUE, it will be checked if any of the genotypes is equal to "NN". Can be very time-consuming when the data set is high-dimensional.
lev numeric or character vector specifying the codings of the levels of the variables/SNPs. Must only be specified if the variables are not coded by the integers between 1 and the number of levels.
B.more a numeric value. If the number of all possible permutations is smaller than or equal to (1+B.more)*B, full permutation will be done. Otherwise, B permutations are used.
B.max a numeric value. If the number of all possible permutations is smaller than or equal to B.max, B randomly selected permutations will be used in the computation of the null distribution. Otherwise, B random draws of the group labels are used.
n.subset a numeric value indicating how many permutations are considered simultaneously when computing the expected d-values.
rand numeric value. If specified, i.e. not NA, the random number generator will be set into a reproducible state.

Details

For each SNP, Pearson's Chi-Square statistic is computed to test if the distribution of the SNP differs between several groups. Since only one null distribution is estimated for all SNPs as proposed in the original SAM procedure of Tusher et al. (2001) all SNPs must have the same number of levels/categories.

Value

a list containing statistics required by sam

Warning

This procedure will only work correctly if all SNPs/variables have the same number of levels/categories.

Author(s)

Holger Schwender, holger.schw@gmx.de

References

Schwender, H. (2004). Modifying Microarray Analysis Methods for Categorical Data – SAM and PAM for SNPs. To appear in: Proceedings of the the 28th Annual Conference of the GfKl.

Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.

See Also

SAM-class,sam


[Package siggenes version 1.10.1 Index]