sam {siggenes} | R Documentation |
Performs a Significance Analysis of Microarrays (SAM). It is possible to perform one and two class analyses using either a modified t-statistic or a (standardized) Wilcoxon rank statistic, and a multiclass analysis using a modified F-statistic. Moreover, this function provides a SAM procedure for categorical data such as SNP data.
sam(data, cl, method = "d.stat", delta = NULL, n.delta = 10, p0 = NA, lambda = seq(0, 0.95, 0.05), ncs.value = "max", ncs.weights = NULL, gene.names = dimnames(data)[[1]], q.version = 1, ...)
data |
a matrix, a data frame, an exprSet or an ExpressionSet object. Each row of data
(or exprs(data) , respectively) must correspond to a gene, and
each column to a sample |
cl |
a vector of length ncol(data) containing the class
labels of the samples. In the two class paired case, cl can also
be a matrix with ncol(data) rows and 2 columns. If data is
an exprSet or ExpressionSet object, cl can also be a character string naming the column
of pData(data) that contains the class labels of the samples.
In the one-class case, cl should be a vector of 1's.
In the two class unpaired case, cl should be a vector containing 0's
(specifying the samples of, e.g., the control group) and 1's (specifying,
e.g., the case group).
In the two class paired case, cl can be either a numeric vector or a numeric matrix.
If it is a vector, then cl has to consist of the integers between -1 and
-n/2 (e.g., before treatment group) and between 1 and n/2 (e.g.,
after treatment group), where n is the length of cl and k
is paired with -k, k=1,...,n/2. If cl is a matrix, one
column should contain -1's and 1's specifying, e.g., the before and the after
treatment samples, respectively, and the other column should contain integer
between 1 and n/2 specifying the n/2 pairs of observations.
In the multiclass case and if method="cat.stat" , cl should be a vector containing integers
between 1 and g, where g is the number of groups.
For examples of how cl can be specified, see the manual of siggenes |
method |
a character string specifying the method that should be used
in the computation of the expression scores d. If method="d.stat" ,
a modified t-statistic or F-statistic, respectively, will be computed
as proposed by Tusher et al. (2001). If method="wilc.stat" , a
Wilcoxon rank sum statistic or Wilcoxon signed rank statistic will be used
as expression score. For an analysis of categorical data such as SNP data,
method can be set to "cat.stat" . In this case Pearson's
Chi-squared statistic is computed for each row. It is also possible to use
a user-written function to compute the expression scores.
For details, see Details |
delta |
a numeric vector specifying a set of values for the threshold
Delta that should be used. If NULL , n.delta
Delta values will be computed automatically |
n.delta |
a numeric value specifying the number of Delta values
that will be computed over the range of all possible values for Delta
if delta is not specified |
p0 |
a numeric value specifying the prior probability pi0
that a gene is not differentially expressed. If NA , p0 will
be computed by the function pi0.est |
lambda |
a numeric vector or value specifying the lambda
values used in the estimation of the prior probability. For details, see
?pi0.est |
ncs.value |
a character string. Only used if lambda is a
vector. Either "max" or "paper" . For details, see ?pi0.est |
ncs.weights |
a numerical vector of the same length as lambda
containing the weights used in the estimation of pi0. By default
no weights are used. For details, see ?pi0.est |
gene.names |
a character vector of length nrow(data) containing the
names of the genes. By default the row names of data are used |
q.version |
a numeric value indicating which version of the q-value should
be computed. If q.version=2 , the original version of the q-value, i.e.
min{pFDR}, will be computed. If q.version=1 , min{FDR} will be used
in the calculation of the q-value. Otherwise, the q-value is not computed.
For details, see ?qvalue.cal |
... |
further arguments of the specific SAM methods. If method="d.stat" ,
see ?sam.dstat , if method="wilc.stat" , see ?sam.wilc , and if
method="cat.stat" , see ?sam.snp for these arguments |
sam
provides SAM procedures for several types of analysis (one and two class analyses
with either a modified t-statistic or a Wilcoxon rank statistic, a multiclass analysis
with a modified F statistic, and an analysis of categorical data). It is, however, also
possible to write your own function for another type of analysis. The required arguments
of this function must be data
and cl
. This function can also have other
arguments. The output of this function must be a list containing
d
:d.bar
:na.exclude(d)
specifying
the expected expression scores under the null hypothesisp.value
:d
containing
the raw, unadjusted p-values of the genesvec.false
:d
consisting of
the one-sided numbers of falsely called genes, i.e. if d>0 the numbers
of genes expected to be larger than d under the null hypothesis, and if
d<0, the number of genes expected to be smaller than d under the
null hypothesiss
:d
containing the standard deviations
of the genes. If no standard deviation can be calculated, set s=numeric(0)
s0
:s0=numeric(0)
mat.samp
:ncol(data)
columns, where B is the number
of permutations, containing the permutations used in the computation of the permuted
d-values. If such a matrix is not computed, set mat.samp=matrix(numeric(0))
msg
:msg
is printed when the function print
or
summary
, respectively, is called. If no such message should be printed, set msg=""
fold
:d
consisting of the fold
changes of the genes. If no fold change has been computed, set fold=numeric(0)
If this function is, e.g., called foo
, it can be used by setting method="foo"
in sam
. More detailed information and an example will be contained in the siggenes
manual.
an object of class SAM
SAM was deveoped by Tusher et al. (2001).
!!! There is a patent pending for the SAM technology at Stanford University. !!!
Holger Schwender, holger.schw@gmx.de
Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany. http://www.sfb475.uni-dortmund.de/berichte/tr44-03.pdf.
Schwender, H. (2004). Modifying Microarray Analysis Methods for Categorical Data – SAM and PAM for SNPs. To appear in: Proceedings of the the 28th Annual Conference of the GfKl.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.
SAM-class
,sam.dstat
,sam.wilc
,
sam.snp
,sam.plot2
,delta.plot
## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # golub.cl contains the class labels. golub.cl # Perform a SAM analysis for the two class unpaired case assuming # unequal variances. sam.out<-sam(golub,golub.cl,B=100,rand=123) sam.out # Obtain the Delta plots for the default set of Deltas plot(sam.out) # Generate the Delta plots for Delta = 0.2, 0.4, 0.6, ..., 2 plot(sam.out,seq(0.2,0.4,2)) # Obtain the SAM plot for Delta = 2 plot(sam.out,2) # Get information about the genes called significant using # Delta = 3 (since neither the gene names nor the chip type # has been specified ll is set to FALSE to avoid a warning) sam.sum3<-summary(sam.out,3,ll=FALSE) # Obtain the rows of golub containing the genes called # differentially expressed sam.sum3@row.sig.genes # and their names golub.gnames[sam.sum3@row.sig.genes,3] # The matrix containing the d-values, q-values etc. of the # differentially expressed genes can be obtained by sam.sum3@mat.sig # Perform a SAM analysis using Wilcoxon rank sums sam(golub,golub.cl,method="wilc.stat",rand=123) # Now consider only the first ten columns of the Golub et al. (1999) # data set. For now, let's assume the first five columns were # before treatment measurements and the next five columns were # after treatment measurements, where column 1 and 6, column 2 # and 7, ..., build a pair. In this case, the class labels # would be new.cl<-c(-(1:5),1:5) new.cl # and the corresponding SAM analysis for the two-class paired # case would be sam(golub[,1:10],new.cl,B=100,rand=123) # Another way of specifying the class labels for the above paired # analysis is mat.cl<-matrix(c(rep(c(-1,1),e=5),rep(1:5,2)),10) mat.cl # and the above SAM analysis can also be done by sam(golub[,1:10],mat.cl,B=100,rand=123) ## End(Not run)