locfdr {locfdr} | R Documentation |
Compute local false discovery rates, following the definitions and description in Efron (2004) JASA, Volume 99, pages 96–104 and Efron, B (2005) "Local false discovery rates" and Efron, B. (2005) "Correlation and large-scale simultaneous significance testing" http://www-stat.stanford.edu/~brad/papers/.
locfdr(zz, bre = 120, df = 7, pct = 1/1000, pct0 = 1/4, nulltype = 1, type = 0, plot = 1, sig0, main = " ")
zz |
A vector of summary statistics, one for each case under simultaneous consideration. In a microarray experiment there would be one component of zz for each gene, perhaps a t-statistic comparing gene expression levels under two different conditions. The calculations assume a large number of cases, say at least length(zz) > 100. |
bre |
Number of breaks in the discretization of the z-score axis, set to 120 by default. This can also be a vector of breakpoints fully describing the discretization. |
df |
Number degrees of freedom for fitting the estimated density f(z); df=7 by default. Larger values of df may be required if f(z) has sharp bends or other irregularities. A warning is issued if the fitted curve does not adequately match the histogram counts. It is a good idea to use the plot option to view the histogram and fitted curve. |
pct |
Excluded tail proportions of zz's when fitting f(z); pct=1/1000 by default; pct=0 includes full range of zz's; pct can also be a 2-vector, describing the fitting range. |
pct0 |
Included proportion of zz distribution used in fitting null density f0(z) is range [pct0, 1-pct0]; default pct0=1/3; pct0 can be a 2-vector, eg pct0=c(.25,.60). |
nulltype |
Type of null hypothesis assumed in estimating f0(z); 0 is theoretical null N(0,1) [which assumes that the original zz scores have been scaled to have a N(0,1) distribution under the null hypothesis]; 1 is the empirical null [which assumes a N(a,b) null hypothesis, with a=zmax and b=sig^2 estimated from the central part of the f(z) fit]; 2 is a "split normal" version of 1 in which the f0(z) is allowed to have different scales on the two sides of the maximum.] The default is nulltype=1. Note that the output includes most of the results for both nulltype=0 and nulltype=1 or 2 no matter which choice is made here, the exception being the indiviual results "fdr" below. |
type |
Type of fitting used for f(z); 0 is a natural spline, 1 is a polynomial, in either case with degrees of freedom df [so total degrees of freedom df+1 including the intercept.] The default is type=0. |
plot |
Number of plots desired; plot=1 gives single plot showing histogram of zz and fitted densities f(z) and f0(z); colored histogram bars indicate non-null "thinned counts", see Section 5, 2nd reference above; square dots on the x-axis indicate threshold z-values for fdr <= .2. plot=2 also gives plot of fdr, and the right and left tail area Fdr curves; plot=3 gives instead the f1 cdf of the estimated fdr curve, as in figure 6 of the second reference above. |
sig0 |
|
main |
{ The main legend for the histogram. }
The standard error estimate lfdrse assumes independence of the zz values, and should usually be considered as a lower bound on the true standard errors.
The density estimates f, f0 , f0theo are scaled to add up to approximately the number of zz's. The non-null density f1 is scaled to add up to approximately (1-p0) times the number of zz's.
The empirical null estimate of standard deviation can be thrown off by irregularities in the central z-value counts. It is a good idea to inspect the z-value histogram (the first plot), and to try the "sig0" option if anomalies are suspected.
A list with five components.
fdr |
|
fp0 |
|
Efdr |
|
cdf1 |
|
mat |
A matrix summarizing the estimates of f(z), f0(z), fdr(z), etc. at the midpoints "z." of the break discretization. These are convenient for comparisons and plotting; mat includes fdr from nulltype 1 or 2 as specified, estimates of the usual tail-area False Discovery Rates, Fdrleft and Fdrright, and also fdrtheo and f0theo. Notice that fp0 and mat contain the information for nulltype 0 and either nulltype 1 or 2, no matter which nulltype has been selected. The choice of nulltype does affect the 10th column of mat, "lfdrse", an estimate of standard error for the curve log(fdr). The 11th column of mat is an estimate "f1" of density for the non-null z-scores. Column "counts" gives the histogram counts for zz.
Bradley Efron
Efron, B. (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis, JASA, Vol. 99, pp 96-104
Efron, B. (2005). Local False Discovery Rates, http://www-stat.stanford.edu/~brad/papers/
Efron, B. (2005). Correlation and large-scale simultaneous significance testing, http://www-stat.stanford.edu/~brad/papers/
## HIV data example data(hivdata) w <- locfdr(hivdata) print(w) ## Second Simulation Example