OrderedList {OrderedList} | R Documentation |
Function OrderedList
aims for the comparison of
comparisons: given two expression studies with one ranked (ordered)
list of genes each, we might observe considerable overlap among the
top-scoring genes. OrderedList
quantifies this overlap by
computing a weighted similarity score, where the top-ranking genes
contribute more to the score than the genes further down the list. The
final list of overlapping genes consists of those probes that
contribute a certain percentage to the overall similarity score.
OrderedList(eset, B = 1000, test = "z", beta = 1, percent = 0.95, verbose = TRUE, alpha=NULL, min.weight=1e-5)
eset |
Expression set containing the two studies of interest. Use prepareData to generate eset . |
B |
Number of internal sub-samples needed to optimize alpha. |
test |
String, one of 'fc' (log ratio = log fold change), 't' (t-test with equal variances) or 'z' (t-test with regularized variances). The z-statistic is implemented as described in Efron et al. (2001). |
beta |
Either 1 or 0.5. In a comparison where the class labels of the studies match, we set beta=1 . For example, in each single study the first class relates to bad prognosis while the second class relates to good prognosis. If a matching is not possible, we set beta=0.5 . For example, we compare a study with good/bad prognosis classes to a study, in which the classes are two types of cancer tissues. |
percent |
The final list of overlapping genes consists of those probes that contribute a certain percentage to the overall similarity score. Default is percent=0.95 . To get the full list of genes, set percent=1 . |
verbose |
Logical value for message printing. |
alpha |
A vector of weighting parameters. If set to NULL (the default),
parameters are computed such that top 100 to the top 2500 ranks receive
weights above min.weight . |
min.weight |
The minimal weight to be taken into account while computing scores. |
In short, the similarity measure is computed as follows: Based on two-sample test statistics like the t-test, genes within each study are ranked from most up-regulated down to most down-regulated. Thus we have one ordered list per study. Now for each rank going both from top (up-regulated end) and from bottom (down-regulated end) we count the number of overlapping genes. The total overlap A_n for rank n is defined as:
A_n = O_n (G_1,G_2) + O_n(f(G_1),f(G_2))
where G_1 and G_2 are the two ordered list, f(G_1) and f(G_2) are the two flipped lists with the down-regulated genes on top and O_n is the size of the overlap of its two arguments. A preliminary version of the weighted overlap over all ranks n is then given as:
T_α(G_1,G_2) = sum_n exp{-α n} A_n.
The final similarity score includes the case that we cannot match the classes in each study exactly and thus do not know whether up-regulation in one list corresponds to up- or down-regulation in the other list. Here parameter β comes into play:
S_α(G_1,G_2) = max{ β T_α(G_1,G_2), (1-β) T_α (G_1,f(G_2)) }.
Parameter β is set by the user but parameter α has to be tuned in a simulation using sub-samples and permutations of the original class labels.
Returns an object of class OrderedList
, which consists of a list with entries:
n |
Total number of genes. |
label |
The concatenated study labels as provided by eset . |
p |
The p-value specifying the significance of the similarity. |
intersect |
Vector with sorted probe IDs of the overlapping genes, which contribute percent to the overall similarity score. |
alpha |
The optimal regularization parameter alpha. |
direction |
Numerical value. Returns '1' if the similarity score is higher for the originally ordered lists and '-1' if the score is higher for the comparison of one original to one flipped list. Of special interest if beta=0.5 . |
scores |
Matrix of observed test scores with genes in rows and studies in columns. |
sim.scores |
List with four elements with output of the resampling with optimal alpha . SIM.observed : The observed similarity sore. SIM.alternative : Vector of observed similarity scores simulated using sub-sampling within the distinct classes of each study. SIM.random : Vector of random similarity scores simulated by randomly permuting the class labels of each study. subSample : TRUE to indicate that sub-sampling was used. |
pauc |
Vector with pAUC-scores for each candidate of the regularization parameter α. The maximal pAUC-score defines the optimal α. See also plot.OrderedList . |
call |
List with some of the input parameters. |
Xinan Yang, Claudio Lottaz, Stefanie Scheid
Yang X, Bentink S, Scheid S, and Spang R (2006): Similarities of ordered gene lists, to appear in Journal of Bioinformatics and Computational Biology.
Efron B, Tibshirani R, Storey JD, and Tusher V (2001): Empirical Bayes analysis of a microarray experiment, Journal of the American Statistical Society 96, 1151–1160.
prepareData
, OL.data
, OL.result
, plot.OrderedList
, print.OrderedList
, compareLists
### Let's compare the two example studies. ### The first entries of 'out' both relate to bad prognosis. ### Hence the class labels match between the two studies ### and we can use 'OrderedList' with default 'beta=1'. data(OL.data) a <- prepareData( list(data=OL.data$breast,name="breast",var="Risk",out=c("high","low"),paired=FALSE), list(data=OL.data$prostate,name="prostate",var="outcome",out=c("Rec","NRec"),paired=FALSE), mapping=OL.data$map ) ## Not run: OL.result <- OrderedList(a) ## End(Not run) ### The same comparison was done beforehand. data(OL.result) OL.result plot(OL.result)