GOHyperG {GOstats} | R Documentation |
Given a set of unique LocusLink Identifiers, a microarray chip and the GO category of interest this function will compute all Hypergeomtric p-values for overrepresentation of the interesting genes (as indicated by the unique LocusLink Identifiers) at the nodes in the induced GO graph.
GOHyperG(x, lib="hgu95av2", what="MF")
x |
A vector of unique LocusLink identifiers. |
lib |
The name of the annotation library for the chip that was used. |
what |
One of "MF", "BP", or "CC" indicating which of the GO categories the computations should be made for. |
Typical usage will be to have a microarray experiment from which a set
of interesting genes/probes has been obtained. To determine whether
there is an overrepresentation of these genes at particular GO terms
a simple hypergeometric calculation has often been made. Two
substantial issues arise. First and most importantly it is not clear
how to do any form of p-value correction in this case. The tests are
not independent and the underlying structure of the GO graph present
certain problems that still need to be addressed. The second substantial
issue is that arises is that the mappings are based on LocusLink
identifiers and hence all computations should also be based on
unique LocusLink identifiers. In GOHyperG
every attempt to
appropriately correct for non-uniqueness of mappings has been made.
The user provides a vector of unique LocusLink identifiers and these are used, together with the name of the chip to create the necessary counts. It is important that the correct chip be identified as that determines the overall counts and all inference will be incorrect if that is not correct.
The test performed is a Hypergeometric test, using phyper
,
where at each GO node we determine how many LLIDs from the chip were
annotated there, how many of the supplied LLIDs were annotated there
and compute a $p$-value. This is the equivalent of using Fisher's
exact test.
The returned value is a list with components:
pvalues |
The ordered p-values. |
goCounts |
The vector of counts of LLIDs from the chip at each node. |
intCounts |
The vector of counts of the supplied LLIDs annotated at each node. |
numLL |
The number of unique LLIDs on the chip that are mapped to some term in the specified GO category. |
numInt |
The number of unique LLIDs from those supplied that are mapped to some term in the specified GO category. |
chip |
A string identifying the chip used. |
intLLs |
The input vector x . |
go2Affy |
A list with one element for each GO node, containing the Affymetrix identifiers associated with that node, for the whole chip (not just the interesting genes). |
R. Gentleman
library(hgu95av2) library(GO) w1<-as.list(hgu95av2LOCUSID) w2<-unique(unlist(w1)) set.seed(123) #pick a hundred interesting genes myLL <- sample(w2, 100) xx<-GOHyperG(myLL) xx$numLL xx$numInt sum(xx$pvalues < 0.01)