GOHyperG {GOstats}R Documentation

Hypergeometric Tests for GO

Description

Given a set of unique LocusLink Identifiers, a microarray chip and the GO category of interest this function will compute all Hypergeomtric p-values for overrepresentation of the interesting genes (as indicated by the unique LocusLink Identifiers) at the nodes in the induced GO graph.

Usage

GOHyperG(x, lib="hgu95av2", what="MF")

Arguments

x A vector of unique LocusLink identifiers.
lib The name of the annotation library for the chip that was used.
what One of "MF", "BP", or "CC" indicating which of the GO categories the computations should be made for.

Details

Typical usage will be to have a microarray experiment from which a set of interesting genes/probes has been obtained. To determine whether there is an overrepresentation of these genes at particular GO terms a simple hypergeometric calculation has often been made. Two substantial issues arise. First and most importantly it is not clear how to do any form of p-value correction in this case. The tests are not independent and the underlying structure of the GO graph present certain problems that still need to be addressed. The second substantial issue is that arises is that the mappings are based on LocusLink identifiers and hence all computations should also be based on unique LocusLink identifiers. In GOHyperG every attempt to appropriately correct for non-uniqueness of mappings has been made.

The user provides a vector of unique LocusLink identifiers and these are used, together with the name of the chip to create the necessary counts. It is important that the correct chip be identified as that determines the overall counts and all inference will be incorrect if that is not correct.

The test performed is a Hypergeometric test, using phyper, where at each GO node we determine how many LLIDs from the chip were annotated there, how many of the supplied LLIDs were annotated there and compute a $p$-value. This is the equivalent of using Fisher's exact test.

Value

The returned value is a list with components:

pvalues The ordered p-values.
goCounts The vector of counts of LLIDs from the chip at each node.
intCounts The vector of counts of the supplied LLIDs annotated at each node.
numLL The number of unique LLIDs on the chip that are mapped to some term in the specified GO category.
numInt The number of unique LLIDs from those supplied that are mapped to some term in the specified GO category.
chip A string identifying the chip used.
intLLs The input vector x.
go2Affy A list with one element for each GO node, containing the Affymetrix identifiers associated with that node, for the whole chip (not just the interesting genes).

Author(s)

R. Gentleman

See Also

phyper

Examples


library(hgu95av2)
library(GO)
w1<-as.list(hgu95av2LOCUSID)
w2<-unique(unlist(w1))
set.seed(123)
#pick a hundred interesting genes
 myLL <- sample(w2, 100)
 xx<-GOHyperG(myLL)
xx$numLL
xx$numInt
sum(xx$pvalues < 0.01)


[Package GOstats version 1.1.3 Index]