GOHyperG {Category}R Documentation

Hypergeometric Tests for GO

Description

Given a set of unique Entrez Gene Identifiers, a microarray annotation data package name, and the GO category of interest, this function will compute Hypergeomtric p-values for overrepresentation of each GO term in the specified category among the GO annotations for the interesting genes (as indicated by the Entrez Gene ids).

Usage

GOHyperG(x, lib="hgu95av2", what="MF", universe=NULL)

Arguments

x A character vector of unique Entrez Gene identifiers.
lib The name of the annotation data package for the chip that was used or "YEAST", see details for more information.
what One of "MF", "BP", or "CC" indicating which of the GO categories to use for the computation. In GOKEGGHyperG, what can also be "KEGG"
universe A character vector of unique Entrez Gene identifiers or NULL. This is the population (the urn) of the Hypergeometric test. When NULL (default), the population is all Entrez Gene ids in the annotation package that have a GO term annotation in the specified GO category (see details).

Details

The Entrez Gene ids given in x define the selected set of genes. The universe of Entrez Gene ids is determined by the chip annotation data package (lib) or specified by the universe argument which must be a subset of the Entrez Gene ids represented on the chip. Both the selected genes and the universe are reduced by removing Entrez Gene ids that do not have any annotations in the specified GO category.

For each GO term in the specified category that has at least one annotation in the selected gene set (x), we determine how many of its Entrez Gene annotations are in the universe set and how many are in the selected set. With these counts we perform a Hypergeometric test using phyper. This is equivalent to using Fisher's exact test.

It is important that the correct chip annotation data package be identified as it determines the GO term to Entrez Gene id mapping as well as the universe of Entrez Gene ids in the case that the universe argument is omitted.

For S. cerevisiae if the lib argument is set to "YEAST" then comparisons and statistics are computed using common names and are with respect to all genes annotated in the S. cerevisiae genome not with respect to any microarray chip. This will not be the right thing to do if you are working with a yeast microarray.

Value

The returned value is a list with components:

pvalues The ordered p-values.
goCounts The vector of counts of Entrez Gene ids from the universe at each node.
intCounts The vector of counts of the supplied Entrez Gene ids annotated at each GO term.
numLL The number of unique Entrez Gene ids in the universe that are mapped to some term in the specified GO category.
numInt The number of unique Entrez Gene ids in the selected gene set, x, that are mapped to some term in the specified GO category.
chip A string identifying the chip annotation data package used.
intLLs The input vector x.
go2Affy A list with one element for each GO term tested, containing the Affymetrix identifiers associated with that node, for the whole chip (not just the interesting genes). This is the same as extracting the tested GO ids from the annotation package's GO2ALLPROBES environment.

Note

Typically, one has a set of interesting genes/probes obtained from a microarray experiment and is interested in determining whether there is an overrepresentation of these genes at particular GO terms. GOHyperG carries out simple Hypergeometric tests to assess the overrepresentation of GO terms.

Two substantial issues arise. First, it is not clear how to do any form of p-value correction. The tests are not independent and the underlying structure of the GO graph presents certain problems that need to be addressed. The second substantial issue is that not all probes on a microarray map to a unique Entrez Gene identifer. In GOHyperG every attempt to appropriately correct for non-uniqueness of mappings has been made.

In the return value, the LLs refer to Locus Link which has been superceded by Entrez Gene. For backwards compatibility, the names of the returned elements have not been changed.

Author(s)

R. Gentleman

See Also

geneGoHyperGeoTest geneKeggHyperGeoTest geneCategoryHyperGeoTest phyper

Examples


library(hgu95av2)
library(GO)
w1<-as.list(hgu95av2LOCUSID)
w2<-unique(unlist(w1))
set.seed(123)
#pick a hundred interesting genes
 myLL <- sample(w2, 100)
 xx<-GOHyperG(myLL)
xx$numLL
xx$numInt
sum(xx$pvalues < 0.01)


[Package Category version 1.4.1 Index]