GOHyperG {Category} | R Documentation |
Given a set of unique Entrez Gene Identifiers, a microarray annotation data package name, and the GO category of interest, this function will compute Hypergeomtric p-values for overrepresentation of each GO term in the specified category among the GO annotations for the interesting genes (as indicated by the Entrez Gene ids).
GOHyperG(x, lib="hgu95av2", what="MF", universe=NULL)
x |
A character vector of unique Entrez Gene identifiers. |
lib |
The name of the annotation data package for the chip that
was used or "YEAST" , see details for more information. |
what |
One of "MF", "BP", or "CC" indicating which of the GO
categories to use for the computation. In GOKEGGHyperG ,
what can also be "KEGG" |
universe |
A character vector of unique Entrez Gene identifiers
or NULL . This is the population (the urn) of the
Hypergeometric test. When NULL (default), the population is
all Entrez Gene ids in the annotation package that have a GO term
annotation in the specified GO category (see details). |
The Entrez Gene ids given in x
define the selected set of
genes. The universe of Entrez Gene ids is determined by the chip
annotation data package (lib
) or specified by the
universe
argument which must be a subset of the Entrez Gene ids
represented on the chip. Both the selected genes and the universe are
reduced by removing Entrez Gene ids that do not have any annotations
in the specified GO category.
For each GO term in the specified category that has at least one
annotation in the selected gene set (x
), we determine how many
of its Entrez Gene annotations are in the universe set and how many
are in the selected set. With these counts we perform a
Hypergeometric test using phyper
. This is equivalent to using
Fisher's exact test.
It is important that the correct chip annotation data package be
identified as it determines the GO term to Entrez Gene id mapping as
well as the universe of Entrez Gene ids in the case that the
universe
argument is omitted.
For S. cerevisiae if the lib
argument is set to "YEAST"
then comparisons and statistics are computed using common names
and are with respect to all genes annotated in the S. cerevisiae genome
not with respect to any microarray chip. This will not be the
right thing to do if you are working with a yeast microarray.
The returned value is a list with components:
pvalues |
The ordered p-values. |
goCounts |
The vector of counts of Entrez Gene ids from the universe at each node. |
intCounts |
The vector of counts of the supplied Entrez Gene ids annotated at each GO term. |
numLL |
The number of unique Entrez Gene ids in the universe that are mapped to some term in the specified GO category. |
numInt |
The number of unique Entrez Gene ids in the selected
gene set, x , that are mapped to some term in the specified GO
category. |
chip |
A string identifying the chip annotation data package used. |
intLLs |
The input vector x . |
go2Affy |
A list with one element for each GO term tested, containing the Affymetrix identifiers associated with that node, for the whole chip (not just the interesting genes). This is the same as extracting the tested GO ids from the annotation package's GO2ALLPROBES environment. |
Typically, one has a set of interesting genes/probes obtained from a
microarray experiment and is interested in determining whether there
is an overrepresentation of these genes at particular GO terms.
GOHyperG
carries out simple Hypergeometric tests to assess the
overrepresentation of GO terms.
Two substantial issues arise. First, it is not clear how to do any
form of p-value correction. The tests are not independent and the
underlying structure of the GO graph presents certain problems that
need to be addressed. The second substantial issue is that not all
probes on a microarray map to a unique Entrez Gene identifer. In
GOHyperG
every attempt to appropriately correct for
non-uniqueness of mappings has been made.
In the return value, the LLs refer to Locus Link which has been superceded by Entrez Gene. For backwards compatibility, the names of the returned elements have not been changed.
R. Gentleman
geneGoHyperGeoTest
geneKeggHyperGeoTest
geneCategoryHyperGeoTest
phyper
library(hgu95av2) library(GO) w1<-as.list(hgu95av2LOCUSID) w2<-unique(unlist(w1)) set.seed(123) #pick a hundred interesting genes myLL <- sample(w2, 100) xx<-GOHyperG(myLL) xx$numLL xx$numInt sum(xx$pvalues < 0.01)