Using Bioconductor for Annotation

Bioconductor has extensive facilities for mapping between microarray probe, gene, pathway, gene ontology, homology and other annotations.

Bioconductor has built-in representations of GO, KEGG, vendor, and other annotations, and can easily access NCBI, Biomart, UCSC, and other sources.

Package Types

Bioconductor contains many different types of annotation packages. You can browse the currently available types here [here](http://www.bioconductor.org/packages/release/BiocViews.html#___PackageType) by simply using the bioconductor web site. You will see that there are packages that contain annotation data about a particular microarray platform (ChipDb), there are packages that contain gene centered data about an organism (OrgDb), and even packages that contain genome centered data about an organisms transcriptome (TranscriptDb). This document will talk about typical uses for most of these more popular kinds of annotation package. As well as describe a newer meta package that wraps access to several different kinds of packages (OrganismDb).

Sample OrgDb Workflow

The organism wide gene centered packages (OrgDb packages) all contain gene centered data for an organism. These packages are the primary place for storing data that can be directly associated with genes. Lets take a closer look at the organism package for human:

library(org.Hs.eg.db)

Once loaded, each OrgDb object can be accessed using the following four methods:

To list the kinds of things that can be retrieved, use the columns method.

columns(org.Hs.eg.db)
##  [1] "ENTREZID"     "PFAM"         "IPI"          "PROSITE"     
##  [5] "ACCNUM"       "ALIAS"        "CHR"          "CHRLOC"      
##  [9] "CHRLOCEND"    "ENZYME"       "MAP"          "PATH"        
## [13] "PMID"         "REFSEQ"       "SYMBOL"       "UNIGENE"     
## [17] "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "GENENAME"    
## [21] "UNIPROT"      "GO"           "EVIDENCE"     "ONTOLOGY"    
## [25] "GOALL"        "EVIDENCEALL"  "ONTOLOGYALL"  "OMIM"        
## [29] "UCSCKG"

To list the kinds of things that can be used as keys we can use the keytypes method

keytypes(org.Hs.eg.db)
##  [1] "ENTREZID"     "PFAM"         "IPI"          "PROSITE"     
##  [5] "ACCNUM"       "ALIAS"        "CHR"          "CHRLOC"      
##  [9] "CHRLOCEND"    "ENZYME"       "MAP"          "PATH"        
## [13] "PMID"         "REFSEQ"       "SYMBOL"       "UNIGENE"     
## [17] "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "GENENAME"    
## [21] "UNIPROT"      "GO"           "EVIDENCE"     "ONTOLOGY"    
## [25] "GOALL"        "EVIDENCEALL"  "ONTOLOGYALL"  "OMIM"        
## [29] "UCSCKG"

To extract viable keys of a particular kind, use the keys method.

head(keys(org.Hs.eg.db, keytype="ENTREZID"))
## [1] "1"         "10"        "100"       "1000"      "10000"     "100008586"

Since the keys method can tell us spefic things that can be used as keys, here we will use it to extract a few ids to use for demonstrating the fourth method type.

ids = head(keys(org.Hs.eg.db, keytype="ENTREZID"))

Once you have some ids that you want to look up data for, the select method allows you to map these ids as long as you use the columns argument to indicate what you need to know and the keytype argument to specify what kind of keys they are.

select(org.Hs.eg.db, keys=ids, columns="SYMBOL", keytype="ENTREZID")
##    ENTREZID  SYMBOL
## 1         1    A1BG
## 2        10    NAT2
## 3       100     ADA
## 4      1000    CDH2
## 5     10000    AKT3
## 6 100008586 GAGE12F

And since the columns argument can take a vector of valid columns, you can look up multiple things at once.

select(org.Hs.eg.db, keys=ids, columns=c("GENENAME", "SYMBOL"), keytype="ENTREZID")
##    ENTREZID                                              GENENAME  SYMBOL
## 1         1                                alpha-1-B glycoprotein    A1BG
## 2        10 N-acetyltransferase 2 (arylamine N-acetyltransferase)    NAT2
## 3       100                                   adenosine deaminase     ADA
## 4      1000             cadherin 2, type 1, N-cadherin (neuronal)    CDH2
## 5     10000         v-akt murine thymoma viral oncogene homolog 3    AKT3
## 6 100008586                                         G antigen 12F GAGE12F

But where would we normally get the “ids” that we would pass in to the keys argument? Usually these kinds of ids come from other datasets.

We can also find and extract the GO ids associated with the first id (there are quite a few of these)

id = ids[1]
res <- select(org.Hs.eg.db, keys=id, columns="GO", keytype="ENTREZID")
## Warning: 'select' resulted in 1:many mapping between keys and return rows
head(res)
##   ENTREZID         GO EVIDENCE ONTOLOGY
## 1        1 GO:0003674       ND       MF
## 2        1 GO:0005576      IDA       CC
## 3        1 GO:0008150       ND       BP
## 4        1 GO:0070062      IDA       CC
## 5        1 GO:0072562      IDA       CC

You may have noticed that the above request results in many rows for just one input id. Somemetimes when you use select you may ask for columns that will result in select having to return multiple values for each key that you passed in. This just has to do with the structure of the underlying data. This is sometimes called a many to one relationship. When this happens select() will return multiple rows for each key that you used as input because the return value for select is a data.frame object. Requesting multiple many to one relationships at once will result in a multiplication of the returned rows and is not recommended as you can very quickly generate a data.frame object that is both very large and progressivly less useful. The best practice is to use select carefully, and to not request multiple many to one values at any one time.

We can also use the GO.db package to find the Terms associated with those GOIDs. How will this work? Well the GO.db package will load a GO.db object that can be used in a manner similar to what we just saw with our OrgDb object org.Hs.eg.db. So we can use the same four methods that we just learned about (columns, keytypes, keys and select), to extract whatever data we need from this object.

library("GO.db")
## 
head(res$GO)  ## shows what we are using as keys
## [1] "GO:0003674" "GO:0005576" "GO:0008150" "GO:0070062" "GO:0072562"
head(select(GO.db, keys=res$GO, columns="TERM", keytype="GOID"))
##         GOID                            TERM
## 1 GO:0003674              molecular_function
## 2 GO:0005576            extracellular region
## 3 GO:0008150              biological_process
## 4 GO:0070062 extracellular vesicular exosome
## 5 GO:0072562             blood microparticle

Exercises for OrgDb objects.

Exercise 1: Look at the help page for the different columns and keytypes values with: help(“SYMBOL”). Now use this information and what we just described to look up the entrez gene and chromosome for the gene symbol “MSX2”.

Exercise 3: In the previous exercise we had to use gene symbols as keys. But in the past this kind of behavior has sometimes been inadvisable because some gene symbols are used as the official symbol for more than one gene. To learn if this is still happening take advantage of the fact that entrez gene ids are uniquely assigned, and extract all of the gene symbols and their associated entrez gene ids from the org.Hs.eg.db package. Then check the symbols for redundancy.

[ Back to top ]

Sample ChipDb Workflow

The following illustrates a typical R / Bioconductor session for a ChipDb package. It continues the differential expression workflow, taking a 'top table' of differentially expressed probesets and discovering the genes probed, and the Gene Ontology pathways to which they belong.

First lets consider some typical probe Ids. If you have done a microarray analysis before you have probably already run into IDs like this. They are typically manufacturer assigned and normally only relevant to a small number of chips. Below I am just going to demonstrate on 6 probe Ids from the u133 2.0 affymetrix platform.

## Affymetrix U133 2.0 array IDs of interest; these might be
## obtained from
##
##   tbl <- topTable(efit, coef=2)
##   ids <- tbl[["ID"]]
##
## as part of a more extensive workflow.
ids <- c("39730_at", "1635_at", "1674_at", "40504_at", "40202_at")

Load libraries as sources of annotation

library("hgu95av2.db")

And from here you can use the new ChipDb object in the same way that you learned to use an OrgDb object before. The only real change is that the ChipDb object will also have data about how platform specific probes match to specific genes. So for example:

columns(hgu95av2.db)
##  [1] "PROBEID"      "ENTREZID"     "PFAM"         "IPI"         
##  [5] "PROSITE"      "ACCNUM"       "ALIAS"        "CHR"         
##  [9] "CHRLOC"       "CHRLOCEND"    "ENZYME"       "MAP"         
## [13] "PATH"         "PMID"         "REFSEQ"       "SYMBOL"      
## [17] "UNIGENE"      "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
## [21] "GENENAME"     "UNIPROT"      "GO"           "EVIDENCE"    
## [25] "ONTOLOGY"     "GOALL"        "EVIDENCEALL"  "ONTOLOGYALL" 
## [29] "OMIM"         "UCSCKG"
keytypes(hgu95av2.db)
##  [1] "ENTREZID"     "PFAM"         "IPI"          "PROSITE"     
##  [5] "ACCNUM"       "ALIAS"        "CHR"          "CHRLOC"      
##  [9] "CHRLOCEND"    "ENZYME"       "MAP"          "PATH"        
## [13] "PMID"         "REFSEQ"       "SYMBOL"       "UNIGENE"     
## [17] "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "GENENAME"    
## [21] "UNIPROT"      "GO"           "EVIDENCE"     "ONTOLOGY"    
## [25] "GOALL"        "EVIDENCEALL"  "ONTOLOGYALL"  "PROBEID"     
## [29] "OMIM"         "UCSCKG"
columns <- c("PFAM","GO", "SYMBOL")
select(hgu95av2.db, keys=ids, columns, keytype="PROBEID")
## Warning: 'select' resulted in 1:many mapping between keys and return rows
##      PROBEID    PFAM         GO EVIDENCE ONTOLOGY SYMBOL
## 1   39730_at PF00017 GO:0000287      IDA       MF   ABL1
## 2   39730_at PF00017 GO:0003677      NAS       MF   ABL1
## 3   39730_at PF00017 GO:0003785      TAS       MF   ABL1
## 4   39730_at PF00017 GO:0004515      TAS       MF   ABL1
## 5   39730_at PF00017 GO:0004713      IDA       MF   ABL1
## 6   39730_at PF00017 GO:0004715      IDA       MF   ABL1
## 7   39730_at PF00017 GO:0005515      IPI       MF   ABL1
## 8   39730_at PF00017 GO:0005524      IDA       MF   ABL1
## 9   39730_at PF00017 GO:0005634      IDA       CC   ABL1
## 10  39730_at PF00017 GO:0005634      NAS       CC   ABL1
## 11  39730_at PF00017 GO:0005634      TAS       CC   ABL1
## 12  39730_at PF00017 GO:0005730      IDA       CC   ABL1
## 13  39730_at PF00017 GO:0005737      TAS       CC   ABL1
## 14  39730_at PF00017 GO:0005739      IEA       CC   ABL1
## 15  39730_at PF00017 GO:0005829      TAS       CC   ABL1
## 16  39730_at PF00017 GO:0006298      TAS       BP   ABL1
## 17  39730_at PF00017 GO:0006355      TAS       BP   ABL1
## 18  39730_at PF00017 GO:0006464      NAS       BP   ABL1
## 19  39730_at PF00017 GO:0006914      IEA       BP   ABL1
## 20  39730_at PF00017 GO:0006974      IDA       BP   ABL1
## 21  39730_at PF00017 GO:0006975      IDA       BP   ABL1
## 22  39730_at PF00017 GO:0007155      IEA       BP   ABL1
## 23  39730_at PF00017 GO:0007411      TAS       BP   ABL1
## 24  39730_at PF00017 GO:0007596      TAS       BP   ABL1
## 25  39730_at PF00017 GO:0008022      IPI       MF   ABL1
## 26  39730_at PF00017 GO:0008630      TAS       BP   ABL1
## 27  39730_at PF00017 GO:0010506      TAS       BP   ABL1
## 28  39730_at PF00017 GO:0015629      TAS       CC   ABL1
## 29  39730_at PF00017 GO:0017124      IPI       MF   ABL1
## 30  39730_at PF00017 GO:0018108      IDA       BP   ABL1
## 31  39730_at PF00017 GO:0019905      IPI       MF   ABL1
## 32  39730_at PF00017 GO:0030036      ISS       BP   ABL1
## 33  39730_at PF00017 GO:0030100      TAS       BP   ABL1
## 34  39730_at PF00017 GO:0030145      IDA       MF   ABL1
## 35  39730_at PF00017 GO:0030155      TAS       BP   ABL1
## 36  39730_at PF00017 GO:0031252      IEA       CC   ABL1
## 37  39730_at PF00017 GO:0031965      IEA       CC   ABL1
## 38  39730_at PF00017 GO:0038096      TAS       BP   ABL1
## 39  39730_at PF00017 GO:0042692      TAS       BP   ABL1
## 40  39730_at PF00017 GO:0042770      IDA       BP   ABL1
## 41  39730_at PF00017 GO:0043065      IDA       BP   ABL1
## 42  39730_at PF00017 GO:0045087      TAS       BP   ABL1
## 43  39730_at PF00017 GO:0048008      IEA       BP   ABL1
## 44  39730_at PF00017 GO:0048471      IDA       CC   ABL1
## 45  39730_at PF00017 GO:0050731      IDA       BP   ABL1
## 46  39730_at PF00017 GO:0051019      IPI       MF   ABL1
## 47  39730_at PF00017 GO:0051149      TAS       BP   ABL1
## 48  39730_at PF00017 GO:0051353      IDA       BP   ABL1
## 49  39730_at PF00017 GO:0051726      IEA       BP   ABL1
## 50  39730_at PF00017 GO:0070064      IDA       MF   ABL1
## 51  39730_at PF00017 GO:0070064      IPI       MF   ABL1
## 52  39730_at PF00017 GO:0071901      IDA       BP   ABL1
## 53  39730_at PF00017 GO:2000145      TAS       BP   ABL1
## 54  39730_at PF00017 GO:2000249      TAS       BP   ABL1
## 55  39730_at PF00017 GO:2001020      IDA       BP   ABL1
## 56  39730_at PF08919 GO:0000287      IDA       MF   ABL1
## 57  39730_at PF08919 GO:0003677      NAS       MF   ABL1
## 58  39730_at PF08919 GO:0003785      TAS       MF   ABL1
## 59  39730_at PF08919 GO:0004515      TAS       MF   ABL1
## 60  39730_at PF08919 GO:0004713      IDA       MF   ABL1
## 61  39730_at PF08919 GO:0004715      IDA       MF   ABL1
## 62  39730_at PF08919 GO:0005515      IPI       MF   ABL1
## 63  39730_at PF08919 GO:0005524      IDA       MF   ABL1
## 64  39730_at PF08919 GO:0005634      IDA       CC   ABL1
## 65  39730_at PF08919 GO:0005634      NAS       CC   ABL1
## 66  39730_at PF08919 GO:0005634      TAS       CC   ABL1
## 67  39730_at PF08919 GO:0005730      IDA       CC   ABL1
## 68  39730_at PF08919 GO:0005737      TAS       CC   ABL1
## 69  39730_at PF08919 GO:0005739      IEA       CC   ABL1
## 70  39730_at PF08919 GO:0005829      TAS       CC   ABL1
## 71  39730_at PF08919 GO:0006298      TAS       BP   ABL1
## 72  39730_at PF08919 GO:0006355      TAS       BP   ABL1
## 73  39730_at PF08919 GO:0006464      NAS       BP   ABL1
## 74  39730_at PF08919 GO:0006914      IEA       BP   ABL1
## 75  39730_at PF08919 GO:0006974      IDA       BP   ABL1
## 76  39730_at PF08919 GO:0006975      IDA       BP   ABL1
## 77  39730_at PF08919 GO:0007155      IEA       BP   ABL1
## 78  39730_at PF08919 GO:0007411      TAS       BP   ABL1
## 79  39730_at PF08919 GO:0007596      TAS       BP   ABL1
## 80  39730_at PF08919 GO:0008022      IPI       MF   ABL1
## 81  39730_at PF08919 GO:0008630      TAS       BP   ABL1
## 82  39730_at PF08919 GO:0010506      TAS       BP   ABL1
## 83  39730_at PF08919 GO:0015629      TAS       CC   ABL1
## 84  39730_at PF08919 GO:0017124      IPI       MF   ABL1
## 85  39730_at PF08919 GO:0018108      IDA       BP   ABL1
## 86  39730_at PF08919 GO:0019905      IPI       MF   ABL1
## 87  39730_at PF08919 GO:0030036      ISS       BP   ABL1
## 88  39730_at PF08919 GO:0030100      TAS       BP   ABL1
## 89  39730_at PF08919 GO:0030145      IDA       MF   ABL1
## 90  39730_at PF08919 GO:0030155      TAS       BP   ABL1
## 91  39730_at PF08919 GO:0031252      IEA       CC   ABL1
## 92  39730_at PF08919 GO:0031965      IEA       CC   ABL1
## 93  39730_at PF08919 GO:0038096      TAS       BP   ABL1
## 94  39730_at PF08919 GO:0042692      TAS       BP   ABL1
## 95  39730_at PF08919 GO:0042770      IDA       BP   ABL1
## 96  39730_at PF08919 GO:0043065      IDA       BP   ABL1
## 97  39730_at PF08919 GO:0045087      TAS       BP   ABL1
## 98  39730_at PF08919 GO:0048008      IEA       BP   ABL1
## 99  39730_at PF08919 GO:0048471      IDA       CC   ABL1
## 100 39730_at PF08919 GO:0050731      IDA       BP   ABL1
## 101 39730_at PF08919 GO:0051019      IPI       MF   ABL1
## 102 39730_at PF08919 GO:0051149      TAS       BP   ABL1
## 103 39730_at PF08919 GO:0051353      IDA       BP   ABL1
## 104 39730_at PF08919 GO:0051726      IEA       BP   ABL1
## 105 39730_at PF08919 GO:0070064      IDA       MF   ABL1
## 106 39730_at PF08919 GO:0070064      IPI       MF   ABL1
## 107 39730_at PF08919 GO:0071901      IDA       BP   ABL1
## 108 39730_at PF08919 GO:2000145      TAS       BP   ABL1
## 109 39730_at PF08919 GO:2000249      TAS       BP   ABL1
## 110 39730_at PF08919 GO:2001020      IDA       BP   ABL1
## 111 39730_at PF00018 GO:0000287      IDA       MF   ABL1
## 112 39730_at PF00018 GO:0003677      NAS       MF   ABL1
## 113 39730_at PF00018 GO:0003785      TAS       MF   ABL1
## 114 39730_at PF00018 GO:0004515      TAS       MF   ABL1
## 115 39730_at PF00018 GO:0004713      IDA       MF   ABL1
## 116 39730_at PF00018 GO:0004715      IDA       MF   ABL1
## 117 39730_at PF00018 GO:0005515      IPI       MF   ABL1
## 118 39730_at PF00018 GO:0005524      IDA       MF   ABL1
## 119 39730_at PF00018 GO:0005634      IDA       CC   ABL1
## 120 39730_at PF00018 GO:0005634      NAS       CC   ABL1
## 121 39730_at PF00018 GO:0005634      TAS       CC   ABL1
## 122 39730_at PF00018 GO:0005730      IDA       CC   ABL1
## 123 39730_at PF00018 GO:0005737      TAS       CC   ABL1
## 124 39730_at PF00018 GO:0005739      IEA       CC   ABL1
## 125 39730_at PF00018 GO:0005829      TAS       CC   ABL1
## 126 39730_at PF00018 GO:0006298      TAS       BP   ABL1
## 127 39730_at PF00018 GO:0006355      TAS       BP   ABL1
## 128 39730_at PF00018 GO:0006464      NAS       BP   ABL1
## 129 39730_at PF00018 GO:0006914      IEA       BP   ABL1
## 130 39730_at PF00018 GO:0006974      IDA       BP   ABL1
## 131 39730_at PF00018 GO:0006975      IDA       BP   ABL1
## 132 39730_at PF00018 GO:0007155      IEA       BP   ABL1
## 133 39730_at PF00018 GO:0007411      TAS       BP   ABL1
## 134 39730_at PF00018 GO:0007596      TAS       BP   ABL1
## 135 39730_at PF00018 GO:0008022      IPI       MF   ABL1
## 136 39730_at PF00018 GO:0008630      TAS       BP   ABL1
## 137 39730_at PF00018 GO:0010506      TAS       BP   ABL1
## 138 39730_at PF00018 GO:0015629      TAS       CC   ABL1
## 139 39730_at PF00018 GO:0017124      IPI       MF   ABL1
## 140 39730_at PF00018 GO:0018108      IDA       BP   ABL1
## 141 39730_at PF00018 GO:0019905      IPI       MF   ABL1
## 142 39730_at PF00018 GO:0030036      ISS       BP   ABL1
## 143 39730_at PF00018 GO:0030100      TAS       BP   ABL1
## 144 39730_at PF00018 GO:0030145      IDA       MF   ABL1
## 145 39730_at PF00018 GO:0030155      TAS       BP   ABL1
## 146 39730_at PF00018 GO:0031252      IEA       CC   ABL1
## 147 39730_at PF00018 GO:0031965      IEA       CC   ABL1
## 148 39730_at PF00018 GO:0038096      TAS       BP   ABL1
## 149 39730_at PF00018 GO:0042692      TAS       BP   ABL1
## 150 39730_at PF00018 GO:0042770      IDA       BP   ABL1
## 151 39730_at PF00018 GO:0043065      IDA       BP   ABL1
## 152 39730_at PF00018 GO:0045087      TAS       BP   ABL1
## 153 39730_at PF00018 GO:0048008      IEA       BP   ABL1
## 154 39730_at PF00018 GO:0048471      IDA       CC   ABL1
## 155 39730_at PF00018 GO:0050731      IDA       BP   ABL1
## 156 39730_at PF00018 GO:0051019      IPI       MF   ABL1
## 157 39730_at PF00018 GO:0051149      TAS       BP   ABL1
## 158 39730_at PF00018 GO:0051353      IDA       BP   ABL1
## 159 39730_at PF00018 GO:0051726      IEA       BP   ABL1
## 160 39730_at PF00018 GO:0070064      IDA       MF   ABL1
## 161 39730_at PF00018 GO:0070064      IPI       MF   ABL1
## 162 39730_at PF00018 GO:0071901      IDA       BP   ABL1
## 163 39730_at PF00018 GO:2000145      TAS       BP   ABL1
## 164 39730_at PF00018 GO:2000249      TAS       BP   ABL1
## 165 39730_at PF00018 GO:2001020      IDA       BP   ABL1
## 166 39730_at PF07714 GO:0000287      IDA       MF   ABL1
## 167 39730_at PF07714 GO:0003677      NAS       MF   ABL1
## 168 39730_at PF07714 GO:0003785      TAS       MF   ABL1
## 169 39730_at PF07714 GO:0004515      TAS       MF   ABL1
## 170 39730_at PF07714 GO:0004713      IDA       MF   ABL1
## 171 39730_at PF07714 GO:0004715      IDA       MF   ABL1
## 172 39730_at PF07714 GO:0005515      IPI       MF   ABL1
## 173 39730_at PF07714 GO:0005524      IDA       MF   ABL1
## 174 39730_at PF07714 GO:0005634      IDA       CC   ABL1
## 175 39730_at PF07714 GO:0005634      NAS       CC   ABL1
## 176 39730_at PF07714 GO:0005634      TAS       CC   ABL1
## 177 39730_at PF07714 GO:0005730      IDA       CC   ABL1
## 178 39730_at PF07714 GO:0005737      TAS       CC   ABL1
## 179 39730_at PF07714 GO:0005739      IEA       CC   ABL1
## 180 39730_at PF07714 GO:0005829      TAS       CC   ABL1
## 181 39730_at PF07714 GO:0006298      TAS       BP   ABL1
## 182 39730_at PF07714 GO:0006355      TAS       BP   ABL1
## 183 39730_at PF07714 GO:0006464      NAS       BP   ABL1
## 184 39730_at PF07714 GO:0006914      IEA       BP   ABL1
## 185 39730_at PF07714 GO:0006974      IDA       BP   ABL1
## 186 39730_at PF07714 GO:0006975      IDA       BP   ABL1
## 187 39730_at PF07714 GO:0007155      IEA       BP   ABL1
## 188 39730_at PF07714 GO:0007411      TAS       BP   ABL1
## 189 39730_at PF07714 GO:0007596      TAS       BP   ABL1
## 190 39730_at PF07714 GO:0008022      IPI       MF   ABL1
## 191 39730_at PF07714 GO:0008630      TAS       BP   ABL1
## 192 39730_at PF07714 GO:0010506      TAS       BP   ABL1
## 193 39730_at PF07714 GO:0015629      TAS       CC   ABL1
## 194 39730_at PF07714 GO:0017124      IPI       MF   ABL1
## 195 39730_at PF07714 GO:0018108      IDA       BP   ABL1
## 196 39730_at PF07714 GO:0019905      IPI       MF   ABL1
## 197 39730_at PF07714 GO:0030036      ISS       BP   ABL1
## 198 39730_at PF07714 GO:0030100      TAS       BP   ABL1
## 199 39730_at PF07714 GO:0030145      IDA       MF   ABL1
## 200 39730_at PF07714 GO:0030155      TAS       BP   ABL1
## 201 39730_at PF07714 GO:0031252      IEA       CC   ABL1
## 202 39730_at PF07714 GO:0031965      IEA       CC   ABL1
## 203 39730_at PF07714 GO:0038096      TAS       BP   ABL1
## 204 39730_at PF07714 GO:0042692      TAS       BP   ABL1
## 205 39730_at PF07714 GO:0042770      IDA       BP   ABL1
## 206 39730_at PF07714 GO:0043065      IDA       BP   ABL1
## 207 39730_at PF07714 GO:0045087      TAS       BP   ABL1
## 208 39730_at PF07714 GO:0048008      IEA       BP   ABL1
## 209 39730_at PF07714 GO:0048471      IDA       CC   ABL1
## 210 39730_at PF07714 GO:0050731      IDA       BP   ABL1
## 211 39730_at PF07714 GO:0051019      IPI       MF   ABL1
## 212 39730_at PF07714 GO:0051149      TAS       BP   ABL1
## 213 39730_at PF07714 GO:0051353      IDA       BP   ABL1
## 214 39730_at PF07714 GO:0051726      IEA       BP   ABL1
## 215 39730_at PF07714 GO:0070064      IDA       MF   ABL1
## 216 39730_at PF07714 GO:0070064      IPI       MF   ABL1
## 217 39730_at PF07714 GO:0071901      IDA       BP   ABL1
## 218 39730_at PF07714 GO:2000145      TAS       BP   ABL1
## 219 39730_at PF07714 GO:2000249      TAS       BP   ABL1
## 220 39730_at PF07714 GO:2001020      IDA       BP   ABL1
## 221  1635_at PF00017 GO:0000287      IDA       MF   ABL1
## 222  1635_at PF00017 GO:0003677      NAS       MF   ABL1
## 223  1635_at PF00017 GO:0003785      TAS       MF   ABL1
## 224  1635_at PF00017 GO:0004515      TAS       MF   ABL1
## 225  1635_at PF00017 GO:0004713      IDA       MF   ABL1
## 226  1635_at PF00017 GO:0004715      IDA       MF   ABL1
## 227  1635_at PF00017 GO:0005515      IPI       MF   ABL1
## 228  1635_at PF00017 GO:0005524      IDA       MF   ABL1
## 229  1635_at PF00017 GO:0005634      IDA       CC   ABL1
## 230  1635_at PF00017 GO:0005634      NAS       CC   ABL1
## 231  1635_at PF00017 GO:0005634      TAS       CC   ABL1
## 232  1635_at PF00017 GO:0005730      IDA       CC   ABL1
## 233  1635_at PF00017 GO:0005737      TAS       CC   ABL1
## 234  1635_at PF00017 GO:0005739      IEA       CC   ABL1
## 235  1635_at PF00017 GO:0005829      TAS       CC   ABL1
## 236  1635_at PF00017 GO:0006298      TAS       BP   ABL1
## 237  1635_at PF00017 GO:0006355      TAS       BP   ABL1
## 238  1635_at PF00017 GO:0006464      NAS       BP   ABL1
## 239  1635_at PF00017 GO:0006914      IEA       BP   ABL1
## 240  1635_at PF00017 GO:0006974      IDA       BP   ABL1
## 241  1635_at PF00017 GO:0006975      IDA       BP   ABL1
## 242  1635_at PF00017 GO:0007155      IEA       BP   ABL1
## 243  1635_at PF00017 GO:0007411      TAS       BP   ABL1
## 244  1635_at PF00017 GO:0007596      TAS       BP   ABL1
## 245  1635_at PF00017 GO:0008022      IPI       MF   ABL1
## 246  1635_at PF00017 GO:0008630      TAS       BP   ABL1
## 247  1635_at PF00017 GO:0010506      TAS       BP   ABL1
## 248  1635_at PF00017 GO:0015629      TAS       CC   ABL1
## 249  1635_at PF00017 GO:0017124      IPI       MF   ABL1
## 250  1635_at PF00017 GO:0018108      IDA       BP   ABL1
## 251  1635_at PF00017 GO:0019905      IPI       MF   ABL1
## 252  1635_at PF00017 GO:0030036      ISS       BP   ABL1
## 253  1635_at PF00017 GO:0030100      TAS       BP   ABL1
## 254  1635_at PF00017 GO:0030145      IDA       MF   ABL1
## 255  1635_at PF00017 GO:0030155      TAS       BP   ABL1
## 256  1635_at PF00017 GO:0031252      IEA       CC   ABL1
## 257  1635_at PF00017 GO:0031965      IEA       CC   ABL1
## 258  1635_at PF00017 GO:0038096      TAS       BP   ABL1
## 259  1635_at PF00017 GO:0042692      TAS       BP   ABL1
## 260  1635_at PF00017 GO:0042770      IDA       BP   ABL1
## 261  1635_at PF00017 GO:0043065      IDA       BP   ABL1
## 262  1635_at PF00017 GO:0045087      TAS       BP   ABL1
## 263  1635_at PF00017 GO:0048008      IEA       BP   ABL1
## 264  1635_at PF00017 GO:0048471      IDA       CC   ABL1
## 265  1635_at PF00017 GO:0050731      IDA       BP   ABL1
## 266  1635_at PF00017 GO:0051019      IPI       MF   ABL1
## 267  1635_at PF00017 GO:0051149      TAS       BP   ABL1
## 268  1635_at PF00017 GO:0051353      IDA       BP   ABL1
## 269  1635_at PF00017 GO:0051726      IEA       BP   ABL1
## 270  1635_at PF00017 GO:0070064      IDA       MF   ABL1
## 271  1635_at PF00017 GO:0070064      IPI       MF   ABL1
## 272  1635_at PF00017 GO:0071901      IDA       BP   ABL1
## 273  1635_at PF00017 GO:2000145      TAS       BP   ABL1
## 274  1635_at PF00017 GO:2000249      TAS       BP   ABL1
## 275  1635_at PF00017 GO:2001020      IDA       BP   ABL1
## 276  1635_at PF08919 GO:0000287      IDA       MF   ABL1
## 277  1635_at PF08919 GO:0003677      NAS       MF   ABL1
## 278  1635_at PF08919 GO:0003785      TAS       MF   ABL1
## 279  1635_at PF08919 GO:0004515      TAS       MF   ABL1
## 280  1635_at PF08919 GO:0004713      IDA       MF   ABL1
## 281  1635_at PF08919 GO:0004715      IDA       MF   ABL1
## 282  1635_at PF08919 GO:0005515      IPI       MF   ABL1
## 283  1635_at PF08919 GO:0005524      IDA       MF   ABL1
## 284  1635_at PF08919 GO:0005634      IDA       CC   ABL1
## 285  1635_at PF08919 GO:0005634      NAS       CC   ABL1
## 286  1635_at PF08919 GO:0005634      TAS       CC   ABL1
## 287  1635_at PF08919 GO:0005730      IDA       CC   ABL1
## 288  1635_at PF08919 GO:0005737      TAS       CC   ABL1
## 289  1635_at PF08919 GO:0005739      IEA       CC   ABL1
## 290  1635_at PF08919 GO:0005829      TAS       CC   ABL1
## 291  1635_at PF08919 GO:0006298      TAS       BP   ABL1
## 292  1635_at PF08919 GO:0006355      TAS       BP   ABL1
## 293  1635_at PF08919 GO:0006464      NAS       BP   ABL1
## 294  1635_at PF08919 GO:0006914      IEA       BP   ABL1
## 295  1635_at PF08919 GO:0006974      IDA       BP   ABL1
## 296  1635_at PF08919 GO:0006975      IDA       BP   ABL1
## 297  1635_at PF08919 GO:0007155      IEA       BP   ABL1
## 298  1635_at PF08919 GO:0007411      TAS       BP   ABL1
## 299  1635_at PF08919 GO:0007596      TAS       BP   ABL1
## 300  1635_at PF08919 GO:0008022      IPI       MF   ABL1
## 301  1635_at PF08919 GO:0008630      TAS       BP   ABL1
## 302  1635_at PF08919 GO:0010506      TAS       BP   ABL1
## 303  1635_at PF08919 GO:0015629      TAS       CC   ABL1
## 304  1635_at PF08919 GO:0017124      IPI       MF   ABL1
## 305  1635_at PF08919 GO:0018108      IDA       BP   ABL1
## 306  1635_at PF08919 GO:0019905      IPI       MF   ABL1
## 307  1635_at PF08919 GO:0030036      ISS       BP   ABL1
## 308  1635_at PF08919 GO:0030100      TAS       BP   ABL1
## 309  1635_at PF08919 GO:0030145      IDA       MF   ABL1
## 310  1635_at PF08919 GO:0030155      TAS       BP   ABL1
## 311  1635_at PF08919 GO:0031252      IEA       CC   ABL1
## 312  1635_at PF08919 GO:0031965      IEA       CC   ABL1
## 313  1635_at PF08919 GO:0038096      TAS       BP   ABL1
## 314  1635_at PF08919 GO:0042692      TAS       BP   ABL1
## 315  1635_at PF08919 GO:0042770      IDA       BP   ABL1
## 316  1635_at PF08919 GO:0043065      IDA       BP   ABL1
## 317  1635_at PF08919 GO:0045087      TAS       BP   ABL1
## 318  1635_at PF08919 GO:0048008      IEA       BP   ABL1
## 319  1635_at PF08919 GO:0048471      IDA       CC   ABL1
## 320  1635_at PF08919 GO:0050731      IDA       BP   ABL1
## 321  1635_at PF08919 GO:0051019      IPI       MF   ABL1
## 322  1635_at PF08919 GO:0051149      TAS       BP   ABL1
## 323  1635_at PF08919 GO:0051353      IDA       BP   ABL1
## 324  1635_at PF08919 GO:0051726      IEA       BP   ABL1
## 325  1635_at PF08919 GO:0070064      IDA       MF   ABL1
## 326  1635_at PF08919 GO:0070064      IPI       MF   ABL1
## 327  1635_at PF08919 GO:0071901      IDA       BP   ABL1
## 328  1635_at PF08919 GO:2000145      TAS       BP   ABL1
## 329  1635_at PF08919 GO:2000249      TAS       BP   ABL1
## 330  1635_at PF08919 GO:2001020      IDA       BP   ABL1
## 331  1635_at PF00018 GO:0000287      IDA       MF   ABL1
## 332  1635_at PF00018 GO:0003677      NAS       MF   ABL1
## 333  1635_at PF00018 GO:0003785      TAS       MF   ABL1
## 334  1635_at PF00018 GO:0004515      TAS       MF   ABL1
## 335  1635_at PF00018 GO:0004713      IDA       MF   ABL1
## 336  1635_at PF00018 GO:0004715      IDA       MF   ABL1
## 337  1635_at PF00018 GO:0005515      IPI       MF   ABL1
## 338  1635_at PF00018 GO:0005524      IDA       MF   ABL1
## 339  1635_at PF00018 GO:0005634      IDA       CC   ABL1
## 340  1635_at PF00018 GO:0005634      NAS       CC   ABL1
## 341  1635_at PF00018 GO:0005634      TAS       CC   ABL1
## 342  1635_at PF00018 GO:0005730      IDA       CC   ABL1
## 343  1635_at PF00018 GO:0005737      TAS       CC   ABL1
## 344  1635_at PF00018 GO:0005739      IEA       CC   ABL1
## 345  1635_at PF00018 GO:0005829      TAS       CC   ABL1
## 346  1635_at PF00018 GO:0006298      TAS       BP   ABL1
## 347  1635_at PF00018 GO:0006355      TAS       BP   ABL1
## 348  1635_at PF00018 GO:0006464      NAS       BP   ABL1
## 349  1635_at PF00018 GO:0006914      IEA       BP   ABL1
## 350  1635_at PF00018 GO:0006974      IDA       BP   ABL1
## 351  1635_at PF00018 GO:0006975      IDA       BP   ABL1
## 352  1635_at PF00018 GO:0007155      IEA       BP   ABL1
## 353  1635_at PF00018 GO:0007411      TAS       BP   ABL1
## 354  1635_at PF00018 GO:0007596      TAS       BP   ABL1
## 355  1635_at PF00018 GO:0008022      IPI       MF   ABL1
## 356  1635_at PF00018 GO:0008630      TAS       BP   ABL1
## 357  1635_at PF00018 GO:0010506      TAS       BP   ABL1
## 358  1635_at PF00018 GO:0015629      TAS       CC   ABL1
## 359  1635_at PF00018 GO:0017124      IPI       MF   ABL1
## 360  1635_at PF00018 GO:0018108      IDA       BP   ABL1
## 361  1635_at PF00018 GO:0019905      IPI       MF   ABL1
## 362  1635_at PF00018 GO:0030036      ISS       BP   ABL1
## 363  1635_at PF00018 GO:0030100      TAS       BP   ABL1
## 364  1635_at PF00018 GO:0030145      IDA       MF   ABL1
## 365  1635_at PF00018 GO:0030155      TAS       BP   ABL1
## 366  1635_at PF00018 GO:0031252      IEA       CC   ABL1
## 367  1635_at PF00018 GO:0031965      IEA       CC   ABL1
## 368  1635_at PF00018 GO:0038096      TAS       BP   ABL1
## 369  1635_at PF00018 GO:0042692      TAS       BP   ABL1
## 370  1635_at PF00018 GO:0042770      IDA       BP   ABL1
## 371  1635_at PF00018 GO:0043065      IDA       BP   ABL1
## 372  1635_at PF00018 GO:0045087      TAS       BP   ABL1
## 373  1635_at PF00018 GO:0048008      IEA       BP   ABL1
## 374  1635_at PF00018 GO:0048471      IDA       CC   ABL1
## 375  1635_at PF00018 GO:0050731      IDA       BP   ABL1
## 376  1635_at PF00018 GO:0051019      IPI       MF   ABL1
## 377  1635_at PF00018 GO:0051149      TAS       BP   ABL1
## 378  1635_at PF00018 GO:0051353      IDA       BP   ABL1
## 379  1635_at PF00018 GO:0051726      IEA       BP   ABL1
## 380  1635_at PF00018 GO:0070064      IDA       MF   ABL1
## 381  1635_at PF00018 GO:0070064      IPI       MF   ABL1
## 382  1635_at PF00018 GO:0071901      IDA       BP   ABL1
## 383  1635_at PF00018 GO:2000145      TAS       BP   ABL1
## 384  1635_at PF00018 GO:2000249      TAS       BP   ABL1
## 385  1635_at PF00018 GO:2001020      IDA       BP   ABL1
## 386  1635_at PF07714 GO:0000287      IDA       MF   ABL1
## 387  1635_at PF07714 GO:0003677      NAS       MF   ABL1
## 388  1635_at PF07714 GO:0003785      TAS       MF   ABL1
## 389  1635_at PF07714 GO:0004515      TAS       MF   ABL1
## 390  1635_at PF07714 GO:0004713      IDA       MF   ABL1
## 391  1635_at PF07714 GO:0004715      IDA       MF   ABL1
## 392  1635_at PF07714 GO:0005515      IPI       MF   ABL1
## 393  1635_at PF07714 GO:0005524      IDA       MF   ABL1
## 394  1635_at PF07714 GO:0005634      IDA       CC   ABL1
## 395  1635_at PF07714 GO:0005634      NAS       CC   ABL1
## 396  1635_at PF07714 GO:0005634      TAS       CC   ABL1
## 397  1635_at PF07714 GO:0005730      IDA       CC   ABL1
## 398  1635_at PF07714 GO:0005737      TAS       CC   ABL1
## 399  1635_at PF07714 GO:0005739      IEA       CC   ABL1
## 400  1635_at PF07714 GO:0005829      TAS       CC   ABL1
## 401  1635_at PF07714 GO:0006298      TAS       BP   ABL1
## 402  1635_at PF07714 GO:0006355      TAS       BP   ABL1
## 403  1635_at PF07714 GO:0006464      NAS       BP   ABL1
## 404  1635_at PF07714 GO:0006914      IEA       BP   ABL1
## 405  1635_at PF07714 GO:0006974      IDA       BP   ABL1
## 406  1635_at PF07714 GO:0006975      IDA       BP   ABL1
## 407  1635_at PF07714 GO:0007155      IEA       BP   ABL1
## 408  1635_at PF07714 GO:0007411      TAS       BP   ABL1
## 409  1635_at PF07714 GO:0007596      TAS       BP   ABL1
## 410  1635_at PF07714 GO:0008022      IPI       MF   ABL1
## 411  1635_at PF07714 GO:0008630      TAS       BP   ABL1
## 412  1635_at PF07714 GO:0010506      TAS       BP   ABL1
## 413  1635_at PF07714 GO:0015629      TAS       CC   ABL1
## 414  1635_at PF07714 GO:0017124      IPI       MF   ABL1
## 415  1635_at PF07714 GO:0018108      IDA       BP   ABL1
## 416  1635_at PF07714 GO:0019905      IPI       MF   ABL1
## 417  1635_at PF07714 GO:0030036      ISS       BP   ABL1
## 418  1635_at PF07714 GO:0030100      TAS       BP   ABL1
## 419  1635_at PF07714 GO:0030145      IDA       MF   ABL1
## 420  1635_at PF07714 GO:0030155      TAS       BP   ABL1
## 421  1635_at PF07714 GO:0031252      IEA       CC   ABL1
## 422  1635_at PF07714 GO:0031965      IEA       CC   ABL1
## 423  1635_at PF07714 GO:0038096      TAS       BP   ABL1
## 424  1635_at PF07714 GO:0042692      TAS       BP   ABL1
## 425  1635_at PF07714 GO:0042770      IDA       BP   ABL1
## 426  1635_at PF07714 GO:0043065      IDA       BP   ABL1
## 427  1635_at PF07714 GO:0045087      TAS       BP   ABL1
## 428  1635_at PF07714 GO:0048008      IEA       BP   ABL1
## 429  1635_at PF07714 GO:0048471      IDA       CC   ABL1
## 430  1635_at PF07714 GO:0050731      IDA       BP   ABL1
## 431  1635_at PF07714 GO:0051019      IPI       MF   ABL1
## 432  1635_at PF07714 GO:0051149      TAS       BP   ABL1
## 433  1635_at PF07714 GO:0051353      IDA       BP   ABL1
## 434  1635_at PF07714 GO:0051726      IEA       BP   ABL1
## 435  1635_at PF07714 GO:0070064      IDA       MF   ABL1
## 436  1635_at PF07714 GO:0070064      IPI       MF   ABL1
## 437  1635_at PF07714 GO:0071901      IDA       BP   ABL1
## 438  1635_at PF07714 GO:2000145      TAS       BP   ABL1
## 439  1635_at PF07714 GO:2000249      TAS       BP   ABL1
## 440  1635_at PF07714 GO:2001020      IDA       BP   ABL1
## 441  1674_at PF07714 GO:0004713      EXP       MF   YES1
## 442  1674_at PF07714 GO:0004713      TAS       MF   YES1
## 443  1674_at PF07714 GO:0004715      TAS       MF   YES1
## 444  1674_at PF07714 GO:0005154      IEA       MF   YES1
## 445  1674_at PF07714 GO:0005515      IPI       MF   YES1
## 446  1674_at PF07714 GO:0005524      IEA       MF   YES1
## 447  1674_at PF07714 GO:0005737      IDA       CC   YES1
## 448  1674_at PF07714 GO:0005794      IDA       CC   YES1
## 449  1674_at PF07714 GO:0005815      IEA       CC   YES1
## 450  1674_at PF07714 GO:0005829      TAS       CC   YES1
## 451  1674_at PF07714 GO:0005886      IDA       CC   YES1
## 452  1674_at PF07714 GO:0006464      TAS       BP   YES1
## 453  1674_at PF07714 GO:0007596      TAS       BP   YES1
## 454  1674_at PF07714 GO:0015758      IEA       BP   YES1
## 455  1674_at PF07714 GO:0018108      EXP       BP   YES1
## 456  1674_at PF07714 GO:0018108      TAS       BP   YES1
## 457  1674_at PF07714 GO:0019899      IPI       MF   YES1
## 458  1674_at PF07714 GO:0031295      TAS       BP   YES1
## 459  1674_at PF07714 GO:0038096      TAS       BP   YES1
## 460  1674_at PF07714 GO:0043114      TAS       BP   YES1
## 461  1674_at PF07714 GO:0044325      IPI       MF   YES1
## 462  1674_at PF07714 GO:0045087      TAS       BP   YES1
## 463  1674_at PF07714 GO:0046777      IEA       BP   YES1
## 464  1674_at PF07714 GO:0050900      TAS       BP   YES1
## 465  1674_at PF07714 GO:0070062      IDA       CC   YES1
## 466  1674_at PF00018 GO:0004713      EXP       MF   YES1
## 467  1674_at PF00018 GO:0004713      TAS       MF   YES1
## 468  1674_at PF00018 GO:0004715      TAS       MF   YES1
## 469  1674_at PF00018 GO:0005154      IEA       MF   YES1
## 470  1674_at PF00018 GO:0005515      IPI       MF   YES1
## 471  1674_at PF00018 GO:0005524      IEA       MF   YES1
## 472  1674_at PF00018 GO:0005737      IDA       CC   YES1
## 473  1674_at PF00018 GO:0005794      IDA       CC   YES1
## 474  1674_at PF00018 GO:0005815      IEA       CC   YES1
## 475  1674_at PF00018 GO:0005829      TAS       CC   YES1
## 476  1674_at PF00018 GO:0005886      IDA       CC   YES1
## 477  1674_at PF00018 GO:0006464      TAS       BP   YES1
## 478  1674_at PF00018 GO:0007596      TAS       BP   YES1
## 479  1674_at PF00018 GO:0015758      IEA       BP   YES1
## 480  1674_at PF00018 GO:0018108      EXP       BP   YES1
## 481  1674_at PF00018 GO:0018108      TAS       BP   YES1
## 482  1674_at PF00018 GO:0019899      IPI       MF   YES1
## 483  1674_at PF00018 GO:0031295      TAS       BP   YES1
## 484  1674_at PF00018 GO:0038096      TAS       BP   YES1
## 485  1674_at PF00018 GO:0043114      TAS       BP   YES1
## 486  1674_at PF00018 GO:0044325      IPI       MF   YES1
## 487  1674_at PF00018 GO:0045087      TAS       BP   YES1
## 488  1674_at PF00018 GO:0046777      IEA       BP   YES1
## 489  1674_at PF00018 GO:0050900      TAS       BP   YES1
## 490  1674_at PF00018 GO:0070062      IDA       CC   YES1
## 491  1674_at PF00017 GO:0004713      EXP       MF   YES1
## 492  1674_at PF00017 GO:0004713      TAS       MF   YES1
## 493  1674_at PF00017 GO:0004715      TAS       MF   YES1
## 494  1674_at PF00017 GO:0005154      IEA       MF   YES1
## 495  1674_at PF00017 GO:0005515      IPI       MF   YES1
## 496  1674_at PF00017 GO:0005524      IEA       MF   YES1
## 497  1674_at PF00017 GO:0005737      IDA       CC   YES1
## 498  1674_at PF00017 GO:0005794      IDA       CC   YES1
## 499  1674_at PF00017 GO:0005815      IEA       CC   YES1
## 500  1674_at PF00017 GO:0005829      TAS       CC   YES1
## 501  1674_at PF00017 GO:0005886      IDA       CC   YES1
## 502  1674_at PF00017 GO:0006464      TAS       BP   YES1
## 503  1674_at PF00017 GO:0007596      TAS       BP   YES1
## 504  1674_at PF00017 GO:0015758      IEA       BP   YES1
## 505  1674_at PF00017 GO:0018108      EXP       BP   YES1
## 506  1674_at PF00017 GO:0018108      TAS       BP   YES1
## 507  1674_at PF00017 GO:0019899      IPI       MF   YES1
## 508  1674_at PF00017 GO:0031295      TAS       BP   YES1
## 509  1674_at PF00017 GO:0038096      TAS       BP   YES1
## 510  1674_at PF00017 GO:0043114      TAS       BP   YES1
## 511  1674_at PF00017 GO:0044325      IPI       MF   YES1
## 512  1674_at PF00017 GO:0045087      TAS       BP   YES1
## 513  1674_at PF00017 GO:0046777      IEA       BP   YES1
## 514  1674_at PF00017 GO:0050900      TAS       BP   YES1
## 515  1674_at PF00017 GO:0070062      IDA       CC   YES1
## 516 40504_at PF01731 GO:0004064      IEA       MF   PON2
## 517 40504_at PF01731 GO:0005576      IEA       CC   PON2
## 518 40504_at PF01731 GO:0005634      IEA       CC   PON2
## 519 40504_at PF01731 GO:0005739      IEA       CC   PON2
## 520 40504_at PF01731 GO:0005764      IEA       CC   PON2
## 521 40504_at PF01731 GO:0005886      IDA       CC   PON2
## 522 40504_at PF01731 GO:0006979      IEA       BP   PON2
## 523 40504_at PF01731 GO:0019439      IDA       BP   PON2
## 524 40504_at PF01731 GO:0042802      IDA       MF   PON2
## 525 40504_at PF01731 GO:0046872      IEA       MF   PON2
## 526 40202_at    <NA> GO:0003677      IEA       MF   KLF9
## 527 40202_at    <NA> GO:0003700      TAS       MF   KLF9
## 528 40202_at    <NA> GO:0005634      IDA       CC   KLF9
## 529 40202_at    <NA> GO:0005737      IDA       CC   KLF9
## 530 40202_at    <NA> GO:0005886      IDA       CC   KLF9
## 531 40202_at    <NA> GO:0006351      IEA       BP   KLF9
## 532 40202_at    <NA> GO:0006357      TAS       BP   KLF9
## 533 40202_at    <NA> GO:0007566      IEA       BP   KLF9
## 534 40202_at    <NA> GO:0046872      IEA       MF   KLF9
## 535 40202_at    <NA> GO:0050847      IEA       BP   KLF9
## 536 40202_at    <NA> GO:0097067      IDA       BP   KLF9

Exercises for ChipDb objects.

Exercise 2: Examine the gene symbols for both the hgu95av2.db and the org.Hs.eg.db packages. Which one has more gene symbols? Which one has more gene symbols that can be mapped to an entrez gene ID? Which object seems to contain more information?

[ Back to top ]

[ Back to top ]

Sample TranscriptDb Workflow

The genome centered TranscriptDb packages support the same interface as that ChipDb and the OrgDb packages.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene ## done for convenience
keys <- head(keys(txdb, keytype="GENEID"), n=2)
columns <- c("TXNAME", "TXSTART","TXSTRAND")
select(txdb, keys, columns, keytype="GENEID")
## Warning: 'select' resulted in 1:many mapping between keys and return rows
##   GENEID     TXNAME TXSTRAND  TXSTART
## 1      1 uc002qsd.4        - 58858172
## 2      1 uc002qsf.2        - 58859832
## 3     10 uc003wyw.1        + 18248755

But in addition to supporting the standard set of methods (select, keytypes, keys and columns). The TranscriptDb objects also support methods to retrieve the annotations as ranges. These accessors break down into two basic categories. The most basic will return annotations as GRanges objects. Some examples of these are: transcripts(), exons() and cds().

This for example will return all the transcripts as ranges:

transcripts(txdb)
## GRanges with 82960 ranges and 2 metadata columns:
##           seqnames               ranges strand   |     tx_id     tx_name
##              <Rle>            <IRanges>  <Rle>   | <integer> <character>
##       [1]     chr1     [ 11874,  14409]      +   |         1  uc001aaa.3
##       [2]     chr1     [ 11874,  14409]      +   |         2  uc010nxq.1
##       [3]     chr1     [ 11874,  14409]      +   |         3  uc010nxr.1
##       [4]     chr1     [ 69091,  70008]      +   |         4  uc001aal.1
##       [5]     chr1     [321084, 321115]      +   |         5  uc001aaq.2
##       ...      ...                  ...    ... ...       ...         ...
##   [82956]     chrY [27605645, 27605678]      -   |     78803  uc004fwx.1
##   [82957]     chrY [27606394, 27606421]      -   |     78804  uc022cpc.1
##   [82958]     chrY [27607404, 27607432]      -   |     78805  uc004fwz.3
##   [82959]     chrY [27635919, 27635954]      -   |     78806  uc022cpd.1
##   [82960]     chrY [59358329, 59360854]      -   |     78807  uc011ncc.1
##   ---
##   seqlengths:
##                    chr1                 chr2 ...       chrUn_gl000249
##               249250621            243199373 ...                38502

And this will return all the exons as ranges:

exons(txdb)
## GRanges with 289969 ranges and 1 metadata column:
##            seqnames               ranges strand   |   exon_id
##               <Rle>            <IRanges>  <Rle>   | <integer>
##        [1]     chr1       [11874, 12227]      +   |         1
##        [2]     chr1       [12595, 12721]      +   |         2
##        [3]     chr1       [12613, 12721]      +   |         3
##        [4]     chr1       [12646, 12697]      +   |         4
##        [5]     chr1       [13221, 14409]      +   |         5
##        ...      ...                  ...    ... ...       ...
##   [289965]     chrY [27607404, 27607432]      -   |    277746
##   [289966]     chrY [27635919, 27635954]      -   |    277747
##   [289967]     chrY [59358329, 59359508]      -   |    277748
##   [289968]     chrY [59360007, 59360115]      -   |    277749
##   [289969]     chrY [59360501, 59360854]      -   |    277750
##   ---
##   seqlengths:
##                    chr1                 chr2 ...       chrUn_gl000249
##               249250621            243199373 ...                38502

But these operations will also support the extraction of extra metadata. All extra data will be inserted into the metadata slot of the returned GRanges object. So for example you could spice up your call to transcripts by using the columns argument like this.

transcripts(txdb, columns = c("tx_id","tx_name","gene_id"))
## GRanges with 82960 ranges and 3 metadata columns:
##           seqnames               ranges strand   |     tx_id     tx_name
##              <Rle>            <IRanges>  <Rle>   | <integer> <character>
##       [1]     chr1     [ 11874,  14409]      +   |         1  uc001aaa.3
##       [2]     chr1     [ 11874,  14409]      +   |         2  uc010nxq.1
##       [3]     chr1     [ 11874,  14409]      +   |         3  uc010nxr.1
##       [4]     chr1     [ 69091,  70008]      +   |         4  uc001aal.1
##       [5]     chr1     [321084, 321115]      +   |         5  uc001aaq.2
##       ...      ...                  ...    ... ...       ...         ...
##   [82956]     chrY [27605645, 27605678]      -   |     78803  uc004fwx.1
##   [82957]     chrY [27606394, 27606421]      -   |     78804  uc022cpc.1
##   [82958]     chrY [27607404, 27607432]      -   |     78805  uc004fwz.3
##   [82959]     chrY [27635919, 27635954]      -   |     78806  uc022cpd.1
##   [82960]     chrY [59358329, 59360854]      -   |     78807  uc011ncc.1
##                   gene_id
##           <CharacterList>
##       [1]       100287102
##       [2]       100287102
##       [3]       100287102
##       [4]           79501
##       [5]                
##       ...             ...
##   [82956]                
##   [82957]                
##   [82958]                
##   [82959]                
##   [82960]                
##   ---
##   seqlengths:
##                    chr1                 chr2 ...       chrUn_gl000249
##               249250621            243199373 ...                38502

The 2nd kind of range accessor supported by TranscriptDb objects are the ones that return GRangesList objects. Some examples of these are: transcriptsBy(), exonsBy() or cdsBy(). These accessors just allow you to return a GRangesList object that contains the desired ranges by split up by some important feature type that is specified using the “by” argument. A typical case is to extract all the transcript ranges known for all the genes. You can do that like this:

transcriptsBy(txdb, by="gene")
## GRangesList of length 23459:
## $1 
## GRanges with 2 ranges and 2 metadata columns:
##       seqnames               ranges strand |     tx_id     tx_name
##          <Rle>            <IRanges>  <Rle> | <integer> <character>
##   [1]    chr19 [58858172, 58864865]      - |     70455  uc002qsd.4
##   [2]    chr19 [58859832, 58874214]      - |     70456  uc002qsf.2
## 
## $10 
## GRanges with 1 range and 2 metadata columns:
##       seqnames               ranges strand | tx_id    tx_name
##   [1]     chr8 [18248755, 18258723]      + | 31944 uc003wyw.1
## 
## $100 
## GRanges with 1 range and 2 metadata columns:
##       seqnames               ranges strand | tx_id    tx_name
##   [1]    chr20 [43248163, 43280376]      - | 72132 uc002xmj.3
## 
## ...
## <23456 more elements>
## ---
## seqlengths:
##                  chr1                 chr2 ...       chrUn_gl000249
##             249250621            243199373 ...                38502

[ Back to top ]

Exercises for TranscriptDb objects.

Exercise 4: Use the accessors for the TxDb.Hsapiens.UCSC.hg19.knownGene package to retrieve the gene id, transcript name and transcript chromosome for all the transcripts. Do this using both the select() method and also using the transcripts() method. What is the difference in the output?

Exercise 5: Load the TxDb.Athaliana.BioMart.plantsmart16 package. This package is not from UCSC and it is based on plantsmart. Now use select or one of the range based accessors to look at the gene ids from this TranscriptDb object. How tdo they compare to what you saw in the TxDb.Hsapiens.UCSC.hg19.knownGene package?

[ Back to top ]

Sample OrganismDb Workflow

What if you wanted to combine all the good stuff from the GO.db package with what you find in the appropriate TranscriptDb and OrgDb packages for an organism? Then you would want to use an OrganismDb package. An example of an OrganismDb package is the Homo.sapiens package. Like the OrgDb, ChipDb and TranscriptDb packages, it supports the use of select, keytypes, keys and columns.

library(Homo.sapiens)
keys <- head(keys(Homo.sapiens, keytype="ENTREZID"), n=2)
columns <- c("SYMBOL","TXNAME")
select(Homo.sapiens, keys, columns, keytype="ENTREZID")
## Warning: 'select' resulted in 1:many mapping between keys and return rows
## Warning: 'select' resulted in 1:many mapping between keys and return rows
##   ENTREZID SYMBOL     TXNAME
## 1        1   A1BG uc002qsd.4
## 2        1   A1BG uc002qsf.2
## 3       10   NAT2 uc003wyw.1

When an OrganismDb package knows about a relevant TranscriptDb package, it can also support the ranged accessors introduced with the TranscriptDb objects.

transcripts(Homo.sapiens, columns=c("TXNAME","SYMBOL"))
## GRanges with 82960 ranges and 2 metadata columns:
##           seqnames               ranges strand   |          TXNAME
##              <Rle>            <IRanges>  <Rle>   | <CharacterList>
##       [1]     chr1     [ 11874,  14409]      +   |      uc001aaa.3
##       [2]     chr1     [ 11874,  14409]      +   |      uc010nxq.1
##       [3]     chr1     [ 11874,  14409]      +   |      uc010nxr.1
##       [4]     chr1     [ 69091,  70008]      +   |      uc001aal.1
##       [5]     chr1     [321084, 321115]      +   |      uc001aaq.2
##       ...      ...                  ...    ... ...             ...
##   [82956]     chrY [27605645, 27605678]      -   |      uc004fwx.1
##   [82957]     chrY [27606394, 27606421]      -   |      uc022cpc.1
##   [82958]     chrY [27607404, 27607432]      -   |      uc004fwz.3
##   [82959]     chrY [27635919, 27635954]      -   |      uc022cpd.1
##   [82960]     chrY [59358329, 59360854]      -   |      uc011ncc.1
##                    SYMBOL
##           <CharacterList>
##       [1]         DDX11L1
##       [2]         DDX11L1
##       [3]         DDX11L1
##       [4]           OR4F5
##       [5]              NA
##       ...             ...
##   [82956]              NA
##   [82957]              NA
##   [82958]              NA
##   [82959]              NA
##   [82960]              NA
##   ---
##   seqlengths:
##                    chr1                 chr2 ...       chrUn_gl000249
##               249250621            243199373 ...                38502

Making an OrganismDb package

You might be surprised to learn that an OrganismDb package does not itself contain very much information. Instead, it “knows where to find it”, but referencing other packages that themselves implement a select interface. So to create an OrganismDb package, you really only need to specify where the information needs to come from. Configuring an OrganismDb object is therefore pretty simple. You simply create a special list object that describes which IDs from each package are the same kind of IDs in other packages to be included, along with the relevant package names. So in the following example, the “GOID” values from the GO.db package act as foreign keys for the “GO” values in the org.Hs.eg.db package and so on.

gd <- list(join1 = c(GO.db="GOID", org.Hs.eg.db="GO"),
           join2 = c(org.Hs.eg.db="ENTREZID",
           TxDb.Hsapiens.UCSC.hg19.knownGene="GENEID"))

makeOrganismPackage(pkgname = "Homo.sapiens",
                graphData = gd,
            organism = "Homo sapiens",
            version = "1.0.0",
            maintainer = "Package Maintainer<maintainer@somewhere.org>",
            author = "Some Body",
            destDir = ".",
            license = "Artistic-2.0")

In this way, you can create a custom OrganismDb package for any organism of interest, providing that you have also have access to the supporting packages. There is a vignette that covers this topic in more detail here.

Exercises for OrganismDb objects.

Exercise 6: Use the Homo.sapiens object to look up the gene symbol, transcript start and chromosome using select(). Then do the same thing using transcripts. You might expect that this call to transcripts will look the same as it did for the TranscriptDb object, but (temporarily) it will not.

Exercise 7: Look at the results from call the columns method on the Homo.sapiens object and compare that to what happens when you call columns on the org.Hs.eg.db object and then look at a call to columns on the TxDb.Hsapiens.UCSC.hg19.knownGene object. What is the difference between TXSTART and CHRLOC? Which one do you think you should use for transcripts or other genomic information?

[ Back to top ]

Making full use of keys

Lets look more closely at the keys method. We have already talked about how you can use it to do this:

library(Homo.sapiens)
keys <- head(keys(Homo.sapiens, keytype="ENTREZID"), n=2)

And then you can use it with select to look up other kinds of information. But what if you only know partial information about the keys you are looking up? In Bioconductor 2.13 and higher there are extra arguments for the keys method that you can make use of to find keys that match certain criteria. The most useful is probably the pattern argument. The pattern argument allows you to find out which keys match a certain pattern. So for example, you can look up entrez gene IDs that start with a “2” like this:

head(keys(Homo.sapiens, keytype="ENTREZID", pattern="^2"), n=6)
## [1] "2"      "20"     "2000"   "200008" "200010" "200014"

Or you could look up gene symbols that start with “MS”:

head(keys(Homo.sapiens, keytype="SYMBOL", pattern="^MS"), n=6)
## [1] "MS4A1" "MS4A3" "MS4A2" "MSTN"  "MSH6"  "MS"

If your string matching is too specific, you could also try to use fuzzy matching by setting the fuzzy argument to TRUE:

head(keys(Homo.sapiens, keytype="SYMBOL", pattern="^MS", fuzzy=TRUE), n=6)
## [1] "MS4A1" "MS4A3" "MS4A2" "MSTN"  "MSH6"  "LIMS1"

And if you want to match one one key and actually return another, then you can use the column argument to indicate which key you want to search for pattern on while using the keytype to indicate which kind of key you want returned. So you could (for example) get back ensembl IDs where the symbol starts with “MS”.

keys <- head(keys(Homo.sapiens, keytype="ENSEMBL", pattern="^MS", column="SYMBOL"), n=6)
keys
## [1] "ENSG00000156738" "ENSG00000149516" "ENSG00000138379" "ENSG00000116062"
## [5] "ENSG00000095002" "ENSG00000113318"
select(Homo.sapiens, keys, "SYMBOL", keytype="ENSEMBL")
##           ENSEMBL SYMBOL
## 1 ENSG00000156738  MS4A1
## 2 ENSG00000149516  MS4A3
## 3 ENSG00000138379   MSTN
## 4 ENSG00000116062   MSH6
## 5 ENSG00000095002   MSH2
## 6 ENSG00000113318   MSH3

Exercises for OrganismDb objects.

Exercise 8: Use the Homo.sapiens object with the keys method to look up the entrez gene IDs for all gene symbols that contain the letter “X”.

[ Back to top ]

Sample AnnotationHub Workflow

So far we have been discussing annotations that are fairly well established and that represent consensus findings from the scientific community. These kinds of annotations are usually curated at large governmental institutions like NCBI or ensembl and for the most part everyone basically agrees about what they mean and how to use them.

But sometimes the annotations that you need are not as well established. Sometimes (for example) we just need to compare our results to the data from a recent large study such as the encode project. The AnnotationHub package is designed to be useful for getting access to data like this. AnnotationHub allows you to get access to data from a range of different data reposotories, with the caveat that the data objects in AnnotationHub have all been pre-processed into appropriate R objects for you.

To make use of AnnotationHub, you need to load the package and then create an AnnotationHub object. Notice that unlike the other packages, with AnnotationHub, you have to create an AnnotationHub object when you 1st start up your R session.

library(AnnotationHub)

ah = AnnotationHub()

Once you have done this, you can “find” any of the available resources just by tab completing along a path like this.

res <- ah$goldenpath.hg19.encodeDCC.wgEncodeUwTfbs.wgEncodeUwTfbsMcf7CtcfStdPkRep1.narrowPeak_0.0.1.RData

res
## GRanges with 82163 ranges and 6 metadata columns:
##           seqnames                 ranges strand   |        name     score
##              <Rle>              <IRanges>  <Rle>   | <character> <integer>
##       [1]     chr1       [237640, 237790]      *   |           .         0
##       [2]     chr1       [544660, 544810]      *   |           .         0
##       [3]     chr1       [567480, 567630]      *   |           .         0
##       [4]     chr1       [569820, 569970]      *   |           .         0
##       [5]     chr1       [714200, 714350]      *   |           .         0
##       ...      ...                    ...    ... ...         ...       ...
##   [82159]     chrX [154764540, 154764690]      *   |           .         0
##   [82160]     chrX [154807400, 154807550]      *   |           .         0
##   [82161]     chrX [154881060, 154881210]      *   |           .         0
##   [82162]     chrX [154892100, 154892250]      *   |           .         0
##   [82163]     chrX [154916040, 154916190]      *   |           .         0
##           signalValue    pValue    qValue      peak
##             <numeric> <numeric> <numeric> <integer>
##       [1]          30    26.892        -1        -1
##       [2]           6     8.164        -1        -1
##       [3]         100    56.718        -1        -1
##       [4]          85    49.654        -1        -1
##       [5]          17    13.184        -1        -1
##       ...         ...       ...       ...       ...
##   [82159]          26     25.29        -1        -1
##   [82160]          22     27.65        -1        -1
##   [82161]          17     16.42        -1        -1
##   [82162]          72    101.61        -1        -1
##   [82163]          32     32.52        -1        -1
##   ---
##   seqlengths:
##         chr1     chr10     chr11 ...      chr8      chr9      chrX
##    249250621 135534747 135006516 ... 146364022 141213431 155270560

In the above example, AnnotationHub will retrieve, download and cache locally the file that you tab-completed to, and then store the results in “res”.

Now you can see how many ways there are to currently complete that path, by checking the length of the AnnotationHub object:

length(ah)
## [1] 10780

The AnnotationHub is still a pretty new resource, and we already hav a LOT of things in there! How can we narrow this down? Right now we can use filters. By default, there are no filters applied, so calling filters() on our AnnotationHub is just an empty list.

filters(ah)
## list()

What things can be used as filters? We can use the columns() method to find out.

columns(ah)
##  [1] "BiocVersion"        "DataProvider"       "Title"             
##  [4] "SourceFile"         "Species"            "SourceUrl"         
##  [7] "SourceVersion"      "TaxonomyId"         "Genome"            
## [10] "Description"        "Tags"               "RDataClass"        
## [13] "RDataPath"          "Coordinate_1_based" "Maintainer"        
## [16] "RDataVersion"       "RDataDateAdded"     "Recipe"

What values can be used with these filters? Here, the keys method will give us an answer.

head(keys(ah, keytype="Species"))
## [1] "9606"                   "Acromyrmex echinatior" 
## [3] "Acyrthosiphon pisum"    "Aedes aegypti"         
## [5] "Agaricus bisporus"      "Ailuropoda melanoleuca"

So now we know what we need to apply a filter to our AnnotationHub. The following filter will limit our AnnotationHub to just those entries that correspond to cattle (Bos taurus).

filters(ah) <- list(Species="Bos taurus")

length(ah)
## [1] 145

We can also view and filter our AnnotationHub object interactively by simply calling the display function on it

d <- display(ah)

We can then filter the AnnotationHub object for “Homo sapiens” by either using the Global search field on the top right corner of the page or the in-column search field for “Species”.

By default 1000 entries are displayed per page, we can change this using the filter on the top of the page or navigate through different pages using the page scrolling feature at the bottom of the page.

We can also select the rows of interest to us and send them back to the R session using 'Send Rows' button ; this sets a filter internally which filters the AnnotationHub object.

[ Back to top ]

Exercises for AnnotationHub.

Exercise 9: Set the AnnotationHub filter to NULL to clear it out, and then set ip up so that it is extracting data that originated with the UCSC data provider and that is also limited to Homo sapiens and the hg19 genome.

Exercise 10 Now that you have basically narrowed things down to the hg19 annotations from UCSC genome browser, lets get one of these annotations. Now tab complete your way to the oreganno track and save it into a local variable.

[ Back to top ]

Using biomaRt

Another valuable resource is the biomaRt package. The biomaRt package exposes a huge family of online annotation resources called marts. Here is a brief run down of how to use it. For the first step, load the package and decide which “mart” you want to use, then use the useMart() method to create a mart object

library("biomaRt")
head(listMarts())
##               biomart                             version
## 1             ensembl        ENSEMBL GENES 75 (SANGER UK)
## 2                 snp    ENSEMBL VARIATION 75 (SANGER UK)
## 3 functional_genomics   ENSEMBL REGULATION 75 (SANGER UK)
## 4                vega                VEGA 53  (SANGER UK)
## 5       fungi_mart_22           ENSEMBL FUNGI 22 (EBI UK)
## 6 fungi_variations_22 ENSEMBL FUNGI VARIATION 22 (EBI UK)
ensembl <- useMart("ensembl")
ensembl
## Object of class 'Mart':
##  Using the ensembl BioMart database
##  Using the  dataset

Next you need to decide on a dataset. This can also be specified in the mart object that is created when you call the the useMart() method.

head(listDatasets(ensembl))
##                          dataset
## 1         oanatinus_gene_ensembl
## 2        cporcellus_gene_ensembl
## 3        gaculeatus_gene_ensembl
## 4         lafricana_gene_ensembl
## 5 itridecemlineatus_gene_ensembl
## 6        choffmanni_gene_ensembl
##                                  description version
## 1     Ornithorhynchus anatinus genes (OANA5)   OANA5
## 2            Cavia porcellus genes (cavPor3) cavPor3
## 3     Gasterosteus aculeatus genes (BROADS1) BROADS1
## 4         Loxodonta africana genes (loxAfr3) loxAfr3
## 5 Ictidomys tridecemlineatus genes (spetri2) spetri2
## 6        Choloepus hoffmanni genes (choHof1) choHof1
ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")
ensembl
## Object of class 'Mart':
##  Using the ensembl BioMart database
##  Using the hsapiens_gene_ensembl dataset

Next we need to think about filters and values. In the biomaRt package, filters are things that can be used with values to restrict or choose what comes back. So you might choose a filter of “affy_hg_u133_plus_2” to go with specific values. For example you might choose c(“202763_at”,“209310_s_at”,“207500_at”) to go with the filter “affy_hg_u133_plus_2”. Together these two things would request things that matched those probeset IDs on the platform listed as the filter. There is an accessor for the kinds of filters that are available from a given mart/dataset:

head(listFilters(ensembl))
##              name     description
## 1 chromosome_name Chromosome name
## 2           start Gene Start (bp)
## 3             end   Gene End (bp)
## 4      band_start      Band Start
## 5        band_end        Band End
## 6    marker_start    Marker Start

Also, you need to know about attributes. Attributes here mean the things that you want returned. So if you want to know the gene symbol or something like that. You would list that as an attribute. There are accessors to list the kinds of attributes you can look up too:

head(listAttributes(ensembl))
##                    name           description
## 1       ensembl_gene_id       Ensembl Gene ID
## 2 ensembl_transcript_id Ensembl Transcript ID
## 3    ensembl_peptide_id    Ensembl Protein ID
## 4       ensembl_exon_id       Ensembl Exon ID
## 5           description           Description
## 6       chromosome_name       Chromosome Name

Once you are done exploring and know what you want to extract, you can call the getBM method to get your data like this:

affyids=c("202763_at","209310_s_at","207500_at")
getBM(attributes=c('affy_hg_u133_plus_2', 'entrezgene'), 
                    filters = 'affy_hg_u133_plus_2', 
                    values = affyids, mart = ensembl)
##   affy_hg_u133_plus_2 entrezgene
## 1         209310_s_at        837
## 2           207500_at        838
## 3           202763_at        836

Now what would you do if you didn't know what the possible values are for a given filter? Well you could just request all the possible values by not specifying the filter, and instead only specifying it as an attribute like this:

head(getBM(attributes='affy_hg_u133_plus_2', mart = ensembl))
##   affy_hg_u133_plus_2
## 1           225862_at
## 2          1560940_at
## 3        1560941_a_at
## 4           240219_at
## 5           237885_at
## 6           222277_at

Of course if you find the standard biomaRt methods difficult to work with, you can now also use the standard select methods here.

[ Back to top ]

Exercises for biomaRt.

Exercise 11: Pull down GO terms for entrez gene id “1” from human by using the ensembl “hsapiens_gene_ensembl” dataset.

Exercise 12: Now compare the GO terms you just pulled down to the same GO terms from the org.Hs.eg.db package (which you can now retrieve using select()). What differences do you notice? Why do you suspect that is?

[ Back to top ]

BSgenome packages

There are many BSgenome packages in the repository too. These packages contain sequence data for sequenced organisms. You can load one of these packages just like this:

library(BSgenome.Hsapiens.UCSC.hg19)
ls(2)
##  [1] "NP2009code"     "attributePages" "columns"        "exportFASTA"   
##  [5] "filterOptions"  "filterType"     "getBM"          "getBMlist"     
##  [9] "getGene"        "getLDS"         "getSequence"    "getXML"        
## [13] "keys"           "keytypes"       "listAttributes" "listDatasets"  
## [17] "listFilters"    "listMarts"      "select"         "show"          
## [21] "useDataset"     "useMart"
Hsapiens
## Human genome
## | 
## | organism: Homo sapiens (Human)
## | provider: UCSC
## | provider version: hg19
## | release date: Feb. 2009
## | release name: Genome Reference Consortium GRCh37
## | 
## | single sequences (see '?seqnames'):
## |   chr1                   chr2                   chr3                 
## |   chr4                   chr5                   chr6                 
## |   chr7                   chr8                   chr9                 
## |   chr10                  chr11                  chr12                
## |   chr13                  chr14                  chr15                
## |   chr16                  chr17                  chr18                
## |   chr19                  chr20                  chr21                
## |   chr22                  chrX                   chrY                 
## |   chrM                   chr1_gl000191_random   chr1_gl000192_random 
## |   chr4_ctg9_hap1         chr4_gl000193_random   chr4_gl000194_random 
## |   chr6_apd_hap1          chr6_cox_hap2          chr6_dbb_hap3        
## |   chr6_mann_hap4         chr6_mcf_hap5          chr6_qbl_hap6        
## |   chr6_ssto_hap7         chr7_gl000195_random   chr8_gl000196_random 
## |   chr8_gl000197_random   chr9_gl000198_random   chr9_gl000199_random 
## |   chr9_gl000200_random   chr9_gl000201_random   chr11_gl000202_random
## |   chr17_ctg5_hap1        chr17_gl000203_random  chr17_gl000204_random
## |   chr17_gl000205_random  chr17_gl000206_random  chr18_gl000207_random
## |   chr19_gl000208_random  chr19_gl000209_random  chr21_gl000210_random
## |   chrUn_gl000211         chrUn_gl000212         chrUn_gl000213       
## |   chrUn_gl000214         chrUn_gl000215         chrUn_gl000216       
## |   chrUn_gl000217         chrUn_gl000218         chrUn_gl000219       
## |   chrUn_gl000220         chrUn_gl000221         chrUn_gl000222       
## |   chrUn_gl000223         chrUn_gl000224         chrUn_gl000225       
## |   chrUn_gl000226         chrUn_gl000227         chrUn_gl000228       
## |   chrUn_gl000229         chrUn_gl000230         chrUn_gl000231       
## |   chrUn_gl000232         chrUn_gl000233         chrUn_gl000234       
## |   chrUn_gl000235         chrUn_gl000236         chrUn_gl000237       
## |   chrUn_gl000238         chrUn_gl000239         chrUn_gl000240       
## |   chrUn_gl000241         chrUn_gl000242         chrUn_gl000243       
## |   chrUn_gl000244         chrUn_gl000245         chrUn_gl000246       
## |   chrUn_gl000247         chrUn_gl000248         chrUn_gl000249       
## | 
## | multiple sequences (see '?mseqnames'):
## |   upstream1000  upstream2000  upstream5000  
## | 
## | (use the '$' or '[[' operator to access a given sequence)

The getSeq method is useful for extracting data from these pacakges. This method takes several arguments but the important ones are the 1st two. The 1st argument specifies the BSgenome object to use and the second argument (names) specifies what data you want back out. So for example, if you call it and give a character vector that names the seqnames for the object then you will get the sequences from those chromosomes as a DNAStringSet object.

seqNms <- seqnames(Hsapiens)
head(seqNms)
## [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6"
getSeq(Hsapiens, seqNms[1:2])
##   A DNAStringSet instance of length 2
##         width seq
## [1] 249250621 NNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNN
## [2] 243199373 NNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Whereas if you give the a GRanges object for the 2nd argument, you can instead get a DNA StringSet that corresponds to those ranges.

rngs <- GRanges(seqnames = c('chr1', 'chr4'), strand=c('+','-'),
                ranges = IRanges(start=c(100000,300000), 
                                 end=c(100023,300037)))
rngs
## GRanges with 2 ranges and 0 metadata columns:
##       seqnames           ranges strand
##          <Rle>        <IRanges>  <Rle>
##   [1]     chr1 [100000, 100023]      +
##   [2]     chr4 [300000, 300037]      -
##   ---
##   seqlengths:
##    chr1 chr4
##      NA   NA
res <- getSeq(Hsapiens, rngs)
res
##   A DNAStringSet instance of length 2
##     width seq
## [1]    24 CACTAAGCACACAGAGAATAATGT
## [2]    38 GCTGGTCCCTTACTTCCAGTAGAAAAGACGTGTTCAGG

This can be a very powerful way to quickly get sequences of interest. And for more useful tools the BSgenome package also has useful functions for finding a pattern in a string set etc.

[ Back to top ]

Installation and Use

Follow installation instructions to start using these packages. To install the annotations associated with the Affymetrix Human Genome U95 V 2.0, and with Gene Ontology, use

source("http://bioconductor.org/biocLite.R")
biocLite(c("hgu95av2.db", "GO.db"))

Package installation is required only once per R installation. View a full list of available software and annotation packages.

To use the AnnotationDbi and GO.db package, evaluate the commands

library(AnnotationDbi)
library(GO.db)

These commands are required once in each R session.

[ Back to top ]

Exploring Package Content

Packages have extensive help pages, and include vignettes highlighting common use cases. The help pages and vignettes are available from within R. After loading a package, use syntax like

help(package="GO.db")
?select

to obtain an overview of help on the GO.db package, and the select method. The AnnotationDbi package is used by most .db packages. View the vignettes in the AnnotationDbi package with

browseVignettes(package="AnnotationDbi")

To view vignettes (providing a more comprehensive introduction to package functionality) in the AnnotationDbi package. Use

help.start()

To open a web page containing comprehensive help resources.

[ Back to top ]

Annotation Resources

The following guides the user through key annotation packages. Users interested in how to create custom chip packages should see the vignettes in the AnnotationForge package. There is additional information in the AnnotationDbi, OrganismDbi and GenomicFeatures packages for how to use some of the extra tools provided. You can also refer to the complete list of annotation packages.

Key Packages

Types of Annotation Packages

[ Back to top ]

Answers for exercises:

Exercise 1:

keys <- "MSX2"
columns <- c("ENTREZID", "CHR")
select(org.Hs.eg.db, keys, columns, keytype="SYMBOL")
##   SYMBOL ENTREZID CHR
## 1   MSX2     4488   5

Exercise 2:

Initially you might expect that hgu95av2.db will have less information in it. After all, it's an old Affymetrix platform that was developed before we even had a very complete human genome. So you might try something like this:

chipSymbols <- keys(hgu95av2.db, keytype="SYMBOL")
orgSymbols <- keys(org.Hs.eg.db, keytype="SYMBOL")
length(orgSymbols)
## [1] 47912
length(chipSymbols)
## [1] 47912

And you might feel confused and so you might try this:

dim(select(org.Hs.eg.db,orgSymbols, "ENTREZID", "SYMBOL"))
## Warning: 'select' resulted in 1:many mapping between keys and return rows
## [1] 47938     2
dim(select(hgu95av2.db,chipSymbols, "ENTREZID", "SYMBOL")) 
## Warning: 'select' resulted in 1:many mapping between keys and return rows
## [1] 47938     2

And you might also have noticed this:

length(columns(org.Hs.eg.db)) < length(columns(hgu95av2.db))
## [1] TRUE

Well the answer you have in front of you is actually correct. There actually is more information available in the hgu95av2.db object than in the org.Hs.eg.db object. This is because even though the hgu95av2.db object technically can only have probes for some genes in the genome, it still (behind the scenes) retrieves data about gene names etc. from the org.Hs.eg.db package. So it effectively has access to all the data from the org package PLUS the probes for that platform and what those map to. So that means that for there will be information about many gene symbols that don't actually match up to any probeset Ids. And that is what we see if we use gene symbols to look up the probes Ids.

head(select(hgu95av2.db,chipSymbols, "PROBEID", "SYMBOL"))
##   SYMBOL  PROBEID
## 1   A1BG     <NA>
## 2    A2M     <NA>
## 3  A2MP1     <NA>
## 4   NAT1 38187_at
## 5   NAT2 38912_at
## 6   NATP     <NA>

Exercise 3:

egr <- select(org.Hs.eg.db, orgSymbols, "ENTREZID", "SYMBOL")
## Warning: 'select' resulted in 1:many mapping between keys and return rows
length(egr$ENTREZID)
## [1] 47938
length(unique(egr$ENTREZID))
## [1] 47938
## VS:
length(egr$SYMBOL)
## [1] 47938
length(unique(egr$SYMBOL))
## [1] 47912
## So lets trap these symbols that are redundant and look more closely...
redund <- egr$SYMBOL
badSymbols <- redund[duplicated(redund)]
select(org.Hs.eg.db, badSymbols, "ENTREZID", "SYMBOL")
## Warning: 'select' resulted in 1:many mapping between keys and return rows
##           SYMBOL  ENTREZID
## 1         CSNK1E      1454
## 2         CSNK1E 102800317
## 3            HBD      3045
## 4            HBD 100187828
## 5           RNR1      4549
## 6           RNR1      6052
## 7           RNR2      4550
## 8           RNR2      6053
## 9            TEC      7006
## 10           TEC 100124696
## 11         MEMO1      7795
## 12         MEMO1     51072
## 13          DUX4     22947
## 14          DUX4 100653046
## 15       KIR3DL3    115653
## 16       KIR3DL3 100133046
## 17          MMD2    221938
## 18          MMD2 100505381
## 19 RP5-1043L13.1    284757
## 20 RP5-1043L13.1    729296
## 21 RP11-344E13.3    339260
## 22 RP11-344E13.3    440416
## 23       MIR642A    693227
## 24       MIR642A 102466336
## 25     LINC00623    728855
## 26     LINC00623 101929362
## 27     LINC00684 100129407
## 28     LINC00684 100132304
## 29  RP6-206I17.2 100130000
## 30  RP6-206I17.2 101929438
## 31 RP4-669L17.10 100132062
## 32 RP4-669L17.10 100132287
## 33    AC159540.1 100506076
## 34    AC159540.1 100506123
## 35     LSAMP-AS1 100506708
## 36     LSAMP-AS1 101926903
## 37 RP11-696N14.1 100507053
## 38 RP11-696N14.1 101929271
## 39 RP11-513O17.2 100507651
## 40 RP11-513O17.2 101929436
## 41 RP11-119F19.2 101060691
## 42 RP11-119F19.2 101929525
## 43 RP11-561O23.5 101926991
## 44 RP11-561O23.5 101929800
## 45   AC004893.11 101927550
## 46   AC004893.11 102723992
## 47  RP11-475O6.1 101927560
## 48  RP11-475O6.1 101927587
## 49  RP1-102G20.5 101928696
## 50  RP1-102G20.5 102724578
## 51 RP11-321G12.1 102723344
## 52 RP11-321G12.1 102723355

Exercise 4:

So to retrieve this information using select you need to do it like this:

res1 <- select(TxDb.Hsapiens.UCSC.hg19.knownGene, 
               keys(TxDb.Hsapiens.UCSC.hg19.knownGene, keytype="TXID"),
               columns=c("GENEID","TXNAME","TXCHROM"), keytype="TXID")

head(res1)
##   TXID    GENEID     TXNAME TXCHROM
## 1    1 100287102 uc001aaa.3    chr1
## 2    2 100287102 uc010nxq.1    chr1
## 3    3 100287102 uc010nxr.1    chr1
## 4    4     79501 uc001aal.1    chr1
## 5    5      <NA> uc001aaq.2    chr1
## 6    6      <NA> uc001aar.2    chr1

And to do it using transcripts you do it like this:

res2 <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene, 
                    columns = c("gene_id","tx_name")) 
head(res2)
## GRanges with 6 ranges and 2 metadata columns:
##       seqnames           ranges strand |         gene_id     tx_name
##          <Rle>        <IRanges>  <Rle> | <CharacterList> <character>
##   [1]     chr1 [ 11874,  14409]      + |       100287102  uc001aaa.3
##   [2]     chr1 [ 11874,  14409]      + |       100287102  uc010nxq.1
##   [3]     chr1 [ 11874,  14409]      + |       100287102  uc010nxr.1
##   [4]     chr1 [ 69091,  70008]      + |           79501  uc001aal.1
##   [5]     chr1 [321084, 321115]      + |                  uc001aaq.2
##   [6]     chr1 [321146, 321207]      + |                  uc001aar.2
##   ---
##   seqlengths:
##                    chr1                 chr2 ...       chrUn_gl000249
##               249250621            243199373 ...                38502

Notice that in the 2nd case we don't have to ask for the chromosome, as transcripts() returns a GRanges object, so the chromosome will automatically be returned as part of the object.

Exercise 5:

library(TxDb.Athaliana.BioMart.plantsmart16)
res <- transcripts(TxDb.Athaliana.BioMart.plantsmart16, columns = c("gene_id")) 

You will notice that the gene ids for this package are TAIR locus IDs and are NOT entrez gene IDs like what you saw in the TxDb.Hsapiens.UCSC.hg19.knownGene package. It's important to always pay attention to the kind of gene id is being used by the TranscriptDb you are looking at.

Exercise 6:

library(Homo.sapiens)
keys <- keys(Homo.sapiens, keytype="TXID")
res1 <- select(Homo.sapiens, 
               keys= keys,
               columns=c("SYMBOL","TXSTART","TXCHROM"), keytype="TXID")

head(res1)

And to do it using transcripts you do it like this:

library(Homo.sapiens)
res2 <- transcripts(Homo.sapiens, columns="SYMBOL") 
head(res2)
## GRanges with 6 ranges and 1 metadata column:
##       seqnames           ranges strand |          SYMBOL
##          <Rle>        <IRanges>  <Rle> | <CharacterList>
##   [1]     chr1 [ 11874,  14409]      + |         DDX11L1
##   [2]     chr1 [ 11874,  14409]      + |         DDX11L1
##   [3]     chr1 [ 11874,  14409]      + |         DDX11L1
##   [4]     chr1 [ 69091,  70008]      + |           OR4F5
##   [5]     chr1 [321084, 321115]      + |              NA
##   [6]     chr1 [321146, 321207]      + |              NA
##   ---
##   seqlengths:
##                    chr1                 chr2 ...       chrUn_gl000249
##               249250621            243199373 ...                38502

Exercise 7:

columns(Homo.sapiens)
##  [1] "GOID"         "TERM"         "ONTOLOGY"     "DEFINITION"  
##  [5] "ENTREZID"     "PFAM"         "IPI"          "PROSITE"     
##  [9] "ACCNUM"       "ALIAS"        "CHR"          "CHRLOC"      
## [13] "CHRLOCEND"    "ENZYME"       "MAP"          "PATH"        
## [17] "PMID"         "REFSEQ"       "SYMBOL"       "UNIGENE"     
## [21] "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "GENENAME"    
## [25] "UNIPROT"      "GO"           "EVIDENCE"     "GOALL"       
## [29] "EVIDENCEALL"  "ONTOLOGYALL"  "OMIM"         "UCSCKG"      
## [33] "CDSID"        "CDSNAME"      "CDSCHROM"     "CDSSTRAND"   
## [37] "CDSSTART"     "CDSEND"       "EXONID"       "EXONNAME"    
## [41] "EXONCHROM"    "EXONSTRAND"   "EXONSTART"    "EXONEND"     
## [45] "GENEID"       "TXID"         "EXONRANK"     "TXNAME"      
## [49] "TXCHROM"      "TXSTRAND"     "TXSTART"      "TXEND"
columns(org.Hs.eg.db)
##  [1] "ENTREZID"     "PFAM"         "IPI"          "PROSITE"     
##  [5] "ACCNUM"       "ALIAS"        "CHR"          "CHRLOC"      
##  [9] "CHRLOCEND"    "ENZYME"       "MAP"          "PATH"        
## [13] "PMID"         "REFSEQ"       "SYMBOL"       "UNIGENE"     
## [17] "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "GENENAME"    
## [21] "UNIPROT"      "GO"           "EVIDENCE"     "ONTOLOGY"    
## [25] "GOALL"        "EVIDENCEALL"  "ONTOLOGYALL"  "OMIM"        
## [29] "UCSCKG"
columns(TxDb.Hsapiens.UCSC.hg19.knownGene)
##  [1] "CDSID"      "CDSNAME"    "CDSCHROM"   "CDSSTRAND"  "CDSSTART"  
##  [6] "CDSEND"     "EXONID"     "EXONNAME"   "EXONCHROM"  "EXONSTRAND"
## [11] "EXONSTART"  "EXONEND"    "GENEID"     "TXID"       "EXONRANK"  
## [16] "TXNAME"     "TXCHROM"    "TXSTRAND"   "TXSTART"    "TXEND"
## You might also want to look at this:
transcripts(Homo.sapiens, columns=c("SYMBOL","CHRLOC"))
## Warning: 'select' resulted in 1:many mapping between keys and return rows
## Warning: 'select' resulted in 1:many mapping between keys and return rows
## GRanges with 82960 ranges and 3 metadata columns:
##           seqnames               ranges strand   |        CHRLOC
##              <Rle>            <IRanges>  <Rle>   | <IntegerList>
##       [1]     chr1     [ 11874,  14409]      +   |         11874
##       [2]     chr1     [ 11874,  14409]      +   |         11874
##       [3]     chr1     [ 11874,  14409]      +   |         11874
##       [4]     chr1     [ 69091,  70008]      +   |         69091
##       [5]     chr1     [321084, 321115]      +   |            NA
##       ...      ...                  ...    ... ...           ...
##   [82956]     chrY [27605645, 27605678]      -   |            NA
##   [82957]     chrY [27606394, 27606421]      -   |            NA
##   [82958]     chrY [27607404, 27607432]      -   |            NA
##   [82959]     chrY [27635919, 27635954]      -   |            NA
##   [82960]     chrY [59358329, 59360854]      -   |            NA
##                 CHRLOCCHR          SYMBOL
##           <CharacterList> <CharacterList>
##       [1]               1         DDX11L1
##       [2]               1         DDX11L1
##       [3]               1         DDX11L1
##       [4]               1           OR4F5
##       [5]              NA              NA
##       ...             ...             ...
##   [82956]              NA              NA
##   [82957]              NA              NA
##   [82958]              NA              NA
##   [82959]              NA              NA
##   [82960]              NA              NA
##   ---
##   seqlengths:
##                    chr1                 chr2 ...       chrUn_gl000249
##               249250621            243199373 ...                38502

The key difference is that the TXSTART refers to the start of a transcript and originates in the TranscriptDb object from the TxDb.Hsapiens.UCSC.hg19.knownGene package, while the CHRLOC refers to the same thing but originates in the OrgDb object from the org.Hs.eg.db package. The point of origin is significant because the TranscriptDb object represents a transcriptome from UCSC and the OrgDb is primarily gene centric data that originates at NCBI. The upshot is that CHRLOC will not have as many regions represented as TXSTART, since there has to be an official gene for there to even be a record. The CHRLOC data is also locked in for org.Hs.eg.db as data for hg19, whereas you can swap in a different TranscriptDb object to match the genome you are using to make it hg18 etc. For these reasons, we strongly recommend using TXSTART instead of CHRLOC. Howeverm CHRLOC still remains in the org packages for historical reasons.

Exercise 8

To find the keys that match, make use of the pattern and column arguments.

library(Homo.sapiens)
xk = head(keys(Homo.sapiens, keytype="ENTREZID", pattern="X", column="SYMBOL"))
xk
## [1] "100033409" "100033411" "100036519" "100038246" "100048904" "100048923"

select verifies the results

select(Homo.sapiens, xk, "SYMBOL", "ENTREZID")
##    ENTREZID   SYMBOL
## 1 100033409   OTX2P1
## 2 100033411     DUXB
## 3 100036519  FOXD4L2
## 4 100038246   TLX1NB
## 5 100048904 DDX39BP1
## 6 100048923 DDX39BP2

Exercise 9:

The 1st thing you need to do is look at the keytypes:

keytypes(ah)
##  [1] "BiocVersion"        "DataProvider"       "Title"             
##  [4] "SourceFile"         "Species"            "SourceUrl"         
##  [7] "SourceVersion"      "TaxonomyId"         "Genome"            
## [10] "Description"        "Tags"               "RDataClass"        
## [13] "RDataPath"          "Coordinate_1_based" "Maintainer"        
## [16] "RDataVersion"       "RDataDateAdded"     "Recipe"

Then you want to look at possible values for DataProvider and for Genome.

keys(ah, keytype="DataProvider")
## [1] "EncodeDCC"                                             
## [2] "ftp.ensembl.org"                                       
## [3] "ftp://ftp.ncbi.nih.gov/snp"                            
## [4] "HAEMCODE"                                              
## [5] "hgdownload.cse.ucsc.edu"                               
## [6] "http://inparanoid.sbc.su.se/download/current/Orthologs"
## [7] "RefNet"
head(keys(ah, keytype="Genome"))
## [1] "ailMel1"   "anoCar1"   "anoCar2"   "AnoCar2.0" "anoGam1"   "apiMel1"
filters(ah) <- NULL
filters(ah) <- list(Species="Homo sapiens", 
                    DataProvider="hgdownload.cse.ucsc.edu",
            Genome="hg19")
length(ah)
## [1] 118

Exercise 10:

This pulls down the oreganno annotations. Which are described on the UCSC site thusly: “This track displays literature-curated regulatory regions, transcription factor binding sites, and regulatory polymorphisms from ORegAnno (Open Regulatory Annotation). For more detailed information on a particular regulatory element, follow the link to ORegAnno from the details page.”

res <- ah$goldenpath.hg19.database.oreganno_0.0.1.RData

Exercise 11:

library("biomaRt")
ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")
ids=c("1")
getBM(attributes=c('go_id', 'entrezgene'), 
            filters = 'entrezgene',
                    values = ids, mart = ensembl)
##        go_id entrezgene
## 1 GO:0008150          1
## 2 GO:0005576          1
## 3 GO:0005515          1
## 4 GO:0003674          1

Exercise 12:

library(org.Hs.eg.db)
ids=c("1")
select(org.Hs.eg.db, keys=ids, columns="GO", keytype="ENTREZID")
## Warning: 'select' resulted in 1:many mapping between keys and return rows
##   ENTREZID         GO EVIDENCE ONTOLOGY
## 1        1 GO:0003674       ND       MF
## 2        1 GO:0005576      IDA       CC
## 3        1 GO:0008150       ND       BP
## 4        1 GO:0070062      IDA       CC
## 5        1 GO:0072562      IDA       CC

When this exercise was written, there was a different number of GO terms returned from biomaRt than from org.Hs.eg.db. This may not always be true in the future though as both of these resources are updated. It is expected however that this web service, (which is updated continuously) will fall in and out of sync with the org.Hs.eg.db package (which is updated twice a year). This is an important difference as each approach has different advantages and disadvantages. The advantage to updating continuously is that you always have the very latest annotations which are frequently different for something like GO terms. The advantage to using a package is that the results are frozen to a release of Bioconductor. And this can help you to get the same answers that you get today (reproducibility), a few years from now.

[ Back to top ]

SessionInfo

sessionInfo()
## R version 3.1.0 (2014-04-10)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## 
## locale:
## [1] C
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] TxDb.Athaliana.BioMart.plantsmart16_2.9.0
##  [2] biomaRt_2.20.0                           
##  [3] Homo.sapiens_1.1.2                       
##  [4] OrganismDbi_1.6.0                        
##  [5] hgu95av2.db_2.14.0                       
##  [6] GO.db_2.14.0                             
##  [7] ensemblVEP_1.4.0                         
##  [8] BSgenome.Hsapiens.UCSC.hg19_1.3.99       
##  [9] BSgenome_1.32.0                          
## [10] org.Mm.eg.db_2.14.0                      
## [11] org.Hs.eg.db_2.14.0                      
## [12] RSQLite_0.11.4                           
## [13] DBI_0.2-7                                
## [14] TxDb.Mmusculus.UCSC.mm10.ensGene_2.14.0  
## [15] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0 
## [16] GenomicFeatures_1.16.2                   
## [17] AnnotationDbi_1.26.0                     
## [18] Biobase_2.24.0                           
## [19] AnnotationHub_1.4.0                      
## [20] VariantAnnotation_1.10.5                 
## [21] Rsamtools_1.16.1                         
## [22] Biostrings_2.32.1                        
## [23] XVector_0.4.0                            
## [24] GenomicRanges_1.16.3                     
## [25] GenomeInfoDb_1.0.2                       
## [26] IRanges_1.22.9                           
## [27] BiocGenerics_0.10.0                      
## 
## loaded via a namespace (and not attached):
##  [1] BBmisc_1.7               BatchJobs_1.2           
##  [3] BiocInstaller_1.14.2     BiocParallel_0.6.1      
##  [5] Category_2.30.0          GSEABase_1.26.0         
##  [7] GenomicAlignments_1.0.2  MASS_7.3-33             
##  [9] Matrix_1.1-3             RBGL_1.40.0             
## [11] RColorBrewer_1.0-5       RCurl_1.95-4.1          
## [13] RJSONIO_1.2-0.2          Rcpp_0.11.2             
## [15] XML_3.98-1.1             annotate_1.42.0         
## [17] bitops_1.0-6             brew_1.0-6              
## [19] caTools_1.17             checkmate_1.1           
## [21] codetools_0.2-8          colorspace_1.2-4        
## [23] digest_0.6.4             evaluate_0.5.5          
## [25] fail_1.2                 foreach_1.4.2           
## [27] formatR_0.10             genefilter_1.46.1       
## [29] ggplot2_1.0.0            graph_1.42.0            
## [31] grid_3.1.0               gridSVG_1.4-0           
## [33] gtable_0.1.2             htmltools_0.2.4         
## [35] httpuv_1.3.0             httr_0.3                
## [37] interactiveDisplay_1.2.0 iterators_1.0.7         
## [39] knitr_1.6                lattice_0.20-29         
## [41] markdown_0.7             munsell_0.4.2           
## [43] plyr_1.8.1               proto_0.3-10            
## [45] reshape2_1.4             rjson_0.2.14            
## [47] rtracklayer_1.24.2       scales_0.2.4            
## [49] sendmailR_1.1-2          shiny_0.10.0            
## [51] splines_3.1.0            stats4_3.1.0            
## [53] stringr_0.6.2            survival_2.37-7         
## [55] tools_3.1.0              xtable_1.7-3            
## [57] zlibbioc_1.10.0

[ Back to top ]