Author: Valerie Obenchain

Annotating Genomic Ranges

Background

Bioconductor can import diverse sequence-related file types, including fasta, fastq, BAM, VCF, gff, bed, and wig files, among others. Packages support common and advanced sequence manipulation operations such as trimming, transformation, and alignment. Domain-specific analyses include quality assessment, ChIP-seq, differential expression, RNA-seq, and other approaches. Bioconductor includes an interface to the Sequence Read Archive (via the SRAdb package).

This workflow walks through the annotation of a generic set of ranges with Bioconductor packages. The ranges can be any user-defined region of interest or can be from a public file.

Data Preparation

As a first step, data are put into a GRanges object so we can take advantage of overlap operations and store identifiers as metadata columns.

The first set of ranges are variants from a dbSNP Variant Call Format (VCF) file. This file can be downloaded from the ftp site at NCBI ftp://ftp.ncbi.nlm.nih.gov/snp/ and imported with readVcf() from the VariantAnnotation package. Alternatively, the file is available as a pre-parsed VCF object in the AnnotationHub.

library(VariantAnnotation)
library(AnnotationHub)

The Hub returns a VcfFile object with a reference to the file on disk.

hub <- AnnotationHub()
## retrieving 1 resources
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |                                                                 |   1%
  |                                                                       
  |=                                                                |   1%
  |                                                                       
  |=                                                                |   2%
  |                                                                       
  |==                                                               |   2%
  |                                                                       
  |==                                                               |   3%
  |                                                                       
  |==                                                               |   4%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |====                                                             |   5%
  |                                                                       
  |====                                                             |   6%
  |                                                                       
  |====                                                             |   7%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |======                                                           |   8%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |=======                                                          |  12%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |========                                                         |  13%
  |                                                                       
  |=========                                                        |  13%
  |                                                                       
  |=========                                                        |  14%
  |                                                                       
  |=========                                                        |  15%
  |                                                                       
  |==========                                                       |  15%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |===========                                                      |  16%
  |                                                                       
  |===========                                                      |  17%
  |                                                                       
  |===========                                                      |  18%
  |                                                                       
  |============                                                     |  18%
  |                                                                       
  |============                                                     |  19%
  |                                                                       
  |=============                                                    |  19%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |=============                                                    |  21%
  |                                                                       
  |==============                                                   |  21%
  |                                                                       
  |==============                                                   |  22%
  |                                                                       
  |===============                                                  |  22%
  |                                                                       
  |===============                                                  |  23%
  |                                                                       
  |===============                                                  |  24%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |=================                                                |  25%
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |=================                                                |  27%
  |                                                                       
  |==================                                               |  27%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |===================                                              |  28%
  |                                                                       
  |===================                                              |  29%
  |                                                                       
  |===================                                              |  30%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |====================                                             |  31%
  |                                                                       
  |====================                                             |  32%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=====================                                            |  33%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |======================                                           |  34%
  |                                                                       
  |======================                                           |  35%
  |                                                                       
  |=======================                                          |  35%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |========================                                         |  36%
  |                                                                       
  |========================                                         |  37%
  |                                                                       
  |========================                                         |  38%
  |                                                                       
  |=========================                                        |  38%
  |                                                                       
  |=========================                                        |  39%
  |                                                                       
  |==========================                                       |  39%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |==========================                                       |  41%
  |                                                                       
  |===========================                                      |  41%
  |                                                                       
  |===========================                                      |  42%
  |                                                                       
  |============================                                     |  42%
  |                                                                       
  |============================                                     |  43%
  |                                                                       
  |============================                                     |  44%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |=============================                                    |  45%
  |                                                                       
  |==============================                                   |  45%
  |                                                                       
  |==============================                                   |  46%
  |                                                                       
  |==============================                                   |  47%
  |                                                                       
  |===============================                                  |  47%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |================================                                 |  48%
  |                                                                       
  |================================                                 |  49%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================                                |  50%
  |                                                                       
  |=================================                                |  51%
  |                                                                       
  |=================================                                |  52%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |==================================                               |  53%
  |                                                                       
  |===================================                              |  53%
  |                                                                       
  |===================================                              |  54%
  |                                                                       
  |===================================                              |  55%
  |                                                                       
  |====================================                             |  55%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |=====================================                            |  57%
  |                                                                       
  |=====================================                            |  58%
  |                                                                       
  |======================================                           |  58%
  |                                                                       
  |======================================                           |  59%
  |                                                                       
  |=======================================                          |  59%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |=======================================                          |  61%
  |                                                                       
  |========================================                         |  61%
  |                                                                       
  |========================================                         |  62%
  |                                                                       
  |=========================================                        |  62%
  |                                                                       
  |=========================================                        |  63%
  |                                                                       
  |=========================================                        |  64%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |==========================================                       |  65%
  |                                                                       
  |===========================================                      |  65%
  |                                                                       
  |===========================================                      |  66%
  |                                                                       
  |===========================================                      |  67%
  |                                                                       
  |============================================                     |  67%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |=============================================                    |  68%
  |                                                                       
  |=============================================                    |  69%
  |                                                                       
  |=============================================                    |  70%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |==============================================                   |  71%
  |                                                                       
  |==============================================                   |  72%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |===============================================                  |  73%
  |                                                                       
  |================================================                 |  73%
  |                                                                       
  |================================================                 |  74%
  |                                                                       
  |================================================                 |  75%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |==================================================               |  76%
  |                                                                       
  |==================================================               |  77%
  |                                                                       
  |==================================================               |  78%
  |                                                                       
  |===================================================              |  78%
  |                                                                       
  |===================================================              |  79%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |====================================================             |  81%
  |                                                                       
  |=====================================================            |  81%
  |                                                                       
  |=====================================================            |  82%
  |                                                                       
  |======================================================           |  82%
  |                                                                       
  |======================================================           |  83%
  |                                                                       
  |======================================================           |  84%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=======================================================          |  85%
  |                                                                       
  |========================================================         |  85%
  |                                                                       
  |========================================================         |  86%
  |                                                                       
  |========================================================         |  87%
  |                                                                       
  |=========================================================        |  87%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |==========================================================       |  88%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |===========================================================      |  90%
  |                                                                       
  |===========================================================      |  91%
  |                                                                       
  |===========================================================      |  92%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |============================================================     |  93%
  |                                                                       
  |=============================================================    |  93%
  |                                                                       
  |=============================================================    |  94%
  |                                                                       
  |=============================================================    |  95%
  |                                                                       
  |==============================================================   |  95%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |===============================================================  |  96%
  |                                                                       
  |===============================================================  |  97%
  |                                                                       
  |===============================================================  |  98%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |================================================================ |  99%
  |                                                                       
  |=================================================================|  99%
  |                                                                       
  |=================================================================| 100%
fl <- hub[['AH47004']]
fl
## class: VcfFile 
## path: /var/lib/jenkins/.AnnotationHub/52446
## index: /var/lib/jenkins/.AnnotationHub/52447
## isOpen: FALSE 
## yieldSize: NA

Read the data into a VCF object:

vcf <- readVcf(fl, "hg19")
dim(vcf)
## [1] 114699      0

Overlap operations require that seqlevels and the genome of the objects match. Here we modify the VCF seqlevels to match the TxDb.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb_hg19 <- TxDb.Hsapiens.UCSC.hg19.knownGene
head(seqlevels(txdb_hg19))
## [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6"
seqlevels(vcf)
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
## [15] "15" "16" "17" "18" "19" "20" "21" "22" "X"  "Y"  "MT"
seqlevels(vcf) <- paste0("chr", seqlevels(vcf))

Sanity check to confirm we have matching seqlevels.

intersect(seqlevels(txdb_hg19), seqlevels(vcf))
##  [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8" 
##  [9] "chr9"  "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16"
## [17] "chr17" "chr18" "chr19" "chr20" "chr21" "chr22" "chrX"  "chrY"

The genomes already match so no change is needed.

unique(genome(txdb_hg19))
## [1] "hg19"
unique(genome(vcf))
## [1] "hg19"

We are only interested in the standard chromosomes so we drop the rest.

txdb_hg19 <- keepStandardChromosomes(txdb_hg19)
vcf <- keepStandardChromosomes(vcf)

The GRanges in a VCF object is extracted with 'rowRanges()'.

gr_hg19 <- rowRanges(vcf)

The second set of ranges is a user-defined region of chromosome 4 in mouse. The idea here is that any region, known or unknown, can be annotated with the following steps.

Load the TxDb package and keep only the standard chromosomes.

library(TxDb.Mmusculus.UCSC.mm10.ensGene)
txdb_mm10 <- keepStandardChromosomes(TxDb.Mmusculus.UCSC.mm10.ensGene)

We are creating the GRanges from scratch and can specify the seqlevels (chromosome names) to match the TxDb.

head(seqlevels(txdb_mm10))
## [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6"
gr_mm10 <- GRanges("chr4", IRanges(c(4000000, 107889000), width=1000))

Now assign the genome.

unique(genome(txdb_mm10))
## [1] "mm10"
genome(gr_mm10) <- "mm10"

Location in and Around Genes

locateVariants() in the VariantAnnotation package annotates ranges with transcript, exon, cds and gene ID's from a TxDb. Various extractions are performed on the TxDb (exonsBy(), transcripts(), cdsBy(), etc.) and the result is overlapped with the ranges. An appropriate GRangesList can also be supplied as the annotation. Different variants such as 'coding', 'fiveUTR', 'threeUTR', 'spliceSite', 'intron', 'promoter', and 'intergenic' can be searched for by passing the appropriate constructor as the 'region' argument. See ?locateVariants for details.

loc_hg19 <- locateVariants(gr_hg19, txdb_hg19, AllVariants())
table(loc_hg19$LOCATION)
## 
## spliceSite     intron    fiveUTR   threeUTR     coding intergenic 
##        311     209819       1276       4933      11383      48467 
##   promoter 
##      12440
loc_mm10 <- locateVariants(gr_mm10, txdb_mm10, AllVariants()) 
table(loc_mm10$LOCATION)
## 
## spliceSite     intron    fiveUTR   threeUTR     coding intergenic 
##          6          1          0          0          0          0 
##   promoter 
##         12

Annotate by ID

The ID's returned from locateVariants() can be used in select() to map to ID's in other annotation packages.

library(org.Hs.eg.db)
cols <- c("UNIPROT", "PFAM")
keys <- na.omit(unique(loc_hg19$GENEID))
head(select(org.Hs.eg.db, keys, cols, keytype="ENTREZID"))
##   ENTREZID UNIPROT    PFAM
## 1     9636  P05161 PF00240
## 2   375790  O00468 PF00008
## 3   375790  O00468 PF00050
## 4   375790  O00468 PF00053
## 5   375790  O00468 PF00054
## 6   375790  O00468 PF01390

The 'keytype' argument specifies that the mouse TxDb contains Ensembl instead of Entrez gene id's.

library(org.Mm.eg.db)
keys <- unique(loc_mm10$GENEID)
head(select(org.Mm.eg.db, keys, cols, keytype="ENSEMBL"))
##              ENSEMBL UNIPROT    PFAM
## 1 ENSMUSG00000028236  Q7TQA3 PF00106
## 2 ENSMUSG00000028608  Q8BHG2 PF05907

Annotate by Position

Files stored in the AnnotationHub have been pre-processed into ranged-based R objects such as a GRanges, GAlignments and VCF. The positions in our GRanges can be overlapped with the ranges in the AnnotationHub files. This allows for easy subsetting of multiple files, resulting in only the ranges of interest.

Create a 'hub' from AnnotationHub and filter the files based on organism and genome build.

hub <- AnnotationHub()
## retrieving 1 resources
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |                                                                 |   1%
  |                                                                       
  |=                                                                |   1%
  |                                                                       
  |=                                                                |   2%
  |                                                                       
  |==                                                               |   2%
  |                                                                       
  |==                                                               |   3%
  |                                                                       
  |==                                                               |   4%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |====                                                             |   5%
  |                                                                       
  |====                                                             |   6%
  |                                                                       
  |====                                                             |   7%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |======                                                           |   8%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |=======                                                          |  12%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |========                                                         |  13%
  |                                                                       
  |=========                                                        |  13%
  |                                                                       
  |=========                                                        |  14%
  |                                                                       
  |=========                                                        |  15%
  |                                                                       
  |==========                                                       |  15%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |===========                                                      |  16%
  |                                                                       
  |===========                                                      |  17%
  |                                                                       
  |===========                                                      |  18%
  |                                                                       
  |============                                                     |  18%
  |                                                                       
  |============                                                     |  19%
  |                                                                       
  |=============                                                    |  19%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |=============                                                    |  21%
  |                                                                       
  |==============                                                   |  21%
  |                                                                       
  |==============                                                   |  22%
  |                                                                       
  |===============                                                  |  22%
  |                                                                       
  |===============                                                  |  23%
  |                                                                       
  |===============                                                  |  24%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |=================                                                |  25%
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |=================                                                |  27%
  |                                                                       
  |==================                                               |  27%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |===================                                              |  28%
  |                                                                       
  |===================                                              |  29%
  |                                                                       
  |===================                                              |  30%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |====================                                             |  31%
  |                                                                       
  |====================                                             |  32%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=====================                                            |  33%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |======================                                           |  34%
  |                                                                       
  |======================                                           |  35%
  |                                                                       
  |=======================                                          |  35%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |========================                                         |  36%
  |                                                                       
  |========================                                         |  37%
  |                                                                       
  |========================                                         |  38%
  |                                                                       
  |=========================                                        |  38%
  |                                                                       
  |=========================                                        |  39%
  |                                                                       
  |==========================                                       |  39%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |==========================                                       |  41%
  |                                                                       
  |===========================                                      |  41%
  |                                                                       
  |===========================                                      |  42%
  |                                                                       
  |============================                                     |  42%
  |                                                                       
  |============================                                     |  43%
  |                                                                       
  |============================                                     |  44%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |=============================                                    |  45%
  |                                                                       
  |==============================                                   |  45%
  |                                                                       
  |==============================                                   |  46%
  |                                                                       
  |==============================                                   |  47%
  |                                                                       
  |===============================                                  |  47%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |================================                                 |  48%
  |                                                                       
  |================================                                 |  49%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================                                |  50%
  |                                                                       
  |=================================                                |  51%
  |                                                                       
  |=================================                                |  52%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |==================================                               |  53%
  |                                                                       
  |===================================                              |  53%
  |                                                                       
  |===================================                              |  54%
  |                                                                       
  |===================================                              |  55%
  |                                                                       
  |====================================                             |  55%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |=====================================                            |  57%
  |                                                                       
  |=====================================                            |  58%
  |                                                                       
  |======================================                           |  58%
  |                                                                       
  |======================================                           |  59%
  |                                                                       
  |=======================================                          |  59%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |=======================================                          |  61%
  |                                                                       
  |========================================                         |  61%
  |                                                                       
  |========================================                         |  62%
  |                                                                       
  |=========================================                        |  62%
  |                                                                       
  |=========================================                        |  63%
  |                                                                       
  |=========================================                        |  64%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |==========================================                       |  65%
  |                                                                       
  |===========================================                      |  65%
  |                                                                       
  |===========================================                      |  66%
  |                                                                       
  |===========================================                      |  67%
  |                                                                       
  |============================================                     |  67%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |=============================================                    |  68%
  |                                                                       
  |=============================================                    |  69%
  |                                                                       
  |=============================================                    |  70%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |==============================================                   |  71%
  |                                                                       
  |==============================================                   |  72%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |===============================================                  |  73%
  |                                                                       
  |================================================                 |  73%
  |                                                                       
  |================================================                 |  74%
  |                                                                       
  |================================================                 |  75%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |==================================================               |  76%
  |                                                                       
  |==================================================               |  77%
  |                                                                       
  |==================================================               |  78%
  |                                                                       
  |===================================================              |  78%
  |                                                                       
  |===================================================              |  79%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |====================================================             |  81%
  |                                                                       
  |=====================================================            |  81%
  |                                                                       
  |=====================================================            |  82%
  |                                                                       
  |======================================================           |  82%
  |                                                                       
  |======================================================           |  83%
  |                                                                       
  |======================================================           |  84%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=======================================================          |  85%
  |                                                                       
  |========================================================         |  85%
  |                                                                       
  |========================================================         |  86%
  |                                                                       
  |========================================================         |  87%
  |                                                                       
  |=========================================================        |  87%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |==========================================================       |  88%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |===========================================================      |  90%
  |                                                                       
  |===========================================================      |  91%
  |                                                                       
  |===========================================================      |  92%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |============================================================     |  93%
  |                                                                       
  |=============================================================    |  93%
  |                                                                       
  |=============================================================    |  94%
  |                                                                       
  |=============================================================    |  95%
  |                                                                       
  |==============================================================   |  95%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |===============================================================  |  96%
  |                                                                       
  |===============================================================  |  97%
  |                                                                       
  |===============================================================  |  98%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |================================================================ |  99%
  |                                                                       
  |=================================================================|  99%
  |                                                                       
  |=================================================================| 100%
hub_hg19 <- subset(hub, 
                  (hub$species == "Homo sapiens") & (hub$genome == "hg19"))
length(hub_hg19)
## [1] 5539

Iterate over the first 3 files and extract ranges that overlap 'gr_hg19'.

ov_hg19 <- lapply(1:3, function(i) subsetByOverlaps(hub_hg19[[i]], gr_hg19))

Inspect the results.

names(ov_hg19) <- names(hub_hg19)[1:3]
lapply(ov_hg19, head, n=3)
## $AH3166
## GRanges object with 3 ranges and 5 metadata columns:
##       seqnames                 ranges strand |
##          <Rle>              <IRanges>  <Rle> |
##   [1]    chr14 [ 23388231,  23388425]      - |
##   [2]    chr11 [118436595, 118436793]      - |
##   [3]     chr7 [141438132, 141438281]      + |
##                                            name     score     level
##                                     <character> <integer> <numeric>
##   [1]   chr14:23388231:23388425:-:1.08:0.993703         0  370.1525
##   [2] chr11:118436595:118436793:-:2.21:0.999420         0  255.1222
##   [3]  chr7:141438132:141438281:+:0.37:0.999969         0  197.9053
##          signif    score2
##       <numeric> <integer>
##   [1]  5.15e-09         0
##   [2]  8.41e-09         0
##   [3]  9.31e-09         0
##   -------
##   seqinfo: 24 sequences from hg19 genome
## 
## $AH3912
## GRanges object with 3 ranges and 5 metadata columns:
##       seqnames           ranges strand |        name     score signalValue
##          <Rle>        <IRanges>  <Rle> | <character> <integer>   <numeric>
##   [1]     chr1 [948050, 950479]      * |           .         0   425.73300
##   [2]     chr1 [954433, 956440]      * |           .         0   107.74700
##   [3]     chr1 [970635, 971318]      * |           .         0     2.16979
##          pValue    qValue
##       <numeric> <numeric>
##   [1] 324.00000        -1
##   [2] 324.00000        -1
##   [3]   1.63691        -1
##   -------
##   seqinfo: 23 sequences from hg19 genome
## 
## $AH3913
## GRanges object with 3 ranges and 6 metadata columns:
##       seqnames             ranges strand |        name     score
##          <Rle>          <IRanges>  <Rle> | <character> <integer>
##   [1]     chr1 [1048860, 1049010]      * |           .         0
##   [2]     chr1 [3838920, 3839070]      * |           .         0
##   [3]     chr1 [6051680, 6051830]      * |           .         0
##       signalValue    pValue    qValue      peak
##         <numeric> <numeric> <numeric> <integer>
##   [1]          60   21.9043        -1        -1
##   [2]         451  324.0000        -1        -1
##   [3]         107  321.6620        -1        -1
##   -------
##   seqinfo: 23 sequences from hg19 genome

Annotating the mouse ranges in the same fashion is left as an exercise.

Annotating Variants

Amino acid coding changes

For the set of dbSNP variants that fall in coding regions, amino acid changes can be computed. The output contains one line for each variant-transcript match which can result in multiple lines for each variant.

library(BSgenome.Hsapiens.UCSC.hg19)
head(predictCoding(vcf, txdb_hg19, Hsapiens), 3)
## GRanges object with 3 ranges and 17 metadata columns:
##               seqnames             ranges strand | paramRangeID
##                  <Rle>          <IRanges>  <Rle> |     <factor>
##   rs397514721     chr1 [1233203, 1233203]      - |         <NA>
##   rs397514721     chr1 [1233203, 1233203]      - |         <NA>
##   rs397514721     chr1 [1233203, 1233203]      - |         <NA>
##                          REF                ALT      QUAL      FILTER
##               <DNAStringSet> <DNAStringSetList> <numeric> <character>
##   rs397514721              T                  A      <NA>           .
##   rs397514721              T                  A      <NA>           .
##   rs397514721              T                  A      <NA>           .
##                    varAllele       CDSLOC    PROTEINLOC   QUERYID
##               <DNAStringSet>    <IRanges> <IntegerList> <integer>
##   rs397514721              T [ 317,  317]           106        49
##   rs397514721              T [1001, 1001]           334        49
##   rs397514721              T [1127, 1127]           376        49
##                      TXID         CDSID      GENEID   CONSEQUENCE
##               <character> <IntegerList> <character>      <factor>
##   rs397514721        4150         12185      116983 nonsynonymous
##   rs397514721        4151         12185      116983 nonsynonymous
##   rs397514721        4152         12185      116983 nonsynonymous
##                     REFCODON       VARCODON         REFAA         VARAA
##               <DNAStringSet> <DNAStringSet> <AAStringSet> <AAStringSet>
##   rs397514721            GAG            GTG             E             V
##   rs397514721            GAG            GTG             E             V
##   rs397514721            GAG            GTG             E             V
##   -------
##   seqinfo: 24 sequences from hg19 genome; no seqlengths
sessionInfo()
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-unknown-linux-gnu (64-bit)
## Running under: Ubuntu precise (12.04.4 LTS)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0      
##  [2] BSgenome_1.36.0                        
##  [3] rtracklayer_1.28.4                     
##  [4] org.Mm.eg.db_3.1.2                     
##  [5] org.Hs.eg.db_3.1.2                     
##  [6] RSQLite_1.0.0                          
##  [7] DBI_0.3.1                              
##  [8] TxDb.Mmusculus.UCSC.mm10.ensGene_3.1.2 
##  [9] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.2
## [10] GenomicFeatures_1.20.1                 
## [11] AnnotationDbi_1.30.1                   
## [12] Biobase_2.28.0                         
## [13] AnnotationHub_2.0.2                    
## [14] VariantAnnotation_1.14.1               
## [15] Rsamtools_1.20.4                       
## [16] Biostrings_2.36.1                      
## [17] XVector_0.8.0                          
## [18] GenomicRanges_1.20.5                   
## [19] GenomeInfoDb_1.4.0                     
## [20] IRanges_2.2.4                          
## [21] S4Vectors_0.6.0                        
## [22] BiocGenerics_0.14.0                    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.11.6                  BiocInstaller_1.18.3        
##  [3] formatR_1.2                  futile.logger_1.4.1         
##  [5] bitops_1.0-6                 futile.options_1.0.0        
##  [7] tools_3.2.0                  zlibbioc_1.14.0             
##  [9] biomaRt_2.24.0               digest_0.6.8                
## [11] evaluate_0.7                 shiny_0.12.0                
## [13] stringr_1.0.0                httr_0.6.1                  
## [15] knitr_1.10.5                 R6_2.0.1                    
## [17] XML_3.98-1.2                 BiocParallel_1.2.2          
## [19] lambda.r_1.1.7               magrittr_1.5                
## [21] htmltools_0.2.6              GenomicAlignments_1.4.1     
## [23] mime_0.3                     interactiveDisplayBase_1.6.0
## [25] xtable_1.7-4                 httpuv_1.3.2                
## [27] stringi_0.4-1                RCurl_1.95-4.6

Exercises

Exercise 1: VCF header and reading data subsets.

VCF files can be large and it's often the case that only a subset of variables or genomic positions are of interest. The scanVcfHeader() function in the VariantAnnotation package retrieves header information from a VCF file. Based on the information returned from scanVcfHeader() a ScanVcfParam() object can be created to read in a subset of data from a VCF file.

Exercise 2: Annotate the mouse ranges in 'gr_mm10' with AnnotationHub files.

Exercise 3: Annotate a gene range from Saccharomyces Scerevisiae.

[ Back to top ]