getPeaks {BasicSTARRseq} | R Documentation |
Performs basic peak calling on STARR-seq data based on a method introduced in "Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq" Arnold et al. [1]
getPeaks(object, minQuantile = 0.9, peakWidth = 500, maxPval = 0.001, deduplicate = TRUE, model = 1)
object |
A |
minQuantile |
Which quantile of coverage height should be considered as peaks. |
peakWidth |
The width (in base pairs) that the peaks should have. |
maxPval |
The maximal p-value of peaks that is desired. |
deduplicate |
Wether the sequences should be deduplicated before calling peaks or not. |
model |
Which binomial model should be applied to calculate the p-values. |
The peak calling works the following way:
All genomic positions having a STARR-seq coverage over the quantile minQuantile
are considered to be the center of a peak with width peakWidth
. If then two ore more peaks overlap, the lower one is discarded. If then the binomial p-Value of the peak is higher than maxPval
the peak is discarded as well.
The binomial model
1 for calculating the p-Value is: number of trials = total number of STARR-seq sequences, number of successes = STARR-seq coverage, estimated sucess probability in each trial = input coverage/total number of input sequences.
The binomial model
2 for caculating the p-Value is: number of trials = STARR-seq coverage plus input coverage, number of successes = STARR-seq coverage, estimated success probability in each trial = total number of STARR-seq sequences/(total number of STARR-seq sequences plus total number of input sequences). This model is used in [1].
The enrichment of STARR-seq over input coverage is then calculated as follows: (STARR-seq coverage of peak/total number of STARR-seq sequences)/(input coverage of peak/total number of input sequences), the numinator and denuminator corrected conservatively to the bounds of the 0.95 binomial confidence inverval corresponding to model
1.
The method getPeaks
return a GRanges
object. The contained ranges are the found peaks with desired width peakWidth
. The metadata columns of the ranges contain four elements:
|
The maximal and central STARR-seq coverage of the peak. |
|
The maximum of the central and the median input coverage of the peak. |
|
The binomial p-Value of the coverage height of the peak normalised to toal number of sequences in STARR-seq and input. |
|
The enrichment of STARR-seq over input coverage height normalised to total number of sequences in STARR-seq and input corrected conservatively to the bounds of a confidence interval. |
Annika Buerger
[1] Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq. Arnold et al. Science. 2013 Mar 1;339(6123):1074-7. doi: 10.1126/science.1232542. Epub 2013 Jan 17.
# create a small sample STARRseqData object starrseqFileName <- system.file("extdata", "smallSTARR.bam", package="BasicSTARRseq") inputFileName <- system.file("extdata", "smallInput.bam", package="BasicSTARRseq") data <- STARRseqData(sample=starrseqFileName, control=inputFileName, pairedEnd=TRUE) # call peaks with default parameters peaks = getPeaks(data) # call peaks with no deduplication and no restriction concerning p-value peaks = getPeaks(data, maxPval = 1, deduplicate = FALSE) # call peaks with other binomial model and width 700 peaks = getPeaks(data, peakWidth = 700, model = 2) # call peaks assuming less regions as potential peaks peaks = getPeaks(data, minQuantile = 0.99)