processingChIPseq {ChIPanalyser}R Documentation

Pre-processing ChIP-seq data from UCSC format file

Description

processingChIPseq will process and extract ChIP scores at a set of loci of interest.

Usage

processingChIPseq(profile,loci=NULL,reduce=NULL,
occupancyProfileParameters=NULL,
peaks=NULL,Access=NULL,noiseFilter=c("zero","mean","median","sigmoid"),cores=1)

Arguments

profile

profile is a path to a UCSC format file, a GRanges or a data.frame containing ChIP scores. The input data frame should contain 4 columns: chromosome, start , end and score. This is also applicable for the GRanges format.

loci

loci is GRanges describing the loci at which ChIP scores should be extracted. If NULL, a set of Loci will extracted from profile based on chromosomes. However, we STRONGLY recommend to use a GRanges of loci of interest. Default=NULL

reduce

reduce is a the top regions to select based on the mean ChIP score. If peaks are provided, regions overlappling with known peaks will be selected based on highest ChIP score. If NULL, all regions will be considered. Default=NULL

occupancyProfileParameters

occupancyProfileParameters is an occupancyProfileParameters object containing chip Parameters to be parsed for ChIP score extraction. If NULL, occupancyProfileParameters will be built internally with default ChIP extraction parameters (see chipSmooth, chipSd and chipMean) Default=NULL

peaks

peaks is a path to UCSC format file or a GRanges object containing location of ChIP peaks. Default=NULL

Access

Access is a GRanges containing Accessible DNA. If provided, regions will be selected only if they contain accesible DNA. Default=NULL

noiseFilter

noiseFilter: Noise filtering method that should be used on ChIP-seq data. Four methods are available: Zero, Mean, Median and Sigmoid. Zero removes all ChIP-seq scores bellow zero, mean under the mean score, median under median score and sigmoid assignes a weight to each score based on a logistic regression curve. Mid point is set at 95 95 quantile of ChIP-seq scores. Below midpoint will receive a score between 0 and 1 , everything above will receive a score between 1 and 2

cores

cores is the number of cores used to extract ChIP scores. Default = 1

Details

When using computeOptimal, it is required to supply real ChIP data in order to have a point of comparison. The corralation and MSE Scores are computed based of how well the model fits biological data. processingChIPseq will extract this data from ChIP data at loci of interest. When using the reduce option, this function will only select the top regions based on peak height or mean ChIP score. processingChIPseq will also extract maxSignal and backgroundSignal from ChIP data and parse it to an occupancyProfileParameters object.

Value

If using reduce, will return a list of two elements. The first element will contain a list with extracted ChIP data and a new set of top scoring loci loci. The second element will contain a occupancyProfileParameters object with maxSignal and backgroundSignal slot updated. If NOT using reduce, the first element will only contain ChIP score at loci of interest the second will still contain a occupancyProfileParameters object with maxSignal and backgroundSignal slot updated.

Author(s)

Patrick C.N. Martin <pm16057@essex.ac.uk>

References

Zabet NR, Adryan B (2015) Estimating binding properties of transcription factors from genome-wide binding profiles. Nucleic Acids Res., 43, 84–94.

Examples



#Data extraction
data(ChIPanalyserData)

## Extracting ChIP scores at loci of interest

ChIP<-processingChIPseq(profile=eveLocusChip, loci=eveLocus)


[Package ChIPanalyser version 1.6.0 Index]