pcKeepCompDetect {nucleR} | R Documentation |
pcKeepComp
param for filterFFT functionThis function tries to obtain the minimum number of components needed in a
FFT filter to achieve or get as close as possible to a given correlation
value. Usually you don't need to call directly this function, is used in
filterFFT
by default.
pcKeepCompDetect( data, pc.min = 0.01, pc.max = 0.1, max.iter = 20, verbose = FALSE, cor.target = 0.98, cor.tol = 0.001, smpl.num = 25, smpl.min.size = 2^10, smpl.max.size = 2^14 )
data |
Numeric vector to be filtered |
pc.min, pc.max |
Range of allowed values for |
max.iter |
Maximum number of iterations |
verbose |
Extra information (debug) |
cor.target |
Target correlation between the filtered and the original profiles. A value around 0.99 is recommeded for Next Generation Sequencing data and around 0.7 for Tiling Arrays. |
cor.tol |
Tolerance allowed between the obtained correlation an the target one. |
smpl.num |
If |
smpl.min.size, smpl.max.size |
Minimum and maximum size of the samples. This is used for selection and sub-selection of ranges with meaningful values (i,e, different from 0 and NA). Power of 2 values are recommended, despite non-mandatory. |
... |
Parameters to be pass to |
This function predicts a suitable pcKeepComp
value for filterFFT
function. This is the recommended amount of components (in percentage) to
keep in the filterFFT
function to obtain a correlation of (or near of)
cor.target
.
The search starts from two given values pc.min
, pc.max
and uses linial
interpolation to quickly reach a value that gives a corelation between the
filtered and the original near cor.target
within the specified tolerance
cor.tol
.
To allow a quick detection without an exhaustive search, this function uses
a subset of the data by randomly sampling those regions with meaningful
coverage values (i,e, different from 0 or NA) larger than smpl.min.size
.
If it's not possible to obtain smpl.max.size
from this region (this could
be due to flanking 0's, for example) at least smpl.min.size
will be used
to check correlation. Mean correlation between all sampled regions is used
to test the performance of the pcKeepComp
parameter.
If the number of meaningful bases in data
is less than smpl.min.size * (smpl.num/2)
all the data
vector will be used instead of using sampling.
Fitted pcKeepComp
value
Oscar Flores oflores@mmb.pcb.ub.es, David Rosell david.rosell@irbbarcelona.org
# Load dataset data(nucleosome_htseq) data <- as.vector(coverage.rpm(nucleosome_htseq)[[1]]) # Get recommended pcKeepComp value pckeepcomp <- pcKeepCompDetect(data, cor.target=0.99) print(pckeepcomp) # Call filterFFT f1 <- filterFFT(data, pcKeepComp=pckeepcomp) # Also this can be called directly f2 <- filterFFT(data, pcKeepComp="auto", cor.target=0.99) # Plot library(ggplot2) i <- 1:2000 plot_data <- rbind( data.frame(x=i, y=data[i], coverage="original"), data.frame(x=i, y=f1[i], coverage="two calls"), data.frame(x=i, y=f2[i], coverage="one call") ) qplot(x=x, y=y, color=coverage, data=plot_data, geom="line", xlab="position", ylab="coverage")