binomRegMethModel {SOMNiBUS} | R Documentation |
This function fits a (dispersion-adjusted) binomial regression model to regional methylation data, and reports the estimated smooth covariate effects and regional p-values for the test of DMRs (differentially methylation regions). Over or under dispersion across loci is accounted for in the model by the combination of a multiplicative dispersion parameter (or scale parameter) and a sample-specific random effect.
This method can deal with outcomes, i.e. the number of
methylated reads in a region, that are contaminated by known
false methylation calling rate (p0
) and false non-methylation
calling rate (1-p1
).
The covariate effects are assumed to smoothly vary across
genomic regions. In order to estimate them, the algorithm first
represents the functional paramters by a linear combination
of a set of restricted cubic splines (with dimention
n.k
), and a smoothness penalization term
which depends on the smoothing parameters lambdas
is also
added to control smoothness.
The estimation is performed by an iterated EM algorithm. Each
M step constitutes
an outer Newton's iteration to estimate smoothing parameters
lambdas
and an
inner P-IRLS iteration to estimate spline coefficients
alpha
for the covariate
effects. Currently, the computation in the M step depends the
implementation of
gam()
in package mgcv
.
binomRegMethModel( data, n.k, p0 = 0.003, p1 = 0.9, Quasi = TRUE, epsilon = 10^(-6), epsilon.lambda = 10^(-3), maxStep = 200, detail = FALSE, binom.link = "logit", method = "REML", covs = NULL, RanEff = TRUE, reml.scale = FALSE, scale = -2 )
data |
a data frame with rows as individual CpGs appeared
in all the samples. The
first 4 columns should contain the information of |
n.k |
a vector of basis dimensions for the intercept and
individual covariates.
|
p0 |
the probability of observing a methylated read when
the underlying true
status is unmethylated. |
p1 |
the probability of observing a methylated read when
the underlying true
status is methylated. |
Quasi |
whether a Quasi-likelihood estimation approach will be used; in other words, whether a multiplicative dispersion is added in the model or not. |
epsilon |
numeric; stopping criterion for the closeness of estimates of spline coefficients from two consecutive iterations. |
epsilon.lambda |
numeric; stopping criterion for the
closeness of estimates of smoothing parameter |
maxStep |
the algorithm will step if the iteration steps
exceed |
detail |
indicate whether print the number of iterations |
binom.link |
the link function used in the binomial regression model; the default is the logit link |
method |
the method used to estimate the smoothing
parameters. The default is the
'REML' method which is generally better than prediction based
criterion |
covs |
a vector of covariate names. The covariates with
names in |
RanEff |
whether sample-level random effects are added or not |
reml.scale |
whether a REML-based scale (dispersion) estimator is used. The default is Fletcher-based estimator |
scale |
nagative values mean scale paramter should be estimated; if a positive value is provided, a fixed scale will be used. |
This function return a list
including objects:
est
: estimates of the spline basis coefficients
alpha
lambda
: estimates of the smoothing parameters
for each functional paramters
est.pi
: predicted methylation levels for each
row in the input data
ite.points
: estimates of est
, lambda
at each EM iteration
cov1
: estimated variance-covariance matrix of the
basis coefficients alphas
reg.out
: regional testing output obtained using
Fletcher-based dispersion
estimate; an additional 'ID' row would appear if RanEff is TRUE
reg.out.reml.scale
:regional testing output obtained
sing REML-based
dispersion estimate;
reg.out.gam
:regional testing output obtained using
(Fletcher-based)
dispersion estimate from mgcv package;
phi_fletcher
: Fletcher-based estimate of the
(multiplicative) dispersion
parameter
phi_reml
: REML-based estimate of the (multiplicative)
dispersion parameter
phi_gam
: Estimated dispersion parameter reported by
mgcv
SE.out
: a matrix of the estimated pointwise Standard
Errors (SE); number
of rows are the number of unique CpG sites in the input data and
the number of columns
equal to the total number of covariates fitted in the model
(the first one is the intercept)
SE.out.REML.scale
: a matrix of the estimated
pointwise Standard Errors (SE);
the SE calculated from the REML-based dispersion estimates
uni.pos
: the genomic postions for each row of
CpG sites in the matrix SE.out
Beta.out
: a matrix of the estimated covariate
effects beta(t), here t
denots the genomic positions.
ncovs
: number of functional paramters in the model
(including the
intercept)
sigma00
: estimated variance for the random effect
if RanEff is TRUE;
NA if RanEff is FALSE
Kaiqiong Zhao
#------------------------------------------------------------# data(RAdat) head(RAdat) RAdat.f <- na.omit(RAdat[RAdat$Total_Counts != 0, ]) out <- binomRegMethModel( data=RAdat.f, n.k=rep(5, 3), p0=0.003307034, p1=0.9, epsilon=10^(-6), epsilon.lambda=10^(-3), maxStep=200, detail=FALSE )