BASiCS_MCMC {BASiCS}R Documentation

BASiCS MCMC sampler

Description

MCMC sampler to perform Bayesian inference for single-cell mRNA sequencing datasets using the model described in Vallejos et al (2015).

Usage

BASiCS_MCMC(Data, N, Thin, Burn, Regression, WithSpikes = TRUE, ...)

Arguments

Data

A SingleCellExperiment object. If WithSpikes = TRUE, this MUST be formatted to include the spike-ins information (see vignette).

N

Total number of iterations for the MCMC sampler. Use N>=max(4,Thin), N being a multiple of Thin.

Thin

Thining period for the MCMC sampler. Use Thin>=2.

Burn

Burn-in period for the MCMC sampler. Use Burn>=1, Burn<N, Burn being a multiple of Thin.

Regression

If Regression = TRUE, BASiCS exploits a joint prior formulation for mean and over-dispersion parameters to estimate a measure of residual over-dispersion is not confounded by mean expression. Recommended setting is Regression = TRUE.

WithSpikes

If WithSpikes = TRUE, BASiCS will use reads from added spike-ins to estimate technical variability. If WithSpikess = FALSE, BASiCS depends on replicated experiments (batches) to estimate technical variability. In this case, please supply the BatchInfo vector in colData(Data). Default: WithSpikes = TRUE.

...

Optional parameters.

PriorDelta

Specifies the prior used for delta. Possible values are 'gamma' (Gamma(a.theta,b.theta) prior) and 'log-normal' (log-Normal(0,s2.delta) prior) .

. Default value: PriorDelta = 'log-normal'.

PriorParam

List of 7 elements, containing the hyper-parameter values required for the adopted prior (see Vallejos et al, 2015, 2016). All elements must be positive real numbers.

s2.mu

Scale hyper-parameter for the log-Normal(0,s2.mu) prior that is shared by all gene-specific expression rate parameters μ_i. Default: s2.mu = 0.5.

s2.delta

Only used when ‘PriorDelta == ’log-normal''. Scale hyper-parameter for the log-Normal(0,s2.delta) prior that is shared by all gene-specific over-dispersion parameters δ_i. Default: s2.delta = 0.5.

a.delta

Only used when ‘PriorDelta == ’gamma''. Shape hyper-parameter for the Gamma(a.delta,b.delta) prior that is shared by all gene-specific biological over-dispersion parameters δ_i. Default: a.delta = 1.

b.delta

Only used when ‘PriorDelta == ’gamma''. Rate hyper-parameter for the Gamma(a.delta,b.delta) prior that is shared by all gene-specific biological over-dispersion hyper-parameters δ_i. Default: b.delta = 1.

p.phi

Dirichlet hyper-parameter for the joint of all (scaled by n) cell-specific mRNA content normalising constants φ_j / n. Default: p.phi = rep(1, n).

a.s

Shape hyper-parameter for the Gamma(a.s,b.s) prior that is shared by all cell-specific capture efficiency normalising constants s_j. Default: a.s = 1.

b.s

Rate hyper-parameter for the Gamma(a.s, b.s) prior that is shared by all cell-specific capture efficiency normalising constants s_j. Default: b.s = 1.

a.theta

Shape hyper-parameter for the Gamma(a.theta,b.theta) prior for technical noise parameter θ. Default: a.theta = 1.

b.theta

Rate hyper-parameter for the Gamma(a.theta,b.theta) prior for technical noise parameter θ. Default: b.theta = 1.

eta

Only used when Regression = TRUE. eta specifies the degress of freedom for the residual term. Default: eta = 5.

.

k

Only used when Regression = TRUE. k specifies the number of regression Gaussian Radial Basis Functions (GRBF) used within the correlated prior adopted for gene-specific over-dispersion and mean expression paramters. Default: k = 12.

Var

Only used when Regression = TRUE. Var specifies the GRBF scaling parameter. Default: Var = 1.2.

AR

Optimal acceptance rate for adaptive Metropolis Hastings updates. It must be a positive number between 0 and 1. Default (and recommended): AR = 0.44

.

StopAdapt

Iteration at which adaptive proposals are not longer adapted. Use StopAdapt>=1. Default: StopAdapt = Burn.

StoreChains

If StoreChains = TRUE, the generated BASiCS_Chain object is stored as a '.Rds' file (RunName argument used to index the file name). Default: StoreChains = FALSE.

StoreAdapt

If StoreAdapt = TRUE, trajectory of adaptive proposal variances (in log-scale) for all parameters is stored as a list in a '.Rds' file (RunName argument used to index file name). Default: StoreAdapt = FALSE.

StoreDir

Directory where output files are stored. Only required if StoreChains = TRUE and/or StoreAdapt = TRUE). Default: StoreDir = getwd().

RunName

String used to index '.Rds' files storing chains and/or adaptive proposal variances.

PrintProgress

If PrintProgress = FALSE, console-based progress report is suppressed.

Start

Starting values for the MCMC sampler. We do not advise to use this argument. Default options have been tuned to facilitate convergence. If changed, it must be a list containing the following elements: mu0, delta0, phi0, s0, nu0, theta0, ls.mu0, ls.delta0, ls.phi0, ls.nu0 and ls.theta0

Value

An object of class BASiCS_Chain.

Author(s)

Catalina A. Vallejos cnvallej@uc.cl

Nils Eling eling@ebi.ac.uk

References

Vallejos, Marioni and Richardson (2015). PLoS Computational Biology.

Vallejos, Richardson and Marioni (2016). Genome Biology.

Eling et al (2018). Cell Systems

Examples


# Built-in simulated dataset
set.seed(1) 
Data <- makeExampleBASiCS_Data()
# To analyse real data, please refer to the instructions in:
# https://github.com/catavallejos/BASiCS/wiki/2.-Input-preparation

# Only a short run of the MCMC algorithm for illustration purposes
# Longer runs migth be required to reach convergence
Chain <- BASiCS_MCMC(Data, N = 50, Thin = 2, Burn = 10, Regression = FALSE,
                     PrintProgress = FALSE, WithSpikes = TRUE)

# To run the regression version of BASiCS, use:
Chain <- BASiCS_MCMC(Data, N = 50, Thin = 2, Burn = 10, Regression = TRUE,
                     PrintProgress = FALSE, WithSpikes = TRUE)

# To run the non-spike version BASiCS requires the data to contain at least
# 2 batches:
set.seed(2)
Data <- makeExampleBASiCS_Data(WithBatch = TRUE)
Chain <- BASiCS_MCMC(Data, N = 50, Thin = 2, Burn = 10, Regression = TRUE,
                     PrintProgress = FALSE, WithSpikes = FALSE)

# For illustration purposes we load a built-in 'BASiCS_Chain' object
# (obtained using the 'BASiCS_MCMC' function)
data(ChainSC)

# `displayChainBASiCS` can be used to extract information from this output.
# For example:
head(displayChainBASiCS(ChainSC, Param = 'mu'))

# Traceplot (examples only)
plot(ChainSC, Param = 'mu', Gene = 1)
plot(ChainSC, Param = 'phi', Cell = 1)
plot(ChainSC, Param = 'theta', Batch = 1)

# Calculating posterior medians and 95% HPD intervals
ChainSummary <- Summary(ChainSC)

# `displaySummaryBASiCS` can be used to extract information from this output
# For example:
head(displaySummaryBASiCS(ChainSummary, Param = 'mu'))

# Graphical display of posterior medians and 95% HPD intervals
# For example:
plot(ChainSummary, Param = 'mu', main = 'All genes')
plot(ChainSummary, Param = 'mu', Genes = 1:10, main = 'First 10 genes')
plot(ChainSummary, Param = 'phi', main = 'All cells')
plot(ChainSummary, Param = 'phi', Cells = 1:5, main = 'First 5 cells')
plot(ChainSummary, Param = 'theta')

# To constrast posterior medians of cell-specific parameters
# For example:
par(mfrow = c(1,2))
plot(ChainSummary, Param = 'phi', Param2 = 's', SmoothPlot = FALSE)
# Recommended for large numbers of cells
plot(ChainSummary, Param = 'phi', Param2 = 's', SmoothPlot = TRUE)

# To constrast posterior medians of gene-specific parameters
par(mfrow = c(1,2))
plot(ChainSummary, Param = 'mu', Param2 = 'delta', log = 'x',
     SmoothPlot = FALSE)
# Recommended
plot(ChainSummary, Param = 'mu', Param2 = 'delta', log = 'x',
     SmoothPlot = TRUE)

# To obtain denoised rates / counts, see:
# help(BASiCS_DenoisedRates)
# and
# help(BASiCS_DenoisedCounts)

# For examples of differential analyses between 2 populations of cells see:
# help(BASiCS_TestDE)


[Package BASiCS version 1.6.0 Index]