melissa {Melissa} | R Documentation |
melissa
clusters and imputes single cells based on their
methylome landscape on specific genomic regions, e.g. promoters, using the
Variational Bayes (VB) EM-like algorithm.
melissa(X, K = 3, basis = NULL, delta_0 = NULL, w = NULL, alpha_0 = 0.5, beta_0 = NULL, vb_max_iter = 300, epsilon_conv = 1e-05, is_kmeans = TRUE, vb_init_nstart = 10, vb_init_max_iter = 20, is_parallel = FALSE, no_cores = 3, is_verbose = TRUE)
X |
The input data, which has to be a list of elements of
length N, where N are the total number of cells. Each element in the list
contains another list of length M, where M is the total number of genomic
regions, e.g. promoters. Each element in the inner list is an |
K |
Integer denoting the total number of clusters K. |
basis |
A 'basis' object. E.g. see create_basis function from BPRMeth package. If NULL, will an RBF object with 3 basis functions will be created. |
delta_0 |
Parameter vector of the Dirichlet prior on the mixing proportions pi. |
w |
Optional, an Mx(D)xK array of the initial parameters, where first dimension are the genomic regions M, 2nd the number of covariates D (i.e. basis functions), and 3rd are the clusters K. If NULL, will be assigned with default values. |
alpha_0 |
Hyperparameter: shape parameter for Gamma distribution. A Gamma distribution is used as prior for the precision parameter tau. |
beta_0 |
Hyperparameter: rate parameter for Gamma distribution. A Gamma distribution is used as prior for the precision parameter tau. |
vb_max_iter |
Integer denoting the maximum number of VB iterations. |
epsilon_conv |
Numeric denoting the convergence threshold for VB. |
is_kmeans |
Logical, use Kmeans for initialization of model parameters. |
vb_init_nstart |
Number of VB random starts for finding better initialization. |
vb_init_max_iter |
Maximum number of mini-VB iterations. |
is_parallel |
Logical, indicating if code should be run in parallel. |
no_cores |
Number of cores to be used, default is max_no_cores - 1. |
is_verbose |
Logical, print results during VB iterations. |
An object of class melissa
with the following elements:
W
: An (M+1) X K matrix with the optimized parameter
values for each cluster, M are the number of basis functions. Each column
of the matrix corresponds a different cluster k.
W_Sigma
: A
list with the covariance matrices of the posterior parmateter W for each
cluster k.
r_nk
: An (N X K) responsibility matrix of each
observations being explained by a specific cluster.
delta
:
Optimized Dirichlet paramter for the mixing proportions.
alpha
: Optimized shape parameter of Gamma distribution.
beta
: Optimized rate paramter of the Gamma distribution
basis
: The basis object.
lb
: The lower bound vector.
labels
: Cluster assignment labels.
pi_k
:
Expected value of mixing proportions.
The modelling and mathematical details for clustering profiles using mean-field variational inference are explained here: http://rpubs.com/cakapourani/ . More specifically:
For Binomial/Bernoulli observation model check: http://rpubs.com/cakapourani/vb-mixture-bpr
For Gaussian observation model check: http://rpubs.com/cakapourani/vb-mixture-lr
C.A.Kapourani C.A.Kapourani@ed.ac.uk
create_melissa_data_obj
,
partition_dataset
, plot_melissa_profiles
,
filter_regions
# Example of running Melissa on synthetic data # Create RBF basis object with 4 RBFs basis_obj <- BPRMeth::create_rbf_object(M = 4) set.seed(15) # Run Melissa melissa_obj <- melissa(X = melissa_synth_dt$met, K = 2, basis = basis_obj, vb_max_iter = 10, vb_init_nstart = 1, vb_init_max_iter = 5, is_parallel = FALSE, is_verbose = FALSE) # Extract mixing proportions print(melissa_obj$pi_k)