nmf.LassoCV {SparseSignatures} | R Documentation |
Perform the discovery by cross validation of K (unknown) somatic mutational signatures given a set of observations x. The estimation can slow down because of memory usage, when I high number of cross validation repetitions is asked and when the grid search is performed for a lot of configurations. In this case, we advice to split the computation into multiple smaller sets.
nmf.LassoCV(x, K = 3:10, starting_beta = NULL, background_signature = NULL, nmf_runs = 10, lambda_values = c(0.1, 0.2, 0.3), cross_validation_entries = 0.05, cross_validation_iterations = 5, cross_validation_repetitions = 10, iterations = 20, max_iterations_lasso = 10000, num_processes = Inf, seed = NULL, verbose = TRUE)
x |
count matrix. |
K |
a range of numeric value (each of them greater than 1) indicating the number of signatures to be discovered. |
starting_beta |
a list of starting beta value for each configuration of K. If it is NULL, starting betas are estimated by NMF. |
background_signature |
background signature to be used. If not provided, a warning is thrown. |
nmf_runs |
number of iteration of NMF to be performed for a robust estimation of starting beta. If beta is not NULL, this parameter is ignored. |
lambda_values |
range of values of LASSO to be used between 0 and 1. This value should be greater than 0. 1 is the value of LASSO that would shrink all the signatures to 0 within one step. The higher lambda_rate is, the sparser are the resulting signatures, but too large values result in a poor fit of the counts. |
cross_validation_entries |
Percentage of cells in the count matrix to be replaced by 0s. |
cross_validation_iterations |
For each configuration, the first time the signatures are discovered form a matrix with a ercentage of values replaced by 0s. This may result in a poor. This parameter is the number of restarts to be performed to improve this estimate. |
cross_validation_repetitions |
Number of time cross-validation should be repeated. Higher values result in better estimate, but are computationally expensive. |
iterations |
Number of iterations to be performed. Each iteration correspond to a first step where the counts are fitted and a second step where sparsity is enhanced. |
max_iterations_lasso |
Number of maximum iterations to be performed during the sparsification. |
num_processes |
Number of processes to be used during parallel execution. If executing in single process mode, this is ignored. |
seed |
Seed for reproducibility. |
verbose |
boolean; Shall I print all messages? |
A list corresponding with 3 elements: grid_search, starting_beta and mean_squared_error. Here, grid_search provides all the results of the executions within the grid search; starting_beta is the set of initial values of beta used for each configuration and mean_squared_error is the mean squared error between the observed counts and the predicted ones for each configuration.