discretize_gene_supervised {FCBF} | R Documentation |
Uses several discretizations and selects the one that is best for a given variable (gene) in comparison to a target class by equivocation Note that set.seed() should be used for reproducing the results. The inner kmeans #' function would, otherwise, provide different results each time.
discretize_gene_supervised(gene, target, output = "discretized_vector", discs = c(".split_vector_in_two_by_median", ".split_vector_in_two_by_mean", ".split_vector_by_kmeans", ".split_vector_in_three_by_mean_sd", ".split_vector_in_two_by_vw"), vw_params = c(0.25, 0.5, 0.75), kmeans_centers = c(2, 3, 4), sd_alpha = c(0.75, 1, 1.25))
gene |
A previously normalized gene expression vector |
target |
A series of labels matching each of the values in the gene vector |
output |
If it is equal to 'discretized_vector', the output is the vector. I it is 'su', returns a dataframe. Defaults to 'discretized_vector' |
discs |
Defaults to c( ".split_vector_in_two_by_median", split_vector_in_two_by_mean", ".split_vector_by_kmeans", ".split_vector_in_three_by_mean_sd", ".split_vector_in_two_by_vw") |
vw_params |
cuttof parameters for the varying width function. Defaults to 0.25, 0.5 and 0.75 |
kmeans_centers |
Numeric vector with the number of centers to use for kmeans. Defaults to 2, 3 and 4 |
sd_alpha |
Parameter for adusting the 'medium' level of the mean +- sd discretization. Defaults to sd_alpha = c(0.75, 1, 1.25)) |
Note that a seed for random values has to bew set for reproducibility. Otherwise, the "kmeans" value might vary from iteration to iteration.
A data frame with the discretized features in the same order as previously
data(scDengue) exprs <- as.data.frame(SummarizedExperiment::assay(scDengue, 'logcounts')) gene <- exprs['ENSG00000166825',] infection <- SummarizedExperiment::colData(scDengue) target <- infection$infection set.seed(3) discrete_expression <- as.data.frame(discretize_gene_supervised(gene, target)) table(discrete_expression)