diffcyt 1.0.0
The diffcyt
package implements statistical methods for differential discovery analyses in high-dimensional cytometry data, based on high-resolution clustering and moderated tests adapted from transcriptomics.
High-dimensional cytometry includes multi-color flow cytometry, mass cytometry or CyTOF, and oligonucleotide-tagged cytometry. These technologies use antibodies to measure expression levels of dozens (around 10 to 100) of marker proteins in thousands of cells. In many experiments, the aim is to detect differential abundance (DA) of cell populations, or differential states (DS) within cell populations, between groups of samples in different conditions.
This vignette provides a complete example workflow for running the diffcyt
pipeline, using either the wrapper function diffcyt()
, or the individual functions for each step.
The input to the diffcyt
pipeline can either be raw data, or a pre-prepared daFrame
object from the CATALYST package (Chevrier, Crowell, Zanotelli et al., 2018). Providing a daFrame
is particularly useful when CATALYST
has already been used for exploratory analyses and visualizations; the diffcyt
methods can then be used for differential testing.
The diffcyt
methodology consists of two main components: high-resolution clustering and moderated tests.
We use high-resolution clustering to define a large number of small clusters representing cell populations. By default, we use the FlowSOM clustering algorithm (Van Gassen et al., 2015) to generate the high-resolution clusters, since we previously showed that this clustering algorithm gives excellent clustering performance and fast runtimes for high-dimensional cytometry data (Weber and Robinson, 2016). However, in principle, other algorithms that can generate high-resolution clusters could also be used.
For the differential analyses, we use methods from the edgeR package (Robinson et al., 2010; McCarthy et al., 2012), limma package (Ritchie et al., 2015), and voom
method (Law et al., 2014). These methods are widely used in the transcriptomics field, and have been adapted here for analyzing high-dimensional cytometry data. In addition, we provide alternative methods based on generalized linear mixed models (GLMMs), linear mixed models (LMMs), and linear models (LMs), developed by Nowicka et al. (2017) (available in the CyTOF workflow).
The diffcyt
methods can be used to test for differential abundance (DA) of cell populations, and differential states (DS) within cell populations.
To do this, the methodology requires the set of protein markers to be grouped into ‘cell type’ and ‘cell state’ markers. Cell type markers are used to define clusters, which are tested for DA; cell state marker signals are used to test for DS within clusters.
The conceptual split into cell type and cell state markers also facilitates biological interpretability, since it allows the results to be linked back to known cell types or populations of interest.
The diffcyt
model setup enables the user to specify flexible experimental designs, including batch effects, paired designs, and continuous covariates. Linear contrasts are used to specify the comparison of interest.
A complete description of the statistical methodology, as well as comparisons with existing approaches, are provided in the accompanying paper.
First, we create some random raw data containing a true differential signal.
# Function to create random data (one sample)
d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) {
d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor
colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol))
d
}
# Create random data (without differential signal)
set.seed(123)
d_input <- list(
sample1 = d_random(),
sample2 = d_random(),
sample3 = d_random(),
sample4 = d_random()
)
# Add DA signal
ix_DA <- 801:900
ix_cols_type <- 1:10
d_input[[3]][ix_DA, ix_cols_type] <- d_random(n = 1000, mean = 2, ncol = 10)
d_input[[4]][ix_DA, ix_cols_type] <- d_random(n = 1000, mean = 2, ncol = 10)
# Add DS signal
ix_DS <- 901:1000
ix_cols_DS <- 19:20
d_input[[1]][ix_DS, ix_cols_type] <- d_random(n = 1000, mean = 3, ncol = 10)
d_input[[2]][ix_DS, ix_cols_type] <- d_random(n = 1000, mean = 3, ncol = 10)
d_input[[3]][ix_DS, c(ix_cols_type, ix_cols_DS)] <- d_random(n = 1200, mean = 3, ncol = 12)
d_input[[4]][ix_DS, c(ix_cols_type, ix_cols_DS)] <- d_random(n = 1200, mean = 3, ncol = 12)
The ‘meta-data’ describing the data set is summarized in two data frames: experiment_info
and marker_info
.
The experiment_info
data frame contains information about each sample, including sample IDs, group IDs, batch IDs or patient IDs (if relevant), and continuous covariates (if relevant). The marker_info
data frame contains information about the protein markers, including channel names, marker names, and a vector to identify the class of each marker (cell type or cell state).
# Experiment information
experiment_info <- data.frame(
sample_id = factor(paste0("sample", 1:4)),
group_id = factor(c("group1", "group1", "group2", "group2")),
stringsAsFactors = FALSE
)
experiment_info
## sample_id group_id
## 1 sample1 group1
## 2 sample2 group1
## 3 sample3 group2
## 4 sample4 group2
# Marker information
marker_info <- data.frame(
channel_name = paste0("channel", sprintf("%03d", 1:20)),
marker_name = paste0("marker", sprintf("%02d", 1:20)),
marker_class = factor(c(rep("type", 10), rep("state", 10)),
levels = c("type", "state", "none")),
stringsAsFactors = FALSE
)
marker_info
## channel_name marker_name marker_class
## 1 channel001 marker01 type
## 2 channel002 marker02 type
## 3 channel003 marker03 type
## 4 channel004 marker04 type
## 5 channel005 marker05 type
## 6 channel006 marker06 type
## 7 channel007 marker07 type
## 8 channel008 marker08 type
## 9 channel009 marker09 type
## 10 channel010 marker10 type
## 11 channel011 marker11 state
## 12 channel012 marker12 state
## 13 channel013 marker13 state
## 14 channel014 marker14 state
## 15 channel015 marker15 state
## 16 channel016 marker16 state
## 17 channel017 marker17 state
## 18 channel018 marker18 state
## 19 channel019 marker19 state
## 20 channel020 marker20 state
For differential testing, the diffcyt
functions require the experimental design to be specified using a design matrix or model formula (depending on the testing function used; see the help files for the differential testing methods for details). Flexible experimental designs are possible, including blocking (e.g. batch effects or paired designs) and continuous covariates. See ?createDesignMatrix
or ?createFormula
for more details.
In addition, a contrast matrix is required to specify the comparison of interest (i.e. the combination of model parameters assumed to equal zero under the null hypothesis). See ?createContrast
for more details.
library(diffcyt)
# Create design matrix
design <- createDesignMatrix(experiment_info, cols_design = 2)
# Alternatively: create model formula (required for some methods)
formula <- createFormula(experiment_info, cols_fixed = 2, cols_random = 1)
# Create contrast matrix
contrast <- createContrast(c(0, 1))
The diffcyt
package includes a wrapper function diffcyt()
, which accepts input data in various formats and runs all steps in the diffcyt
pipeline in the correct sequence.
The first option for running the diffcyt
pipeline is to provide the raw data and meta-data directly to the wrapper function, along with arguments to specify the type of analysis and parameter choices, as well as the design matrix (or model formula) and contrast matrix. The input data can be provided as a flowSet
or a list of flowFrames
, DataFrames
, data.frames
, or matrices. See ?diffcyt
for more details.
Here, we run the wrapper function twice: to calculate tests for differential abundance (DA) of clusters, and tests for differential states (DS) within clusters. The results consist of p-values and adjusted p-values for each cluster (DA tests) or cluster-marker combination (DS tests), which can be used to rank the clusters or cluster-marker combinations by the strength of their differential evidence. The function topClusters
can be used to display the results for the top (most highly significant) detected clusters or cluster-marker combinations. See ?diffcyt
and ?topClusters
for more details.
# Test for differential abundance (DA) of clusters
out_DA <- diffcyt(d_input, experiment_info, marker_info,
design = design, contrast = contrast,
analysis_type = "DA", method_DA = "diffcyt-DA-edgeR",
seed_clustering = 123, verbose = FALSE)
## FlowSOM clustering completed in 0.3 seconds
# Test for differential states (DS) within clusters
out_DS <- diffcyt(d_input, experiment_info, marker_info,
design = design, contrast = contrast,
analysis_type = "DS", method_DS = "diffcyt-DS-limma",
seed_clustering = 123, plot = FALSE, verbose = FALSE)
## FlowSOM clustering completed in 0.3 seconds
## Warning: Partial NA coefficients for 20 probe(s)
# Display results for top DA clusters
topClusters(out_DA$res)
## DataFrame with 20 rows and 6 columns
## cluster_id logFC logCPM LR
## <factor> <numeric> <numeric> <numeric>
## 1 73 7.4400422928474 13.6392974131533 59.8898526824548
## 2 61 4.98244294541275 13.5518341547954 46.3506505766949
## 3 94 4.60696016550421 13.2522737880189 31.101306273126
## 4 84 6.34522910251425 12.7742243007621 27.8670525971469
## 5 83 1.97283386080915 13.3931340282898 13.6399825449091
## ... ... ... ... ...
## 16 12 -0.837259035501847 13.2522681309609 2.44784790032958
## 17 44 0.755752842073619 13.8495127575059 2.44474057182344
## 18 1 -0.569218348208356 14.0543277589098 2.32330790246751
## 19 2 0.735834037078332 13.2887958904107 2.08050434849183
## 20 18 -0.590748283472218 13.8976037429243 1.82320011935921
## p_val p_adj
## <numeric> <numeric>
## 1 1.00317332088145e-14 9.73078121255011e-13
## 2 9.88746687933515e-12 4.79542143647755e-10
## 3 2.44906523918522e-08 7.91864427336553e-07
## 4 1.29943743446979e-07 3.15113577858924e-06
## 5 0.000221419493704951 0.00429553817787604
## ... ... ...
## 16 0.117686121083218 0.672834105402875
## 17 0.117919379297411 0.672834105402875
## 18 0.127448742625069 0.686807113035095
## 19 0.149190874627574 0.715103519271971
## 20 0.176932829510591 0.715103519271971
# Number of significant detected DA clusters at 10% FDR
threshold <- 0.1
res_DA_all <- topClusters(out_DA$res, all = TRUE)
table(res_DA_all$p_adj <= threshold)
##
## FALSE TRUE
## 91 6
# Display results for top DS cluster-marker combinations
topClusters(out_DS$res)
## DataFrame with 20 rows and 9 columns
## cluster_id marker ID logFC AveExpr
## <factor> <factor> <character> <numeric> <numeric>
## 1 91 marker20 91 3.4589772386213 1.28490772418615
## 2 92 marker20 92 3.34887454866637 1.49546158879607
## 3 81 marker20 81 3.45221053932968 1.49606750606053
## 4 91 marker19 91 2.89978699820003 1.41874939024591
## 5 81 marker19 81 3.03116083668396 1.60509327435627
## ... ... ... ... ... ...
## 16 39 marker13 39 -1.0410635788519 0.0898028993373329
## 17 16 marker14 16 -1.54634182493088 -0.125984680303974
## 18 29 marker15 29 -1.06479974123836 0.0808527627744741
## 19 90 marker12 90 1.20986771463457 -0.277764362416852
## 20 99 marker19 99 0.925458329395848 0.154571350663079
## t p_val p_adj
## <numeric> <numeric> <numeric>
## 1 11.6076499450934 3.81766070212139e-30 3.62677766701532e-27
## 2 10.5291610795415 3.0937756614009e-25 1.46954343916543e-22
## 3 10.4860799477362 4.76365527498134e-25 1.50849083707742e-22
## 4 9.45596344007059 9.08219834991599e-21 2.15702210810505e-18
## 5 8.98498746672018 6.08260433452531e-19 1.15569482355981e-16
## ... ... ... ...
## 16 -3.02678366747064 0.00250491622917267 0.145797718132341
## 17 -3.00753281229326 0.00266834370197406 0.145797718132341
## 18 -2.99692796824966 0.00276248308040225 0.145797718132341
## 19 2.90668778543109 0.00369509465873561 0.184754732936781
## 20 2.69830035265918 0.00703113183005278 0.333978761927507
## B
## <numeric>
## 1 59.7882881230059
## 2 47.9727607277065
## 3 47.5168434326136
## 4 37.3648524293117
## 5 33.0734539716619
## ... ...
## 16 -2.11058483516643
## 17 -1.88612318828707
## 18 -2.17271437596916
## 19 -2.34908740313822
## 20 -3.03415920830407
# Number of significant detected DS cluster-marker combinations at 10% FDR
threshold <- 0.1
res_DS_all <- topClusters(out_DS$res, all = TRUE)
table(res_DS_all$p_adj <= threshold)
##
## FALSE TRUE
## 936 14
The second option for running the diffcyt
pipeline is to provide a CATALYST daFrame
as the input to the diffcyt()
wrapper function. This is particularly useful when CATALYST
has already been used to perform exploratory data analyses and clustering, and to generate visualizations. The diffcyt
methods can then be used to perform differential analyses using the existing object.
As in option 1, the diffcyt()
wrapper function also requires arguments to specify the type of analysis and parameter choices, as well as the design matrix (or model formula) and contrast matrix. See ?diffcyt
for more details.
Once a CATALYST
daFrame
object has been created, and the CATALYST
functions have been used to add cluster labels, the daFrame
can be provided as the input data object to the diffcyt()
wrapper function (i.e. replacing d_input
in the code shown for option 1 above). The results can then be accessed as in option 1. See ?diffcyt
for more details.
To provide more insight into the steps in the diffcyt
pipeline, we also run the pipeline using the individual functions for each step. In some cases, this may provide some additional flexibility, for example if a user wishes to customize or modify certain parts of the pipeline.
The first step is to prepare the input data into the format required by subsequent functions in the diffcyt
pipeline. Here, the data object d_se
contains cells in rows, and markers in columns. See ?prepareData()
for more details.
# Prepare data
d_se <- prepareData(d_input, experiment_info, marker_info)
Next, we transform the data using an arcsinh
transform with cofactor = 5
. This is a standard transform used for mass cytometry (CyTOF) data, which brings the data closer to a normal distribution, improving clustering performance and visualizations. See ?transformData()
for more details.
# Transform data
d_se <- transformData(d_se)
By default, we use the FlowSOM clustering algorithm (Van Gassen et al., 2015) to generate the high-resolution clustering. In principle, other clustering algorithms that can generate large numbers of clusters could also be used. See ?generateClusters()
for more details.
# Generate clusters
d_se <- generateClusters(d_se, seed_clustering = 123)
## FlowSOM clustering completed in 0.3 seconds
Next, calculate cluster cell counts and cluster medians (median marker expression for each cluster and sample). These objects are required to calculate the differential tests. See ?calcCounts
and ?calcMedians
for more details.
# Calculate counts
d_counts <- calcCounts(d_se)
# Calculate medians
d_medians <- calcMedians(d_se)
The design matrix (or model formula) specifies the experimental design. Flexible experimental designs are possible, including blocking (e.g. batch effects or paired designs) and continuous covariates. Note that some of the differential testing methods require a model formula instead of a design matrix (see the help files for the differential testing methods for details). See ?createDesignMatrix
or ?createFormula
for more details.
The contrast matrix specifies the comparison of interest, i.e. the combination of model parameters assumed to equal zero under the null hypothesis. See ?createContrast
for more details.
# Create design matrix
design <- createDesignMatrix(experiment_info, cols_design = 2)
# Alternatively: create model formula (required for some methods)
formula <- createFormula(experiment_info, cols_fixed = 2, cols_random = 1)
# Create contrast matrix
contrast <- createContrast(c(0, 1))
Calculate tests for differential abundance (DA) of clusters, using one of the DA testing methods (diffcyt-DA-edgeR
, diffcyt-DA-voom
, or diffcyt-DA-GLMM
). Here, we use the default method, diffcyt-DA-edgeR
. The results consist of p-values and adjusted p-values for each cluster, which can be used to rank the clusters by their evidence for differential abundance. The p-values and adjusted p-values are stored in the rowData
of the SummarizedExperiment
output object. For more details, see ?testDA_edgeR
, ?testDA_voom
, or ?testDA_GLMM
.
The function topClusters
can be used to display the results for the top (most highly significant) detected DA clusters. This extracts the results from the rowData
of the SummarizedExperiment
output object. The displayed results include cluster labels, p-values, and adjusted p-values. The adjusted p-values are stored in the column labeled p_adj
. See ?topClusters
for more details.
We also generate a table summarizing the number of detected DA clusters at a given adjusted p-value threshold.
# Test for differential abundance (DA) of clusters
res_DA <- testDA_edgeR(d_counts, design, contrast)
# Display results for top DA clusters
topClusters(res_DA)
## DataFrame with 20 rows and 6 columns
## cluster_id logFC logCPM LR
## <factor> <numeric> <numeric> <numeric>
## 1 73 7.4400422928474 13.6392974131533 59.8898526824548
## 2 61 4.98244294541275 13.5518341547954 46.3506505766949
## 3 94 4.60696016550421 13.2522737880189 31.101306273126
## 4 84 6.34522910251425 12.7742243007621 27.8670525971469
## 5 83 1.97283386080915 13.3931340282898 13.6399825449091
## ... ... ... ... ...
## 16 12 -0.837259035501847 13.2522681309609 2.44784790032958
## 17 44 0.755752842073619 13.8495127575059 2.44474057182344
## 18 1 -0.569218348208356 14.0543277589098 2.32330790246751
## 19 2 0.735834037078332 13.2887958904107 2.08050434849183
## 20 18 -0.590748283472218 13.8976037429243 1.82320011935921
## p_val p_adj
## <numeric> <numeric>
## 1 1.00317332088145e-14 9.73078121255011e-13
## 2 9.88746687933515e-12 4.79542143647755e-10
## 3 2.44906523918522e-08 7.91864427336553e-07
## 4 1.29943743446979e-07 3.15113577858924e-06
## 5 0.000221419493704951 0.00429553817787604
## ... ... ...
## 16 0.117686121083218 0.672834105402875
## 17 0.117919379297411 0.672834105402875
## 18 0.127448742625069 0.686807113035095
## 19 0.149190874627574 0.715103519271971
## 20 0.176932829510591 0.715103519271971
# Number of significant detected DA clusters at 10% FDR
threshold <- 0.1
res_DA_all <- topClusters(res_DA, all = TRUE)
table(res_DA_all$p_adj <= threshold)
##
## FALSE TRUE
## 91 6
Calculate tests for differential states (DS) within clusters, using one of the DS testing methods (diffcyt-DS-limma
or diffcyt-DS-LMM
). Here, we use the default method, diffcyt-DS-limma
. We test all cell state markers for differential expression (the set of markers to test can be adjusted with the optional argument markers_to_test
).
The results consist of p-values and adjusted p-values for each cluster-marker combination (cell state markers only), which can be used to rank the cluster-marker combinations by their evidence for differential states. The results are stored in the rowData
of the SummarizedExperiment
output object. For more details, see ?diffcyt-DS-limma
or ?diffcyt-DS-LMM
.
As above, we use the function topClusters
to display the results for the top (most highly significant) detected DS cluster-marker combinations. (Note that there is now one test result for each cluster-marker combination, instead of one per cluster as above). The displayed results include cluster labels, marker names, p-values, and adjusted p-values. The adjusted p-values are stored in the column labeled p_adj
. See ?topClusters
for more details.
We also generate a table summarizing the number of detected DS cluster-marker combinations at a given adjusted p-value threshold.
# Test for differential states (DS) within clusters
res_DS <- testDS_limma(d_counts, d_medians, design, contrast, plot = FALSE)
## Warning: Partial NA coefficients for 20 probe(s)
# Display results for top DS cluster-marker combinations
topClusters(res_DS)
## DataFrame with 20 rows and 9 columns
## cluster_id marker ID logFC AveExpr
## <factor> <factor> <character> <numeric> <numeric>
## 1 91 marker20 91 3.4589772386213 1.28490772418615
## 2 92 marker20 92 3.34887454866637 1.49546158879607
## 3 81 marker20 81 3.45221053932968 1.49606750606053
## 4 91 marker19 91 2.89978699820003 1.41874939024591
## 5 81 marker19 81 3.03116083668396 1.60509327435627
## ... ... ... ... ... ...
## 16 39 marker13 39 -1.0410635788519 0.0898028993373329
## 17 16 marker14 16 -1.54634182493088 -0.125984680303974
## 18 29 marker15 29 -1.06479974123836 0.0808527627744741
## 19 90 marker12 90 1.20986771463457 -0.277764362416852
## 20 99 marker19 99 0.925458329395848 0.154571350663079
## t p_val p_adj
## <numeric> <numeric> <numeric>
## 1 11.6076499450934 3.81766070212139e-30 3.62677766701532e-27
## 2 10.5291610795415 3.0937756614009e-25 1.46954343916543e-22
## 3 10.4860799477362 4.76365527498134e-25 1.50849083707742e-22
## 4 9.45596344007059 9.08219834991599e-21 2.15702210810505e-18
## 5 8.98498746672018 6.08260433452531e-19 1.15569482355981e-16
## ... ... ... ...
## 16 -3.02678366747064 0.00250491622917267 0.145797718132341
## 17 -3.00753281229326 0.00266834370197406 0.145797718132341
## 18 -2.99692796824966 0.00276248308040225 0.145797718132341
## 19 2.90668778543109 0.00369509465873561 0.184754732936781
## 20 2.69830035265918 0.00703113183005278 0.333978761927507
## B
## <numeric>
## 1 59.7882881230059
## 2 47.9727607277065
## 3 47.5168434326136
## 4 37.3648524293117
## 5 33.0734539716619
## ... ...
## 16 -2.11058483516643
## 17 -1.88612318828707
## 18 -2.17271437596916
## 19 -2.34908740313822
## 20 -3.03415920830407
# Number of significant detected DS cluster-marker combinations at 10% FDR
threshold <- 0.1
res_DS_all <- topClusters(res_DS, all = TRUE)
table(res_DS_all$p_adj <= threshold)
##
## FALSE TRUE
## 936 14
Visualizations to explore the data and illustrate the results can be generated using plotting functions available in the CATALYST package. (The CATALYST
plotting function accept inputs in both daFrame
and SummarizedExperiment
formats.)
This includes barplots of the number of cells per sample, multi-dimensional scaling (MDS) plots to represent overall similarity between samples (which are useful for identifying batch effects), and heatmaps to illustrate the phenotypes (marker expression profiles) and signals of interest (cluster abundances by sample, or expression of cell state markers by sample) for the detected clusters or cluster-marker combinations.
These plotting functions were originally developed by Malgorzata Nowicka for the CyTOF workflow available from Bioconductor (Nowicka et al., 2017), and have been adapted by Helena Crowell for the CATALYST
package.
Heatmaps are generated using the ComplexHeatmap Bioconductor package (Gu et al., 2016).
Additional plot of (arcsinh-transformed) marker expression profiles to explore the data.
suppressPackageStartupMessages(library(SummarizedExperiment))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(reshape2))
suppressPackageStartupMessages(library(ggridges))
# data frame for 'ggplot'
d_plot <- cbind(as.data.frame(assay(d_se)), as.data.frame(rowData(d_se)))
d_plot <- melt(d_plot, id.vars = c("sample_id", "group_id", "cluster_id"),
variable.name = "marker", value.name = "expression")
# ridgeline plot of marker expression profiles
ggplot(d_plot, aes(x = expression, y = marker, fill = group_id)) +
geom_density_ridges(alpha = 0.5) +
scale_fill_manual(values = c("orangered", "royalblue")) +
theme_bw()
## Picking joint bandwidth of 0.221
Additional heatmaps to illustrate results for the top (most highly significant) detected clusters (DA tests) and cluster-marker combinations (DS tests).
Each row represents a cluster (DA tests) or cluster-marker combination (DS tests). Columns represent protein markers or samples, depending on the panel. The left panel displays median (arcsinh-transformed) expression values across all samples for cell type markers. The right panel displays the signals of interest: cluster abundances by sample (DA tests) or median expression of cell state markers by sample (DS tests). The right annotation bar indicates clusters or cluster-marker combinations detected as significantly differential at an adjusted p-value threshold of 10%.
# Plot heatmap for DA tests
plotHeatmap(out_DA, analysis_type = "DA")
# Plot heatmap for DS tests
plotHeatmap(out_DS, analysis_type = "DS")
Chevrier, S., Crowell, H. L., Zanotelli, V. R. T., Engler, S., Robinson, M. D., and Bodenmiller, B. (2018). Compensation of Signal Spillover in Suspension and Imaging Mass Cytometry. Cell Systems, 6:1–9.
Gu, Z., Eils, R., and Schlesner, M. (2016). Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics, 32(18):2847–2849.
Law, C. W., Chen, Y., Shi, W., and Smyth, G. K. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 2014, 15:R29.
McCarthy, D. J., Chen, Y., and Smyth, G. K. (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research, 40(10):4288–4297.
Nowicka, M., Krieg, C., Weber, L. M., Hartmann, F. J., Guglietta, S., Becher, B., Levesque, M. P., and Robinson, M. D. (2017). CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research, version 2.
Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., and Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7):e47.
Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1):139–140.
Van Gassen, S., Callebaut, B., Van Helden, M. J., Lambrecht, B. N., Demeester, P., Dhaene, T., and Saeys, Y. (2015). FlowSOM: Using Self-Organizing Maps for Visualization and Interpretation of Cytometry Data. Cytometry Part A, 87A:636–645.
Weber, L. M. and Robinson, M. D. (2016). Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data. Cytometry Part A, 89A:1084–1096.