Contents

1 Introduction

The diffcyt package implements statistical methods for differential discovery analyses in high-dimensional cytometry data, based on high-resolution clustering and moderated tests adapted from transcriptomics.

High-dimensional cytometry includes multi-color flow cytometry, mass cytometry or CyTOF, and oligonucleotide-tagged cytometry. These technologies use antibodies to measure expression levels of dozens (around 10 to 100) of marker proteins in thousands of cells. In many experiments, the aim is to detect differential abundance (DA) of cell populations, or differential states (DS) within cell populations, between groups of samples in different conditions.

This vignette provides a complete example workflow for running the diffcyt pipeline, using either the wrapper function diffcyt(), or the individual functions for each step.

The input to the diffcyt pipeline can either be raw data, or a pre-prepared daFrame object from the CATALYST package (Chevrier, Crowell, Zanotelli et al., 2018). Providing a daFrame is particularly useful when CATALYST has already been used for exploratory analyses and visualizations; the diffcyt methods can then be used for differential testing.

2 Overview of methodology

The diffcyt methodology consists of two main components: high-resolution clustering and moderated tests.

We use high-resolution clustering to define a large number of small clusters representing cell populations. By default, we use the FlowSOM clustering algorithm (Van Gassen et al., 2015) to generate the high-resolution clusters, since we previously showed that this clustering algorithm gives excellent clustering performance and fast runtimes for high-dimensional cytometry data (Weber and Robinson, 2016). However, in principle, other algorithms that can generate high-resolution clusters could also be used.

For the differential analyses, we use methods from the edgeR package (Robinson et al., 2010; McCarthy et al., 2012), limma package (Ritchie et al., 2015), and voom method (Law et al., 2014). These methods are widely used in the transcriptomics field, and have been adapted here for analyzing high-dimensional cytometry data. In addition, we provide alternative methods based on generalized linear mixed models (GLMMs), linear mixed models (LMMs), and linear models (LMs), developed by Nowicka et al. (2017) (available in the CyTOF workflow).

2.1 Differential abundance (DA) and differential states (DS)

The diffcyt methods can be used to test for differential abundance (DA) of cell populations, and differential states (DS) within cell populations.

To do this, the methodology requires the set of protein markers to be grouped into ‘cell type’ and ‘cell state’ markers. Cell type markers are used to define clusters, which are tested for DA; cell state marker signals are used to test for DS within clusters.

The conceptual split into cell type and cell state markers also facilitates biological interpretability, since it allows the results to be linked back to known cell types or populations of interest.

2.2 Flexible experimental designs and contrasts

The diffcyt model setup enables the user to specify flexible experimental designs, including batch effects, paired designs, and continuous covariates. Linear contrasts are used to specify the comparison of interest.

2.3 More details

A complete description of the statistical methodology, as well as comparisons with existing approaches, are provided in the accompanying paper.

3 diffcyt pipeline

3.1 Create random data

First, we create some random raw data containing a true differential signal.

# Function to create random data (one sample)
d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) {
  d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor
  colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol))
  d
}

# Create random data (without differential signal)
set.seed(123)
d_input <- list(
  sample1 = d_random(), 
  sample2 = d_random(), 
  sample3 = d_random(), 
  sample4 = d_random()
)

# Add DA signal
ix_DA <- 801:900
ix_cols_type <- 1:10
d_input[[3]][ix_DA, ix_cols_type] <- d_random(n = 1000, mean = 2, ncol = 10)
d_input[[4]][ix_DA, ix_cols_type] <- d_random(n = 1000, mean = 2, ncol = 10)

# Add DS signal
ix_DS <- 901:1000
ix_cols_DS <- 19:20
d_input[[1]][ix_DS, ix_cols_type] <- d_random(n = 1000, mean = 3, ncol = 10)
d_input[[2]][ix_DS, ix_cols_type] <- d_random(n = 1000, mean = 3, ncol = 10)
d_input[[3]][ix_DS, c(ix_cols_type, ix_cols_DS)] <- d_random(n = 1200, mean = 3, ncol = 12)
d_input[[4]][ix_DS, c(ix_cols_type, ix_cols_DS)] <- d_random(n = 1200, mean = 3, ncol = 12)

3.2 Create meta-data

The ‘meta-data’ describing the data set is summarized in two data frames: experiment_info and marker_info.

The experiment_info data frame contains information about each sample, including sample IDs, group IDs, batch IDs or patient IDs (if relevant), and continuous covariates (if relevant). The marker_info data frame contains information about the protein markers, including channel names, marker names, and a vector to identify the class of each marker (cell type or cell state).

# Experiment information
experiment_info <- data.frame(
  sample_id = factor(paste0("sample", 1:4)), 
  group_id = factor(c("group1", "group1", "group2", "group2")), 
  stringsAsFactors = FALSE
)

experiment_info
##   sample_id group_id
## 1   sample1   group1
## 2   sample2   group1
## 3   sample3   group2
## 4   sample4   group2
# Marker information
marker_info <- data.frame(
  channel_name = paste0("channel", sprintf("%03d", 1:20)), 
  marker_name = paste0("marker", sprintf("%02d", 1:20)), 
  marker_class = factor(c(rep("type", 10), rep("state", 10)), 
                        levels = c("type", "state", "none")), 
  stringsAsFactors = FALSE
)

marker_info
##    channel_name marker_name marker_class
## 1    channel001    marker01         type
## 2    channel002    marker02         type
## 3    channel003    marker03         type
## 4    channel004    marker04         type
## 5    channel005    marker05         type
## 6    channel006    marker06         type
## 7    channel007    marker07         type
## 8    channel008    marker08         type
## 9    channel009    marker09         type
## 10   channel010    marker10         type
## 11   channel011    marker11        state
## 12   channel012    marker12        state
## 13   channel013    marker13        state
## 14   channel014    marker14        state
## 15   channel015    marker15        state
## 16   channel016    marker16        state
## 17   channel017    marker17        state
## 18   channel018    marker18        state
## 19   channel019    marker19        state
## 20   channel020    marker20        state

3.3 Design matrix (or model formula) and contrast matrix

For differential testing, the diffcyt functions require the experimental design to be specified using a design matrix or model formula (depending on the testing function used; see the help files for the differential testing methods for details). Flexible experimental designs are possible, including blocking (e.g. batch effects or paired designs) and continuous covariates. See ?createDesignMatrix or ?createFormula for more details.

In addition, a contrast matrix is required to specify the comparison of interest (i.e. the combination of model parameters assumed to equal zero under the null hypothesis). See ?createContrast for more details.

library(diffcyt)

# Create design matrix
design <- createDesignMatrix(experiment_info, cols_design = 2)

# Alternatively: create model formula (required for some methods)
formula <- createFormula(experiment_info, cols_fixed = 2, cols_random = 1)

# Create contrast matrix
contrast <- createContrast(c(0, 1))

3.4 Differential testing

3.4.1 Option 1: Wrapper function using raw data

The diffcyt package includes a wrapper function diffcyt(), which accepts input data in various formats and runs all steps in the diffcyt pipeline in the correct sequence.

The first option for running the diffcyt pipeline is to provide the raw data and meta-data directly to the wrapper function, along with arguments to specify the type of analysis and parameter choices, as well as the design matrix (or model formula) and contrast matrix. The input data can be provided as a flowSet or a list of flowFrames, DataFrames, data.frames, or matrices. See ?diffcyt for more details.

Here, we run the wrapper function twice: to calculate tests for differential abundance (DA) of clusters, and tests for differential states (DS) within clusters. The results consist of p-values and adjusted p-values for each cluster (DA tests) or cluster-marker combination (DS tests), which can be used to rank the clusters or cluster-marker combinations by the strength of their differential evidence. The function topClusters can be used to display the results for the top (most highly significant) detected clusters or cluster-marker combinations. See ?diffcyt and ?topClusters for more details.

# Test for differential abundance (DA) of clusters
out_DA <- diffcyt(d_input, experiment_info, marker_info, 
                  design = design, contrast = contrast, 
                  analysis_type = "DA", method_DA = "diffcyt-DA-edgeR", 
                  seed_clustering = 123, verbose = FALSE)
## FlowSOM clustering completed in 0.3 seconds
# Test for differential states (DS) within clusters
out_DS <- diffcyt(d_input, experiment_info, marker_info, 
                  design = design, contrast = contrast, 
                  analysis_type = "DS", method_DS = "diffcyt-DS-limma", 
                  seed_clustering = 123, plot = FALSE, verbose = FALSE)
## FlowSOM clustering completed in 0.3 seconds
## Warning: Partial NA coefficients for 20 probe(s)
# Display results for top DA clusters
topClusters(out_DA$res)
## DataFrame with 20 rows and 6 columns
##     cluster_id              logFC           logCPM               LR
##       <factor>          <numeric>        <numeric>        <numeric>
## 1           73    7.4400422928474 13.6392974131533 59.8898526824548
## 2           61   4.98244294541275 13.5518341547954 46.3506505766949
## 3           94   4.60696016550421 13.2522737880189  31.101306273126
## 4           84   6.34522910251425 12.7742243007621 27.8670525971469
## 5           83   1.97283386080915 13.3931340282898 13.6399825449091
## ...        ...                ...              ...              ...
## 16          12 -0.837259035501847 13.2522681309609 2.44784790032958
## 17          44  0.755752842073619 13.8495127575059 2.44474057182344
## 18           1 -0.569218348208356 14.0543277589098 2.32330790246751
## 19           2  0.735834037078332 13.2887958904107 2.08050434849183
## 20          18 -0.590748283472218 13.8976037429243 1.82320011935921
##                    p_val                p_adj
##                <numeric>            <numeric>
## 1   1.00317332088145e-14 9.73078121255011e-13
## 2   9.88746687933515e-12 4.79542143647755e-10
## 3   2.44906523918522e-08 7.91864427336553e-07
## 4   1.29943743446979e-07 3.15113577858924e-06
## 5   0.000221419493704951  0.00429553817787604
## ...                  ...                  ...
## 16     0.117686121083218    0.672834105402875
## 17     0.117919379297411    0.672834105402875
## 18     0.127448742625069    0.686807113035095
## 19     0.149190874627574    0.715103519271971
## 20     0.176932829510591    0.715103519271971
# Number of significant detected DA clusters at 10% FDR
threshold <- 0.1
res_DA_all <- topClusters(out_DA$res, all = TRUE)
table(res_DA_all$p_adj <= threshold)
## 
## FALSE  TRUE 
##    91     6
# Display results for top DS cluster-marker combinations
topClusters(out_DS$res)
## DataFrame with 20 rows and 9 columns
##     cluster_id   marker          ID             logFC            AveExpr
##       <factor> <factor> <character>         <numeric>          <numeric>
## 1           91 marker20          91   3.4589772386213   1.28490772418615
## 2           92 marker20          92  3.34887454866637   1.49546158879607
## 3           81 marker20          81  3.45221053932968   1.49606750606053
## 4           91 marker19          91  2.89978699820003   1.41874939024591
## 5           81 marker19          81  3.03116083668396   1.60509327435627
## ...        ...      ...         ...               ...                ...
## 16          39 marker13          39  -1.0410635788519 0.0898028993373329
## 17          16 marker14          16 -1.54634182493088 -0.125984680303974
## 18          29 marker15          29 -1.06479974123836 0.0808527627744741
## 19          90 marker12          90  1.20986771463457 -0.277764362416852
## 20          99 marker19          99 0.925458329395848  0.154571350663079
##                     t                p_val                p_adj
##             <numeric>            <numeric>            <numeric>
## 1    11.6076499450934 3.81766070212139e-30 3.62677766701532e-27
## 2    10.5291610795415  3.0937756614009e-25 1.46954343916543e-22
## 3    10.4860799477362 4.76365527498134e-25 1.50849083707742e-22
## 4    9.45596344007059 9.08219834991599e-21 2.15702210810505e-18
## 5    8.98498746672018 6.08260433452531e-19 1.15569482355981e-16
## ...               ...                  ...                  ...
## 16  -3.02678366747064  0.00250491622917267    0.145797718132341
## 17  -3.00753281229326  0.00266834370197406    0.145797718132341
## 18  -2.99692796824966  0.00276248308040225    0.145797718132341
## 19   2.90668778543109  0.00369509465873561    0.184754732936781
## 20   2.69830035265918  0.00703113183005278    0.333978761927507
##                     B
##             <numeric>
## 1    59.7882881230059
## 2    47.9727607277065
## 3    47.5168434326136
## 4    37.3648524293117
## 5    33.0734539716619
## ...               ...
## 16  -2.11058483516643
## 17  -1.88612318828707
## 18  -2.17271437596916
## 19  -2.34908740313822
## 20  -3.03415920830407
# Number of significant detected DS cluster-marker combinations at 10% FDR
threshold <- 0.1
res_DS_all <- topClusters(out_DS$res, all = TRUE)
table(res_DS_all$p_adj <= threshold)
## 
## FALSE  TRUE 
##   936    14

3.4.2 Option 2: Wrapper function using CATALYST ‘daFrame’

The second option for running the diffcyt pipeline is to provide a CATALYST daFrame as the input to the diffcyt() wrapper function. This is particularly useful when CATALYST has already been used to perform exploratory data analyses and clustering, and to generate visualizations. The diffcyt methods can then be used to perform differential analyses using the existing object.

As in option 1, the diffcyt() wrapper function also requires arguments to specify the type of analysis and parameter choices, as well as the design matrix (or model formula) and contrast matrix. See ?diffcyt for more details.

Once a CATALYST daFrame object has been created, and the CATALYST functions have been used to add cluster labels, the daFrame can be provided as the input data object to the diffcyt() wrapper function (i.e. replacing d_input in the code shown for option 1 above). The results can then be accessed as in option 1. See ?diffcyt for more details.

3.4.3 Option 3: Individual functions

To provide more insight into the steps in the diffcyt pipeline, we also run the pipeline using the individual functions for each step. In some cases, this may provide some additional flexibility, for example if a user wishes to customize or modify certain parts of the pipeline.

3.4.3.1 Prepare data into required format

The first step is to prepare the input data into the format required by subsequent functions in the diffcyt pipeline. Here, the data object d_se contains cells in rows, and markers in columns. See ?prepareData() for more details.

# Prepare data
d_se <- prepareData(d_input, experiment_info, marker_info)

3.4.3.2 Transform data

Next, we transform the data using an arcsinh transform with cofactor = 5. This is a standard transform used for mass cytometry (CyTOF) data, which brings the data closer to a normal distribution, improving clustering performance and visualizations. See ?transformData() for more details.

# Transform data
d_se <- transformData(d_se)

3.4.3.3 Generate clusters

By default, we use the FlowSOM clustering algorithm (Van Gassen et al., 2015) to generate the high-resolution clustering. In principle, other clustering algorithms that can generate large numbers of clusters could also be used. See ?generateClusters() for more details.

# Generate clusters
d_se <- generateClusters(d_se, seed_clustering = 123)
## FlowSOM clustering completed in 0.3 seconds

3.4.3.4 Calculate features

Next, calculate cluster cell counts and cluster medians (median marker expression for each cluster and sample). These objects are required to calculate the differential tests. See ?calcCounts and ?calcMedians for more details.

# Calculate counts
d_counts <- calcCounts(d_se)

# Calculate medians
d_medians <- calcMedians(d_se)

3.4.3.5 Create design matrix (or model formula) and contrast matrix

The design matrix (or model formula) specifies the experimental design. Flexible experimental designs are possible, including blocking (e.g. batch effects or paired designs) and continuous covariates. Note that some of the differential testing methods require a model formula instead of a design matrix (see the help files for the differential testing methods for details). See ?createDesignMatrix or ?createFormula for more details.

The contrast matrix specifies the comparison of interest, i.e. the combination of model parameters assumed to equal zero under the null hypothesis. See ?createContrast for more details.

# Create design matrix
design <- createDesignMatrix(experiment_info, cols_design = 2)

# Alternatively: create model formula (required for some methods)
formula <- createFormula(experiment_info, cols_fixed = 2, cols_random = 1)

# Create contrast matrix
contrast <- createContrast(c(0, 1))

3.4.3.6 Test for differential abundance (DA) of cell populations

Calculate tests for differential abundance (DA) of clusters, using one of the DA testing methods (diffcyt-DA-edgeR, diffcyt-DA-voom, or diffcyt-DA-GLMM). Here, we use the default method, diffcyt-DA-edgeR. The results consist of p-values and adjusted p-values for each cluster, which can be used to rank the clusters by their evidence for differential abundance. The p-values and adjusted p-values are stored in the rowData of the SummarizedExperiment output object. For more details, see ?testDA_edgeR, ?testDA_voom, or ?testDA_GLMM.

The function topClusters can be used to display the results for the top (most highly significant) detected DA clusters. This extracts the results from the rowData of the SummarizedExperiment output object. The displayed results include cluster labels, p-values, and adjusted p-values. The adjusted p-values are stored in the column labeled p_adj. See ?topClusters for more details.

We also generate a table summarizing the number of detected DA clusters at a given adjusted p-value threshold.

# Test for differential abundance (DA) of clusters
res_DA <- testDA_edgeR(d_counts, design, contrast)

# Display results for top DA clusters
topClusters(res_DA)
## DataFrame with 20 rows and 6 columns
##     cluster_id              logFC           logCPM               LR
##       <factor>          <numeric>        <numeric>        <numeric>
## 1           73    7.4400422928474 13.6392974131533 59.8898526824548
## 2           61   4.98244294541275 13.5518341547954 46.3506505766949
## 3           94   4.60696016550421 13.2522737880189  31.101306273126
## 4           84   6.34522910251425 12.7742243007621 27.8670525971469
## 5           83   1.97283386080915 13.3931340282898 13.6399825449091
## ...        ...                ...              ...              ...
## 16          12 -0.837259035501847 13.2522681309609 2.44784790032958
## 17          44  0.755752842073619 13.8495127575059 2.44474057182344
## 18           1 -0.569218348208356 14.0543277589098 2.32330790246751
## 19           2  0.735834037078332 13.2887958904107 2.08050434849183
## 20          18 -0.590748283472218 13.8976037429243 1.82320011935921
##                    p_val                p_adj
##                <numeric>            <numeric>
## 1   1.00317332088145e-14 9.73078121255011e-13
## 2   9.88746687933515e-12 4.79542143647755e-10
## 3   2.44906523918522e-08 7.91864427336553e-07
## 4   1.29943743446979e-07 3.15113577858924e-06
## 5   0.000221419493704951  0.00429553817787604
## ...                  ...                  ...
## 16     0.117686121083218    0.672834105402875
## 17     0.117919379297411    0.672834105402875
## 18     0.127448742625069    0.686807113035095
## 19     0.149190874627574    0.715103519271971
## 20     0.176932829510591    0.715103519271971
# Number of significant detected DA clusters at 10% FDR
threshold <- 0.1
res_DA_all <- topClusters(res_DA, all = TRUE)
table(res_DA_all$p_adj <= threshold)
## 
## FALSE  TRUE 
##    91     6

3.4.3.7 Test for differential states (DS) within cell populations

Calculate tests for differential states (DS) within clusters, using one of the DS testing methods (diffcyt-DS-limma or diffcyt-DS-LMM). Here, we use the default method, diffcyt-DS-limma. We test all cell state markers for differential expression (the set of markers to test can be adjusted with the optional argument markers_to_test).

The results consist of p-values and adjusted p-values for each cluster-marker combination (cell state markers only), which can be used to rank the cluster-marker combinations by their evidence for differential states. The results are stored in the rowData of the SummarizedExperiment output object. For more details, see ?diffcyt-DS-limma or ?diffcyt-DS-LMM.

As above, we use the function topClusters to display the results for the top (most highly significant) detected DS cluster-marker combinations. (Note that there is now one test result for each cluster-marker combination, instead of one per cluster as above). The displayed results include cluster labels, marker names, p-values, and adjusted p-values. The adjusted p-values are stored in the column labeled p_adj. See ?topClusters for more details.

We also generate a table summarizing the number of detected DS cluster-marker combinations at a given adjusted p-value threshold.

# Test for differential states (DS) within clusters
res_DS <- testDS_limma(d_counts, d_medians, design, contrast, plot = FALSE)
## Warning: Partial NA coefficients for 20 probe(s)
# Display results for top DS cluster-marker combinations
topClusters(res_DS)
## DataFrame with 20 rows and 9 columns
##     cluster_id   marker          ID             logFC            AveExpr
##       <factor> <factor> <character>         <numeric>          <numeric>
## 1           91 marker20          91   3.4589772386213   1.28490772418615
## 2           92 marker20          92  3.34887454866637   1.49546158879607
## 3           81 marker20          81  3.45221053932968   1.49606750606053
## 4           91 marker19          91  2.89978699820003   1.41874939024591
## 5           81 marker19          81  3.03116083668396   1.60509327435627
## ...        ...      ...         ...               ...                ...
## 16          39 marker13          39  -1.0410635788519 0.0898028993373329
## 17          16 marker14          16 -1.54634182493088 -0.125984680303974
## 18          29 marker15          29 -1.06479974123836 0.0808527627744741
## 19          90 marker12          90  1.20986771463457 -0.277764362416852
## 20          99 marker19          99 0.925458329395848  0.154571350663079
##                     t                p_val                p_adj
##             <numeric>            <numeric>            <numeric>
## 1    11.6076499450934 3.81766070212139e-30 3.62677766701532e-27
## 2    10.5291610795415  3.0937756614009e-25 1.46954343916543e-22
## 3    10.4860799477362 4.76365527498134e-25 1.50849083707742e-22
## 4    9.45596344007059 9.08219834991599e-21 2.15702210810505e-18
## 5    8.98498746672018 6.08260433452531e-19 1.15569482355981e-16
## ...               ...                  ...                  ...
## 16  -3.02678366747064  0.00250491622917267    0.145797718132341
## 17  -3.00753281229326  0.00266834370197406    0.145797718132341
## 18  -2.99692796824966  0.00276248308040225    0.145797718132341
## 19   2.90668778543109  0.00369509465873561    0.184754732936781
## 20   2.69830035265918  0.00703113183005278    0.333978761927507
##                     B
##             <numeric>
## 1    59.7882881230059
## 2    47.9727607277065
## 3    47.5168434326136
## 4    37.3648524293117
## 5    33.0734539716619
## ...               ...
## 16  -2.11058483516643
## 17  -1.88612318828707
## 18  -2.17271437596916
## 19  -2.34908740313822
## 20  -3.03415920830407
# Number of significant detected DS cluster-marker combinations at 10% FDR
threshold <- 0.1
res_DS_all <- topClusters(res_DS, all = TRUE)
table(res_DS_all$p_adj <= threshold)
## 
## FALSE  TRUE 
##   936    14

3.5 Visualizations

Visualizations to explore the data and illustrate the results can be generated using plotting functions available in the CATALYST package. (The CATALYST plotting function accept inputs in both daFrame and SummarizedExperiment formats.)

This includes barplots of the number of cells per sample, multi-dimensional scaling (MDS) plots to represent overall similarity between samples (which are useful for identifying batch effects), and heatmaps to illustrate the phenotypes (marker expression profiles) and signals of interest (cluster abundances by sample, or expression of cell state markers by sample) for the detected clusters or cluster-marker combinations.

These plotting functions were originally developed by Malgorzata Nowicka for the CyTOF workflow available from Bioconductor (Nowicka et al., 2017), and have been adapted by Helena Crowell for the CATALYST package.

Heatmaps are generated using the ComplexHeatmap Bioconductor package (Gu et al., 2016).

3.5.1 Marker expression profiles

Additional plot of (arcsinh-transformed) marker expression profiles to explore the data.

suppressPackageStartupMessages(library(SummarizedExperiment))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(reshape2))
suppressPackageStartupMessages(library(ggridges))

# data frame for 'ggplot'
d_plot <- cbind(as.data.frame(assay(d_se)), as.data.frame(rowData(d_se)))
d_plot <- melt(d_plot, id.vars = c("sample_id", "group_id", "cluster_id"), 
               variable.name = "marker", value.name = "expression")

# ridgeline plot of marker expression profiles
ggplot(d_plot, aes(x = expression, y = marker, fill = group_id)) + 
  geom_density_ridges(alpha = 0.5) + 
  scale_fill_manual(values = c("orangered", "royalblue")) + 
  theme_bw()
## Picking joint bandwidth of 0.221

3.5.2 Heatmaps

Additional heatmaps to illustrate results for the top (most highly significant) detected clusters (DA tests) and cluster-marker combinations (DS tests).

Each row represents a cluster (DA tests) or cluster-marker combination (DS tests). Columns represent protein markers or samples, depending on the panel. The left panel displays median (arcsinh-transformed) expression values across all samples for cell type markers. The right panel displays the signals of interest: cluster abundances by sample (DA tests) or median expression of cell state markers by sample (DS tests). The right annotation bar indicates clusters or cluster-marker combinations detected as significantly differential at an adjusted p-value threshold of 10%.

# Plot heatmap for DA tests
plotHeatmap(out_DA, analysis_type = "DA")

# Plot heatmap for DS tests
plotHeatmap(out_DS, analysis_type = "DS")

4 References

Chevrier, S., Crowell, H. L., Zanotelli, V. R. T., Engler, S., Robinson, M. D., and Bodenmiller, B. (2018). Compensation of Signal Spillover in Suspension and Imaging Mass Cytometry. Cell Systems, 6:1–9.

Gu, Z., Eils, R., and Schlesner, M. (2016). Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics, 32(18):2847–2849.

Law, C. W., Chen, Y., Shi, W., and Smyth, G. K. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 2014, 15:R29.

McCarthy, D. J., Chen, Y., and Smyth, G. K. (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research, 40(10):4288–4297.

Nowicka, M., Krieg, C., Weber, L. M., Hartmann, F. J., Guglietta, S., Becher, B., Levesque, M. P., and Robinson, M. D. (2017). CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets. F1000Research, version 2.

Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., and Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7):e47.

Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1):139–140.

Van Gassen, S., Callebaut, B., Van Helden, M. J., Lambrecht, B. N., Demeester, P., Dhaene, T., and Saeys, Y. (2015). FlowSOM: Using Self-Organizing Maps for Visualization and Interpretation of Cytometry Data. Cytometry Part A, 87A:636–645.

Weber, L. M. and Robinson, M. D. (2016). Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data. Cytometry Part A, 89A:1084–1096.