SDA {SDAMS}R Documentation

Semi-parametric differential abuandance/expression analysis

Description

This function considers a two-part semi-parametric model for metabolomics, proteomics and single-cell RNA sequencing data. A kernel-smoothed method is applied to estimate the regression coefficients. And likelihood ratio test is constructed for differential abundance/expression analysis.

Usage

SDA(sumExp, VOI = NULL, ...)

Arguments

sumExp

An object of 'SummarizedExperiment' class.

VOI

Variable of interest. Default is NULL, when there is only one covariate, otherwise it must be one of the column names in colData.

...

Additional arguments passed to qvalue.

Details

The differential abundance/expression analysis is to compare metabolomic or proteomic profiles or gene expression between different experimental groups, which utilizes a two-part model: a logistic regression model to characterize the zero proportion and a semi-parametric model to characterize non-zero values. Let Y_i be the random variable and X_i is a vector of covariates. This two-part model has the following form:

log(pi_i/(1-pi_i))=gamma_0 + gamma*X_i

log(Y_i)=beta*X_i+ epsilon_i

where pi_i=Pr(Y_i=0). The model parameters gamma quantify the covariates effects on the fraction of zero values and gamma_0 is the intercept. beta are the model parameters quantifying the covariates effects on the non-zero values, epsilon_i are independent error terms with a common but completely unspecified density function f.

For differential abundant analysis on data from mass spectrometry, Y_i represents the abundance of certain feature for subject i, pi_i is the probability of point mass. X_i=(X_i1, X_i2,..., X_iQ)^T is a Q-vector of covariates that specifies the treatment conditions applied to subject i. The corresponding Q-vector of model parameters gamma=(gamma_1, gamma_2,...,gamma_Q)^T and beta=(beta_1, beta_2,..., beta_Q)^T quantify the covariates effects for certain feature. Hypothesis testing on the effect of the qth covariate on certain feature is performed by assessing gamma_q and beta_q. Consider the null hypothesis H_0: gamma_q=0 and beta_q=0 against alternative hypothesis H_1: at least one of the two parameters is non-zero. We also consider the hypotheses for testing gamma_q=0 and beta_q=0 individually.

For differential expression analysis on single-cell RNA sequencing data, Y_i represents represents the expression (TPM value) of certain gene in ith cell, pi_i is the drop-out probability. X_i=(Z_i, W_i)^T is a vector of covariates with Z_i being a binary indicator of the cell population under comparison and W_i being a vector of other covariates, e.g. cell size, and gamma =(gamma_Z, gamma_W) and beta= (beta_Z, beta_W) are model parameters. Hypothesis testing on the effect of different cell subpopulations on certain gene is performed by assessing gamma_Z and beta_Z. For each gene, the likelihood ratio test is performed on the null hypothesis H_0: gamma_Z=0 and beta_Z=0 against alternative hypothesis H_1: at least one of the two parameters is non-zero. We also consider the hypotheses for testing gamma_Z=0 and beta_Z=0 individually.

The p-value is calculated based on an asympotic chi-squared distribution. To adjust for multiple comparisons across features, the false discovery discovery rate (FDR) q-value is calculated based on the qvalue function in R/Bioconductor.

Value

A list containing the following components:

gamma

a matrix of point estimators for gamma_g in the logistic model (binary part)

beta

a matrix of point estimators for beta_g in the semi-parametric model (non-zero part)

pv_gamma

a matrix of one-part p-values for gamma_g

pv_beta

a matrix of one-part p-values for beta_g

qv_gamma

a matrix of one-part q-values for gamma_g

qv_beta

a matrix of one-part q-values for beta_g

pv_2part

a matrix of two-part p-values for overall test

qv_2part

a matrix of two-part q-values for overall test

feat.names

a vector of feature/gene names

Author(s)

Yuntong Li <yuntong.li@uky.edu>, Chi Wang <chi.wang@uky.edu>, Li Chen <lichenuky@uky.edu>

Examples

##--------- load data ------------
data(exampleSumExp)

results = SDA(exampleSumExp)

##------ two part q-values -------
results$qv_2part

[Package SDAMS version 1.12.0 Index]