This vignette summarizes the functionalities and options of MSstastTMTPTM and provides a workflow example.
MSstatsTMTPTM includes the following two functions for data visualization and statistical testing:
dataProcessPlotsTMTPMT
groupComparisonTMTPTM
To install this package, start R (version “4.0”) and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MSstatsTMTPTM")
#> Bioconductor version 3.12 (BiocManager 1.30.10), R 4.0.3 (2020-10-10)
#> Installing package(s) 'MSstatsTMTPTM'
#> Warning: package 'MSstatsTMTPTM' is not available for this version of R
#>
#> A version of this package for your version of R might be available elsewhere,
#> see the ideas at
#> https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
#> Old packages: 'SQUAREM', 'batchtools'
To illustrate the quantitative data and quality control of MS runs, dataProcessPlotsTMT takes the quantitative data from MSstatsTMT converter functions as input and generate two types of figures in pdf files as output : 1. Profile plot (specify “ProfilePlot” in option type), to identify the potential sources of variation for each protein; 2. Quality control plot (specify “QCPlot” in option type), to evaluate the systematic bias between MS runs.
data.ptm
name of the data with PTM sites in protein name, which can be the output of MSstatsTMT converter functions.data.protein
name of the data with peptide level, which can be the output of MSstatsTMT converter functions.data.ptm.summarization
name of the data with ptm sites in protein-level name , which can be the output of the MSstatsTMT function.data.protein.summarization
name of the data with protein-level, which can be the output of the MSstatsTMT function.type
choice of visualization. “ProfilePlot” represents profile plot of log intensities across MS runs. “QCPlot” represents box plots of log intensities across channels and MS runs.ylimUp
upper limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot uses the upper limit as rounded off maximum of log2(intensities) after normalization + 3..ylimDown
lower limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot uses 0..x.axis.size
size of x-axis labeling for “Run” and “channel in Profile Plot and QC Plot.y.axis.size
size of y-axis labels. Default is 10.text.size
size of labels represented each condition at the top of Profile plot and QC plot. Default is 4.text.angle
angle of labels represented each condition at the top of Profile plot and QC plot. Default is 0.legend.size
size of legend above Profile plot. Default is 7.dot.size.profile
size of dots in Profile plot. Default is 2.ncol.guide
number of columns for legends at the top of plot. Default is 5.width
width of the saved pdf file. Default is 10.height
height of the saved pdf file. Default is 10.which.Protein
Protein list to draw plots. List can be names of Proteins or order numbers of Proteins. Default is “all”, which generates all plots for each protein. For QC plot, “allonly” will generate one QC plot with all proteins.originalPlot
TRUE(default) draws original profile plots, without normalization.summaryPlot
TRUE(default) draws profile plots with protein summarization for each channel and MS run.address
the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of “ProfilePlot.pdf” or “QCplot.pdf”. The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window.The raw dataset for both the PTM and Protein datasets are required for the plotting function. This can be the output of the MSstatsTMT converter functions: PDtoMSstatsTMTFormat
, SpectroMinetoMSstatsTMTFormat
, and OpenMStoMSstatsTMTFormat
. Both the PTM and protein datasets must include the following columns: ProteinName
, PeptideSequence
, Charge
, PSM
, Mixture
, TechRepMixture
, Run
, Channel
, Condition
, BioReplicate
, and Intensity
.
# read in raw data files
# raw.ptm <- read.csv(file="raw.ptm.csv", header=TRUE)
# raw.protein <- read.csv(file="raw.protein.csv", header=TRUE)
head(raw.ptm)
#> ProteinName PeptideSequence Charge PSM Mixture TechRepMixture
#> 1 Protein_12_S703 Peptide_491 3 Peptide_491_3 1 1
#> 2 Protein_12_S703 Peptide_491 3 Peptide_491_3 1 1
#> 3 Protein_12_S703 Peptide_491 3 Peptide_491_3 1 1
#> 4 Protein_12_S703 Peptide_491 3 Peptide_491_3 1 1
#> 5 Protein_12_S703 Peptide_491 3 Peptide_491_3 1 1
#> 6 Protein_12_S703 Peptide_491 3 Peptide_491_3 1 1
#> Run Channel Condition BioReplicate Intensity
#> 1 1_1 128N Condition_2 Condition_2_1 48030.0
#> 2 1_1 129C Condition_4 Condition_4_2 100224.4
#> 3 1_1 131C Condition_3 Condition_3_2 66804.6
#> 4 1_1 130N Condition_1 Condition_1_2 46779.8
#> 5 1_1 128C Condition_6 Condition_6_1 77497.9
#> 6 1_1 126C Condition_4 Condition_4_1 81559.7
head(raw.protein)
#> ProteinName PeptideSequence Charge PSM Mixture TechRepMixture Run
#> 1 Protein_12 Peptide_9121 3 Peptide_9121_3 1 1 1_1
#> 2 Protein_12 Peptide_27963 5 Peptide_27963_5 1 1 1_1
#> 3 Protein_12 Peptide_28482 4 Peptide_28482_4 1 1 1_1
#> 4 Protein_12 Peptide_10940 2 Peptide_10940_2 2 1 2_1
#> 5 Protein_12 Peptide_4900 2 Peptide_4900_2 2 1 2_1
#> 6 Protein_12 Peptide_4900 3 Peptide_4900_3 2 1 2_1
#> Channel Condition BioReplicate Intensity
#> 1 126C Condition_4 Condition_4_1 10996116.9
#> 2 127C Condition_5 Condition_5_1 56965.1
#> 3 131N Condition_2 Condition_2_2 286121.7
#> 4 131N Condition_2 Condition_2_4 534806.0
#> 5 126C Condition_4 Condition_4_3 1134908.7
#> 6 126C Condition_4 Condition_4_3 1605773.2
# Run MSstatsTMT proteinSummarization function
quant.msstats.ptm <- proteinSummarization(raw.ptm,
method = "msstats",
global_norm = TRUE,
reference_norm = FALSE,
MBimpute = TRUE)
quant.msstats.protein <- proteinSummarization(raw.protein,
method = "msstats",
global_norm = TRUE,
reference_norm = FALSE,
MBimpute = TRUE)
head(quant.msstats.ptm)
#> Run Protein Abundance Channel BioReplicate Condition
#> 1 1_1 Protein_1076_Y67 13.65475 130N Condition_1_2 Condition_1
#> 2 1_1 Protein_1076_Y67 13.57146 127N Condition_1_1 Condition_1
#> 3 1_1 Protein_1076_Y67 13.56900 128N Condition_2_1 Condition_2
#> 4 1_1 Protein_1076_Y67 13.70567 131N Condition_2_2 Condition_2
#> 5 1_1 Protein_1076_Y67 13.24717 131C Condition_3_2 Condition_3
#> 6 1_1 Protein_1076_Y67 13.11874 129N Condition_3_1 Condition_3
#> TechRepMixture Mixture
#> 1 1 1
#> 2 1 1
#> 3 1 1
#> 4 1 1
#> 5 1 1
#> 6 1 1
head(quant.msstats.protein)
#> Run Protein Abundance Channel BioReplicate Condition TechRepMixture
#> 1 1_1 Protein_1076 18.75131 127N Condition_1_1 Condition_1 1
#> 2 1_1 Protein_1076 18.80198 130N Condition_1_2 Condition_1 1
#> 3 1_1 Protein_1076 18.92222 131N Condition_2_2 Condition_2 1
#> 4 1_1 Protein_1076 19.02252 128N Condition_2_1 Condition_2 1
#> 5 1_1 Protein_1076 18.28685 131C Condition_3_2 Condition_3 1
#> 6 1_1 Protein_1076 18.40555 129N Condition_3_1 Condition_3 1
#> Mixture
#> 1 1
#> 2 1
#> 3 1
#> 4 1
#> 5 1
#> 6 1
# Profile Plot
dataProcessPlotsTMTPTM(data.ptm=raw.ptm,
data.protein=raw.protein,
data.ptm.summarization=quant.msstats.ptm,
data.protein.summarization=quant.msstats.protein,
which.Protein = 1,
type='ProfilePlot',
address = FALSE
)
#> Drew the Profile plot for Protein_1076_Y67 ( 1 of 1 )
#> Drew the Profile plot with summarization for Protein_1076_Y67 ( 1 of 1 )
Tests for significant changes in PTM abundance adjusted for global protein abundance across conditions based on a family of linear mixed-effects models in TMT experiment. Experimental design of case-control study (patients are not repeatedly measured) is automatically determined based on proper statistical model.
data.ptm
: Name of the output of proteinSummarization function with PTM data. It should have columns named Protein
, TechRepMixture
, Mixture
, Run
, Channel
, Condition
, BioReplicate
, Abundance
.data.protein
: Name of the output of proteinSummarization function with Protein data. It should have columns named Protein
, TechRepMixture
,Mixture
, Run
, Channel
, Condition
, BioReplicate
, Abundance
.contrast.matrix
: Comparison between conditions of interests. 1) default is pairwise
, which compare all possible pairs between two conditions.moderated
: If moderated = TRUE, then moderated t statistic will be calculated; otherwise, ordinary t statistic will be used.adj.method
: adjusted method for multiple comparison. ’BH` is default.# test for all the possible pairs of conditions
model.results.pairwise <- groupComparisonTMTPTM(data.ptm=quant.msstats.ptm,
data.protein=quant.msstats.protein)
names(model.results.pairwise)
#> [1] "PTM.Model" "Protein.Model" "Adjusted.Model"
head(model.results.pairwise[[1]])
#> Protein Label log2FC SE DF
#> 1 Protein_1076_Y67 Condition_1-Condition_2 0.04713074 0.05264970 15.00243
#> 2 Protein_1076_Y67 Condition_1-Condition_3 0.42262536 0.05712141 15.00251
#> 3 Protein_1076_Y67 Condition_1-Condition_4 0.11835636 0.05264970 15.00243
#> 4 Protein_1076_Y67 Condition_1-Condition_5 0.28875531 0.05264970 15.00243
#> 5 Protein_1076_Y67 Condition_1-Condition_6 0.14293731 0.05712141 15.00251
#> 6 Protein_1076_Y67 Condition_2-Condition_3 0.37549462 0.05712141 15.00251
#> pvalue adj.pvalue issue
#> 1 3.848308e-01 6.180078e-01 NA
#> 2 2.221079e-06 9.994856e-05 NA
#> 3 4.003914e-02 4.936333e-02 NA
#> 4 6.282242e-05 7.067522e-04 NA
#> 5 2.439054e-02 4.390296e-02 NA
#> 6 8.821963e-06 3.969883e-04 NA
# Load specific contrast matrix
#example.contrast.matrix <- read.csv(file="example.contrast.matrix.csv", header=TRUE)
example.contrast.matrix
#> Condition_1 Condition_2 Condition_3 Condition_4 Condition_5 Condition_6
#> 1-4 1.0000000 0.0000000 0.0000000 -1.0000000 0.0000000 0.0000000
#> 2-5 0.0000000 1.0000000 0.0000000 0.0000000 -1.0000000 0.0000000
#> 3-6 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000 -1.0000000
#> 1-3 1.0000000 0.0000000 -1.0000000 0.0000000 0.0000000 0.0000000
#> 2-3 0.0000000 1.0000000 -1.0000000 0.0000000 0.0000000 0.0000000
#> 4-6 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 -1.0000000
#> 5-6 0.0000000 0.0000000 0.0000000 0.0000000 1.0000000 -1.0000000
#> Partial 0.2500000 0.2500000 -0.5000000 0.2500000 0.2500000 -0.5000000
#> Third 0.3333333 0.3333333 0.3333333 -0.3333333 -0.3333333 -0.3333333
# test for specified condition comparisons only
model.results.contrast <- groupComparisonTMTPTM(data.ptm=quant.msstats.ptm,
data.protein=quant.msstats.protein,
contrast.matrix = example.contrast.matrix)
names(model.results.contrast)
#> [1] "PTM.Model" "Protein.Model" "Adjusted.Model"
head(model.results.contrast[[1]])
#> Protein Label log2FC SE DF pvalue
#> 1 Protein_1076_Y67 1-4 0.11835636 0.05264970 15.00243 4.003914e-02
#> 2 Protein_1076_Y67 2-5 0.24162457 0.05264970 15.00243 3.542924e-04
#> 3 Protein_1076_Y67 3-6 -0.27968805 0.06173696 15.00271 3.983220e-04
#> 4 Protein_1076_Y67 1-3 0.42262536 0.05712141 15.00251 2.221079e-06
#> 5 Protein_1076_Y67 2-3 0.37549462 0.05712141 15.00251 8.821963e-06
#> 6 Protein_1076_Y67 4-6 0.02458095 0.05712141 15.00251 6.730747e-01
#> adj.pvalue issue
#> 1 4.936333e-02 NA
#> 2 2.022438e-03 NA
#> 3 7.169796e-03 NA
#> 4 9.994856e-05 NA
#> 5 3.969883e-04 NA
#> 6 7.530135e-01 NA
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MSstatsTMT_1.7.6 MSstatsTMTPTM_0.99.3
#>
#> loaded via a namespace (and not attached):
#> [1] ggrepel_0.8.2 Rcpp_1.0.5 lattice_0.20-41
#> [4] tidyr_1.1.2 snow_0.4-3 gtools_3.8.2
#> [7] digest_0.6.26 foreach_1.5.1 R6_2.4.1
#> [10] plyr_1.8.6 backports_1.1.10 evaluate_0.14
#> [13] ggplot2_3.3.2 pillar_1.4.6 gplots_3.1.0
#> [16] rlang_0.4.8 minqa_1.2.4 data.table_1.13.2
#> [19] nloptr_1.2.2.2 Matrix_1.2-18 preprocessCore_1.51.0
#> [22] rmarkdown_2.5 labeling_0.4.2 splines_4.0.3
#> [25] lme4_1.1-23 statmod_1.4.35 stringr_1.4.0
#> [28] MSstats_3.21.3 munsell_0.5.0 broom_0.7.2
#> [31] compiler_4.0.3 numDeriv_2016.8-1.1 xfun_0.18
#> [34] pkgconfig_2.0.3 lmerTest_3.1-2 marray_1.67.0
#> [37] htmltools_0.5.0 doSNOW_1.0.19 tidyselect_1.1.0
#> [40] tibble_3.0.4 gridExtra_2.3 codetools_0.2-16
#> [43] matrixStats_0.57.0 crayon_1.3.4 dplyr_1.0.2
#> [46] MASS_7.3-53 bitops_1.0-6 grid_4.0.3
#> [49] nlme_3.1-149 gtable_0.3.0 lifecycle_0.2.0
#> [52] magrittr_1.5 scales_1.1.1 KernSmooth_2.23-17
#> [55] stringi_1.5.3 farver_2.0.3 reshape2_1.4.4
#> [58] limma_3.45.19 ellipsis_0.3.1 generics_0.0.2
#> [61] vctrs_0.3.4 boot_1.3-25 iterators_1.0.13
#> [64] tools_4.0.3 glue_1.4.2 purrr_0.3.4
#> [67] parallel_4.0.3 survival_3.2-7 yaml_2.2.1
#> [70] colorspace_1.4-1 BiocManager_1.30.10 caTools_1.18.0
#> [73] minpack.lm_1.2-1 knitr_1.30