MSstatsTMTPTM : A package for post translational modification (PTM) significance analysis in shotgun mass spectrometry-based proteomic experiments with tandem mass tag (TMT) labeling

Devon Kohler (kohler.d@northeastern.edu)

2020-11-16

library(MSstatsTMTPTM)
library(MSstatsTMT)

This vignette summarizes the functionalities and options of MSstastTMTPTM and provides a workflow example.

MSstatsTMTPTM includes the following two functions for data visualization and statistical testing:

  1. Data visualization of PTM and global protein levels: dataProcessPlotsTMTPMT
  2. Group comparison on PTM/protein quantification data: groupComparisonTMTPTM

Installation

To install this package, start R (version “4.0”) and enter:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("MSstatsTMTPTM")

1. dataProcessPlotsTMTPTM()

To illustrate the quantitative data and quality control of MS runs, dataProcessPlotsTMT takes the quantitative data from MSstatsTMT converter functions as input and generate two types of figures in pdf files as output : 1. Profile plot (specify “ProfilePlot” in option type), to identify the potential sources of variation for each protein; 2. Quality control plot (specify “QCPlot” in option type), to evaluate the systematic bias between MS runs.

Arguments

Example

The raw dataset for both the PTM and Protein datasets are required for the plotting function. This can be the output of the MSstatsTMT converter functions: PDtoMSstatsTMTFormat, SpectroMinetoMSstatsTMTFormat, and OpenMStoMSstatsTMTFormat. Both the PTM and protein datasets must include the following columns: ProteinName, PeptideSequence, Charge, PSM, Mixture, TechRepMixture, Run, Channel, Condition, BioReplicate, and Intensity.

# read in raw data files
# raw.ptm <- read.csv(file="raw.ptm.csv", header=TRUE)
# raw.protein <- read.csv(file="raw.protein.csv", header=TRUE)
head(raw.ptm)
#>       ProteinName PeptideSequence Charge           PSM Mixture TechRepMixture
#> 1 Protein_12_S703     Peptide_491      3 Peptide_491_3       1              1
#> 2 Protein_12_S703     Peptide_491      3 Peptide_491_3       1              1
#> 3 Protein_12_S703     Peptide_491      3 Peptide_491_3       1              1
#> 4 Protein_12_S703     Peptide_491      3 Peptide_491_3       1              1
#> 5 Protein_12_S703     Peptide_491      3 Peptide_491_3       1              1
#> 6 Protein_12_S703     Peptide_491      3 Peptide_491_3       1              1
#>   Run Channel   Condition  BioReplicate Intensity
#> 1 1_1    128N Condition_2 Condition_2_1   48030.0
#> 2 1_1    129C Condition_4 Condition_4_2  100224.4
#> 3 1_1    131C Condition_3 Condition_3_2   66804.6
#> 4 1_1    130N Condition_1 Condition_1_2   46779.8
#> 5 1_1    128C Condition_6 Condition_6_1   77497.9
#> 6 1_1    126C Condition_4 Condition_4_1   81559.7
head(raw.protein)
#>   ProteinName PeptideSequence Charge             PSM Mixture TechRepMixture Run
#> 1  Protein_12    Peptide_9121      3  Peptide_9121_3       1              1 1_1
#> 2  Protein_12   Peptide_27963      5 Peptide_27963_5       1              1 1_1
#> 3  Protein_12   Peptide_28482      4 Peptide_28482_4       1              1 1_1
#> 4  Protein_12   Peptide_10940      2 Peptide_10940_2       2              1 2_1
#> 5  Protein_12    Peptide_4900      2  Peptide_4900_2       2              1 2_1
#> 6  Protein_12    Peptide_4900      3  Peptide_4900_3       2              1 2_1
#>   Channel   Condition  BioReplicate  Intensity
#> 1    126C Condition_4 Condition_4_1 10996116.9
#> 2    127C Condition_5 Condition_5_1    56965.1
#> 3    131N Condition_2 Condition_2_2   286121.7
#> 4    131N Condition_2 Condition_2_4   534806.0
#> 5    126C Condition_4 Condition_4_3  1134908.7
#> 6    126C Condition_4 Condition_4_3  1605773.2
# Run MSstatsTMT proteinSummarization function
quant.msstats.ptm <- proteinSummarization(raw.ptm,
                                          method = "msstats",
                                          global_norm = TRUE,
                                          reference_norm = FALSE,
                                          MBimpute = TRUE)

quant.msstats.protein <- proteinSummarization(raw.protein,
                                          method = "msstats",
                                          global_norm = TRUE,
                                          reference_norm = FALSE,
                                          MBimpute = TRUE)
head(quant.msstats.ptm)
#>   Run          Protein Abundance Channel  BioReplicate   Condition
#> 1 1_1 Protein_1076_Y67  13.65475    130N Condition_1_2 Condition_1
#> 2 1_1 Protein_1076_Y67  13.57146    127N Condition_1_1 Condition_1
#> 3 1_1 Protein_1076_Y67  13.56900    128N Condition_2_1 Condition_2
#> 4 1_1 Protein_1076_Y67  13.70567    131N Condition_2_2 Condition_2
#> 5 1_1 Protein_1076_Y67  13.24717    131C Condition_3_2 Condition_3
#> 6 1_1 Protein_1076_Y67  13.11874    129N Condition_3_1 Condition_3
#>   TechRepMixture Mixture
#> 1              1       1
#> 2              1       1
#> 3              1       1
#> 4              1       1
#> 5              1       1
#> 6              1       1
head(quant.msstats.protein)
#>   Run      Protein Abundance Channel  BioReplicate   Condition TechRepMixture
#> 1 1_1 Protein_1076  18.75131    127N Condition_1_1 Condition_1              1
#> 2 1_1 Protein_1076  18.80198    130N Condition_1_2 Condition_1              1
#> 3 1_1 Protein_1076  18.92222    131N Condition_2_2 Condition_2              1
#> 4 1_1 Protein_1076  19.02252    128N Condition_2_1 Condition_2              1
#> 5 1_1 Protein_1076  18.28685    131C Condition_3_2 Condition_3              1
#> 6 1_1 Protein_1076  18.40555    129N Condition_3_1 Condition_3              1
#>   Mixture
#> 1       1
#> 2       1
#> 3       1
#> 4       1
#> 5       1
#> 6       1

# Profile Plot
dataProcessPlotsTMTPTM(data.ptm=raw.ptm,
                    data.protein=raw.protein,
                    data.ptm.summarization=quant.msstats.ptm,
                    data.protein.summarization=quant.msstats.protein,
                    which.Protein = 1,
                    type='ProfilePlot',
                    address = FALSE
                    )
#> Drew the Profile plot for  Protein_1076_Y67 ( 1  of  1 )

#> Drew the Profile plot with summarization for  Protein_1076_Y67 ( 1  of  1 )


# Quality Control Plot
# dataProcessPlotsTMTPTM(data.ptm=ptm.input.pd,
#                     data.protein=protein.input.pd,
#                     data.ptm.summarization=quant.msstats.ptm,
#                     data.protein.summarization=quant.msstats.protein,
#                     type='QCPlot')

3. groupComparisonTMTPTM()

Tests for significant changes in PTM abundance adjusted for global protein abundance across conditions based on a family of linear mixed-effects models in TMT experiment. Experimental design of case-control study (patients are not repeatedly measured) is automatically determined based on proper statistical model.

Arguments

  1. Otherwise, users can specify the comparisons of interest. Based on the levels of conditions, specify 1 or -1 to the conditions of interests and 0 otherwise. The levels of conditions are sorted alphabetically.

Example

# test for all the possible pairs of conditions
model.results.pairwise <- groupComparisonTMTPTM(data.ptm=quant.msstats.ptm,
                                       data.protein=quant.msstats.protein)
names(model.results.pairwise)
#> [1] "PTM.Model"      "Protein.Model"  "Adjusted.Model"
head(model.results.pairwise[[1]])
#>            Protein                   Label     log2FC         SE       DF
#> 1 Protein_1076_Y67 Condition_1-Condition_2 0.04713074 0.05264970 15.00243
#> 2 Protein_1076_Y67 Condition_1-Condition_3 0.42262536 0.05712141 15.00251
#> 3 Protein_1076_Y67 Condition_1-Condition_4 0.11835636 0.05264970 15.00243
#> 4 Protein_1076_Y67 Condition_1-Condition_5 0.28875531 0.05264970 15.00243
#> 5 Protein_1076_Y67 Condition_1-Condition_6 0.14293731 0.05712141 15.00251
#> 6 Protein_1076_Y67 Condition_2-Condition_3 0.37549462 0.05712141 15.00251
#>         pvalue   adj.pvalue issue
#> 1 3.848308e-01 6.180078e-01    NA
#> 2 2.221079e-06 9.994856e-05    NA
#> 3 4.003914e-02 4.936333e-02    NA
#> 4 6.282242e-05 7.067522e-04    NA
#> 5 2.439054e-02 4.390296e-02    NA
#> 6 8.821963e-06 3.969883e-04    NA

# Load specific contrast matrix
#example.contrast.matrix <- read.csv(file="example.contrast.matrix.csv", header=TRUE)
example.contrast.matrix
#>         Condition_1 Condition_2 Condition_3 Condition_4 Condition_5 Condition_6
#> 1-4       1.0000000   0.0000000   0.0000000  -1.0000000   0.0000000   0.0000000
#> 2-5       0.0000000   1.0000000   0.0000000   0.0000000  -1.0000000   0.0000000
#> 3-6       0.0000000   0.0000000   1.0000000   0.0000000   0.0000000  -1.0000000
#> 1-3       1.0000000   0.0000000  -1.0000000   0.0000000   0.0000000   0.0000000
#> 2-3       0.0000000   1.0000000  -1.0000000   0.0000000   0.0000000   0.0000000
#> 4-6       0.0000000   0.0000000   0.0000000   1.0000000   0.0000000  -1.0000000
#> 5-6       0.0000000   0.0000000   0.0000000   0.0000000   1.0000000  -1.0000000
#> Partial   0.2500000   0.2500000  -0.5000000   0.2500000   0.2500000  -0.5000000
#> Third     0.3333333   0.3333333   0.3333333  -0.3333333  -0.3333333  -0.3333333

# test for specified condition comparisons only
model.results.contrast <- groupComparisonTMTPTM(data.ptm=quant.msstats.ptm,
                                       data.protein=quant.msstats.protein,
                                       contrast.matrix = example.contrast.matrix)

names(model.results.contrast)
#> [1] "PTM.Model"      "Protein.Model"  "Adjusted.Model"
head(model.results.contrast[[1]])
#>            Protein Label      log2FC         SE       DF       pvalue
#> 1 Protein_1076_Y67   1-4  0.11835636 0.05264970 15.00243 4.003914e-02
#> 2 Protein_1076_Y67   2-5  0.24162457 0.05264970 15.00243 3.542924e-04
#> 3 Protein_1076_Y67   3-6 -0.27968805 0.06173696 15.00271 3.983220e-04
#> 4 Protein_1076_Y67   1-3  0.42262536 0.05712141 15.00251 2.221079e-06
#> 5 Protein_1076_Y67   2-3  0.37549462 0.05712141 15.00251 8.821963e-06
#> 6 Protein_1076_Y67   4-6  0.02458095 0.05712141 15.00251 6.730747e-01
#>     adj.pvalue issue
#> 1 4.936333e-02    NA
#> 2 2.022438e-03    NA
#> 3 7.169796e-03    NA
#> 4 9.994856e-05    NA
#> 5 3.969883e-04    NA
#> 6 7.530135e-01    NA

Session information

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] MSstatsTMT_1.8.0    MSstatsTMTPTM_1.0.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] ggrepel_0.8.2         Rcpp_1.0.5            lattice_0.20-41      
#>  [4] tidyr_1.1.2           snow_0.4-3            gtools_3.8.2         
#>  [7] digest_0.6.27         foreach_1.5.1         R6_2.5.0             
#> [10] plyr_1.8.6            backports_1.2.0       evaluate_0.14        
#> [13] ggplot2_3.3.2         pillar_1.4.6          gplots_3.1.0         
#> [16] rlang_0.4.8           minqa_1.2.4           data.table_1.13.2    
#> [19] nloptr_1.2.2.2        Matrix_1.2-18         preprocessCore_1.52.0
#> [22] rmarkdown_2.5         labeling_0.4.2        splines_4.0.3        
#> [25] lme4_1.1-25           statmod_1.4.35        stringr_1.4.0        
#> [28] MSstats_3.22.0        munsell_0.5.0         broom_0.7.2          
#> [31] compiler_4.0.3        numDeriv_2016.8-1.1   xfun_0.19            
#> [34] pkgconfig_2.0.3       lmerTest_3.1-3        marray_1.68.0        
#> [37] htmltools_0.5.0       doSNOW_1.0.19         tidyselect_1.1.0     
#> [40] tibble_3.0.4          gridExtra_2.3         codetools_0.2-18     
#> [43] matrixStats_0.57.0    crayon_1.3.4          dplyr_1.0.2          
#> [46] MASS_7.3-53           bitops_1.0-6          grid_4.0.3           
#> [49] nlme_3.1-150          gtable_0.3.0          lifecycle_0.2.0      
#> [52] magrittr_1.5          scales_1.1.1          KernSmooth_2.23-18   
#> [55] stringi_1.5.3         farver_2.0.3          reshape2_1.4.4       
#> [58] limma_3.46.0          ellipsis_0.3.1        generics_0.1.0       
#> [61] vctrs_0.3.4           boot_1.3-25           iterators_1.0.13     
#> [64] tools_4.0.3           glue_1.4.2            purrr_0.3.4          
#> [67] parallel_4.0.3        survival_3.2-7        yaml_2.2.1           
#> [70] colorspace_2.0-0      caTools_1.18.0        minpack.lm_1.2-1     
#> [73] knitr_1.30