Utilizes functionality from MSstatsTMT to clean, summarize, and normalize PTM and protein level data. Imputes missing values, protein and PTM level summarization from peptide level quantification. Applies global median normalization on peptide level data and normalizes between runs.

dataSummarizationPTM_TMT(
  data,
  method = "msstats",
  global_norm = TRUE,
  global_norm.PTM = TRUE,
  reference_norm = TRUE,
  reference_norm.PTM = TRUE,
  remove_norm_channel = TRUE,
  remove_empty_channel = TRUE,
  MBimpute = TRUE,
  MBimpute.PTM = TRUE,
  maxQuantileforCensored = NULL
)

Arguments

data

Name of the output of MSstatsPTM converter function or peptide-level quantified data from other tools. It should be a list containing one or two data tables, named PTM and PROTEIN for modified and unmodified datasets. The list must at least contain the PTM dataset. The data should have columns ProteinName, PeptideSequence, Charge, PSM, Mixture, TechRepMixture, Run, Channel, Condition, BioReplicate, Intensity

method

Four different summarization methods to protein-level can be performed : "msstats"(default), "MedianPolish", "Median", "LogSum".

global_norm

Global median normalization on for unmodified peptide level data (equalizing the medians across all the channels and MS runs). Default is TRUE. It will be performed before protein-level summarization.

global_norm.PTM

Same as above for modified peptide level data. Default is TRUE.

reference_norm

Reference channel based normalization between MS runs on unmodified protein level data. TRUE(default) needs at least one reference channel in each MS run, annotated by 'Norm' in Condtion column. It will be performed after protein-level summarization. FALSE will not perform this normalization step. If data only has one run, then reference_norm=FALSE.

reference_norm.PTM

Same as above for modified peptide level data. Default is TRUE.

remove_norm_channel

TRUE(default) removes 'Norm' channels from protein level data.

remove_empty_channel

TRUE(default) removes 'Empty' channels from protein level data.

MBimpute

only for method="msstats". TRUE (default) imputes missing values by Accelated failure model. FALSE uses minimum value to impute the missing value for each peptide precursor ion.

MBimpute.PTM

Same as above for modified peptide level data. Default is TRUE

maxQuantileforCensored

We assume missing values are censored. maxQuantileforCensored is Maximum quantile for deciding censored missing value, for instance, 0.999. Default is Null.

Value

list of two data.tables

Examples

head(raw.input.tmt$PTM)
#> ProteinName PeptideSequence Charge PSM Mixture TechRepMixture #> 1 Protein_12_S703 Peptide_491 3 Peptide_491_3 1 1 #> 2 Protein_12_S703 Peptide_491 3 Peptide_491_3 1 1 #> 3 Protein_12_S703 Peptide_491 3 Peptide_491_3 1 1 #> 4 Protein_12_S703 Peptide_491 3 Peptide_491_3 1 1 #> 5 Protein_12_S703 Peptide_491 3 Peptide_491_3 1 1 #> 6 Protein_12_S703 Peptide_491 3 Peptide_491_3 1 1 #> Run Channel Condition BioReplicate Intensity #> 1 1_1 128N Condition_2 Condition_2_1 48030.0 #> 2 1_1 129C Condition_4 Condition_4_2 100224.4 #> 3 1_1 131C Condition_3 Condition_3_2 66804.6 #> 4 1_1 130N Condition_1 Condition_1_2 46779.8 #> 5 1_1 128C Condition_6 Condition_6_1 77497.9 #> 6 1_1 126C Condition_4 Condition_4_1 81559.7
head(raw.input.tmt$PROTEIN)
#> ProteinName PeptideSequence Charge PSM Mixture TechRepMixture Run #> 1 Protein_12 Peptide_9121 3 Peptide_9121_3 1 1 1_1 #> 2 Protein_12 Peptide_27963 5 Peptide_27963_5 1 1 1_1 #> 3 Protein_12 Peptide_28482 4 Peptide_28482_4 1 1 1_1 #> 4 Protein_12 Peptide_10940 2 Peptide_10940_2 2 1 2_1 #> 5 Protein_12 Peptide_4900 2 Peptide_4900_2 2 1 2_1 #> 6 Protein_12 Peptide_4900 3 Peptide_4900_3 2 1 2_1 #> Channel Condition BioReplicate Intensity #> 1 126C Condition_4 Condition_4_1 10996116.9 #> 2 127C Condition_5 Condition_5_1 56965.1 #> 3 131N Condition_2 Condition_2_2 286121.7 #> 4 131N Condition_2 Condition_2_4 534806.0 #> 5 126C Condition_4 Condition_4_3 1134908.7 #> 6 126C Condition_4 Condition_4_3 1605773.2
quant.tmt.msstatsptm <- dataSummarizationPTM_TMT(raw.input.tmt, method = "msstats")
#> Joining, by = c("Run", "Channel")
#> Summarizing for Run : 1_1 ( 1 of 2 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 86 #> # of Peptides/Protein 1-4 #> # of Transitions/Peptide 1-1
#> #> ** 62 Proteins have only single transition : Consider excluding this protein from the dataset. (Protein_1076_Y67, Protein_1145_T915, Protein_12_S703, Protein_1235_S416, Protein_1326_Y182, Protein_1380_Y106, Protein_15_S140, Protein_15_Y137, Protein_150_S729, Protein_152_S455 ...)
#> #> Summary of Samples : #> Condition_1 Condition_2 Condition_3 Condition_4 #> # of MS runs 2 2 2 2 #> # of Biological Replicates 2 2 2 2 #> # of Technical Replicates 1 1 1 1 #> Condition_5 Condition_6 #> # of MS runs 2 1 #> # of Biological Replicates 2 1 #> # of Technical Replicates 1 1
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 0
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |= | 1% | |== | 2% | |== | 3% | |=== | 5% | |==== | 6% | |===== | 7% | |====== | 8% | |======= | 9% | |======= | 10% | |======== | 12% | |========= | 13% | |========== | 14% | |=========== | 15% | |=========== | 16% | |============ | 17% | |============= | 19% | |============== | 20% | |=============== | 21% | |=============== | 22% | |================ | 23% | |================= | 24% | |================== | 26% | |=================== | 27% | |==================== | 28% | |==================== | 29% | |===================== | 30% | |====================== | 31% | |======================= | 33% | |======================== | 34% | |======================== | 35% | |========================= | 36% | |========================== | 37% | |=========================== | 38% | |============================ | 40% | |============================ | 41% | |============================= | 42% | |============================== | 43% | |=============================== | 44% | |================================ | 45% | |================================= | 47% | |================================= | 48% | |================================== | 49% | |=================================== | 50% | |==================================== | 51% | |===================================== | 52% | |===================================== | 53% | |====================================== | 55% | |======================================= | 56% | |======================================== | 57% | |========================================= | 58% | |========================================== | 59% | |========================================== | 60% | |=========================================== | 62% | |============================================ | 63% | |============================================= | 64% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 67% | |================================================ | 69% | |================================================= | 70% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 73% | |==================================================== | 74% | |===================================================== | 76% | |====================================================== | 77% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 80% | |========================================================= | 81% | |========================================================== | 83% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 86% | |============================================================= | 87% | |============================================================== | 88% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 92% | |================================================================= | 93% | |================================================================== | 94% | |=================================================================== | 95% | |==================================================================== | 97% | |==================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 2_1 ( 2 of 2 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 27 #> # of Peptides/Protein 1-3 #> # of Transitions/Peptide 1-1
#> #> ** 18 Proteins have only single transition : Consider excluding this protein from the dataset. (Protein_1076_Y67, Protein_1220_Y321, Protein_125_Y343, Protein_1547_Y608, Protein_1587_Y168, Protein_1864_Y207, Protein_1929_Y323, Protein_2072_Y89, Protein_2264_Y64, Protein_2284_Y519 ...)
#> #> Summary of Samples : #> Condition_1 Condition_2 Condition_3 Condition_4 #> # of MS runs 2 2 1 2 #> # of Biological Replicates 2 2 1 2 #> # of Technical Replicates 1 1 1 1 #> Condition_5 Condition_6 #> # of MS runs 2 2 #> # of Biological Replicates 2 2 #> # of Technical Replicates 1 1
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 0
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |=== | 4% | |===== | 7% | |======== | 11% | |========== | 15% | |============= | 19% | |================ | 22% | |================== | 26% | |===================== | 30% | |======================= | 33% | |========================== | 37% | |============================= | 41% | |=============================== | 44% | |================================== | 48% | |==================================== | 52% | |======================================= | 56% | |========================================= | 59% | |============================================ | 63% | |=============================================== | 67% | |================================================= | 70% | |==================================================== | 74% | |====================================================== | 78% | |========================================================= | 81% | |============================================================ | 85% | |============================================================== | 89% | |================================================================= | 93% | |=================================================================== | 96% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> ** 'Norm' information in Condition is required for normalization. #> Please check it. At this moment, normalization is not performed.
#> Joining, by = c("Run", "Channel")
#> Summarizing for Run : 1_1 ( 1 of 2 )
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 85 #> # of Peptides/Protein 1-120 #> # of Transitions/Peptide 1-1
#> #> ** 2 Proteins have only single transition : Consider excluding this protein from the dataset. (Protein_2108, Protein_2207 ...)
#> #> Summary of Samples : #> Condition_1 Condition_2 Condition_3 Condition_4 #> # of MS runs 2 2 2 2 #> # of Biological Replicates 2 2 2 2 #> # of Technical Replicates 1 1 1 1 #> Condition_5 Condition_6 #> # of MS runs 2 1 #> # of Biological Replicates 2 1 #> # of Technical Replicates 1 1
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 0
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |= | 1% | |== | 2% | |== | 4% | |=== | 5% | |==== | 6% | |===== | 7% | |====== | 8% | |======= | 9% | |======= | 11% | |======== | 12% | |========= | 13% | |========== | 14% | |=========== | 15% | |============ | 16% | |============ | 18% | |============= | 19% | |============== | 20% | |=============== | 21% | |================ | 22% | |================ | 24% | |================= | 25% | |================== | 26% | |=================== | 27% | |==================== | 28% | |===================== | 29% | |===================== | 31% | |====================== | 32% | |======================= | 33% | |======================== | 34% | |========================= | 35% | |========================== | 36% | |========================== | 38% | |=========================== | 39% | |============================ | 40% | |============================= | 41% | |============================== | 42% | |============================== | 44% | |=============================== | 45% | |================================ | 46% | |================================= | 47% | |================================== | 48% | |=================================== | 49% | |=================================== | 51% | |==================================== | 52% | |===================================== | 53% | |====================================== | 54% | |======================================= | 55% | |======================================== | 56% | |======================================== | 58% | |========================================= | 59% | |========================================== | 60% | |=========================================== | 61% | |============================================ | 62% | |============================================ | 64% | |============================================= | 65% | |============================================== | 66% | |=============================================== | 67% | |================================================ | 68% | |================================================= | 69% | |================================================= | 71% | |================================================== | 72% | |=================================================== | 73% | |==================================================== | 74% | |===================================================== | 75% | |====================================================== | 76% | |====================================================== | 78% | |======================================================= | 79% | |======================================================== | 80% | |========================================================= | 81% | |========================================================== | 82% | |========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 86% | |============================================================= | 87% | |============================================================== | 88% | |=============================================================== | 89% | |=============================================================== | 91% | |================================================================ | 92% | |================================================================= | 93% | |================================================================== | 94% | |=================================================================== | 95% | |==================================================================== | 96% | |==================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> Summarizing for Run : 2_1 ( 2 of 2 )
#> CAUTION : the input dataset has incomplete rows. #> If missing peaks occur they should be included in the dataset as separate rows, #> and the missing intensity values should be indicated with 'NA'. #> The incomplete rows are listed below.
#> *** Subject : Condition_1_3, Condition : Condition_1 has incomplete rows for some features (Peptide_7183_2_NA_NA, Peptide_7463_3_NA_NA)
#> *** Subject : Condition_6_3, Condition : Condition_6 has incomplete rows for some features (Peptide_24361_3_NA_NA)
#> *** Subject : Condition_3_3, Condition : Condition_3 has incomplete rows for some features (Peptide_10658_3_NA_NA, Peptide_17129_2_NA_NA, Peptide_3162_3_NA_NA)
#> #> DONE : Incomplete rows for missing peaks are added with intensity values=NA.
#> ** Use all features that the dataset origianally has.
#> #> Summary of Features : #> count #> # of Protein 83 #> # of Peptides/Protein 1-160 #> # of Transitions/Peptide 1-1
#> #> ** 2 Proteins have only single transition : Consider excluding this protein from the dataset. (Protein_2311, Protein_2751 ...)
#> #> Summary of Samples : #> Condition_1 Condition_2 Condition_3 Condition_4 #> # of MS runs 2 2 1 2 #> # of Biological Replicates 2 2 1 2 #> # of Technical Replicates 1 1 1 1 #> Condition_5 Condition_6 #> # of MS runs 2 2 #> # of Biological Replicates 2 2 #> # of Technical Replicates 1 1
#> #> Summary of Missingness :
#> # transitions are completely missing in at least one of the conditions : 3
#> -> Peptide_17129_2_NA_NA, Peptide_3162_3_NA_NA, Peptide_10658_3_NA_NA ...
#> #> # run with 75% missing observations: 0
#> #> == Start the summarization per subplot...
#> | | | 0% | |= | 1% | |== | 2% | |=== | 4% | |=== | 5% | |==== | 6% | |===== | 7% | |====== | 8% | |======= | 10% | |======== | 11% | |======== | 12% | |========= | 13% | |========== | 14% | |=========== | 16% | |============ | 17% | |============= | 18% | |============= | 19% | |============== | 20% | |=============== | 22% | |================ | 23% | |================= | 24% | |================== | 25% | |=================== | 27% | |=================== | 28% | |==================== | 29% | |===================== | 30% | |====================== | 31% | |======================= | 33% | |======================== | 34% | |======================== | 35% | |========================= | 36% | |========================== | 37% | |=========================== | 39% | |============================ | 40% | |============================= | 41% | |============================== | 42% | |============================== | 43% | |=============================== | 45% | |================================ | 46% | |================================= | 47% | |================================== | 48% | |=================================== | 49% | |=================================== | 51% | |==================================== | 52% | |===================================== | 53% | |====================================== | 54% | |======================================= | 55% | |======================================== | 57% | |======================================== | 58% | |========================================= | 59% | |========================================== | 60% | |=========================================== | 61% | |============================================ | 63% | |============================================= | 64% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 67% | |================================================ | 69% | |================================================= | 70% | |================================================== | 71% | |=================================================== | 72% | |=================================================== | 73% | |==================================================== | 75% | |===================================================== | 76% | |====================================================== | 77% | |======================================================= | 78% | |======================================================== | 80% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 83% | |=========================================================== | 84% | |============================================================ | 86% | |============================================================= | 87% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 90% | |================================================================ | 92% | |================================================================= | 93% | |================================================================== | 94% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 100%
#> #> == the summarization per subplot is done.
#> ** Protein-level summarization done by MSstats.
#> ** 'Norm' information in Condition is required for normalization. #> Please check it. At this moment, normalization is not performed.
head(quant.tmt.msstatsptm$PTM)
#> Run Protein Abundance Channel BioReplicate Condition #> 1 1_1 Protein_1076_Y67 13.65475 130N Condition_1_2 Condition_1 #> 2 1_1 Protein_1076_Y67 13.57146 127N Condition_1_1 Condition_1 #> 3 1_1 Protein_1076_Y67 13.56900 128N Condition_2_1 Condition_2 #> 4 1_1 Protein_1076_Y67 13.70567 131N Condition_2_2 Condition_2 #> 5 1_1 Protein_1076_Y67 13.24717 131C Condition_3_2 Condition_3 #> 6 1_1 Protein_1076_Y67 13.11874 129N Condition_3_1 Condition_3 #> TechRepMixture Mixture #> 1 1 1 #> 2 1 1 #> 3 1 1 #> 4 1 1 #> 5 1 1 #> 6 1 1