R/422-extractProtPSSMFeature.R
extractProtPSSMFeature.Rd
Profile-based protein representation derived by PSSM (Position-Specific Scoring Matrix)
extractProtPSSMFeature(pssmmat)
pssmmat | The PSSM computed by |
---|
A numeric vector which has 20 x N
named elements,
where N
is the size of the window (number of rows of the PSSM).
This function calculates the profile-based protein representation
derived by PSSM. The feature vector is based on the PSSM computed by
extractProtPSSM
. For a given sequence,
The PSSM feature represents the log-likelihood of the substitution of the
20 types of amino acids at that position in the sequence.
Each PSSM feature value in the vector represents the degree of conservation
of a given amino acid type. The value is normalized to
interval (0, 1) by the transformation 1/(1+e^(-x)).
Ye, Xugang, Guoli Wang, and Stephen F. Altschul. "An assessment of substitution scores for protein profile-profile comparison." Bioinformatics 27.24 (2011): 3356--3363.
Rangwala, Huzefa, and George Karypis. "Profile-based direct kernels for remote homology detection and fold recognition." Bioinformatics 21.23 (2005): 4239--4247.
# NOT RUN { x = readFASTA(system.file('protseq/P00750.fasta', package = 'Rcpi'))[[1]] # }# NOT RUN { dbpath = tempfile('tempdb', fileext = '.fasta') invisible(file.copy(from = system.file('protseq/Plasminogen.fasta', package = 'Rcpi'), to = dbpath)) pssmmat = extractProtPSSM(seq = x, database.path = dbpath) pssmfeature = extractProtPSSMFeature(pssmmat) head(pssmfeature) # }