conserv.Rd
Quantifies residue conservation in a given protein sequence alignment by calculating the degree of amino acid variability in each column of the alignment.
conserv(x, method = c("similarity","identity","entropy22","entropy10"), sub.matrix = c("bio3d", "blosum62", "pam30", "other"), matrix.file = NULL, normalize.matrix = TRUE)
x | an alignment list object with |
---|---|
method | the conservation assesment method. |
sub.matrix | a matrix to score conservation. |
matrix.file | a file name of an arbitary user matrix. |
normalize.matrix | logical, if TRUE the matrix is normalized pior to assesing conservation. |
To assess the level of sequence conservation at each position in an alignment, the “similarity”, “identity”, and “entropy” per position can be calculated.
The “similarity” is defined as the average of the similarity scores of all pairwise residue comparisons for that position in the alignment, where the similarity score between any two residues is the score value between those residues in the chosen substitution matrix “sub.matrix”.
The “identity” i.e. the preference for a specific amino acid to be found at a certain position, is assessed by averaging the identity scores resulting from all possible pairwise comparisons at that position in the alignment, where all identical residue comparisons are given a score of 1 and all other comparisons are given a value of 0.
“Entropy” is based on Shannons information entropy. See the
entropy
function for further details.
Note that the returned scores are normalized so that conserved columns score 1 and diverse columns score 0.
Returns a numeric vector of scores
Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696. Grant, B.J. et al. (2007) J. Mol. Biol. 368, 1231--1248.
Barry Grant
Each of these conservation scores has particular strengths and weaknesses. For example, entropy elegantly captures amino acid diversity but fails to account for stereochemical similarities. By employing a combination of scores and taking the union of their respective conservation signals we expect to achieve a more comprehensive analysis of sequence conservation (Grant, 2007).
## Read an example alignment aln <- read.fasta(system.file("examples/hivp_xray.fa",package="bio3d")) ## Score conservation conserv(x=aln$ali, method="similarity", sub.matrix="bio3d")#> [1] 9.765816e-01 9.952941e-01 8.513097e-01 9.733885e-01 9.952941e-01 #> [6] 9.952941e-01 6.199423e-01 9.850477e-01 1.000000e+00 9.379756e-01 #> [11] 1.000000e+00 1.000000e+00 9.766926e-01 7.709889e-01 1.000000e+00 #> [16] 9.790233e-01 1.000000e+00 1.000000e+00 9.766926e-01 9.258624e-01 #> [21] 9.905993e-01 9.790233e-01 1.000000e+00 9.730255e-01 8.756582e-01 #> [26] 1.000000e+00 1.000000e+00 9.915494e-01 1.000000e+00 9.776781e-01 #> [31] 9.545105e-01 9.590455e-01 6.639556e-01 9.443130e-01 9.271010e-01 #> [36] 9.153596e-01 5.746038e-01 1.000000e+00 9.743618e-01 9.761776e-01 #> [41] 7.836815e-01 9.766926e-01 9.577003e-01 1.000000e+00 9.518668e-01 #> [46] 9.404950e-01 9.750677e-01 9.831099e-01 9.534406e-01 9.534406e-01 #> [51] 1.109878e-05 1.109878e-05 1.000000e+00 1.000000e+00 1.000000e+00 #> [56] 9.564218e-01 9.790233e-01 9.766926e-01 9.860155e-01 9.813541e-01 #> [61] 1.000000e+00 9.766926e-01 9.674029e-01 9.696737e-01 4.100422e-01 #> [66] 8.215583e-01 1.000000e+00 9.860155e-01 5.633552e-01 9.813097e-01 #> [71] 9.790233e-01 9.971765e-01 8.923885e-01 9.696737e-01 9.831787e-01 #> [76] 1.000000e+00 9.860155e-01 9.836848e-01 9.766926e-01 1.000000e+00 #> [81] 9.720311e-01 1.000000e+00 1.000000e+00 6.371454e-01 1.000000e+00 #> [86] 8.794939e-01 9.674029e-01 1.000000e+00 1.000000e+00 9.776781e-01 #> [91] 9.850477e-01 9.340866e-01 1.000000e+00 9.743618e-01 9.813541e-01 #> [96] 1.000000e+00 5.092952e-01 9.790233e-01 1.000000e+00 1.000000e+00 #> [101] 9.850477e-01##conserv(x=aln$ali,method="entropy22", sub.matrix="other")