conserv(x, method = c("similarity","identity","entropy22","entropy10"), sub.matrix = c("bio3d", "blosum62", "pam30", "other"), matrix.file = NULL, normalize.matrix = TRUE)
id
and ali
components, similar to that generated by read.fasta
. Quantifies residue conservation in a given protein sequence alignment by calculating the degree of amino acid variability in each column of the alignment.
To assess the level of sequence conservation at each position in an alignment, the similarity, identity, and entropy per position can be calculated.
The similarity is defined as the average of the similarity scores of all pairwise residue comparisons for that position in the alignment, where the similarity score between any two residues is the score value between those residues in the chosen substitution matrix sub.matrix.
The identity i.e. the preference for a specific amino acid to be found at a certain position, is assessed by averaging the identity scores resulting from all possible pairwise comparisons at that position in the alignment, where all identical residue comparisons are given a score of 1 and all other comparisons are given a value of 0.
Entropy is based on Shannons information entropy. See the
entropy
function for further details.
Note that the returned scores are normalized so that conserved columns score 1 and diverse columns score 0.
Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696. Grant, B.J. et al. (2007) J. Mol. Biol. 368, 1231--1248.
Each of these conservation scores has particular strengths and weaknesses. For example, entropy elegantly captures amino acid diversity but fails to account for stereochemical similarities. By employing a combination of scores and taking the union of their respective conservation signals we expect to achieve a more comprehensive analysis of sequence conservation (Grant, 2007).
## Read an example alignment aln <- read.fasta(system.file("examples/hivp_xray.fa",package="bio3d")) ## Score conservation conserv(x=aln$ali, method="similarity", sub.matrix="bio3d")[1] 9.765816e-01 9.952941e-01 8.513097e-01 9.733885e-01 9.952941e-01 [6] 9.952941e-01 6.199423e-01 9.850477e-01 1.000000e+00 9.379756e-01 [11] 1.000000e+00 1.000000e+00 9.766926e-01 7.709889e-01 1.000000e+00 [16] 9.790233e-01 1.000000e+00 1.000000e+00 9.766926e-01 9.258624e-01 [21] 9.905993e-01 9.790233e-01 1.000000e+00 9.730255e-01 8.756582e-01 [26] 1.000000e+00 1.000000e+00 9.915494e-01 1.000000e+00 9.776781e-01 [31] 9.545105e-01 9.590455e-01 6.639556e-01 9.443130e-01 9.271010e-01 [36] 9.153596e-01 5.746038e-01 1.000000e+00 9.743618e-01 9.761776e-01 [41] 7.836815e-01 9.766926e-01 9.577003e-01 1.000000e+00 9.518668e-01 [46] 9.404950e-01 9.750677e-01 9.831099e-01 9.534406e-01 9.534406e-01 [51] 1.109878e-05 1.109878e-05 1.000000e+00 1.000000e+00 1.000000e+00 [56] 9.564218e-01 9.790233e-01 9.766926e-01 9.860155e-01 9.813541e-01 [61] 1.000000e+00 9.766926e-01 9.674029e-01 9.696737e-01 4.100422e-01 [66] 8.215583e-01 1.000000e+00 9.860155e-01 5.633552e-01 9.813097e-01 [71] 9.790233e-01 9.971765e-01 8.923885e-01 9.696737e-01 9.831787e-01 [76] 1.000000e+00 9.860155e-01 9.836848e-01 9.766926e-01 1.000000e+00 [81] 9.720311e-01 1.000000e+00 1.000000e+00 6.371454e-01 1.000000e+00 [86] 8.794939e-01 9.674029e-01 1.000000e+00 1.000000e+00 9.776781e-01 [91] 9.850477e-01 9.340866e-01 1.000000e+00 9.743618e-01 9.813541e-01 [96] 1.000000e+00 5.092952e-01 9.790233e-01 1.000000e+00 1.000000e+00 [101] 9.850477e-01##conserv(x=aln$ali,method="entropy22", sub.matrix="other")