filter.identity(aln = NULL, ide = NULL, cutoff = 0.6, verbose = TRUE, ...)
seqaln
or read.fasta
, or an alignment
character matrix. Not used if ide is given.seqidentity
. Identify and filter subsets of sequences at a given sequence identity cutoff.
This function performs hierarchical cluster analysis of a given sequence identity matrix ide, or the identity matrix calculated from a given alignment aln, to identify sequences that fall below a given identity cutoff value cutoff.
"hclust"
, which describes the
tree produced by the clustering process. Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.
attach(kinesin) ide.mat <- seqidentity(pdbs) # Histogram of pairwise identity values op <- par(no.readonly=TRUE) par(mfrow=c(2,1)) hist(ide.mat[upper.tri(ide.mat)], breaks=30,xlim=c(0,1), main="Sequence Identity", xlab="Identity") k <- filter.identity(ide=ide.mat, cutoff=0.6)filter.identity(): N clusters @ cutoff = 10ide.cut <- seqidentity(pdbs$ali[k$ind,]) hist(ide.cut[upper.tri(ide.cut)], breaks=10, xlim=c(0,1), main="Sequence Identity", xlab="Identity")#plot(k$tree, axes = FALSE, ylab="Sequence Identity") #print(k$ind) # selected par(op) detach(kinesin)