blast.pdb(seq, database = "pdb", time.out = NULL, chain.single=TRUE)get.blast(urlget, time.out = NULL, chain.single=TRUE)"plot"(x, cutoff = NULL, cut.seed=NULL, cluster=TRUE, mar=c(2, 5, 1, 1), cex=1.5, ...)
get.seq
or pdb object from
function read.pdb
can be provided. blast.pdb
. Run NCBI blastp, on a given sequence, against the PDB, NR and swissprot sequence databases. Produce plots that facilitate hit selection from the match statistics of a BLAST result.
The blast.pdb
function employs direct HTTP-encoded requests to the NCBI web
server to run BLASTP, the protein search algorithm of the BLAST
software package.
BLAST, currently the most popular pairwise sequence comparison algorithm for database searching, performs gapped local alignments via a heuristic strategy: it identifies short nearly exact matches or hits, bidirectionally extends non-overlapping hits resulting in ungapped extended hits or high-scoring segment pairs(HSPs), and finally extends the highest scoring HSP in both directions via a gapped alignment (Altschul et al., 1997)
For each pairwise alignment BLAST reports the raw score, bitscore and an E-value that assess the statistical significance of the raw score. Note that unlike the raw score E-values are normalized with respect to both the substitution matrix and the query and database lengths.
Here we also return a corrected normalized score (mlog.evalue) that in our experience is easier to handle and store than conventional E-values. In practice, this score is equivalent to minus the natural log of the E-value. Note that, unlike the raw score, this score is independent of the substitution matrix and and the query and database lengths, and thus is comparable between BLASTP searches.
Examining plots of BLAST alignment lengths, scores, E-values and normalized
scores (-log(E-Value) from the blast.pdb
function can aid in the
identification sensible hit similarity thresholds. This is facilitated by
the plot.blast
function.
If a cutoff value is not supplied then a basic hierarchical clustering of normalized scores is performed with initial group partitioning implemented at a hopefully sensible point in the vicinity of h=cut.seed. Inspection of the resultant plot can then be use to refine the value of cut.seed or indeed cutoff. As the cutoff value can vary depending on the desired application and indeed the properties of the system under study it is envisaged that plot.blast will be called multiple times to aid selection of a suitable cutoff value. See the examples below for further details.
blast.pdb
returns a list with three components,
hit.tbl
, raw
, and url
.
The function plot.blast
produces a plot on the
active graphics device and returns a list object with four components,
hits
, pdb.id
, acc
, and inds
. See below:
Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.
BLAST is the work of Altschul et al.: Altschul, S.F. et al. (1990) J. Mol. Biol. 215, 403--410.
Full details of the BLAST algorithm, along with download and installation instructions can be obtained from: http://www.ncbi.nlm.nih.gov/BLAST/.
Online access is required to query NCBI blast services.
pdb <- read.pdb("4q21")Note: Accessing on-line PDB fileblast <- blast.pdb( pdbseq(pdb) )Searching ... please wait (updates every 5 seconds) RID = YVA3R9PP015 . Reporting 319 hitshead(blast$hit.tbl)queryid subjectids identity positives alignmentlength 1 Query_6213 gi|231226|pdb|6Q21|A 100 100 168 2 Query_6213 gi|231227|pdb|6Q21|B 100 100 168 3 Query_6213 gi|231228|pdb|6Q21|C 100 100 168 4 Query_6213 gi|231229|pdb|6Q21|D 100 100 168 5 Query_6213 gi|15988032|pdb|1IOZ|A 100 100 168 6 Query_6213 gi|157829765|pdb|1AA9|A 100 100 168 mismatches gapopens q.start q.end s.start s.end evalue bitscore 1 0 0 1 168 1 168 1.11e-124 348 2 0 0 1 168 1 168 1.11e-124 348 3 0 0 1 168 1 168 1.11e-124 348 4 0 0 1 168 1 168 1.11e-124 348 5 0 0 1 168 1 168 1.11e-124 348 6 0 0 1 168 1 168 1.11e-124 348 mlog.evalue pdb.id acc 1 285.4162 6Q21_A 231226 2 285.4162 6Q21_B 231227 3 285.4162 6Q21_C 231228 4 285.4162 6Q21_D 231229 5 285.4162 1IOZ_A 15988032 6 285.4162 1AA9_A 157829765top.hits <- plot(blast)* Possible cutoff values: 241 74 Yielding Nhits: 233 319 * Chosen cutoff value of: 241 Yielding Nhits: 233head(top.hits$hits)pdb.id acc group 1 "6Q21_A" "231226" "1" 2 "6Q21_B" "231227" "1" 3 "6Q21_C" "231228" "1" 4 "6Q21_D" "231229" "1" 5 "1IOZ_A" "15988032" "1" 6 "1AA9_A" "157829765" "1"## Use 'get.blast()' to retrieve results at a later time. #x <- get.blast(blast$url) #head(x$hit.tbl) # Examine and download 'best' hits top.hits <- plot.blast(blast, cutoff=188)* Possible cutoff values: 241 74 Yielding Nhits: 233 319 * Chosen cutoff value of: 188 Yielding Nhits: 233head(top.hits$hits)pdb.id acc group 1 "6Q21_A" "231226" "1" 2 "6Q21_B" "231227" "1" 3 "6Q21_C" "231228" "1" 4 "6Q21_D" "231229" "1" 5 "1IOZ_A" "15988032" "1" 6 "1AA9_A" "157829765" "1"#get.pdb(top.hits)