Perform a HMMER search against the PDB, NR, swissprot or other sequence and structure databases.

hmmer(seq, type="phmmer", db = NULL, verbose = TRUE, timeout = 90)

Arguments

seq

a multi-element character vector containing the query sequence. Alternatively a ‘fasta’ object as obtained from functions get.seq or read.fasta can be provided.

type

character string specifying the ‘HMMER’ job type. Current options are ‘phmmer’, ‘hmmscan’, ‘hmmsearch’, and ‘jackhmmer’.

db

character string specifying the database to search. Current options are ‘pdb’, ‘nr’, ‘swissprot’, ‘pfam’, etc. See ‘details’ for a complete list.

verbose

logical, if TRUE details of the download process is printed.

timeout

integer specifying the number of seconds to wait for the blast reply before a time out occurs.

Details

This function employs direct HTTP-encoded requests to the HMMER web server. HMMER can be used to search sequence databases for homologous protein sequences. The HMMER server implements methods using probabilistic models called profile hidden Markov models (profile HMMs).

There are currently four types of HMMER search to perform:

- ‘phmmer’: protein sequence vs protein sequence database.
(input argument seq must be a sequence).

Allowed options for type includes: ‘env_nr’, ‘nr’, ‘refseq’, ‘pdb’, ‘rp15’, ‘rp35’, ‘rp55’, ‘rp75’, ‘swissprot’, ‘unimes’, ‘uniprotkb’, ‘uniprotrefprot’, ‘pfamseq’.

- ‘hmmscan’: protein sequence vs profile-HMM database.
(input argument seq must be a sequence).

Allowed options for type includes: ‘pfam’, ‘gene3d’, ‘superfamily’, ‘tigrfam’.

- ‘hmmsearch’: protein alignment/profile-HMM vs protein sequence database.
(input argument seq must be an alignment).

Allowed options for type includes: ‘pdb’, ‘swissprot’.

- ‘jackhmmer’: iterative search vs protein sequence database.
(input argument seq must be an alignment). ‘jackhmmer’ functionality incomplete!!

Allowed options for type includes: ‘env_nr’, ‘nr’, ‘refseq’, ‘pdb’, ‘rp15’, ‘rp35’, ‘rp55’, ‘rp75’, ‘swissprot’, ‘unimes’, ‘uniprotkb’, ‘uniprotrefprot’, ‘pfamseq’.

More information can be found at the HMMER website:
http://hmmer.org

Value

A list object with components ‘hit.tbl’ and ‘url’. ‘hit.tbl’ is a data frame with multiple components depending on the selected job ‘type’. Frequently reported fields include:

name

a character vector containing the name of the target.

acc

a character vector containing the accession identifier of the target.

acc2

a character vector containing secondary accession of the target.

pdb.id

same as ‘acc’.

id

a character vector containing Identifier of the target

desc

a character vector containing entry description.

score

a numeric vector containing bit score of the sequence (all domains, without correction).

bitscore

same as ‘score’.

pvalue

a numeric vector containing the P-value of the score.

evalue

a numeric vector containing the E-value of the score.

mlog.evalue

a numeric vector containing minus the natural log of the E-value.

nregions

a numeric vector containing Number of regions evaluated.

nenvelopes

a numeric vector containing the number of envelopes handed over for domain definition, null2, alignment, and scoring.

ndom

a numeric vector containing the total number of domains identified in this sequence.

nreported

a numeric vector containing the number of domains satisfying reporting thresholding.

nincluded

a numeric vector containing the number of domains satisfying inclusion thresholding.

taxid

a character vector containing The NCBI taxonomy identifier of the target (if applicable).

species

a character vector containing the species name.

kg

a character vector containing the kingdom of life that the target belongs to - based on placing in the NCBI taxonomy tree.

More details can be found at the HMMER website: http://www.ebi.ac.uk/Tools/hmmer/help/api

Note

Note that the chained ‘pdbs’ HMMER field (used for redundant PDBs) is included directly into the result list (applies only when db='pdb'). In this case, the ‘name’ component of the target contains the parent (non redundant) entry, and the ‘acc’ component the chained PDB identifiers. The search results will therefore provide duplicated PDB identifiers for component $name, while $acc should be unique.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

Finn, R.D. et al. (2011) Nucl. Acids Res. 39, 29--37. Eddy, S.R. (2011) PLoS Comput Biol 7(10): e1002195.

See also the ‘HMMER’ website:
http://hmmer.org

Author

Lars Skjaerven

Note

Online access is required to query HMMER services.

See also

Examples

if (FALSE) { # HMMER server connection required - testing excluded ##- PHMMER seq <- get.seq("2abl_A", outfile=tempfile()) res <- hmmer(seq, db="pdb") ##- HMMSCAN fam <- hmmer(seq, type="hmmscan", db="pfam") pfam.aln <- pfam(fam$hit.tbl$acc[1]) ##- HMMSEARCH hmm <- hmmer(pfam.aln, type="hmmsearch", db="pdb") unique(hmm$hit.tbl$species) hmm$hit.tbl$acc }