Structure-based Sequence Alignment with MUSTANG

Usage

mustang(files, exefile="mustang", outfile="aln.mustang.fa", cleanpdb=FALSE, cleandir="mustangpdbs", verbose=TRUE)

Arguments

files
a character vector of PDB file names.
exefile
file path to the ‘MUSTANG’ program on your system (i.e. how is ‘MUSTANG’ invoked).
outfile
name of ‘FASTA’ output file to which alignment should be written.
cleanpdb
logical, if TRUE iterate over the PDB files and map non-standard residues to standard residues (e.g. SEP->SER..) to produce ‘clean’ PDB files.
cleandir
character string specifying the directory in which the ‘clean’ PDB files should be written.
verbose
logical, if TRUE ‘MUSTANG’ warning and error messages are printed.

Description

Create a multiple sequence alignment from a bunch of PDB files.

Details

Structure-based sequence alignment with ‘MUSTANG’ attempts to arrange and align the sequences of proteins based on their 3D structure.

This function calls the ‘MUSTANG’ program, to perform a multiple structure alignment, which MUST BE INSTALLED on your system and in the search path for executables.

Note that non-standard residues are mapped to “Z” in MUSTANG. As a workaround the bio3d ‘mustang’ function will attempt to map any non-standard residues to standard residues (e.g. SEP->SER, etc). To avoid this behaviour use ‘cleanpdb=FALSE’.

Value

A list with two components:
ali
an alignment character matrix with a row per sequence and a column per equivalent aminoacid.

ids
sequence names as identifers.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

‘MUSTANG’ is the work of Konagurthu et al: Konagurthu, A.S. et al. (2006) Proteins 64(3):559--74.

More details of the ‘MUSTANG’ algorithm, along with download and installation instructions can be obtained from: http://www.csse.monash.edu.au/~karun/Site/mustang.html.

Note

A system call is made to the ‘MUSTANG’ program, which must be installed on your system and in the search path for executables.

Examples

## Fetch PDB files and split to chain A only PDB files ids <- c("1a70_A", "1czp_A", "1frd_A") files <- get.pdb(ids, split = TRUE, path = tempdir())
|======================================================================| 100%
##-- Or, read a folder/directory of existing PDB files #pdb.path <- "my_dir_of_pdbs" #files <- list.files(path=pdb.path , # pattern=".pdb", # full.names=TRUE) ##-- Align these PDB sequences aln <- mustang(files)
Running command mustang -f /tmp/RtmpTDihxb/filecfdb57737ba6 -o /tmp/RtmpTDihxb/filecfdb453ee916 -F fasta
##-- Read Aligned PDBs storing coordinate data pdbs <- read.fasta.pdb(aln)
pdb/seq: 1 name: /tmp/RtmpTDihxb/split_chain/1a70_A.pdb pdb/seq: 2 name: /tmp/RtmpTDihxb/split_chain/1czp_A.pdb PDB has ALT records, taking A only, rm.alt=TRUE pdb/seq: 3 name: /tmp/RtmpTDihxb/split_chain/1frd_A.pdb

See also

read.fasta, read.fasta.pdb, pdbaln, plot.fasta, seqaln

Author

Lars Skjaerven