Align a PDB structure to an existing alignment

Usage

pdb2aln(aln, pdb, id="seq.pdb", aln.id=NULL, file="pdb2aln.fa", ...)

Arguments

aln
an alignment list object with id and ali components, similar to that generated by read.fasta, read.fasta.pdb, and seqaln.
pdb
the PDB object to be added to aln.
id
name for the PDB sequence in the generated new alignment.
aln.id
id of the sequence in aln that is close to the sequence from pdb.
file
output file name for writing the generated new alignment.
...
additional arguments passed to seqaln.

Description

Extract sequence from a PDB object and align it to an existing multiple sequence alignment that you wish keep intact.

Details

The basic effect of this function is to add a PDB sequence to an existing alignement. In this case, the function is simply a wrapper of seq2aln.

The more advanced (and also more useful) effect is giving complete mappings from the column indices of the original alignment (aln$ali) to atomic indices of equivalent C-alpha atoms in the pdb. These mappings are stored in the output list (see below 'Value' section). This feature is better illustrated in the function pdb2aln.ind, which calls pdb2aln and directly returns atom selections given a set of alignment positions. (See pdb2aln.ind for details. )

When aln.id is provided, the function will do pairwise alignment between the sequence from pdb and the sequence in aln with id matching aln.id. This is the best way to use the function if the protein has an identical or very similar sequence to one of the sequences in aln.

Value

Return a list object of the class 'fasta' containing three components:
id
sequence names as identifers.

ali
an alignment character matrix with a row per sequence and a column per equivalent aminoacid/nucleotide.

ref
an integer 2xN matrix, where N is the number of columns of the new alignment ali. The first row contains the column indices of the original alignment aln$ali. The second row contains atomic indices of equivalent C-alpha atoms in pdb. Gaps in the new alignement are indicated by NAs.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

Examples

##--- Read aligned PDB coordinates (CA only) aln <- read.fasta(system.file("examples/kif1a.fa",package="bio3d")) pdbs <- read.fasta.pdb(aln)
pdb/seq: 1 name: http://www.rcsb.org/pdb/files/1bg2.pdb pdb/seq: 2 name: http://www.rcsb.org/pdb/files/1i6i.pdb PDB has ALT records, taking A only, rm.alt=TRUE pdb/seq: 3 name: http://www.rcsb.org/pdb/files/1i5s.pdb PDB has ALT records, taking A only, rm.alt=TRUE pdb/seq: 4 name: http://www.rcsb.org/pdb/files/2ncd.pdb
##--- Read PDB coordinate for a new structure (all atoms) id <- get.pdb("2kin", URLonly=TRUE) pdb <- read.pdb(id)
PDB has ALT records, taking A only, rm.alt=TRUE
# add pdb to the alignment naln <- pdb2aln(aln=pdbs, pdb=pdb, id=id) naln
1 . . . . 50 [Truncated_Name:1]1bg2.pdb ------NIKVMCRFRPLNESEVNRGDKYIAKFQGEDTV----VIASK--- [Truncated_Name:2]1i6i.pdb ------SVKVAVRVRPFNSREMSRDSKCIIQMSGSTTT----IVNPKQPK [Truncated_Name:3]1i5s.pdb ------SVKVAVRVRPFNSREMSRDSKCIIQMSGSTTT----IVNPKQPK [Truncated_Name:4]2ncd.pdb ------NIRVFCRIRPPLESEENRMC-CTWTYHDESTVELQSIDAQAKSK [Truncated_Name:5]2kin.pdb ADPAECSIKVMCRFRPLNEAEILRGDKFIPKFKGEETVVIGQ-------- ^^* * ** * * * 1 . . . . 50 51 . . . . 100 [Truncated_Name:1]1bg2.pdb ----PYAFDRVFQSS--------TSQEQVYNDCAKKIVKDVLEGYNGTIF [Truncated_Name:2]1i6i.pdb ETPKSFSFDYSYWSHTSPEDINYASQKQVYRDIGEEMLQHAFEGYNVCIF [Truncated_Name:3]1i5s.pdb ETPKSFSFDYSYWSHTSPEDINYASQKQVYRDIGEEMLQHAFEGYNVCIF [Truncated_Name:4]2ncd.pdb MGQQIFSFDQVFHPL--------SSQSDIF-EMVSPLIQSALDGYNICIF [Truncated_Name:5]2kin.pdb --GKPYVFDRVLPPN--------TTQEQVYNACAKQIVKDVLEGYNGTIF ^ ** ^* ^^ ^^ ^*** ** 51 . . . . 100 101 . . . . 150 [Truncated_Name:1]1bg2.pdb AYGQTSSGKTHTMEGKLHDPEGMGIIPRIVQDIFNYIYSMDENL-EFHIK [Truncated_Name:2]1i6i.pdb AYGQTGAGKSYTMMGKQEK-DQQGIIPQLCEDLFSRINDTTNDNMSYSVE [Truncated_Name:3]1i5s.pdb AYGQTGAGKSYTMMGKQEK-DQQGIIPQLCEDLFSRINDTTNDNMSYSVE [Truncated_Name:4]2ncd.pdb AYGQTGSGKTYTMDGV---PESVGVIPRTVDLLFDSIRGYRNLGWEYEIK [Truncated_Name:5]2kin.pdb AYGQTSSGKTHTMEGKLHDPQLMGIIPRIAHDIFDHIYSMDEN-LEFHIK ***** **^ ** * *^** ^* * ^ ^ 101 . . . . 150 151 . . . . 200 [Truncated_Name:1]1bg2.pdb VSYFEIYLDKIRDLL-DVSKT-NLSVHEDKNRVPYVKGCTERFVCSPDEV [Truncated_Name:2]1i6i.pdb VSYMEIYCERVRDLL-NPKNKGNLRVREHPLLGPYVEDLSKLAVTSYNDI [Truncated_Name:3]1i5s.pdb VSYMEIYCERVRDLL-NPKNKGNLRVREHPLLGPYVEDLSKLAVTSYNDI [Truncated_Name:4]2ncd.pdb ATFLEIYNEVLYDLLSNEQKDMEIRMAKNNKNDIYVSNITEETVLDPNHL [Truncated_Name:5]2kin.pdb VSYFEIYLDKIRDLL--DVSKTNLAVHEDKNRVPYVKGCTERFVSSPEEV ^^ *** ^ ^ *** ^ ^ ** ^ * ^ 151 . . . . 200 201 . . . . 250 [Truncated_Name:1]1bg2.pdb MDTIDEGKSNRHVAVTNMNEHSSRSHSIFLINVKQENTQT----EQKLSG [Truncated_Name:2]1i6i.pdb QDLMDSGNKARTVAATNMNETSSRSHAVFNIIFTQKRHDAETNITTEKVS [Truncated_Name:3]1i5s.pdb QDLMDSGNKARTVAATNMNETSSRSHAVFNIIFTQKRHDAETNITTEKVS [Truncated_Name:4]2ncd.pdb RHLMHTAKMNRATASTAGNERSSRSHAVTKLELIGRHAEK----QEISVG [Truncated_Name:5]2kin.pdb MDVIDEGKANRHVAVTNMNEHSSRSHSIFLINIKQENVET----EKKLSG ^ ^ * * * ** ***** ^ ^ 201 . . . . 250 251 . . . . 300 [Truncated_Name:1]1bg2.pdb KLYLVDLAGSEKVSKTGAEGAVLDEAKNINKSLSALGNVISALAEG-STY [Truncated_Name:2]1i6i.pdb KISLVDLAGSE---------------ANINKSLTTLGKVISALAEM-D-F [Truncated_Name:3]1i5s.pdb KISLVDLAGSER-----AKGTRLKEGANINKSLTTLGKVISALAEM-D-- [Truncated_Name:4]2ncd.pdb SINLVDLAGSES--------------PNINRSLSELTNVILALLQK-QDH [Truncated_Name:5]2kin.pdb KLYLVDLAGSEKVA------------KNINKSLSALGNVISALAEGTKTH ^ ******** ***^**^ * ** ** 251 . . . . 300 301 . . . . 350 [Truncated_Name:1]1bg2.pdb VPYRDSKMTRILQDSLGGNCRTTIVICCSPSSYNESETKSTLLFGQRAKT [Truncated_Name:2]1i6i.pdb IPYRDSVLTWLLRENLGGNSRTAMVAALSPADINYDETLSTLRYADRAKQ [Truncated_Name:3]1i5s.pdb IPYRDSVLTWLLRENLGGNSRTAMVAALSPADINYDETLSTLRYADRAK- [Truncated_Name:4]2ncd.pdb IPYRNSKLTHLLMPSLGGNSKTLMFINVSPFQDCFQESVKSLRFAASVNS [Truncated_Name:5]2kin.pdb VPYRDSKMTRILQDSLDGNCRTTIVICCSPSVFNEAETKSTLMFGQRAKT ^*** * ^* ^* * ** ^* ^ ** *^ ^* ^^ 301 . . . . 350 351 . . 375 [Truncated_Name:1]1bg2.pdb I------------------------ [Truncated_Name:2]1i6i.pdb I------------------------ [Truncated_Name:3]1i5s.pdb ------------------------- [Truncated_Name:4]2ncd.pdb C------------------------ [Truncated_Name:5]2kin.pdb IKNTVSVNLELTAEEWKKKYEKEKE 351 . . 375 Call: pdb2aln(aln = pdbs, pdb = pdb, id = id) Class: fasta Alignment dimensions: 5 sequence rows; 375 position columns (286 non-gap, 89 gap) + attr: id, ali, ref, call

See also

seqaln, seq2aln, seqaln.pair, pdb2aln.ind

Author

Xin-Qiu Yao & Barry Grant