read.fasta.pdb. bio3d 2.3-0

Usage

read.fasta.pdb(aln, prefix = "", pdbext = "", fix.ali = FALSE, pdblist=NULL, ncore = 1, nseg.scale = 1, ...)

Arguments

aln: an alignment data structure obtained with read.fasta.
prefix: prefix to aln$id to locate PDB files.
pdbext: the file name extention of the PDB files.
fix.ali: logical, if TRUE check consistence between $ali and $resno, and correct $ali if they don't match.
pdblist: an optional list of pdb objects with sequence corresponding to the alignments in aln. Primarily used through function pdbaln when the PDB objects already exists (avoids reading PDBs from file).
ncore: number of CPU cores used to do the calculation. ncore>1 requires package ‘parallel’ installed.
nseg.scale: split input data into specified number of segments prior to running multiple core calculation. See fit.xyz.
...: other parameters for read.pdb.

Description

Read aligned PDB structures and store their C-alpha atom data, including xyz coordinates, residue numbers, residue type and B-factors.

Details

The input aln, produced with read.fasta, must have identifers (i.e. sequence names) that match the PDB file names. For example the sequence corresponding to the structure “1bg2.pdb” should have the identifer ‘1bg2’. See examples below.

Sequence miss-matches will generate errors. Thus, care should be taken to ensure that the sequences in the alignment match the sequences in their associated PDB files.

Value

xyz: numeric matrix of aligned C-alpha coordinates.
resno: character matrix of aligned residue numbers.
b: numeric matrix of aligned B-factor values.
chain: character matrix of aligned chain identifiers.
id: character vector of PDB sequence/structure names.
ali: character matrix of aligned sequences.
resid: character matrix of aligned 3-letter residue names.
sse: character matrix of aligned helix and strand secondary structure elements as defined in each PDB file.
call: the matched call.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

Note

The sequence character ‘X’ is useful for masking unusual or unknown residues, as it can match any other residue type.

Examples


# Redundant testing excluded

# Read sequence alignment
file <- system.file("examples/kif1a.fa",package="bio3d")
aln  <- read.fasta(file)

# Read aligned PDBs
pdbs <- read.fasta.pdb(aln)

pdb/seq: 1   name: http://www.rcsb.org/pdb/files/1bg2.pdb 
pdb/seq: 2   name: http://www.rcsb.org/pdb/files/1i6i.pdb 
   PDB has ALT records, taking A only, rm.alt=TRUE
pdb/seq: 3   name: http://www.rcsb.org/pdb/files/1i5s.pdb 
   PDB has ALT records, taking A only, rm.alt=TRUE
pdb/seq: 4   name: http://www.rcsb.org/pdb/files/2ncd.pdb 


# Structure/sequence names/ids
basename( pdbs$id )

[1] "1bg2.pdb" "1i6i.pdb" "1i5s.pdb" "2ncd.pdb"


# Alignment positions 335 to 339
pdbs$ali[,335:339]

                                       [,1] [,2] [,3] [,4] [,5]
http://www.rcsb.org/pdb/files/1bg2.pdb "L"  "L"  "F"  "G"  "Q" 
http://www.rcsb.org/pdb/files/1i6i.pdb "L"  "R"  "Y"  "A"  "D" 
http://www.rcsb.org/pdb/files/1i5s.pdb "L"  "R"  "Y"  "A"  "D" 
http://www.rcsb.org/pdb/files/2ncd.pdb "L"  "R"  "F"  "A"  "A" 

pdbs$resid[,335:339]

                                       [,1]  [,2]  [,3]  [,4]  [,5] 
http://www.rcsb.org/pdb/files/1bg2.pdb "LEU" "LEU" "PHE" "GLY" "GLN"
http://www.rcsb.org/pdb/files/1i6i.pdb "LEU" "ARG" "TYR" "ALA" "ASP"
http://www.rcsb.org/pdb/files/1i5s.pdb "LEU" "ARG" "TYR" "ALA" "ASP"
http://www.rcsb.org/pdb/files/2ncd.pdb "LEU" "ARG" "PHE" "ALA" "ALA"

pdbs$resno[,335:339]

                                       [,1] [,2] [,3] [,4] [,5]
http://www.rcsb.org/pdb/files/1bg2.pdb  316  317  318  319  320
http://www.rcsb.org/pdb/files/1i6i.pdb  345  346  347  348  349
http://www.rcsb.org/pdb/files/1i5s.pdb  345  346  347  348  349
http://www.rcsb.org/pdb/files/2ncd.pdb  661  662  663  664  665

pdbs$b[,335:339]

                                        [,1]  [,2]  [,3]  [,4]  [,5]
http://www.rcsb.org/pdb/files/1bg2.pdb 14.99 15.46  9.61 12.16 26.17
http://www.rcsb.org/pdb/files/1i6i.pdb 19.27 22.65 23.66 20.76 26.02
http://www.rcsb.org/pdb/files/1i5s.pdb 16.08 19.46 16.91 20.48 26.71
http://www.rcsb.org/pdb/files/2ncd.pdb 14.01 22.77 31.48 44.88 54.12


# Alignment C-alpha coordinates for these positions
pdbs$xyz[, atom2xyz(335:339)]

                                         [,1]    [,2]   [,3]   [,4]    [,5]
http://www.rcsb.org/pdb/files/1bg2.pdb 18.880  -4.358 46.825 19.801  -7.246
http://www.rcsb.org/pdb/files/1i6i.pdb  4.206  34.360  4.539  3.439  38.005
http://www.rcsb.org/pdb/files/1i5s.pdb 16.417 -33.136  4.337 17.452 -36.750
http://www.rcsb.org/pdb/files/2ncd.pdb  0.185  37.134 46.308 -0.375  34.180
                                         [,6]   [,7]    [,8]   [,9]  [,10]
http://www.rcsb.org/pdb/files/1bg2.pdb 44.546 21.449  -4.913 42.020 23.509
http://www.rcsb.org/pdb/files/1i6i.pdb  5.403  1.775  37.124  8.670  4.660
http://www.rcsb.org/pdb/files/1i5s.pdb  5.131 19.143 -35.570  8.292 15.967
http://www.rcsb.org/pdb/files/2ncd.pdb 48.642 -2.386  36.408 50.962 -4.710
                                         [,11]  [,12]  [,13]   [,14]  [,15]
http://www.rcsb.org/pdb/files/1bg2.pdb  -3.354 44.785 24.545  -6.830 45.885
http://www.rcsb.org/pdb/files/1i6i.pdb  34.826  9.548  7.057  37.666  8.763
http://www.rcsb.org/pdb/files/1i5s.pdb -33.755  9.354 13.829 -36.723  8.388
http://www.rcsb.org/pdb/files/2ncd.pdb  36.941 48.048 -4.498  33.173 47.625


# See 'fit.xyz()' function for actual coordinate superposition
#  e.g. fit to first structure
# xyz <- fit.xyz(pdbs$xyz[1,], pdbs)
# xyz[, atom2xyz(335:339)]

Author

Barry Grant

Read Aligned Structure Data