Read Aligned Structure Data

Read aligned PDB structures and store their equalvalent atom data, including xyz coordinates, residue numbers, residue type and B-factors.

read.all(aln, prefix = "", pdbext = "", sel = NULL, rm.wat=TRUE, rm.ligand=FALSE,
         compact = TRUE, ncore = NULL, ...)

Arguments

aln	an alignment data structure obtained with `read.fasta`.
prefix	prefix to aln$id to locate PDB files.
pdbext	the file name extention of the PDB files.
sel	a selection string detailing the atom type data to store (see function store.atom)
rm.wat	logical, if TRUE water atoms are removed.
rm.ligand	logical, if TRUE ligand atoms are removed.
compact	logical, if TRUE the number of atoms stored for each aligned residue varies according to the amino acid type. If FALSE, the constant maximum possible number of atoms are stored for all aligned residues.
ncore	number of CPU cores used to do the calculation. By default (`ncore=NULL`) use all detected CPU cores.
...	other parameters for `read.pdb`.

Details

The input aln, produced with read.fasta, must have identifers (i.e. sequence names) that match the PDB file names. For example the sequence corresponding to the structure file “mypdbdir/1bg2.pdb” should have the identifer ‘mypdbdir/1bg2.pdb’ or ‘1bg2’ if input ‘prefix’ and ‘pdbext’ equal ‘mypdbdir/’ and ‘pdb’. See the examples below.

Sequence miss-matches will generate errors. Thus, care should be taken to ensure that the sequences in the alignment match the sequences in their associated PDB files.

Value

Returns a list of class "pdbs" with the following five components:

xyz

numeric matrix of aligned C-alpha coordinates.

resno

character matrix of aligned residue numbers.

numeric matrix of aligned B-factor values.

chain

character matrix of aligned chain identifiers.

character vector of PDB sequence/structure names.

ali

character matrix of aligned sequences.

resid

character matrix of aligned 3-letter residue names.

all

numeric matrix of aligned equalvelent atom coordinates.

all.elety

numeric matrix of aligned atom element types.

all.resid

numeric matrix of aligned three-letter residue codes.

all.resno

numeric matrix of aligned residue numbers.

all.grpby

numeric vector indicating the group of atoms belonging to the same aligned residue.

all.hetatm

a list of ‘pdb’ objects for non-protein atoms.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

Author

Barry Grant

Note

This function is still in development and is NOT part of the offical bio3d package.

The sequence character ‘X’ is useful for masking unusual or unknown residues, as it can match any other residue type.

Examples

# still working on speeding this guy up
if (FALSE) {
## Read sequence alignment
file <- system.file("examples/kif1a.fa",package="bio3d")
aln  <- read.fasta(file)

## Read aligned PDBs storing all data for 'sel'
sel <- c("N", "CA", "C", "O", "CB", "*G", "*D",  "*E", "*Z")
pdbs <- read.all(aln, sel=sel)

atm <- colnames(pdbs$all)
ca.ind  <- which(atm == "CA")
core <- core.find(pdbs)
core.ind <- c( matrix(ca.ind, nrow=3)[,core$c0.5A.atom] )

## Fit structures
nxyz <- fit.xyz(pdbs$all[1,], pdbs$all,
               fixed.inds  = core.ind,
               mobile.inds = core.ind)

ngap.col <- gap.inspect(nxyz)

#npc.xray <- pca.xyz(nxyz[ ,ngap.col$f.inds])

#a <- mktrj.pca(npc.xray, pc=1, file="pc1-all.pdb",
#               elety=pdbs$all.elety[1,unique( ceiling(ngap.col$f.inds/3) )],
#               resid=pdbs$all.resid[1,unique( ceiling(ngap.col$f.inds/3) )],
#               resno=pdbs$all.resno[1,unique( ceiling(ngap.col$f.inds/3) )] )

}