Split a PDB File Into Separate Files, One For Each Chain.

Usage

pdbsplit(pdb.files, ids = NULL, path = "split_chain", overwrite=TRUE, verbose = FALSE, mk4=FALSE, ncore = 1, ...)

Arguments

pdb.files
a character vector of PDB file names.
ids
a character vector of PDB and chain identifiers (of the form: ‘pdbId_chainId’, e.g. ‘1bg2_A’). Used for filtering chain IDs for output (in the above example only chain A would be produced).
path
output path for chain-split files.
overwrite
logical, if FALSE the PDB structures will not be read and written if split files already exist.
verbose
logical, if TRUE details of the PDB header and chain selections are printed.
mk4
logical, if TRUE output filenames will use only the first four characters of the input filename (see basename.pdb for details).
ncore
number of CPU cores used for the calculation. ncore>1 requires package ‘parallel’ be installed.
...
additional arguments to read.pdb. Useful e.g. for parsing multi model PDB files, including ALT records etc. in the output files.

Description

Split a Protein Data Bank (PDB) coordinate file into new separate files with one file for each chain.

Details

This function will produce single chain PDB files from multi-chain input files. By default all separate filenames are returned. To return only a subset of select chains the optional input ‘ids’ can be provided to filter the output (e.g. to fetch only chain C, of a PDB object with additional chains A+B ignored). See examples section for further details.

Note that multi model atom records will only split into individual PDB files if multi=TRUE, else they are omitted. See examples.

Value

Returns a character vector of chain-split file names.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

For a description of PDB format (version3.3) see: http://www.wwpdb.org/documentation/format33/v3.3.html.

Examples

## Save separate PDB files for each chain of a local or on-line file pdbsplit( get.pdb("2KIN", URLonly=TRUE) )
|======================================================================| 100%
[1] "split_chain/2KIN_A.pdb" "split_chain/2KIN_B.pdb"
## Split several PDBs by chain ID and multi-model records raw.files <- get.pdb( c("1YX5", "3NOB") , URLonly=TRUE) chain.files <- pdbsplit(raw.files, path=tempdir(), multi=TRUE)
|======================================================================| 100%
basename(chain.files)
[1] "1YX5_A.01.pdb" "1YX5_A.02.pdb" "1YX5_A.03.pdb" "1YX5_A.04.pdb" [5] "1YX5_A.05.pdb" "1YX5_A.06.pdb" "1YX5_A.07.pdb" "1YX5_A.08.pdb" [9] "1YX5_A.09.pdb" "1YX5_A.10.pdb" "1YX5_A.11.pdb" "1YX5_A.12.pdb" [13] "1YX5_A.13.pdb" "1YX5_A.14.pdb" "1YX5_A.15.pdb" "1YX5_A.16.pdb" [17] "1YX5_A.17.pdb" "1YX5_A.18.pdb" "1YX5_B.01.pdb" "1YX5_B.02.pdb" [21] "1YX5_B.03.pdb" "1YX5_B.04.pdb" "1YX5_B.05.pdb" "1YX5_B.06.pdb" [25] "1YX5_B.07.pdb" "1YX5_B.08.pdb" "1YX5_B.09.pdb" "1YX5_B.10.pdb" [29] "1YX5_B.11.pdb" "1YX5_B.12.pdb" "1YX5_B.13.pdb" "1YX5_B.14.pdb" [33] "1YX5_B.15.pdb" "1YX5_B.16.pdb" "1YX5_B.17.pdb" "1YX5_B.18.pdb" [37] "3NOB_A.pdb" "3NOB_B.pdb" "3NOB_C.pdb" "3NOB_D.pdb" [41] "3NOB_E.pdb" "3NOB_F.pdb" "3NOB_G.pdb" "3NOB_H.pdb"
## Output only desired pdbID_chainID combinations ## for the last entry (1f9j), fetch all chains ids <- c("1YX5_A", "3NOB_B", "1F9J") raw.files <- get.pdb( ids , URLonly=TRUE) chain.files <- pdbsplit(raw.files, ids, path=tempdir())
|======================================================================| 100%
basename(chain.files)
[1] "1YX5_A.pdb" "3NOB_B.pdb" "1F9J_A.pdb" "1F9J_B.pdb"

See also

read.pdb, atom.select, write.pdb, get.pdb.

Author

Barry Grant