Read MOL2 File

Usage

read.mol2(file, maxlines = -1L)
"print"(x, ...)

Arguments

file
a single element character vector containing the name of the MOL2 file to be read.
maxlines
the maximum number of lines to read before giving up with large files. Default is all lines.
x
an object as obtained from read.mol2.
...
additional arguments to ‘print’.

Description

Read a Tripos MOL2 file

Details

Basic functionality to parse a MOL2 file. The current version reads and stores ‘@MOLECULE’, ‘@ATOM’, ‘@BOND’ and ‘@SUBSTRUCTURE’ records.

In the case of a multi-molecule MOL2 file, each molecule will be stored as an individual ‘mol2’ object in a list. Conversely, if the multi-molecule MOL2 file contains identical molecules in different conformations (typically from a docking run), then the output will be one object with an atom and xyz component (xyz in matrix representation; row-wise coordinates).

See examples for further details.

Value

Returns a list of molecules containing the following components:
atom
a data frame containing all atomic coordinate ATOM data, with a row per ATOM and a column per record type. See below for details of the record type naming convention (useful for accessing columns).

bond
a data frame containing all atomic bond information.

substructure
a data frame containing all substructure information.

xyz
a numeric matrix of ATOM coordinate data.

info
a numeric vector of MOL2 info data.

name
a single element character vector containing the molecule name.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

Note

For atom list components the column names can be used as a convenient means of data access, namely: Atom serial number “eleno”, Atom name “elena”, Orthogonal coordinates “x”, Orthogonal coordinates “y”, Orthogonal coordinates “z”, Reisude number “resno”, Atom type “elety”, Residue name “resid”, Atom charge “charge”, Status bit “statbit”,

For bond list components the column names are: Bond identifier “id”, number of the atom at one end of the bond“origin”, number of the atom at the other end of the bond “target”, the SYBYL bond type “type”.

For substructure list components the column names are: substructure identifier “id”, substructure name “name”, the ID number of the substructure's root atom “root_atom”, the substructure type “subst_type”, the type of dictionary associated with the substructure “dict_type”, the chain to which the substructre belongs “chain”, the subtype of the chain “sub_type”, the number of inter bonds “inter_bonds”, status bit “status”.

See examples for further details.

Examples

cat("\n")
## Read a single entry MOL2 file ## (returns a single object) mol <- read.mol2( system.file("examples/aspirin.mol2", package="bio3d") ) ## Short summary of the molecule print(mol)
... Name: ZINC00000053 ... 20 atoms in molecule ... 20 bonds in molecule ... 1 frame(s) stored + attr: atom, bond, substructure, xyz, info, name
## ATOM records mol$atom
eleno elena x y z elety resno resid charge statbit 1 1 C1 -1.4238 1.4221 1.2577 C.3 1 <0> -0.1370 <NA> 2 2 C2 -1.3441 -0.0813 1.1904 C.2 1 <0> 0.4790 <NA> 3 3 O1 -1.4532 -0.7378 2.1988 O.2 1 <0> -0.5137 <NA> 4 4 O2 -1.1519 -0.6914 0.0093 O.3 1 <0> -0.2785 <NA> 5 5 C3 -0.9822 -2.0375 0.0080 C.ar 1 <0> 0.1140 <NA> 6 6 C4 0.2935 -2.5790 0.0577 C.ar 1 <0> -0.1438 <NA> 7 7 C5 0.4647 -3.9491 0.0557 C.ar 1 <0> -0.1168 <NA> 8 8 C6 -0.6330 -4.7933 0.0047 C.ar 1 <0> -0.1375 <NA> 9 9 C7 -1.9078 -4.2739 -0.0447 C.ar 1 <0> -0.0755 <NA> 10 10 C8 -2.0966 -2.8882 -0.0379 C.ar 1 <0> -0.1363 <NA> 11 11 C9 -3.4564 -2.3257 -0.0844 C.2 1 <0> 0.5228 <NA> 12 12 O3 -4.4256 -3.0711 -0.1243 O.co2 1 <0> -0.6853 <NA> 13 13 O4 -3.6170 -1.1130 -0.0830 O.co2 1 <0> -0.6764 <NA> 14 14 H1 -1.5815 1.7317 2.2908 H 1 <0> 0.0951 <NA> 15 15 H2 -0.4932 1.8522 0.8875 H 1 <0> 0.0914 <NA> 16 16 H3 -2.2544 1.7698 0.6434 H 1 <0> 0.1039 <NA> 17 17 H4 1.1543 -1.9280 0.0976 H 1 <0> 0.1207 <NA> 18 18 H5 1.4605 -4.3654 0.0938 H 1 <0> 0.1210 <NA> 19 19 H6 -0.4882 -5.8635 0.0032 H 1 <0> 0.1200 <NA> 20 20 H7 -2.7604 -4.9357 -0.0844 H 1 <0> 0.1328 <NA>
## BOND records mol$bond
id origin target type statbit 1 1 1 2 1 <NA> 2 2 1 14 1 <NA> 3 3 1 15 1 <NA> 4 4 1 16 1 <NA> 5 5 2 3 2 <NA> 6 6 2 4 1 <NA> 7 7 4 5 1 <NA> 8 8 5 10 ar <NA> 9 9 5 6 ar <NA> 10 10 6 7 ar <NA> 11 11 6 17 1 <NA> 12 12 7 8 ar <NA> 13 13 7 18 1 <NA> 14 14 8 9 ar <NA> 15 15 8 19 1 <NA> 16 16 9 10 ar <NA> 17 17 9 20 1 <NA> 18 18 10 11 1 <NA> 19 19 11 12 2 <NA> 20 20 11 13 1 <NA>
## Print some coordinate data head(mol$atom[, c("x","y","z")])
x y z 1 -1.4238 1.4221 1.2577 2 -1.3441 -0.0813 1.1904 3 -1.4532 -0.7378 2.1988 4 -1.1519 -0.6914 0.0093 5 -0.9822 -2.0375 0.0080 6 0.2935 -2.5790 0.0577
## Or coordinates as a numeric vector #head(mol$xyz) ## Print atom charges head(mol$atom[, "charge"])
[1] -0.1370 0.4790 -0.5137 -0.2785 0.1140 -0.1438
## Convert to PDB pdb <- as.pdb(mol)
Summary of PDB generation: .. number of atoms in PDB determined by 'xyz' .. 0 atom(s) from 'string' selection .. 0 atom(s) in final combined selection .. number of atoms in PDB: 20 .. number of calphas in PDB: 0 .. number of residues in PDB: 1
## Read a multi-molecule MOL2 file ## (returns a list of objects) #multi.mol <- read.mol2("zinc.mol2") ## Number of molecules described in file #length(multi.mol) ## Access ATOM records for the first molecule #multi.mol[[1]]$atom ## Or coordinates for the second molecule #multi.mol[[2]]$xyz ## Process output from docking (e.g. DOCK) ## (typically one molecule with many conformations) ## (returns one object, but xyz in matrix format) #dock <- read.mol2("dock.mol2") ## Reference PDB file (e.g. X-ray structure) #pdb <- read.pdb("dock_ref.pdb") ## Calculate RMSD of docking modes #sele <- atom.select(dock, "noh") #rmsd(pdb$xyz, dock$xyz, b.inds=sele$xyz)

See also

write.mol2, atom.select.mol2, trim.mol2, as.pdb.mol2 read.pdb

Author

Lars Skjaerven