scispace - formally typeset
Open AccessProceedings ArticleDOI

A dynamic data structure for flexible molecular maintenance and informatics

Reads0
Chats0
TLDR
The "Dynamic Packing Grid" (DPG) data structure is presented along with details of the implementation and performance results, for maintaining and manipulating flexible molecular models and assemblies, and can additionally be utilized in efficiently maintaining multiple "rigid" domains of dynamic flexible molecules.
Abstract
We present the "Dynamic Packing Grid" (DPG) data structure along with details of our implementation and performance results, for maintaining and manipulating flexible molecular models and assemblies. DPG can efficiently maintain the molecular surface (e.g., van der Waals surface and the solvent contact surface) under insertion/deletion/movement (i.e., updates) of atoms or groups of atoms. DPG also permits the fast estimation of important molecular properties (e.g., surface area, volume, polarization energy, etc.) that are needed for computing binding affinities in drug design or in molecular dynamics calculations. DPG can additionally be utilized in efficiently maintaining multiple "rigid" domains of dynamic flexible molecules. In DPG, each up-date takes only O (log w) time w.h.p. on a RAM with w-bit words i.e., O (1) time in practice, and hence is extremely fast. DPG's queries include the reporting of all atoms within O (rmax) distance from any given atom center or point in 3-space in O (log log w) (= O (1)) time w.h.p., where rmax is the radius of the largest atom in the molecule. It can also answer whether a given atom is exposed or buried under the surface within the same time bound, and can return the entire molecular surface in O (m) worst-case time, where m is the number of atoms on the surface. The data structure uses space linear in the number of atoms in the molecule.

read more

Content maybe subject to copyright    Report

A Dynamic Data Structure for Flexible Molecular
Maintenance and Informatics
Chandrajit Bajaj
Institute for Computational
Engineering and Science
University of Texas
Austin, TX 78712
bajaj@cs.utexas.edu
Rezaul Alam Chowdhury
Institute for Computational
Engineering and Science
University of Texas
Austin, TX 78712
shaikat@cs.utexas.edu
Muhibur Rasheed
Institute for Computational
Engineering and Science
University of Texas
Austin, TX 78712
muhibur@cs.utexas.edu
ABSTRACT
We present the “Dynamic Packing Grid” (DPG) data struc-
ture along with details of our implementation and perfor-
mance results, for maintaining and manipulating flexible
molecular models and assemblies. DPG can efficiently main-
tain the molecular surface (e.g., van der Waals surface and
the solvent contact surface) under insertion/deletion/ move-
ment (i.e., updates) of atoms or groups of atoms. DPG also
permits the fast estimation of important molecular prop-
erties (e.g., surface area, volume, polarization energy, etc.)
that are needed for computing binding affinities in drug de-
sign or in molecular dynamics calculations. DPG can addi-
tionally be utilized in efficiently maintaining multiple “rigid”
domains of dynamic flexible molecules. In DPG, each up-
date takes only O (log w) time w.h.p. on a RAM with w-bit
words i.e., O (1) time in practice, and hence is extremely
fast. DPG’s queries include the reporting of all atoms within
O (r
max
) distance from any given atom center or point in 3-
space in O (log log w) (= O (1)) time w.h.p., where r
max
is
the radius of the largest atom in the molecule. It can also
answer whether a given atom is exposed or buried under
the surface within the same time bound, and can return the
entire molecular surface in O (m) worst-case time, where m
is the number of atoms on the surface. The data structure
uses space linear in the number of atoms in the molecule.
Categories and Subject Descriptors
I.3.5 [Computer Graphics]: Computational Geometry and
Object Modeling—boundary representations; curve, surface,
solid, and object representations; geometric algorithms, lan-
guages, and systems; physically based modeling; F.2.2 [Analysis
of Algorithms and Problem Complexity]: Nonnumeri-
cal Algorithms and Problems—computations on discrete struc-
This research was supp orted in part by NSF grant CNS-
0540033 and NIH contracts R01-EB00487, R01-GM074258,
R01-GM07308.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SIAM/ACM Joint Conference on Geometric and Physical Modeling 2009
San Francisco, California USA
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00.
tures; geometrical problems and computations; J.6 [Computer-
Aided Engineering]: Computer-aided design (CAD)
General Terms
Algorithms, Design, Performance
Keywords
shape modeling, de novo drug design, computer aided de-
sign, interactive software, protein folding, molecular docking
1. INTRODUCTION
Many human functional processes are mediated through
the interactions amongst proteins, a major molecular con-
stituent of our anatomical makeup. A computational under-
standing of th ese interactions provides important clues for
developing therapeutic interventions related to diseases such
as cancer and metabolic disorders. Computational meth-
ods such as automated docking through shape and energetic
complementarity scoring, aim to gain insight and predict
such molecular interactions.
The most common model for proteins is a collection of
atoms represented by spherical balls, with radii equal to
their van der Waals radii [35, 16]. The surface of the union of
these spheres is known as the van der Waals surface. Lee and
Richards introduced the concept of accessibility to the sol-
vent [31]. Proteins are not isolated, but commonly present
in solutions, esp ecially water. Also, the van der Waals sur-
face contains too many internal atoms and patches which
are not accessible by the solvent or any other protein that
may bind to it. Hence, Lee and Richards gave a new defi-
nition for the protein surface or protein-solvent interface as
the surface accessible to the watery solvent. They modeled
water molecules as spheres with radius 1.4
˚
A, and considered
the locus of the center of one such ‘probe’, as it rolled along
the protein surface as the Solvent Accessible Surface (SAS).
Richards then gave a more commonly used definition for
molecular surface as a set of contact and reentrant patches
[42]. Though Connolly considered this an alternative defini-
tion of the SAS surface in [13], now it is commonly known
as the Solvent Contact Surface (SCS), or Solvent Excluded
Surface (SES) or simply the molecular surface/interface of
the protein.
Protein interactions or protein-protein docking involves
induced complementary fit between flexible protein inter-
faces and additionally the interface conformational changes
are often critical during the lock and key matching [43].

Figure 1: Visualization of the Rice Dwarf Virus (RDV) nucleo-capsid contains 3.5 mil lion atoms (left) while
Microtubule contains 1.2 million (right), using TexMol (http://cvcweb.ices.utexas.edu/software/#TexMol).
In this figure, atoms are color-coded using the standard Corey, Pauling, Koltun (CPK) color scheme.
The flexible docking solution space consisting of all relative
positions, orientations and conformations of the proteins,
is searched, and the putative dockings are evaluated us-
ing combinations of interface complementarity scoring, and
atomic pair-wise charged Coulombic interactions [27]. Since
proteins function in their predominantly watery (solvent)
environment, the computation of protein solvation energy
(or known as protein - solvent interaction energy) also plays
an important role in determining inter-molecular binding
affinities “in-vivo” for drug screening, as well as in molecular
dynamics simulations [52], and in the study of hydropho-
bicity and protein folding. When computing the solvation
energy for molecules, it is crucial to correctly model and
sample t he protein - solvent interface.
Since Richards introduced the SES definition, a number
of techniques have been devised for static construction of
the molecular surface (e.g., [12, 13, 53, 17, 50, 3, 45, 44, 55,
23, 7, 6]). However, not much work has been done on dy-
namic maintenance of molecular surfaces. In [8] Bajaj et al.
considered limited dynamic maintenance of molecular sur-
faces based on Non Uniform Rational BSplines ( NURBS )
descriptions for the patches. Eyal and Halperin [19, 20] pre-
sented an algorithm based on dynamic graph connectivity
that updates the molecular surface after a conformational
change in O
`
log
2
n
´
amortized time per affected (by this
change) atom.
In this paper we present the Dynamic Packing Grid (DPG)
a space and time efficient data structure that maintains
a collection of balls (atoms) in 3-space allowing a range of
spherical range queries and up dates for rapid scoring of flex-
ible protein-p rotein interactions. The efficiency of the data
structure results from the assumption t hat the centers of two
different balls in the collection cannot come arbitrarily close
to each other, which is a natural property of molecules. A
consequence of this assumption is that any ball in the collec-
tion can intersect at most a constant number of other balls.
On a RAM with w-bit words, the data structure can re-
port all balls intersecting a given ball or within O (r
max
)
distance from a given point in O (log log w) time w.h.p.,
where r
max
is the radius of the largest ball in the collec-
tion. It can also answer whether a given ball is exposed
(i.e., lies on the union boundary) or buried within the same
time bound. At any time the entire un ion boundary can
be extracted from the data structure in O (m) time in the
worst-case, where m is the numb er of atoms on the bound-
ary. Updates (i.e., insertion/deletion/movement of a ball)
are supported in O (log w) time (w.h.p.). The data struc-
ture uses linear space. A packing grid can maintain both
the van der Waals surface and the solvent contact surface
(SCS) of a molecule within the performance bounds men-
tioned above. Packing grids can be used to maintain the sur-
face of a flexible molecule decomposed into rigid domains so
that applying a bending/shearing/twisting motion between
two domains takes O (1 +
m log w) time (w.h.p.), where m is
the number of atoms in the connectors between the two do-
mains. We also describe a Hierarchical Packing Grid (HPG)
data structure that maintains a molecule at multiple resolu-
tions (atomic and coarser) under up dates, and can compute
any mixed resolution surface efficiently. Packing grids can
also aid in fast energetics calculation by rapidly locating the
atoms close to each sampled quadrature point on the SCS.
DPG has potential applications in interactive software tools
developed for de novo drug design (e.g., [30, 46, 18, 29]),
protein folding (e.g., [28, 14]) and molecular docking (e.g.,
[33, 2]) that u se human intuition and biological knowledge
in order to steer the prediction process. These applica-
tions often need to handle extremely large molecules and
macromolecules (e.g., as shown in Figure 1 Rice Dwarf Virus
with 3.5 million atoms, and Microtubule has 1.2 million),
and n eed to perform a sequ ence of dynamic updates on
them in real time. The Mol ecule Evaluator [30, 18] is a
de novo molecular design software based on adaptive inter-
active evolution. In a series of interactive steps it applies
a set of problem-specific mutation (e.g., add/remove atom,
add/remove group) and recombination operators on a set
of evolving molecules, and keeps track of several chemical
and biological properties of each molecule (e.g., molecular
mass, hydrophobicity, etc.). The ProteinShop software [28,
14] allows the interactive creation of protein structures (e.g.,
through shape manipulation) given an amino acid sequence
and a sequence of pred icted secondary structure types for
each amino acid. DockingShop [33] is a successor of Pro-
teinShop, which provides an interactive docking environ-
ment with flexibility of side chains and backbone movement.
Users can adjust the receptor protein structure by rotating
the backbone dihedral angles, changing the dihedral angles
of selected residues, substituting the side chain of selected
residues using a rotamer library, or changing a residue for
another while keeping the backbone xed. Figure 2 shows an
example where t he flexible movement/rearrangement of the

(a)
(b)
Figure 2: Figures (a) and (b) show the structure of
a soluble fragment of the envelope (E) Glycoprotein
from DV (dengue virus) type 2. Figure (a) shows
the crystals grown in the presence (pre-fusion) of
the detergent n-octyl-β -D-glucoside (β-OG, colored
in green), and Figure (b) shows the same in its
absence (post-fusion). The key difference between
these two structures is a local rearrangement of the
“kl” β-hairpin (residues 268-280) and the concomi-
tant opening up of a hydrophobic pocket for ligand
binding. In Figure (a) this pocket is occupied by a
molecule of β-OG [36].
“kl” β-hairpin on the envelope (E) Glycoprotein of dengue
virus opens up a hydrophobic pocket for ligand binding, and
the inhibitor n-octyl-β-D-glucoside docks into that pocket.
VRDD [2] supp orts molecular visualization and interactive
docking in a VR environment, and allows side-chain flexibil-
ity.
The molecular dynamic simulation tool IMD [49] allows
interactive manipulation of bio-molecular sy stems. It com-
bines interactive molecular visualization (using VMD [26])
with molecular dynamic simulation (using NAMD [38, 41])
in the background that supports manipulation of molecules
by applying force t o single atoms. Traditional all-atom molec-
ular dynamics (MD) simulation reveals in detail the protein
folding process, but it is restricted to small time scales on
the order of nanosecond [47] and small length range on the
order of nanometer [32, 34]. To fully investigate the folding
process of a protein into its functional structure, a larger
timescale from micro- to millisecond and larger length scale
of micrometer are needed [4]. Protein coarse grained (CG)
models which represent clusters of atoms with similar phys-
ical properties by CG beads and simplify the interactions
significantly reduce th e size of the system and therefore be-
come a promising approach to repro duce large-scale protein
motions.
The DPG data structure also h as potential applications in
tracking the dynamic structure of a particle system as parti-
cles move, appear and disappear [5, 22, 25]. Particle systems
are used for modeling a number of physical world scenarios
ranging from cosmological systems and plasma physics to
molecular systems, where particles are defined as smooth
functions with compact support. The applications are wide
and varied and include chemistry, material science, and bio-
engineering. The dynamic re- meshing problem for time de-
pendent particle systems arise in gas hydrodynamics simula-
tions essential in the computational investigation of the for-
mation of large scale stru ctures, such as galaxies and galaxy
clusters, in the universe [25]. For the meshing of particle
systems, it suffices to consider particles as idealized balls, or
radially symmetric domains of support of their kernels.
The rest of the paper is organized as follows. We describe
and analyze the packing grid data structure in Section 2. We
give some preliminaries in Section 2.1, describe the layout of
the data structure in Section 2.2, and describe and analyze
the supported queries and updates in Section 2.3. In Sec-
tion 3 we describe h ow to use packing grids for maintaining
the surface of a molecule decomposed into rigid domains,
and in Section 4 we describe hierarchical packing grids for
maintaining mixed resolution surfaces. In Section 5 we de-
scribe some applications of packing grids. Our ex perimental
results are included in Section 6.
2. THE DYNAMIC PACKING GRID DATA
STRUCTURE
We describe the packing grid data structure for maintain-
ing a set M of balls in 3-space efficiently u nder the following
set of q ueries and updates. By B = (c, r) we denote a ball
with center c and radius r.
Queries.
1. Intersect( c, r ) : Return all balls in M that intersect
the given ball B = (c, r). The given ball may or may
not belong to the set M.
2. Range ( p, δ ): Return all balls in M with centers
within distance δ of point p. We assume that δ is at
most a constant multiple of the radius of the largest
ball in M.
3. Exposed( c, r ): Return s true if the ball B = (c, r)
contributes to the outer bound ary of the union of the
balls in M . The given ball must belong to M.
4. Surface( ): Returns the outer boundary of the union
of the balls in M. If there are multiple disjoint outer
boundary surfaces defined by M , the routine return s
any one of t hem.
Updates.
1. Add( c, r ): Add a new ball B = (c, r) to th e set M.
2. Remove( c, r ): Remove the ball B = (c, r) from M.
3. Move( c
1
, c
2
, r ): Move the ball with center c
1
and
radius r to a new center c
2
.
We assume that at all times during the lifetime of the data
structure the following holds.
Assumption 2.1. If r
max
is the radius of the largest ball
in M, and d
min
is the minimum Euclidean distance between
the centers of any two balls in M , then r
max
= O ( d
min
).
In general, a ball in a collection of n balls in 3-space can
intersect Θ (n) other balls in the worst case, and it has been
shown in [11] that the boundary defined by t he union of these
balls has a worst-case combinatorial complexity of Θ
`
n
2
´
.

Time Complexity
Operations
Assuming
t
q
= O ( log log w),
t
u
= O ( log w)
Assuming
t
q
= O ( log log n),
t
u
= O
log n
log log n
Range( p, δ ) | Intersect( c, r ) | Exposed( c, r )
(δ = O (r
max
))
O (log log w) (w.h.p.) O (log log n) (w.h.p.)
Surface( )
O (#balls on surface) (worst-case)
Add( c, r ) | Remove( c, r ) | Move( c
1
, c
2
, r )
O (log w) (w.h.p.)
O
log n
log log n
(w.h.p.)
Assumptions: (i) RAM with w-bit Words, (ii) Collection of n Balls,
and (iii) r
max
= O ( minimum distance between two balls)
Table 1: Time complex ities of the operations supported by the packing grid data structure.
However, if M is a “union of balls” representation of th e
atoms in a molecule, then assumption 2.1 holds naturally
[24, 51], and as proved in [24], in that case, both complexities
improve by a factor of n. The following theorem states the
consequences of the assumption.
Theorem 2.1. (Theorem 2.1 in [24], slightly m odified)
Let M = {B
1
, . . . , B
n
} be a collection of n balls in 3-space
with radii r
1
, . . . , r
n
and centers at c
1
, . . . , c
n
. Let r
max
=
max
i
{r
i
} and let d
min
= min
i,j
{d(c
i
, c
j
)}, where d(c
i
, c
j
)
is the Euclidean distance between c
i
and c
j
. Also let δM =
{δB
1
, . . . , δB
n
} be the collection of spheres such that δB
i
is
the boundary surface of B
i
. If r
max
= O (d
min
) (i.e., As-
sumption 2.1 holds), then:
(i) Each B
i
M intersects at most 216 · (r
max
/d
min
)
3
=
O (1) other balls in M.
(ii) The maximum combinatorial complexity of the bound-
ary of the union of the balls in M is O
`
(r
max
/d
min
)
3
· n
´
= O ( n).
Proof. Similar to the proof of Theorem 2.1 in [24].
Therefore, as Theorem 2.1 suggests, for intersection queries
and boundary construction, one should be able to handle M
more efficiently if assumption 2.1 holds. The efficiency of
our data structure, too, partly depends on this assumption.
2.1 Preliminaries
Before we describe our data structure we present several
definitions in order to simplify the exposition.
Definition 2.1 (r-grid and grid-cell). An r-grid is
an axis-parallel infinite grid structure in 3-space consisting
of cells of size r×r ×r (r R) with the root (i.e., the corner
with the smallest x, y, z coordinates) of one of the cells co-
inciding with origin of the (Cartesian) coordinate axes. The
grid cell that has its root at Cartesian coordinates (ar, br, cr)
(where a, b, c Z) is referred to as the (a, b, c, r)-cell or sim-
ply as the (a, b, c)-cell when r is clear from the context.
Definition 2.2 (grid-line). The (b, c, r)- line (where
b, c Z) in an r-grid consists of all (x, y, z, r)-cells with y
and z fixed to b and c, respectively. W hen r is clear from the
context the (b, c, r) -line will simply be called the (b, c)-line.
Observe that each cell on the (b, c, r)-line can be identified
with a unique integer, e.g., the cell at index a Z on the
given line corresponds to the (a, b, c, r)- cell in the r-grid.
Definition 2.3 (grid-plane). The (c, r)-plane (where
c Z) in an r-grid consists of all (x, y, z, r)-cells with z fixed
to c. The (c, r)-plane will be referred to as the c-plane when
r is clear from the context.
The (c, r)-plane can be decomposed into an infi nite number
of lines each identifiable with a unique integer. For example,
index b Z uniquely identifies the (b, c, r)-line on t he given
plane. Also each grid-plane in the r-grid can be identified
with a unique integer, e.g., the (c, r)-plane is identified by c.
The proof of the following lemma is straight-forward.
Lemma 2.1. Let M = {B
1
, . . . , B
n
} be a collection of n
balls in 3-space with radii r
1
, . . . , r
n
and centers at c
1
, . . . , c
n
.
Let r
max
= max
i
{r
i
} and let d
min
= min
i,j
{d(c
i
, c
j
)},
where d(c
i
, c
j
) is the Euclidean distance between c
i
and c
j
.
Suppose M is stored in the 2r
max
-grid G. Then
(i) If r
max
= O (d
min
) (i.e., Assumption 2.1 holds) then
each grid-cell in G contains the centers of at most 64 ·
(r
max
/d
min
)
3
= O (1) balls in M.
(ii) Each ball in M intersects at most 8 grid-cells in G.
(iii) For a given ball B M with center in grid-cell C, the
center of each ball intersecting B lies either in C or in
one of the 26 grid-cells adjacent to C.
(iv) The number of non-empty (i.e., containing the center
of at least one ball in M) grid-cells in G is at most
n, and the same bound holds for grid-lines and grid-
planes.
At the h eart of our data structure is a fully dynamic one
dimensional integer range reporting d ata structure for word
RAM described in [37]. The data structure in [37] main-
tains a set S of integers under updates (i.e., insertions and
deletions), and answers queries of the form: report any or
all points in S in a given interval. The following theorem
summarizes the performance bounds of the data structure
which are of interest to us.
Theorem 2.2. (proved in [37]) On a RAM with w-bit
words the fully dynamic one dimensional integer range re-
porting problem can be solved in linear space, and with high
probability bounds of O (t
u
) and O (t
q
+ k) on update time
and query time, respectively, where k is the number of items
reported, and
(i) t
u
= O (log w) and t
q
= O (log log w) using the data
structure in [37]; and

(ii) t
u
= O (log n/log log n) and t
q
= O (log log n) using
the data structure in [37] for small w and a fusion tree
[21] for large w.
The data structure can be augmented to store satellite in-
formation of size O (1) with each integer without degrading
its asymptotic performance bounds. Therefore, it supports
the following three functions:
1. Insert( i, s ): Insert an integer i with satellite infor-
mation s.
2. Delete( i ): Delete integer i from the data structure.
3. Query( l, h ): Return the set of all h i, s i tuples
with i [l, h] stored in the data structure.
2.2 Description (Layout) of the Packing Grid
Data Structure
We are now in a position to present our data structure. Let
DPG be th e d ata structure. We represent the entire 3-space
as a 2r
max
-grid (see Definition 2.1), and maintain the non-
empty grid-p lanes (see Definition 2.3), grid-lines (see Defi-
nition 2.2) and grid-cells (see Definition 2.1) in DPG. A grid
component (i.e., cell, line or plane) is non-empty if it con-
tains the center of at least one ball in M. The data structure
can be describ ed hierarchically. It has a tree structure with
5 levels: 4 internal levels (levels 3, 2, 1 and 0) and an ex-
ternal level of leaves (see Figure 3). The description of each
level follows.
The Leaf Level “Ball” Data Structure (DPG
1
). The
data structure stores the center c = (c
x
, c
y
, c
z
) and the ra-
dius r of the given ball B. It also includes a Boolean flag
exposed which is set to true if B contributes to the outer
boundary of th e union of t he balls in M, and false other-
wise. If another ball B
intersects B, it does so on a circle
which divides the boundary δB of B into two parts: one
part is buried inside B
and hence cannot contribute to the
union boundary, and the other part is exposed w.r.t. B
and hence might appear on the union boundary. The cir-
cular intersections of all balls intersecting B define a 2D
arrangement A on δB which according to Theorem 2.1 has
O (1) combinatorial complexity. A face of A is exposed, i.e.,
contributes to the union boundary, provided it is not buried
inside any other ball. Observe that if at least one other ball
intersects B, and A has an exposed face f, then each edge
of f separates f from another exposed face f
which belongs
to the arrangement A
of a ball intersecting B. We store all
exposed faces (if any) of A in a set F of size O ( 1), and with
each face f we store pointers to the data structures of O (1)
other balls that share edges with f and also the identifier
of the corresponding face on each ball. Observe that if B
does not intersect any other balls then F will contain only
a single face and no pointers to any oth er balls.
The Level 0 “Grid-Cell” Data Structure (DPG
0
). The
“grid-cell” data structure stores the root (see Definition 2.1)
(a, b, c) of the grid-cell it corresponds to. A grid-cell can
contain the centers of at most O (1) balls in M (see Lemma
2.1). Pointers to data structures of all such balls are stored
in a set S of size O (1). Since we create “grid-cell” data
structures only for non-empty grid-cells, there will be at
most n (and possibly n) such data structures, where n is
the current number of balls in M.
Figure 3: Hierarchical structure of DPG.
The Level 1 “Grid-Line” Data Structure (DPG
1
). We
create a “grid-line” data structure for a (b, c)-line provided
it contains at least one non-empty grid-cell. The data struc-
ture stores the values of b and c. Each (a, b, c)-cell lying
on this line is identified with the uniqu e integer a, and the
identifier of each such non-empty grid-cell is stored in an
integer range search data structure RR as described in Sec-
tion 2.1 (see Theorem 2.2). We augment RR to store the
pointer to the corresponding “grid-cell data stru cture with
each identifier it stores. The total number of “grid-line” data
structure created is upper bounded by n and possibly much
less than n.
The Level 2 “Grid-Plane” Data Structure (DPG
2
).
A “grid-plane” data structure is created for a c-plane pro-
vided it contains at least one non-empty grid-line. Similar
to the “grid-line” data structure it identifies each non-empty
(b, c)-line lying on the c-p lane with the unique integer b, and
stores th e identifiers in a range reporting data structure RR
described in Section 2.1. A pointer to the corresponding
“grid-line” data structure is also stored with each identifier.
The data stru cture also stores c. The total number of “grid-
plane” data structures created cannot exceed n, and will
possibly be much less than n.
The Level 3 “Grid” Data Structure (DPG
3
). This data
structure maintains the non-empty grid-planes of the 2r
max
-
grid in an integer range reporting data structure RR (see
Section 2.1). Each c-plane is identified by the unique integer
c, and each such integer stored in R R is also accompanied by
a pointer to the corresponding “grid-plane” data structure.
The “grid” data structure also stores a surface-root pointer
which points to the “Ball” d ata structure of an arbitrary ex-
posed ball in M.
We have the following lemma on the space usage of the data
structure.
Lemma 2.2. Let M be a collection of n balls as defined
in Theorem 2.1, and let Assumption 2.1 holds. Then the
packing grid data structure storing M uses O (n) space.
Proof. The space usage of the data structure is domi-
nated by the space used by t he range reporting data struc-
tures, the grid-cells and the “ball” data structures. Since the
range reporting data structures use linear space (see Theo-
rem 2.2) and total number of non-empty grid components
(i.e., planes, lines and cells) is O (n) (see Lemma 2.1), total
space used by all such data structures is O (n). The grid cells

Figures
Citations
More filters
Journal ArticleDOI

Protein-Protein Docking with F2Dock 2.0 and GB-Rerank

TL;DR: The new F2 Dock protocol improves the state of the art in initial stage rigid body exhaustive docking search, scoring and ranking by introducing improvements in the shape-complementarity and electrostatics affinity functions, a new knowledge- based interface propensity term with FFT formulation, a set of novel knowledge-based filters and finally a solvation energy (GBSA) based reranking technique.
Journal ArticleDOI

Wetting Effects in Hair Simulation

TL;DR: This paper introduces a simulation model that reproduces interactions between water and hair as a dynamic anisotropic permeable material and utilizes an Eulerian approach for capturing the microscopic porosity of hair and handles the wetting effects using a Cartesian bounding grid.
Journal ArticleDOI

A dynamic data structure for flexible molecular maintenance and informatics

TL;DR: The 'Dynamic Packing Grid' (DPG), a neighborhood data structure for maintaining and manipulating flexible molecules and assemblies, is presented for efficient computation of binding affinities in drug design or in molecular dynamics calculations.
Journal ArticleDOI

GPU Accelerated Finding of Channels and Tunnels for a Protein Molecule

TL;DR: A novel method for computing the cavities and channels/tunnels in a protein molecule in interactive time without significant user effort is proposed and the shortest path from a user selected or automatically chosen cavity to the exterior of the protein molecule is generated.
Posted Content

Stable Mesh Decimation

TL;DR: This paper presents a methodology to elucidate the geometric sensitivity of functionals via two major functional discretization techniques: Galerkin finite element and discrete exterior calculus.
References
More filters
Journal ArticleDOI

VMD: Visual molecular dynamics

TL;DR: VMD is a molecular graphics program designed for the display and analysis of molecular assemblies, in particular biopolymers such as proteins and nucleic acids, which can simultaneously display any number of structures using a wide variety of rendering styles and coloring methods.
Journal ArticleDOI

Scalable molecular dynamics with NAMD

TL;DR: NAMD as discussed by the authors is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems that scales to hundreds of processors on high-end parallel platforms, as well as tens of processors in low-cost commodity clusters, and also runs on individual desktop and laptop computers.
Journal ArticleDOI

The Amber biomolecular simulation programs

TL;DR: The development, current features, and some directions for future development of the Amber package of computer programs, which contains a group of programs embodying a number of powerful tools of modern computational chemistry, focused on molecular dynamics and free energy calculations of proteins, nucleic acids, and carbohydrates.
Book

Computer simulation using particles

TL;DR: In this paper, a simulation program for particle-mesh force calculation is presented, based on a one-dimensional plasma model and a collisionless particle model, which is used to simulate collisionless particle models.
Related Papers (5)
Frequently Asked Questions (19)
Q1. What have the authors contributed in "A dynamic data structure for flexible molecular maintenance and informatics" ?

The authors present the “ Dynamic Packing Grid ” ( DPG ) data structure along with details of their implementation and performance results, for maintaining and manipulating flexible molecular models and assemblies. DPG ’ s queries include the reporting of all atoms within O ( rmax ) distance from any given atom center or point in 3space in O ( log log w ) ( = O ( 1 ) ) time w. h. p., where rmax is the radius of the largest atom in the molecule. 

Packing grids can be used to maintain the surface of a flexible molecule decomposed into rigid domains so that applying a bending/shearing/twisting motion between two domains takes O (1 + m log w) time (w.h.p.), where m is the number of atoms in the connectors between the two domains. 

Now in order to create a mixed resolution surface of the given molecule M , the authors start at coarse resolution, say at some level j > 0, and copy DPG(i) to an initially empty packing grid DPG with the same parameters. 

In order to detect the intersections among concave patches, the authors maintain the centers of all current concave patches in DPG’, and use the Intersect query to find the concave patch (if any) that intersects a given concave patch. 

The DPG data structure outputs the SAS as a set of spherical (convex and concave) and toroidal patches, and the authors add up the area of each patch in order to calculate ΩSAS. 

For virus capsids as multiple chains areinserted, not only the number of atoms increases but also the overall structure becomes sparser. 

The SAS of the molecule can be extracted in O ( em log w) (w.h.p.) time and O ( em) space using a DPG data structure, where em is the number of atoms in the molecule. 

The authors store all exposed faces (if any) of A in a set F of size O (1), and with each face f the authors store pointers to the data structures of O (1) other balls that share edges with f and also the identifier of the corresponding face on each ball. 

The authors store all exposed faces (if any) of A in a set F of size O (1), and with each face f the authors store pointers to the data structures of O (1) other balls that share edges with f and also the identifier of the corresponding face on each ball. 

The surface of a flexible molecule decomposed into (mostly) rigid domains can be maintained using packing grid data structures so that(i) updating for a bending/shearing/twisting motion applied between two domains takes O (1 + m log w) time (w.h.p.), where m is the number of atoms in the connectors between the two domains;(ii) updating the conformation of a flexible loop or a sidechain on the surface of a domain takes O ( em log w) time (w.h.p.), where em is the number of atoms affected by this change; and(iii) generating the surface of the entire molecule requires O ( bm log w) time (w.h.p.), where bm is the sum of the number of atoms on the surface of each domain. 

The surface of a flexible molecule decomposed into (mostly) rigid domains can be maintained using packing grid data structures so that(i) updating for a bending/shearing/twisting motion applied between two domains takes O (1 + m log w) time (w.h.p.), where m is the number of atoms in the connectors between the two domains;(ii) updating the conformation of a flexible loop or a sidechain on the surface of a domain takes O ( em log w) time (w.h.p.), where em is the number of atoms affected by this change; and(iii) generating the surface of the entire molecule requires O ( bm log w) time (w.h.p.), where bm is the sum of the number of atoms on the surface of each domain. 

The authors compute the SES of the molecule in O ( em log w) time (w.h.p.) and O ( em) space using a DPG data structure D, and then use the method in [9] in order to choose the integration points and weights in O (N) time. 

A packing grid can maintain both the van der Waals surface and the solvent contact surface (SCS) of a molecule within the performance bounds mentioned above. 

Protein coarse grained (CG) models which represent clusters of atoms with similar physical properties by CG beads and simplify the interactions significantly reduce the size of the system and therefore become a promising approach to reproduce large-scale protein motions. 

Thus generating the surface of the entire molecule requires O ( bm log w) time (w.h.p.), where bm is the sum of the number of atoms on the surface of each domain. 

Observe that the introduction of a new ball may affect the surface exposure of only the balls it intersects (i.e., bury some/all of them partly or completely), and no other balls. 

Identifying Intersecting Balls: From S the authors remove the data structure of each ball that does not intersect B, and return the resulting (possibly reduced) set. 

In addition to the molecules used in the experiments of [19, 20], the authors ran their experiments on some viruses and ribosomes the authors are interested in. 

In addition to the molecules used in the experiments of [19, 20], the authors ran their experiments on some viruses and ribosomes the authors are interested in.