What have the authors contributed in "A dynamic data structure for flexible molecular maintenance and informatics" ?

The authors present the “ Dynamic Packing Grid ” ( DPG ) data structure along with details of their implementation and performance results, for maintaining and manipulating flexible molecular models and assemblies. DPG ’ s queries include the reporting of all atoms within O ( rmax ) distance from any given atom center or point in 3space in O ( log log w ) ( = O ( 1 ) ) time w. h. p., where rmax is the radius of the largest atom in the molecule.

How do the authors create a mixed resolution surface of a given molecule?

Now in order to create a mixed resolution surface of the given molecule M , the authors start at coarse resolution, say at some level j > 0, and copy DPG(i) to an initially empty packing grid DPG with the same parameters.

How do the authors find the intersections between concave patches?

In order to detect the intersections among concave patches, the authors maintain the centers of all current concave patches in DPG’, and use the Intersect query to find the concave patch (if any) that intersects a given concave patch.

How does the DPG data structure calculate the SAS of the molecule?

The DPG data structure outputs the SAS as a set of spherical (convex and concave) and toroidal patches, and the authors add up the area of each patch in order to calculate ΩSAS.

What is the effect of multiple chains on the structure of the virus?

For virus capsids as multiple chains areinserted, not only the number of atoms increases but also the overall structure becomes sparser.

How can the authors extract the SAS of the molecule?

The SAS of the molecule can be extracted in O ( em log w) (w.h.p.) time and O ( em) space using a DPG data structure, where em is the number of atoms in the molecule.

What is the identifier of the corresponding face on each ball?

The authors store all exposed faces (if any) of A in a set F of size O (1), and with each face f the authors store pointers to the data structures of O (1) other balls that share edges with f and also the identifier of the corresponding face on each ball.

What is the identifier of the corresponding face on each ball?

The authors store all exposed faces (if any) of A in a set F of size O (1), and with each face f the authors store pointers to the data structures of O (1) other balls that share edges with f and also the identifier of the corresponding face on each ball.

What is the simplest way to maintain the surface of a flexible molecule?

The surface of a flexible molecule decomposed into (mostly) rigid domains can be maintained using packing grid data structures so that(i) updating for a bending/shearing/twisting motion applied between two domains takes O (1 + m log w) time (w.h.p.), where m is the number of atoms in the connectors between the two domains;(ii) updating the conformation of a flexible loop or a sidechain on the surface of a domain takes O ( em log w) time (w.h.p.), where em is the number of atoms affected by this change; and(iii) generating the surface of the entire molecule requires O ( bm log w) time (w.h.p.), where bm is the sum of the number of atoms on the surface of each domain.

What is the simplest way to maintain the surface of a flexible molecule?

The surface of a flexible molecule decomposed into (mostly) rigid domains can be maintained using packing grid data structures so that(i) updating for a bending/shearing/twisting motion applied between two domains takes O (1 + m log w) time (w.h.p.), where m is the number of atoms in the connectors between the two domains;(ii) updating the conformation of a flexible loop or a sidechain on the surface of a domain takes O ( em log w) time (w.h.p.), where em is the number of atoms affected by this change; and(iii) generating the surface of the entire molecule requires O ( bm log w) time (w.h.p.), where bm is the sum of the number of atoms on the surface of each domain.

How do the authors compute the SES of the molecule?

The authors compute the SES of the molecule in O ( em log w) time (w.h.p.) and O ( em) space using a DPG data structure D, and then use the method in [9] in order to choose the integration points and weights in O (N) time.

What is the way to simulate protein motions?

Protein coarse grained (CG) models which represent clusters of atoms with similar physical properties by CG beads and simplify the interactions significantly reduce the size of the system and therefore become a promising approach to reproduce large-scale protein motions.

What is the simplest way to generate the surface of a molecule?

Thus generating the surface of the entire molecule requires O ( bm log w) time (w.h.p.), where bm is the sum of the number of atoms on the surface of each domain.

What is the effect of the introduction of a new ball on the surface exposure of the set?

Observe that the introduction of a new ball may affect the surface exposure of only the balls it intersects (i.e., bury some/all of them partly or completely), and no other balls.

What is the function that is used to identify the two balls that can not intersect B?

Identifying Intersecting Balls: From S the authors remove the data structure of each ball that does not intersect B, and return the resulting (possibly reduced) set.

What are the other molecules used in the experiments?

In addition to the molecules used in the experiments of [19, 20], the authors ran their experiments on some viruses and ribosomes the authors are interested in.

What are the other molecules used in the experiments?

In addition to the molecules used in the experiments of [19, 20], the authors ran their experiments on some viruses and ribosomes the authors are interested in.

(Open Access) A dynamic data structure for flexible molecular maintenance and informatics (2009) | Chandrajit L. Bajaj

Q: What is the way to maintain the surface of a molecule?

Packing grids can be used to maintain the surface of a flexible molecule decomposed into rigid domains so that applying a bending/shearing/twisting motion between two domains takes O (1 + m log w) time (w.h.p.), where m is the number of atoms in the connectors between the two domains.

A Dynamic Data Structure for Flexible Molecular

Maintenance and Informatics

∗

Chandrajit Bajaj

Institute for Computational

Engineering and Science

University of Texas

Austin, TX 78712

bajaj@cs.utexas.edu

Rezaul Alam Chowdhury

Institute for Computational

Engineering and Science

University of Texas

Austin, TX 78712

shaikat@cs.utexas.edu

Muhibur Rasheed

Institute for Computational

Engineering and Science

University of Texas

Austin, TX 78712

muhibur@cs.utexas.edu

ABSTRACT

We present the “Dynamic Packing Grid” (DPG) data struc-

ture along with details of our implementation and perfor-

mance results, for maintaining and manipulating ﬂexible

molecular models and assemblies. DPG can eﬃciently main-

tain the molecular surface (e.g., van der Waals surface and

the solvent contact surface) under insertion/deletion/ move-

ment (i.e., updates) of atoms or groups of atoms. DPG also

permits the fast estimation of important molecular prop-

erties (e.g., surface area, volume, polarization energy, etc.)

that are needed for computing binding aﬃnities in drug de-

sign or in molecular dynamics calculations. DPG can addi-

tionally be utilized in eﬃciently maintaining multiple “rigid”

domains of dynamic ﬂexible molecules. In DPG, each up-

date takes only O (log w) time w.h.p. on a RAM with w-bit

words i.e., O (1) time in practice, and hence is extremely

fast. DPG’s queries include the reporting of all atoms within

O (r

max

) distance from any given atom center or point in 3-

space in O (log log w) (= O (1)) time w.h.p., where r

max

the radius of the largest atom in the molecule. It can also

answer whether a given atom is exposed or buried under

the surface within the same time bound, and can return the

entire molecular surface in O (m) worst-case time, where m

is the number of atoms on the surface. The data structure

uses space linear in the number of atoms in the molecule.

Categories and Subject Descriptors

I.3.5 [Computer Graphics]: Computational Geometry and

Object Modeling—boundary representations; curve, surface,

solid, and object representations; geometric algorithms, lan-

guages, and systems; physically based modeling; F.2.2 [Analysis

of Algorithms and Problem Complexity]: Nonnumeri-

cal Algorithms and Problems—computations on discrete struc-

∗

This research was supp orted in part by NSF grant CNS-

0540033 and NIH contracts R01-EB00487, R01-GM074258,

R01-GM07308.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

SIAM/ACM Joint Conference on Geometric and Physical Modeling 2009

San Francisco, California USA

tures; geometrical problems and computations; J.6 [Computer-

Aided Engineering]: Computer-aided design (CAD)

General Terms

Algorithms, Design, Performance

Keywords

shape modeling, de novo drug design, computer aided de-

sign, interactive software, protein folding, molecular docking

1. INTRODUCTION

Many human functional processes are mediated through

the interactions amongst proteins, a major molecular con-

stituent of our anatomical makeup. A computational under-

standing of th ese interactions provides important clues for

developing therapeutic interventions related to diseases such

as cancer and metabolic disorders. Computational meth-

ods such as automated docking through shape and energetic

complementarity scoring, aim to gain insight and predict

such molecular interactions.

The most common model for proteins is a collection of

atoms represented by spherical balls, with radii equal to

their van der Waals radii [35, 16]. The surface of the union of

these spheres is known as the van der Waals surface. Lee and

Richards introduced the concept of accessibility to the sol-

vent [31]. Proteins are not isolated, but commonly present

in solutions, esp ecially water. Also, the van der Waals sur-

face contains too many internal atoms and patches which

are not accessible by the solvent or any other protein that

may bind to it. Hence, Lee and Richards gave a new deﬁ-

nition for the protein surface or protein-solvent interface as

the surface accessible to the watery solvent. They modeled

water molecules as spheres with radius 1.4

A, and considered

the locus of the center of one such ‘probe’, as it rolled along

the protein surface as the Solvent Accessible Surface (SAS).

Richards then gave a more commonly used deﬁnition for

molecular surface as a set of contact and reentrant patches

[42]. Though Connolly considered this an alternative deﬁni-

tion of the SAS surface in [13], now it is commonly known

as the Solvent Contact Surface (SCS), or Solvent Excluded

Surface (SES) or simply the molecular surface/interface of

the protein.

Protein interactions or protein-protein docking involves

induced complementary ﬁt between ﬂexible protein inter-

faces and additionally the interface conformational changes

are often critical during the lock and key matching [43].

Figure 1: Visualization of the Rice Dwarf Virus (RDV) nucleo-capsid contains 3.5 mil lion atoms (left) while

Microtubule contains 1.2 million (right), using TexMol (http://cvcweb.ices.utexas.edu/software/#TexMol).

In this ﬁgure, atoms are color-coded using the standard Corey, Pauling, Koltun (CPK) color scheme.

The ﬂexible docking solution space consisting of all relative

positions, orientations and conformations of the proteins,

is searched, and the putative dockings are evaluated us-

ing combinations of interface complementarity scoring, and

atomic pair-wise charged Coulombic interactions [27]. Since

proteins function in their predominantly watery (solvent)

environment, the computation of protein solvation energy

(or known as protein - solvent interaction energy) also plays

an important role in determining inter-molecular binding

aﬃnities “in-vivo” for drug screening, as well as in molecular

dynamics simulations [52], and in the study of hydropho-

bicity and protein folding. When computing the solvation

energy for molecules, it is crucial to correctly model and

sample t he protein - solvent interface.

Since Richards introduced the SES deﬁnition, a number

of techniques have been devised for static construction of

the molecular surface (e.g., [12, 13, 53, 17, 50, 3, 45, 44, 55,

23, 7, 6]). However, not much work has been done on dy-

namic maintenance of molecular surfaces. In [8] Bajaj et al.

considered limited dynamic maintenance of molecular sur-

faces based on Non Uniform Rational BSplines ( NURBS )

descriptions for the patches. Eyal and Halperin [19, 20] pre-

sented an algorithm based on dynamic graph connectivity

that updates the molecular surface after a conformational

change in O

log

amortized time per aﬀected (by this

change) atom.

In this paper we present the Dynamic Packing Grid (DPG)

– a space and time eﬃcient data structure that maintains

a collection of balls (atoms) in 3-space allowing a range of

spherical range queries and up dates for rapid scoring of ﬂex-

ible protein-p rotein interactions. The eﬃciency of the data

structure results from the assumption t hat the centers of two

diﬀerent balls in the collection cannot come arbitrarily close

to each other, which is a natural property of molecules. A

consequence of this assumption is that any ball in the collec-

tion can intersect at most a constant number of other balls.

On a RAM with w-bit words, the data structure can re-

port all balls intersecting a given ball or within O (r

max

)

distance from a given point in O (log log w) time w.h.p.,

where r

max

is the radius of the largest ball in the collec-

tion. It can also answer whether a given ball is exposed

(i.e., lies on the union boundary) or buried within the same

time bound. At any time the entire un ion boundary can

be extracted from the data structure in O (m) time in the

worst-case, where m is the numb er of atoms on the bound-

ary. Updates (i.e., insertion/deletion/movement of a ball)

are supported in O (log w) time (w.h.p.). The data struc-

ture uses linear space. A packing grid can maintain both

the van der Waals surface and the solvent contact surface

(SCS) of a molecule within the performance bounds men-

tioned above. Packing grids can be used to maintain the sur-

face of a ﬂexible molecule decomposed into rigid domains so

that applying a bending/shearing/twisting motion between

two domains takes O (1 +

m log w) time (w.h.p.), where m is

the number of atoms in the connectors between the two do-

mains. We also describe a Hierarchical Packing Grid (HPG)

data structure that maintains a molecule at multiple resolu-

tions (atomic and coarser) under up dates, and can compute

any mixed resolution surface eﬃciently. Packing grids can

also aid in fast energetics calculation by rapidly locating the

atoms close to each sampled quadrature point on the SCS.

DPG has potential applications in interactive software tools

developed for de novo drug design (e.g., [30, 46, 18, 29]),

protein folding (e.g., [28, 14]) and molecular docking (e.g.,

[33, 2]) that u se human intuition and biological knowledge

in order to steer the prediction process. These applica-

tions often need to handle extremely large molecules and

macromolecules (e.g., as shown in Figure 1 Rice Dwarf Virus

with 3.5 million atoms, and Microtubule has 1.2 million),

and n eed to perform a sequ ence of dynamic updates on

them in real time. The Mol ecule Evaluator [30, 18] is a

de novo molecular design software based on adaptive inter-

active evolution. In a series of interactive steps it applies

a set of problem-speciﬁc mutation (e.g., add/remove atom,

add/remove group) and recombination operators on a set

of evolving molecules, and keeps track of several chemical

and biological properties of each molecule (e.g., molecular

mass, hydrophobicity, etc.). The ProteinShop software [28,

14] allows the interactive creation of protein structures (e.g.,

through shape manipulation) given an amino acid sequence

and a sequence of pred icted secondary structure types for

each amino acid. DockingShop [33] is a successor of Pro-

teinShop, which provides an interactive docking environ-

ment with ﬂexibility of side chains and backbone movement.

Users can adjust the receptor protein structure by rotating

the backbone dihedral angles, changing the dihedral angles

of selected residues, substituting the side chain of selected

residues using a rotamer library, or changing a residue for

another while keeping the backbone ﬁ xed. Figure 2 shows an

example where t he ﬂexible movement/rearrangement of the

(a)

(b)

Figure 2: Figures (a) and (b) show the structure of

a soluble fragment of the envelope (E) Glycoprotein

from DV (dengue virus) type 2. Figure (a) shows

the crystals grown in the presence (pre-fusion) of

the detergent n-octyl-β -D-glucoside (β-OG, colored

in green), and Figure (b) shows the same in its

absence (post-fusion). The key diﬀerence between

these two structures is a local rearrangement of the

“kl” β-hairpin (residues 268-280) and the concomi-

tant opening up of a hydrophobic pocket for ligand

binding. In Figure (a) this pocket is occupied by a

molecule of β-OG [36].

“kl” β-hairpin on the envelope (E) Glycoprotein of dengue

virus opens up a hydrophobic pocket for ligand binding, and

the inhibitor n-octyl-β-D-glucoside docks into that pocket.

VRDD [2] supp orts molecular visualization and interactive

docking in a VR environment, and allows side-chain ﬂexibil-

ity.

The molecular dynamic simulation tool IMD [49] allows

interactive manipulation of bio-molecular sy stems. It com-

bines interactive molecular visualization (using VMD [26])

with molecular dynamic simulation (using NAMD [38, 41])

in the background that supports manipulation of molecules

by applying force t o single atoms. Traditional all-atom molec-

ular dynamics (MD) simulation reveals in detail the protein

folding process, but it is restricted to small time scales on

the order of nanosecond [47] and small length range on the

order of nanometer [32, 34]. To fully investigate the folding

process of a protein into its functional structure, a larger

timescale from micro- to millisecond and larger length scale

of micrometer are needed [4]. Protein coarse grained (CG)

models which represent clusters of atoms with similar phys-

ical properties by CG beads and simplify the interactions

signiﬁcantly reduce th e size of the system and therefore be-

come a promising approach to repro duce large-scale protein

motions.

The DPG data structure also h as potential applications in

tracking the dynamic structure of a particle system as parti-

cles move, appear and disappear [5, 22, 25]. Particle systems

are used for modeling a number of physical world scenarios

ranging from cosmological systems and plasma physics to

molecular systems, where particles are deﬁned as smooth

functions with compact support. The applications are wide

and varied and include chemistry, material science, and bio-

engineering. The dynamic re- meshing problem for time de-

pendent particle systems arise in gas hydrodynamics simula-

tions essential in the computational investigation of the for-

mation of large scale stru ctures, such as galaxies and galaxy

clusters, in the universe [25]. For the meshing of particle

systems, it suﬃces to consider particles as idealized balls, or

radially symmetric domains of support of their kernels.

The rest of the paper is organized as follows. We describe

and analyze the packing grid data structure in Section 2. We

give some preliminaries in Section 2.1, describe the layout of

the data structure in Section 2.2, and describe and analyze

the supported queries and updates in Section 2.3. In Sec-

tion 3 we describe h ow to use packing grids for maintaining

the surface of a molecule decomposed into rigid domains,

and in Section 4 we describe hierarchical packing grids for

maintaining mixed resolution surfaces. In Section 5 we de-

scribe some applications of packing grids. Our ex perimental

results are included in Section 6.

2. THE DYNAMIC PACKING GRID DATA

STRUCTURE

We describe the packing grid data structure for maintain-

ing a set M of balls in 3-space eﬃciently u nder the following

set of q ueries and updates. By B = (c, r) we denote a ball

with center c and radius r.

Queries.

1. Intersect( c, r ) : Return all balls in M that intersect

the given ball B = (c, r). The given ball may or may

not belong to the set M.

2. Range ( p, δ ): Return all balls in M with centers

within distance δ of point p. We assume that δ is at

most a constant multiple of the radius of the largest

ball in M.

3. Exposed( c, r ): Return s true if the ball B = (c, r)

contributes to the outer bound ary of the union of the

balls in M . The given ball must belong to M.

4. Surface( ): Returns the outer boundary of the union

of the balls in M. If there are multiple disjoint outer

boundary surfaces deﬁned by M , the routine return s

any one of t hem.

Updates.

1. Add( c, r ): Add a new ball B = (c, r) to th e set M.

2. Remove( c, r ): Remove the ball B = (c, r) from M.

3. Move( c

, c

, r ): Move the ball with center c

and

radius r to a new center c

We assume that at all times during the lifetime of the data

structure the following holds.

Assumption 2.1. If r

max

is the radius of the largest ball

in M, and d

min

is the minimum Euclidean distance between

the centers of any two balls in M , then r

max

= O ( d

min

In general, a ball in a collection of n balls in 3-space can

intersect Θ (n) other balls in the worst case, and it has been

shown in [11] that the boundary deﬁned by t he union of these

balls has a worst-case combinatorial complexity of Θ

Time Complexity

Operations

Assuming

= O ( log log w),

= O ( log w)

Assuming

= O ( log log n),

= O

“

log n

log log n

”

Range( p, δ ) | Intersect( c, r ) | Exposed( c, r )

(δ = O (r

max

))

O (log log w) (w.h.p.) O (log log n) (w.h.p.)

Surface( )

O (#balls on surface) (worst-case)

Add( c, r ) | Remove( c, r ) | Move( c

, c

, r )

O (log w) (w.h.p.)

“

log n

log log n

”

(w.h.p.)

Assumptions: (i) RAM with w-bit Words, (ii) Collection of n Balls,

and (iii) r

max

= O ( minimum distance between two balls)

Table 1: Time complex ities of the operations supported by the packing grid data structure.

However, if M is a “union of balls” representation of th e

atoms in a molecule, then assumption 2.1 holds naturally

[24, 51], and as proved in [24], in that case, both complexities

improve by a factor of n. The following theorem states the

consequences of the assumption.

Theorem 2.1. (Theorem 2.1 in [24], slightly m odiﬁed)

Let M = {B

, . . . , B

} be a collection of n balls in 3-space

with radii r

, . . . , r

and centers at c

, . . . , c

. Let r

max

} and let d

min

= min

i,j

{d(c

, c

)}, where d(c

, c

)

is the Euclidean distance between c

and c

. Also let δM =

{δB

, . . . , δB

} be the collection of spheres such that δB

the boundary surface of B

. If r

max

= O (d

min

) (i.e., As-

sumption 2.1 holds), then:

(i) Each B

∈ M intersects at most 216 · (r

max

min

)

O (1) other balls in M.

(ii) The maximum combinatorial complexity of the bound-

ary of the union of the balls in M is O

max

min

)

· n

= O ( n).

Proof. Similar to the proof of Theorem 2.1 in [24]. 

Therefore, as Theorem 2.1 suggests, for intersection queries

and boundary construction, one should be able to handle M

more eﬃciently if assumption 2.1 holds. The eﬃciency of

our data structure, too, partly depends on this assumption.

2.1 Preliminaries

Before we describe our data structure we present several

deﬁnitions in order to simplify the exposition.

Definition 2.1 (r-grid and grid-cell). An r-grid is

an axis-parallel inﬁnite grid structure in 3-space consisting

of cells of size r×r ×r (r ∈ R) with the root (i.e., the corner

with the smallest x, y, z coordinates) of one of the cells co-

inciding with origin of the (Cartesian) coordinate axes. The

grid cell that has its root at Cartesian coordinates (ar, br, cr)

(where a, b, c ∈ Z) is referred to as the (a, b, c, r)-cell or sim-

ply as the (a, b, c)-cell when r is clear from the context.

Definition 2.2 (grid-line). The (b, c, r)- line (where

b, c ∈ Z) in an r-grid consists of all (x, y, z, r)-cells with y

and z ﬁxed to b and c, respectively. W hen r is clear from the

context the (b, c, r) -line will simply be called the (b, c)-line.

Observe that each cell on the (b, c, r)-line can be identiﬁed

with a unique integer, e.g., the cell at index a ∈ Z on the

given line corresponds to the (a, b, c, r)- cell in the r-grid.

Definition 2.3 (grid-plane). The (c, r)-plane (where

c ∈ Z) in an r-grid consists of all (x, y, z, r)-cells with z ﬁxed

to c. The (c, r)-plane will be referred to as the c-plane when

r is clear from the context.

The (c, r)-plane can be decomposed into an inﬁ nite number

of lines each identiﬁable with a unique integer. For example,

index b ∈ Z uniquely identiﬁes the (b, c, r)-line on t he given

plane. Also each grid-plane in the r-grid can be identiﬁed

with a unique integer, e.g., the (c, r)-plane is identiﬁed by c.

The proof of the following lemma is straight-forward.

Lemma 2.1. Let M = {B

, . . . , B

} be a collection of n

balls in 3-space with radii r

, . . . , r

and centers at c

, . . . , c

Let r

max

= max

} and let d

min

= min

i,j

{d(c

, c

)},

where d(c

, c

) is the Euclidean distance between c

and c

Suppose M is stored in the 2r

max

-grid G. Then

(i) If r

max

= O (d

min

) (i.e., Assumption 2.1 holds) then

each grid-cell in G contains the centers of at most 64 ·

max

min

)

= O (1) balls in M.

(ii) Each ball in M intersects at most 8 grid-cells in G.

(iii) For a given ball B ∈ M with center in grid-cell C, the

center of each ball intersecting B lies either in C or in

one of the 26 grid-cells adjacent to C.

(iv) The number of non-empty (i.e., containing the center

of at least one ball in M) grid-cells in G is at most

n, and the same bound holds for grid-lines and grid-

planes.

At the h eart of our data structure is a fully dynamic one

dimensional integer range reporting d ata structure for word

RAM described in [37]. The data structure in [37] main-

tains a set S of integers under updates (i.e., insertions and

deletions), and answers queries of the form: report any or

all points in S in a given interval. The following theorem

summarizes the performance bounds of the data structure

which are of interest to us.

Theorem 2.2. (proved in [37]) On a RAM with w-bit

words the fully dynamic one dimensional integer range re-

porting problem can be solved in linear space, and with high

probability bounds of O (t

) and O (t

+ k) on update time

and query time, respectively, where k is the number of items

reported, and

(i) t

= O (log w) and t

= O (log log w) using the data

structure in [37]; and

(ii) t

= O (log n/log log n) and t

= O (log log n) using

the data structure in [37] for small w and a fusion tree

[21] for large w.

The data structure can be augmented to store satellite in-

formation of size O (1) with each integer without degrading

its asymptotic performance bounds. Therefore, it supports

the following three functions:

1. Insert( i, s ): Insert an integer i with satellite infor-

mation s.

2. Delete( i ): Delete integer i from the data structure.

3. Query( l, h ): Return the set of all h i, s i tuples

with i ∈ [l, h] stored in the data structure.

2.2 Description (Layout) of the Packing Grid

Data Structure

We are now in a position to present our data structure. Let

DPG be th e d ata structure. We represent the entire 3-space

as a 2r

max

-grid (see Deﬁnition 2.1), and maintain the non-

empty grid-p lanes (see Deﬁnition 2.3), grid-lines (see Deﬁ-

nition 2.2) and grid-cells (see Deﬁnition 2.1) in DPG. A grid

component (i.e., cell, line or plane) is non-empty if it con-

tains the center of at least one ball in M. The data structure

can be describ ed hierarchically. It has a tree structure with

5 levels: 4 internal levels (levels 3, 2, 1 and 0) and an ex-

ternal level of leaves (see Figure 3). The description of each

level follows.

The Leaf Level “Ball” Data Structure (DPG

−1

). The

data structure stores the center c = (c

, c

) and the ra-

dius r of the given ball B. It also includes a Boolean ﬂag

exposed which is set to true if B contributes to the outer

boundary of th e union of t he balls in M, and false other-

wise. If another ball B

′

intersects B, it does so on a circle

which divides the boundary δB of B into two parts: one

part is buried inside B

′

and hence cannot contribute to the

union boundary, and the other part is exposed w.r.t. B

′

and hence might appear on the union boundary. The cir-

cular intersections of all balls intersecting B deﬁne a 2D

arrangement A on δB which according to Theorem 2.1 has

O (1) combinatorial complexity. A face of A is exposed, i.e.,

contributes to the union boundary, provided it is not buried

inside any other ball. Observe that if at least one other ball

intersects B, and A has an exposed face f, then each edge

of f separates f from another exposed face f

′

which belongs

to the arrangement A

′

of a ball intersecting B. We store all

exposed faces (if any) of A in a set F of size O ( 1), and with

each face f we store pointers to the data structures of O (1)

other balls that share edges with f and also the identiﬁer

of the corresponding face on each ball. Observe that if B

does not intersect any other balls then F will contain only

a single face and no pointers to any oth er balls.

The Level 0 “Grid-Cell” Data Structure (DPG

). The

“grid-cell” data structure stores the root (see Deﬁnition 2.1)

(a, b, c) of the grid-cell it corresponds to. A grid-cell can

contain the centers of at most O (1) balls in M (see Lemma

2.1). Pointers to data structures of all such balls are stored

in a set S of size O (1). Since we create “grid-cell” data

structures only for non-empty grid-cells, there will be at

most n (and possibly ≪ n) such data structures, where n is

the current number of balls in M.



 



 







 







 



 







   







   



















   







   

 







   







   















   







   

Figure 3: Hierarchical structure of DPG.

The Level 1 “Grid-Line” Data Structure (DPG

). We

create a “grid-line” data structure for a (b, c)-line provided

it contains at least one non-empty grid-cell. The data struc-

ture stores the values of b and c. Each (a, b, c)-cell lying

on this line is identiﬁed with the uniqu e integer a, and the

identiﬁer of each such non-empty grid-cell is stored in an

integer range search data structure RR as described in Sec-

tion 2.1 (see Theorem 2.2). We augment RR to store the

pointer to the corresponding “grid-cell” data stru cture with

each identiﬁer it stores. The total number of “grid-line” data

structure created is upper bounded by n and possibly much

less than n.

The Level 2 “Grid-Plane” Data Structure (DPG

A “grid-plane” data structure is created for a c-plane pro-

vided it contains at least one non-empty grid-line. Similar

to the “grid-line” data structure it identiﬁes each non-empty

(b, c)-line lying on the c-p lane with the unique integer b, and

stores th e identiﬁers in a range reporting data structure RR

described in Section 2.1. A pointer to the corresponding

“grid-line” data structure is also stored with each identiﬁer.

The data stru cture also stores c. The total number of “grid-

plane” data structures created cannot exceed n, and will

possibly be much less than n.

The Level 3 “Grid” Data Structure (DPG

). This data

structure maintains the non-empty grid-planes of the 2r

max

grid in an integer range reporting data structure RR (see

Section 2.1). Each c-plane is identiﬁed by the unique integer

c, and each such integer stored in R R is also accompanied by

a pointer to the corresponding “grid-plane” data structure.

The “grid” data structure also stores a surface-root pointer

which points to the “Ball” d ata structure of an arbitrary ex-

posed ball in M.

We have the following lemma on the space usage of the data

structure.

Lemma 2.2. Let M be a collection of n balls as deﬁned

in Theorem 2.1, and let Assumption 2.1 holds. Then the

packing grid data structure storing M uses O (n) space.

Proof. The space usage of the data structure is domi-

nated by the space used by t he range reporting data struc-

tures, the grid-cells and the “ball” data structures. Since the

range reporting data structures use linear space (see Theo-

rem 2.2) and total number of non-empty grid components

(i.e., planes, lines and cells) is O (n) (see Lemma 2.1), total

space used by all such data structures is O (n). The grid cells

A dynamic data structure for flexible molecular maintenance and informatics

Figures

Citations

Protein-Protein Docking with F2Dock 2.0 and GB-Rerank

Wetting Effects in Hair Simulation

A dynamic data structure for flexible molecular maintenance and informatics

GPU Accelerated Finding of Channels and Tunnels for a Protein Molecule

Stable Mesh Decimation

References

VMD: Visual molecular dynamics

Scalable molecular dynamics with NAMD

Scalable Molecular Dynamics with NAMD

The Amber biomolecular simulation programs

Computer simulation using particles

Related Papers (5)

Three-dimensional computation of atom depth in complex molecular structures

A method for determining overall protein fold from NMR distance restraints

Time-efficient flexible superposition of medium-sized molecules.

Hydrogen Bonding and Molecular Surface Shape Complementarity as a Basis for Protein Docking

A branch-and-bound method for optimal atom-type assignment in de novo ligand design

Frequently Asked Questions (19)

Q1. What have the authors contributed in "A dynamic data structure for flexible molecular maintenance and informatics" ?

Q2. What is the way to maintain the surface of a molecule?

Q3. How do the authors create a mixed resolution surface of a given molecule?

Q4. How do the authors find the intersections between concave patches?

Q5. How does the DPG data structure calculate the SAS of the molecule?

Q6. What is the effect of multiple chains on the structure of the virus?

Q7. How can the authors extract the SAS of the molecule?

Q8. What is the identifier of the corresponding face on each ball?

Q9. What is the identifier of the corresponding face on each ball?

Q10. What is the simplest way to maintain the surface of a flexible molecule?

Q11. What is the simplest way to maintain the surface of a flexible molecule?

Q12. How do the authors compute the SES of the molecule?

Q13. What is the way to maintain the van der Waals surface?

Q14. What is the way to simulate protein motions?

Q15. What is the simplest way to generate the surface of a molecule?

Q16. What is the effect of the introduction of a new ball on the surface exposure of the set?

Q17. What is the function that is used to identify the two balls that can not intersect B?

Q18. What are the other molecules used in the experiments?

Q19. What are the other molecules used in the experiments?