scispace - formally typeset
Search or ask a question
Author

Mark A. Moraes

Other affiliations: Columbia University
Bio: Mark A. Moraes is an academic researcher from D. E. Shaw Research. The author has contributed to research in topics: Massively parallel & Electronic mail. The author has an hindex of 11, co-authored 14 publications receiving 4821 citations. Previous affiliations of Mark A. Moraes include Columbia University.

Papers
More filters
Proceedings ArticleDOI
11 Nov 2006
TL;DR: This work presents several new algorithms and implementation techniques that significantly accelerate parallel MD simulations compared with current state-of-the-art codes, including a novel parallel decomposition method and message-passing techniques that reduce communication requirements, as well as novel communication primitives that further reduce communication time.
Abstract: Although molecular dynamics (MD) simulations of biomolecular systems often run for days to months, many events of great scientific interest and pharmaceutical relevance occur on long time scales that remain beyond reach. We present several new algorithms and implementation techniques that significantly accelerate parallel MD simulations compared with current stateof- the-art codes. These include a novel parallel decomposition method and message-passing techniques that reduce communication requirements, as well as novel communication primitives that further reduce communication time. We have also developed numerical techniques that maintain high accuracy while using single precision computation in order to exploit processor-level vector instructions. These methods are embodied in a newly developed MD code called Desmond that achieves unprecedented simulation throughput and parallel scalability on commodity clusters. Our results suggest that Desmond?s parallel performance substantially surpasses that of any previously described code. For example, on a standard benchmark, Desmond?s performance on a conventional Opteron cluster with 2K processors slightly exceeded the reported performance of IBM?s Blue Gene/L machine with 32K processors running its Blue Matter MD code.

2,035 citations

Journal ArticleDOI
01 Jul 2008
TL;DR: A massively parallel machine called Anton is described, which should be capable of executing millisecond-scale classical MD simulations of such biomolecular systems and has been designed to use both novel parallel algorithms and special-purpose logic to dramatically accelerate those calculations that dominate the time required for a typical MD simulation.
Abstract: The ability to perform long, accurate molecular dynamics (MD) simulations involving proteins and other biological macro-molecules could in principle provide answers to some of the most important currently outstanding questions in the fields of biology, chemistry, and medicine. A wide range of biologically interesting phenomena, however, occur over timescales on the order of a millisecond---several orders of magnitude beyond the duration of the longest current MD simulations. We describe a massively parallel machine called Anton, which should be capable of executing millisecond-scale classical MD simulations of such biomolecular systems. The machine, which is scheduled for completion by the end of 2008, is based on 512 identical MD-specific ASICs that interact in a tightly coupled manner using a specialized highspeed communication network. Anton has been designed to use both novel parallel algorithms and special-purpose logic to dramatically accelerate those calculations that dominate the time required for a typical MD simulation. The remainder of the simulation algorithm is executed by a programmable portion of each chip that achieves a substantial degree of parallelism while preserving the flexibility necessary to accommodate anticipated advances in physical models and simulation methods.

778 citations

Patent
19 Apr 1996
TL;DR: In this paper, the authors present a system for providing scheduled messages to a remote user in a batch oriented system, where a user creates and/or reads electronic mail locally and a message is displayed to the user on a portion of the local monitor, the message preferably changing in accordance with local display schedule and stored on a local storage device.
Abstract: A system for providing scheduled messages to a remote user in a batch oriented system. In a preferred embodiment of the present invention, a user creates and/or reads electronic mail locally. While the user creates the electronic mail, a message is displayed to the user on a portion of the local monitor, the message preferably changing in accordance with a local display schedule and stored on a local storage device. The message is preferably targeted to the particular user. When the user is ready to transmit the e-mail created and/or receive e-mail addressed to him, the user's local client establishes a connection via a modem with a remote e-mail server system. The remote e-mail server system not only receives the e-mail transmitted by the user and/or transmits e-mail addressed to the user, but also updates the user's local messages in accordance with a distribution schedule. After the e-mail and message updates are transmitted, the user's local client computer is disconnected from the remote e-mail server system.

560 citations

Patent
11 Apr 1997
TL;DR: In this article, the authors present a system for providing scheduled messages to a remote user in a batch oriented system, where a user creates and/or reads electronic mail locally and a message is displayed to the user on a portion of the local monitor, the message preferably changing in accordance with local display schedule and stored on a local storage device.
Abstract: A system for providing scheduled messages to a remote user in a batch oriented system. In a preferred embodiment of the present invention, a user creates and/or reads electronic mail locally. While the user creates the electronic mail, a message is displayed to the user on a portion of the local monitor, the message preferably changing in accordance with a local display schedule and stored on a local storage device. The message is preferably targeted to the particular user. When the user is ready to transmit the e-mail created and/or receive e-mail addressed to him, the user's local client establishes a connection via a modem with a remote e-mail server system. The remote e-mail server system not only receives the e-mail transmitted by the user and/or transmits e-mail addressed to the user, but also updates the user's local messages in accordance with a distribution schedule. After the e-mail and message updates are transmitted, the user's local client computer is disconnected from the remote e-mail server system.

510 citations

Proceedings ArticleDOI
16 Nov 2014
TL;DR: The architecture of Anton 2 is tailored for fine-grained event-driven operation, which improves performance by increasing the overlap of computation with communication, and also allows a wider range of algorithms to run efficiently, enabling many new software-based optimizations.
Abstract: Anton 2 is a second-generation special-purpose supercomputer for molecular dynamics simulations that achieves significant gains in performance, programmability, and capacity compared to its predecessor, Anton 1. The architecture of Anton 2 is tailored for fine-grained event-driven operation, which improves performance by increasing the overlap of computation with communication, and also allows a wider range of algorithms to run efficiently, enabling many new software-based optimizations. A 512-node Anton 2 machine, currently in operation, is up to ten times faster than Anton 1 with the same number of nodes, greatly expanding the reach of all-atom bio molecular simulations. Anton 2 is the first platform to achieve simulation rates of multiple microseconds of physical time per day for systems with millions of atoms. Demonstrating strong scaling, the machine simulates a standard 23,558-atom benchmark system at a rate of 85 µs/day -- 180 times faster than any commodity hardware platform or general-purpose supercomputer.

509 citations


Cited by
More filters
01 May 1993
TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.
Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

29,323 citations

Journal ArticleDOI
16 Sep 2020-Nature
TL;DR: In this paper, the authors review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data, and their evolution into a flexible interoperability layer between increasingly specialized computational libraries is discussed.
Abstract: Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves1 and in the first imaging of a black hole2. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis. NumPy is the primary array programming library for Python; here its fundamental concepts are reviewed and its evolution into a flexible interoperability layer between increasingly specialized computational libraries is discussed.

7,624 citations

Journal ArticleDOI
TL;DR: A range of new simulation algorithms and features developed during the past 4 years are presented, leading up to the GROMACS 4.5 software package, which provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations.
Abstract: Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information:Supplementary data are available at Bioinformatics online.

6,029 citations