OpenMP: an industry standard API for shared-memory programming

doi:10.1109/99.660313

Citations

PDF

Open Access

More filters

Journal Article•DOI•

PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data

[...]

Daniel R. Roe¹, Thomas E. Cheatham¹•Institutions (1)

University of Utah¹

25 Jun 2013-Journal of Chemical Theory and Computation

TL;DR: PTRAJ and its successor CPPTRAJ are described, two complementary, portable, and freely available computer programs for the analysis and processing of time series of three-dimensional atomic positions and the data therein derived.

...read moreread less

Abstract: We describe PTRAJ and its successor CPPTRAJ, two complementary, portable, and freely available computer programs for the analysis and processing of time series of three-dimensional atomic positions (i.e., coordinate trajectories) and the data therein derived. Common tools include the ability to manipulate the data to convert among trajectory formats, process groups of trajectories generated with ensemble methods (e.g., replica exchange molecular dynamics), image with periodic boundary conditions, create average structures, strip subsets of the system, and perform calculations such as RMS fitting, measuring distances, B-factors, radii of gyration, radial distribution functions, and time correlations, among other actions and analyses. Both the PTRAJ and CPPTRAJ programs and source code are freely available under the GNU General Public License version 3 and are currently distributed within the AmberTools 12 suite of support programs that make up part of the Amber package of computer programs (see http://ambe...

...read moreread less

4,382 citations

Journal Article•DOI•

ViennaRNA Package 2.0

[...]

Ronny Lorenz¹, Stephan H. Bernhart¹, Christian Höner zu Siederdissen¹, Hakim Tafer², Christoph Flamm¹, Peter F. Stadler, Ivo L. Hofacker¹, Ivo L. Hofacker³ - Show less +4 more•Institutions (3)

University of Vienna¹, Leipzig University², University of Copenhagen³

24 Nov 2011-Algorithms for Molecular Biology

TL;DR: In this article, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties of nucleic acids based on carefully measured thermodynamic parameters.

...read moreread less

Abstract: Background Secondary structure forms an important intermediate level of description of nucleic acids that encapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely used as a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties.

...read moreread less

3,620 citations

Journal Article•DOI•

Do we need hundreds of classifiers to solve real world classification problems

[...]

Manuel Fernández-Delgado¹, E. Cernadas¹, Senén Barro¹, Dinani Gomes Amorim•Institutions (1)

University of Santiago de Compostela¹

01 Jan 2014-Journal of Machine Learning Research

TL;DR: The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in theTop-20, respectively).

...read moreread less

Abstract: We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively).

...read moreread less

2,616 citations

Journal Article•DOI•

ProtTest 3

[...]

Diego Darriba¹, Guillermo L. Taboada², Ramón Doallo², David Posada²•Institutions (2)

University of Vigo¹, University of A Coruña²

01 Apr 2011-Bioinformatics

TL;DR: A high-performance computing (HPC) version of ProtTest that can be executed in parallel in multicore desktops and clusters, called ProtTest 3, includes new features and extended capabilities.

...read moreread less

Abstract: Summary: We have implemented a high-performance computing (HPC) version of ProtTest that can be executed in parallel in multicore desktops and clusters. This version, called ProtTest 3, includes new features and extended capabilities. Availability: ProtTest 3 source code and binaries are freely available under GNU license for download from http://darwin.uvigo.es/software/prottest3, linked to a Mercurial repository at Bitbucket (https://bitbucket.org/). Contact: dposada@uvigo.es Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

2,210 citations

Cites methods from "OpenMP: an industry standard API fo..."

...(3) Ahybrid implementation MPJ - OpenMP (Dagum and Menon, 1998) to obtain maximum scalability in architectures with both shared and distributed memory (e.g. multicore HPC clusters)....
[...]
...Up to 4 MPJ Express processes per node and at least 2 OpenMP threads for each ML optimization were executed. speed through the distribution of tasks among nodes while taking advantage of multicore processors within nodes....
[...]
...(3) Ahybrid implementation MPJ - OpenMP (Dagum and Menon, 1998) to obtain maximum scalability in architectures with both shared and distributed memory (e....
[...]

ProtTest 3: fast selection of best-t models of protein evolution

[...]

Diego Darriba, Guillermo L. Taboada, David Posada

01 Jan 2011

TL;DR: ProtTest 3 as mentioned in this paper is a HPC version of ProtTest that can be run in parallel in multi-core desktops and clusters, and includes new features and extended capabilities.

...read moreread less

Abstract: Summary: We have implemented a High Performance Computing (HPC) version of ProtTest (Abascal et al., 2007) that can be executed in parallel in multi-core desktops and clusters. This version, called ProtTest 3, includes new features and extended capabilities. Availability: ProtTest 3 source code and binaries are freely available under GNU license for download fromhttp://darwin.uvigo.es/ software/prottest3, linked to a Mercurial repository at Bitbucket

...read moreread less

1,889 citations

Collapse

References

PDF

Open Access

More filters

Book•

Scalable shared-memory multiprocessing

[...]

Daniel E. Lenoski, Wolf-Dietrich Weber

01 Jan 1995

TL;DR: The aim of this presentation is to provide a discussion of the design and implementation of Scalable Shared-Memory Systems, as well as some of the techniques used to design and implement these systems.

...read moreread less

Abstract: Foreword Preface Part 1 General Concepts Chapter 1 Multiprocessing and Scalability 1.1 Multiprocessor Architecture 1.1.1 Single versus Multiple Instruction Streams 1.1.2 Message-Passing versus Shared-Memory Architectures 1.2 Cache Coherence 1.2.1 Uniprocessor Caches 1.2.2 Multiprocessor Caches 1.3 Scalability 1.3.1 Scalable Interconnection Networks 1.3.2 Scalable Cache Coherence 1.3.3 Scalable I/O 1.3.4 Summary of Hardware Architecture Scalability 1.3.5 Scalability of Parallel Software 1.4 Scaling and Processor Grain Size 1.5 Chapter conclusions Chapter 2 Shared-Memory Parallel Programs 2.1 Basic Concepts 2.2 Parallel Application Set 2.2.1 MP3D 2.2.2 Water 2.2.3 PTHOR 2.2.4 LocusRoute 2.2.5 Cholesky 2.2.6 Barnes-Hut 2.3 Simulation Environment 2.3.1 Basic Program Characteristics 2.4 Parallel Application Execution Model 2.5 Parallel Execution under a PRAM Memory Model 2.6 Parallel Execution with Shared Data Uncached 2.7 Parallel Execution with Shared Data Cached 2.8 Summary of Results with Different Memory System Models 2.9 Communication Behavior of Parallel Applications 2.10 Communication-to-Computation Ratios 2.11 Invalidation Patterns 2.11.1 Classification of Data Objects 2.11.2 Average Invalidation Characteristics 2.11.3 Basic Invalidation Patterns for Each Application 2.11.4 MP3D 2.11.5 Water 2.11.6 PTHOR 2.11.7 LocusRoute 2.11.8 Cholesky 2.11.9 Barnes-Hut 2.11.10 Summary of Individual Invalidation Distributions 2.11.11 Effect of Problem Size 2.11.12 Effect of Number of Processors 2.11.13 Effect of Finite Caches and Replacement Hints 2.11.14 Effect of Cache Line Size 2.11.15 Invalidation Patterns Summary 2.12 Chapter Conclusions Chapter 3 System Performance Issues 3.1 Memory Latency 3.2 Memory Latency Reduction 3.2.1 Nonuniform Memory Access (NUMA) 3.2.2 Cache-Only Memory Architecture (COMA) 3.2.3 Direct Interconnect Networks 3.2.4 Hierarchical Access 3.2.5 Protocol Optimizations 3.2.6 Latency Reduction Summary 3.3 Latency Hiding 3.3.1 Weak Consistency Models 3.3.2 Prefetch 3.3.3 Multiple-Context Processors 3.3.4 Producer-Initiated Communications 3.3.5 Latency Hiding Summary 3.4 Memory Bandwidth 3.4.1 Hot Spots 3.4.2 Synchronization Support 3.5 Chapter Conclusions Chapter 4 System Implementation 4.1 Scalability of System Costs 4.1.1 Directory Storage overhead 4.1.2 Sparse Directories 4.1.3 Hierarchical Directories 4.1.4 Summary of Directory Storage overhead 4.2 Implementation Issues and Design Correctness 4.2.1 Unbounded Number of Requests 4.2.2 Distributed memory Operations 4.2.3 Request Starvation 4.2.4 Error Detection and Fault tolerance 4.2.5 Design Verification 4.3 Chapter Conclusions Chapter 5 Scalable Shared-Memory Systems 5.1 Directory-Based Systems 5.1.1 DASH 5.1.2 Alewife 5.1.3 S3.mp 5.1.4 IEEE Scalable Coherent Interface 5.1.5 Convex Exemplar 5.2 Hierarchical Systems 5.2.1 Encore GigaMax 5.2.2 ParaDiGM 5.2.3 Data Diffusion Machine 5.2.4 Kendall Square Research KSR-1 and KSR-2 5.3 Reflective Memory Systems 5.3.1 Plus 5.3.2 Merlin and Sesame 5.4 Non-Cache Coherent Systems 5.4.1 NYU Ultracomputer 5.4.2 IBM RP3 and BBN TC2000 5.4.3 Cray Research T3D 5.5 Vector Supercomputer Systems 5.5.1 Cray Research Y-MP C90 5.5.2 Tera Computer MTA 5.6 Virtual Shared-Memory Systems 5.6.1 Ivy and Munin/Treadmarks 5.6.2 J-Machine 5.6.3 MIT/Motorola *T and *T-NG 5.7 Chapter Conclusions Part 2 Experience with DASH Chapter 6 DASH Prototype System 6.1 System Organization 6.1.1 Cluster Organization 6.1.2 Directory Logic 6.1.3 Interconnection Network 6.2 Programmer's Model 6.3 Coherence Protocol 6.3.1 Nomenclature 6.3.2 Basic Memory Operations 6.3.3 Prefetch Operations 6.3.4 DMA/Uncached Operations 6.4 Synchronization Protocol 6.4.1 Granting Locks 6.4.2 Fetch&Op Variables 6.4.3 Fence Operations 6.5 Protocol General Exceptions 6.6 Chapter Conclusions Chapter 7 Prototype Hardware Structures 7.1 Base Cluster Hardware 7.1.1 SGI Multiprocessor Bus (MPBUS) 7.1.2 SGI CPU Board 7.1.3 SGI Memory Board 7.1.4 SGI I/O Board 7.2 Directory Controller 7.3 Reply Controller 7.4 Pseudo-CPU 7.5 Network and Network Interface 7.6 Performance Monitor 7.7 Logic Overhead of Directory-Based Coherence 7.8 Chapter Conclusions Chapter 8 Prototype Performance Analysis 8.1 Base Memory Performance 8.1.1 Overall Memory System Bandwidth 8.1.2 Other Memory Bandwidth Limits 8.1.3 Processor Issue Bandwidth and Latency 8.1.4 Interprocessor Latency 8.1.5 Summary of Memory System Bandwidth and Latency 8.2 Parallel Application Performance 8.2.1 Application Run-time Environment 8.2.2 Application Speedups 8.2.3 Detailed Case Studies 8.2.4 Application Speedup Summary 8.3 Protocol Effectiveness 8.3.1 Base Protocol Features 8.3.2 Alternative Memory Operations 8.4 Chapter Conclusions Part 3 Future Trends Chapter 9 TeraDASH 9.1 TeraDASH System Organization 9.1.1 TeraDASH Cluster Structure 9.1.2 Intracluster Operations 9.1.3 TeraDASH Mesh Network 9.1.4 Tera \DASH Directory Structure 9.2. TeraDASH Coherence Protocol 9.2.1 Required Changes for the Scalable Directory Structure 9.2.2 Enhancements for Increased protocol Robustness 9.2.3 Enhancements for Increased Performance 9.3 TeraDASH Performance 9.3.1 Access Latencies 9.3.2 Potential Application Speedup 9.4 Chapter Conclusions Chapter 10 Conclusions and Future Directions 10.1 SSMP Design Conclusions 10.2 Current Trends 10.3 Future Trends Appendix Multiprocessor Systems References Index

...read moreread less

146 citations

Book•

Parallel Programming

[...]

Susann Ragsdale

01 Jan 1991

14 citations

OpenMP: an industry standard API for shared-memory programming

Citations

Cites methods from "OpenMP: an industry standard API fo..."

References

Related Papers (5)