scispace - formally typeset
Search or ask a question
Journal ArticleDOI

OpenMP: an industry standard API for shared-memory programming

01 Jan 1998-Vol. 5, Iss: 1, pp 46-55
TL;DR: At its most elemental level, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran (and separately, C and C++ to express shared memory parallelism) and leaves the base language unspecified.
Abstract: At its most elemental level, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran (and separately, C and C++ to express shared memory parallelism. It leaves the base language unspecified, and vendors can implement OpenMP in any Fortran compiler. Naturally, to support pointers and allocatables, Fortran 90 and Fortran 95 require the OpenMP implementation to include additional semantics over Fortran 77. OpenMP leverages many of the X3H5 concepts while extending them to support coarse grain parallelism. The standard also includes a callable runtime library with accompanying environment variables.
Citations
More filters
Journal ArticleDOI
TL;DR: PTRAJ and its successor CPPTRAJ are described, two complementary, portable, and freely available computer programs for the analysis and processing of time series of three-dimensional atomic positions and the data therein derived.
Abstract: We describe PTRAJ and its successor CPPTRAJ, two complementary, portable, and freely available computer programs for the analysis and processing of time series of three-dimensional atomic positions (i.e., coordinate trajectories) and the data therein derived. Common tools include the ability to manipulate the data to convert among trajectory formats, process groups of trajectories generated with ensemble methods (e.g., replica exchange molecular dynamics), image with periodic boundary conditions, create average structures, strip subsets of the system, and perform calculations such as RMS fitting, measuring distances, B-factors, radii of gyration, radial distribution functions, and time correlations, among other actions and analyses. Both the PTRAJ and CPPTRAJ programs and source code are freely available under the GNU General Public License version 3 and are currently distributed within the AmberTools 12 suite of support programs that make up part of the Amber package of computer programs (see http://ambe...

4,382 citations

Journal ArticleDOI
TL;DR: In this article, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties of nucleic acids based on carefully measured thermodynamic parameters.
Abstract: Background Secondary structure forms an important intermediate level of description of nucleic acids that encapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely used as a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties.

3,620 citations

Journal ArticleDOI
TL;DR: The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in theTop-20, respectively).
Abstract: We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classifiers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively).

2,616 citations

Journal ArticleDOI
TL;DR: A high-performance computing (HPC) version of ProtTest that can be executed in parallel in multicore desktops and clusters, called ProtTest 3, includes new features and extended capabilities.
Abstract: Summary: We have implemented a high-performance computing (HPC) version of ProtTest that can be executed in parallel in multicore desktops and clusters. This version, called ProtTest 3, includes new features and extended capabilities. Availability: ProtTest 3 source code and binaries are freely available under GNU license for download from http://darwin.uvigo.es/software/prottest3, linked to a Mercurial repository at Bitbucket (https://bitbucket.org/). Contact: dposada@uvigo.es Supplementary information:Supplementary data are available at Bioinformatics online.

2,210 citations


Cites methods from "OpenMP: an industry standard API fo..."

  • ...(3) Ahybrid implementation MPJ - OpenMP (Dagum and Menon, 1998) to obtain maximum scalability in architectures with both shared and distributed memory (e.g. multicore HPC clusters)....

    [...]

  • ...Up to 4 MPJ Express processes per node and at least 2 OpenMP threads for each ML optimization were executed. speed through the distribution of tasks among nodes while taking advantage of multicore processors within nodes....

    [...]

  • ...(3) Ahybrid implementation MPJ - OpenMP (Dagum and Menon, 1998) to obtain maximum scalability in architectures with both shared and distributed memory (e....

    [...]

01 Jan 2011
TL;DR: ProtTest 3 as mentioned in this paper is a HPC version of ProtTest that can be run in parallel in multi-core desktops and clusters, and includes new features and extended capabilities.
Abstract: Summary: We have implemented a High Performance Computing (HPC) version of ProtTest (Abascal et al., 2007) that can be executed in parallel in multi-core desktops and clusters. This version, called ProtTest 3, includes new features and extended capabilities. Availability: ProtTest 3 source code and binaries are freely available under GNU license for download fromhttp://darwin.uvigo.es/ software/prottest3, linked to a Mercurial repository at Bitbucket

1,889 citations

References
More filters
Book
01 Jan 1995
TL;DR: The aim of this presentation is to provide a discussion of the design and implementation of Scalable Shared-Memory Systems, as well as some of the techniques used to design and implement these systems.
Abstract: Foreword Preface Part 1 General Concepts Chapter 1 Multiprocessing and Scalability 1.1 Multiprocessor Architecture 1.1.1 Single versus Multiple Instruction Streams 1.1.2 Message-Passing versus Shared-Memory Architectures 1.2 Cache Coherence 1.2.1 Uniprocessor Caches 1.2.2 Multiprocessor Caches 1.3 Scalability 1.3.1 Scalable Interconnection Networks 1.3.2 Scalable Cache Coherence 1.3.3 Scalable I/O 1.3.4 Summary of Hardware Architecture Scalability 1.3.5 Scalability of Parallel Software 1.4 Scaling and Processor Grain Size 1.5 Chapter conclusions Chapter 2 Shared-Memory Parallel Programs 2.1 Basic Concepts 2.2 Parallel Application Set 2.2.1 MP3D 2.2.2 Water 2.2.3 PTHOR 2.2.4 LocusRoute 2.2.5 Cholesky 2.2.6 Barnes-Hut 2.3 Simulation Environment 2.3.1 Basic Program Characteristics 2.4 Parallel Application Execution Model 2.5 Parallel Execution under a PRAM Memory Model 2.6 Parallel Execution with Shared Data Uncached 2.7 Parallel Execution with Shared Data Cached 2.8 Summary of Results with Different Memory System Models 2.9 Communication Behavior of Parallel Applications 2.10 Communication-to-Computation Ratios 2.11 Invalidation Patterns 2.11.1 Classification of Data Objects 2.11.2 Average Invalidation Characteristics 2.11.3 Basic Invalidation Patterns for Each Application 2.11.4 MP3D 2.11.5 Water 2.11.6 PTHOR 2.11.7 LocusRoute 2.11.8 Cholesky 2.11.9 Barnes-Hut 2.11.10 Summary of Individual Invalidation Distributions 2.11.11 Effect of Problem Size 2.11.12 Effect of Number of Processors 2.11.13 Effect of Finite Caches and Replacement Hints 2.11.14 Effect of Cache Line Size 2.11.15 Invalidation Patterns Summary 2.12 Chapter Conclusions Chapter 3 System Performance Issues 3.1 Memory Latency 3.2 Memory Latency Reduction 3.2.1 Nonuniform Memory Access (NUMA) 3.2.2 Cache-Only Memory Architecture (COMA) 3.2.3 Direct Interconnect Networks 3.2.4 Hierarchical Access 3.2.5 Protocol Optimizations 3.2.6 Latency Reduction Summary 3.3 Latency Hiding 3.3.1 Weak Consistency Models 3.3.2 Prefetch 3.3.3 Multiple-Context Processors 3.3.4 Producer-Initiated Communications 3.3.5 Latency Hiding Summary 3.4 Memory Bandwidth 3.4.1 Hot Spots 3.4.2 Synchronization Support 3.5 Chapter Conclusions Chapter 4 System Implementation 4.1 Scalability of System Costs 4.1.1 Directory Storage overhead 4.1.2 Sparse Directories 4.1.3 Hierarchical Directories 4.1.4 Summary of Directory Storage overhead 4.2 Implementation Issues and Design Correctness 4.2.1 Unbounded Number of Requests 4.2.2 Distributed memory Operations 4.2.3 Request Starvation 4.2.4 Error Detection and Fault tolerance 4.2.5 Design Verification 4.3 Chapter Conclusions Chapter 5 Scalable Shared-Memory Systems 5.1 Directory-Based Systems 5.1.1 DASH 5.1.2 Alewife 5.1.3 S3.mp 5.1.4 IEEE Scalable Coherent Interface 5.1.5 Convex Exemplar 5.2 Hierarchical Systems 5.2.1 Encore GigaMax 5.2.2 ParaDiGM 5.2.3 Data Diffusion Machine 5.2.4 Kendall Square Research KSR-1 and KSR-2 5.3 Reflective Memory Systems 5.3.1 Plus 5.3.2 Merlin and Sesame 5.4 Non-Cache Coherent Systems 5.4.1 NYU Ultracomputer 5.4.2 IBM RP3 and BBN TC2000 5.4.3 Cray Research T3D 5.5 Vector Supercomputer Systems 5.5.1 Cray Research Y-MP C90 5.5.2 Tera Computer MTA 5.6 Virtual Shared-Memory Systems 5.6.1 Ivy and Munin/Treadmarks 5.6.2 J-Machine 5.6.3 MIT/Motorola *T and *T-NG 5.7 Chapter Conclusions Part 2 Experience with DASH Chapter 6 DASH Prototype System 6.1 System Organization 6.1.1 Cluster Organization 6.1.2 Directory Logic 6.1.3 Interconnection Network 6.2 Programmer's Model 6.3 Coherence Protocol 6.3.1 Nomenclature 6.3.2 Basic Memory Operations 6.3.3 Prefetch Operations 6.3.4 DMA/Uncached Operations 6.4 Synchronization Protocol 6.4.1 Granting Locks 6.4.2 Fetch&Op Variables 6.4.3 Fence Operations 6.5 Protocol General Exceptions 6.6 Chapter Conclusions Chapter 7 Prototype Hardware Structures 7.1 Base Cluster Hardware 7.1.1 SGI Multiprocessor Bus (MPBUS) 7.1.2 SGI CPU Board 7.1.3 SGI Memory Board 7.1.4 SGI I/O Board 7.2 Directory Controller 7.3 Reply Controller 7.4 Pseudo-CPU 7.5 Network and Network Interface 7.6 Performance Monitor 7.7 Logic Overhead of Directory-Based Coherence 7.8 Chapter Conclusions Chapter 8 Prototype Performance Analysis 8.1 Base Memory Performance 8.1.1 Overall Memory System Bandwidth 8.1.2 Other Memory Bandwidth Limits 8.1.3 Processor Issue Bandwidth and Latency 8.1.4 Interprocessor Latency 8.1.5 Summary of Memory System Bandwidth and Latency 8.2 Parallel Application Performance 8.2.1 Application Run-time Environment 8.2.2 Application Speedups 8.2.3 Detailed Case Studies 8.2.4 Application Speedup Summary 8.3 Protocol Effectiveness 8.3.1 Base Protocol Features 8.3.2 Alternative Memory Operations 8.4 Chapter Conclusions Part 3 Future Trends Chapter 9 TeraDASH 9.1 TeraDASH System Organization 9.1.1 TeraDASH Cluster Structure 9.1.2 Intracluster Operations 9.1.3 TeraDASH Mesh Network 9.1.4 Tera \DASH Directory Structure 9.2. TeraDASH Coherence Protocol 9.2.1 Required Changes for the Scalable Directory Structure 9.2.2 Enhancements for Increased protocol Robustness 9.2.3 Enhancements for Increased Performance 9.3 TeraDASH Performance 9.3.1 Access Latencies 9.3.2 Potential Application Speedup 9.4 Chapter Conclusions Chapter 10 Conclusions and Future Directions 10.1 SSMP Design Conclusions 10.2 Current Trends 10.3 Future Trends Appendix Multiprocessor Systems References Index

146 citations

Book
01 Jan 1991

14 citations