Showing papers on "Performance prediction published in 1996"

PDF

Open Access

Journal Article•DOI•

Analysis of benchmark characteristics and benchmark performance prediction

[...]

Rafael H. Saavedra¹, Alan Jay Smith²•Institutions (2)

University of Southern California¹, University of California, Berkeley²

01 Nov 1996-ACM Transactions on Computer Systems

TL;DR: A machine-imdependent model of program execution is developed to characterize both machine performance and program execution, and a metric for program similarity is developed that makes it possible to classify benchmarks with respect to a large set of characteristics.

...read moreread less

Abstract: Standard benchmarking provides to run-times for given programs on given machines, but fails to provide insight as to why those results were obtained (either in terms of machine or program characteristics) and fails to provide run-times for that program on some other machine, or some other programs on that machine. We have developed a machine-imdependent model of program execution to characterize both machine performance and program execution. By merging these machine and program characterizations, we can estimate execution time for arbitrary machine/program combinations. Our technique allows us to identify those operations, either on the machine or in the programs, which dominate the benchmark results. This information helps designers in improving the performance of future machines and users in tuning their applications to better utilize the performance of existing machines. Here we apply our methodology to characterize benchmarks and predict their execution times. We present extensive run-time statistics for a large set of benchmarks including the SPEC and Perfect Club suites. We show how these statistics can be used to identify important shortcoming in the programs. In addition, we give execution time estimates for a large sample of programs and machines and compare these against benchmark results. Finally, we develop a metric for program similarity that makes it possible to classify benchmarks with respect to a large set of characteristics.

...read moreread less

230 citations

Book•

Automatic Performance Prediction of Parallel Programs

[...]

Thomas Fahringer

31 Mar 1996

TL;DR: The author introduces a novel and very practical approach for predicting some of the most important performance parameters of parallel programs, including work distribution, number of transfers, amount of data transferred, network contention, transfer time, computation time and number of cache misses.

...read moreread less

Abstract: Automatic Performance Prediction of Parallel Programs presents a unified approach to the problem of automatically estimating the performance of parallel computer programs. The author focuses primarily on distributed memory multiprocessor systems, although large portions of the analysis can be applied to shared memory architectures as well. The author introduces a novel and very practical approach for predicting some of the most important performance parameters of parallel programs, including work distribution, number of transfers, amount of data transferred, network contention, transfer time, computation time and number of cache misses. This approach is based on advanced compiler analysis that carefully examines loop iteration spaces, procedure calls, array subscript expressions, communication patterns, data distributions and optimizing code transformations at the program level; and the most important machine specific parameters including cache characteristics, communication network indices, and benchmark data for computational operations at the machine level. The material has been fully implemented as part of P3T, which is an integrated automatic performance estimator of the Vienna Fortran Compilation System (VFCS), a state-of-the-art parallelizing compiler for Fortran77, Vienna Fortran and a subset of High Performance Fortran (HPF) programs. A large number of experiments using realistic HPF and Vienna Fortran code examples demonstrate highly accurate performance estimates, and the ability of the described performance prediction approach to successfully guide both programmer and compiler in parallelizing and optimizing parallel programs. A graphical user interface is described and displayed that visualizes each program source line together with the corresponding parameter values. P3T uses color-coded performance visualization to immediately identify hot spots in the parallel program. Performance data can be filtered and displayed at various levels of detail. Colors displayed by the graphical user interface are visualized in greyscale. Automatic Performance Prediction of Parallel Programs also includes coverage of fundamental problems of automatic parallelization for distributed memory multicomputers, a description of the basic parallelization strategy and a large variety of optimizing code transformations as included under VFCS.

...read moreread less

75 citations

Journal Article•DOI•

Evaluation of a Navier-Stokes Analysis Method for Hover Performance Prediction

[...]

Brian E. Wake, James D. Baeder

01 Jan 1996-Journal of The American Helicopter Society

61 citations

Journal Article•DOI•

Performance analysis of scheduling policies in re-entrant manufacturing systems

[...]

Yadati Narahari¹, L.M. Khan¹•Institutions (1)

Indian Institute of Science¹

02 Jan 1996-Computers & Operations Research

TL;DR: An approximate technique for analytical performance prediction of re-entrant lines is presented based on MVA (Mean Value Analysis), which makes it overwhelmingly efficient compared to simulation.

...read moreread less

53 citations

Journal Article•DOI•

Early prediction of MPP performance: the SP2, T3D, and Paragon experiences

[...]

Zhiwei Xu¹, Kai Hwang²•Institutions (2)

Chinese Academy of Sciences¹, University of Hong Kong²

01 Oct 1996

TL;DR: The main contribution of this work lies in providing a systematic procedure to estimate the computational work-load, to determine the application attributes, and to reveal the communication overhead in using these MPPs.

...read moreread less

Abstract: The performance of Massively Parallel Processors (MPPs) is attributed to a large number of machine and program factors. Software development for MPP applications is often very costly. The high cost is partially caused by a lack of early prediction of MPP performance. The program development cycle may iterate many times before achieving the desired performance level. In this paper, we present an early prediction scheme we have developed at the University of Southern California for reducing the cost of application software development. Using workload analysis and overhead estimation, our scheme optimizes the design of parallel algorithm before entering the tedious coding, debugging, and testing cycle of the applications. The scheme is generally applied at user/programmer level, not tied to any particular machine platform or any specific software environment. We have tested the effectiveness of this early performance prediction scheme by running the MIT/STAP benchmark programs on a 400-node IBM SP2 system at the Maui High-Performance Computing Center (MHPCC), on a 400-node Intel Paragon system at the San Diego Supercomputing Center (SDSC), and on a 128-node Cray T3D at the Cray Research Eagan Center in Wisconsin. Our prediction shows to be rather accurate compared with the actual performance measured on these machines. We use the SP2 data to illustrate the early prediction scheme. The main contribution of this work lies in providing a systematic procedure to estimate the computational work-load, to determine the application attributes, and to reveal the communication overhead in using these MPPs. These results can be applied to develop any MPP applications other than the STAP benchmarks by which this prediction scheme was developed.

...read moreread less

34 citations

Book Chapter•DOI•

Accurate Performance Prediction for Assively Parallel Systems and Its Applications

[...]

Jens Simon, Jens-Michael Wierum

26 Aug 1996

TL;DR: A performance prediction method is presented, which accurately predicts the expected program execution time on massively parallel systems using a relaxed task graph model, a queuing model, and a memory hierarchy model.

...read moreread less

Abstract: A performance prediction method is presented, which accurately predicts the expected program execution time on massively parallel systems. We consider distributed-memory architectures with SMD nodes and a fast communication network. The method is based on a relaxed task graph model, a queuing model, and a memory hierarchy model. The relaxed task graph is a compact representation of communicating processes of an application mapped onto the target machine. Simultaneous accesses to the resources of a multi-processor node are modeled by a queuing network. The execution time of the application is computed by an evaluation algorithm. An example application implemented on a massively parallel computer demonstrates the high accuracy of our model. Furthermore, two applications of our accurate prediction method are presented.

...read moreread less

32 citations

Book Chapter•DOI•

Theory, Practice, and a Tool for BSP Performance Prediction

[...]

Jonathan M. D. Hill¹, Paul I. Crumpton¹, David A. Burgess²•Institutions (2)

University of Oxford¹, Stanford University²

26 Aug 1996

TL;DR: It is shown how the profiling tool can be used to explore the communication patterns of the CFD code and accurately predict the performance of the application on any parallel machine.

...read moreread less

Abstract: The Bulk Synchronous Parallel (BSP) model provides a theoretical framework to accurately predict the execution time of parallel programs. In this paper we describe a BSP programming library that has been developed and contrast two approaches to analysing performance: (1) a pencil and paper method; (2) a profiling tool that analyses trace information generated during program execution. These approaches are evaluated on an industrial application code that solves fluid dynamics equations around a complex aircraft geometry on IBM SP2 and SGI Power Challenge machines. We show how the profiling tool can be used to explore the communication patterns of the CFD code and accurately predict the performance of the application on any parallel machine.

...read moreread less

31 citations

Proceedings Article•DOI•

Multilateral Well Performance Prediction

[...]

J. R. Salas, P. J. Clifford, D. P. Jenkins

01 Jan 1996

TL;DR: In this article, the performance of multilateral wells is compared with horizontal wells in reservoirs with 2phase gravity drainage, and multilateral well is shown to be an effective solution to the waterflooding of a faulted reservoir.

...read moreread less

Abstract: Potential reservoir applications of multilateral wells me identified. Methods for calculating their productivity and performance are described. These methods are used to show how productivity depends on wellbore geometry and reservoir properties. Also, the performance of multilateral wells is compared with horizontal wells in reservoirs with 2phase gravity drainage, and multilateral wells are shown to be an effective solution to the waterflooding of a faulted reservoir.

...read moreread less

29 citations

Journal Article•DOI•

Thermal-Hydraulic Performance Prediction in Fluid Power Systems

[...]

J A Sidders¹, D G Tilley¹, P J Chappie¹•Institutions (1)

University of Bath¹

01 Nov 1996

TL;DR: In this article, a set of lumped parameter mathematical models are developed which are based on conservation of mass and energy for the system, and the theoretical basis and modelling strategy are discussed for an open circuit containing a hydraulic pump, loading valve, heat exchanger and reservoir.

...read moreread less

Abstract: This paper presents a modelling approach to the study of thermal-hydraulic performance in fluid power systems. A set of lumped parameter mathematical models are developed which are based on conservation of mass and energy for the system. The theoretical basis and modelling strategy are discussed for an open circuit containing a hydraulic pump, loading valve, heat exchanger and reservoir. Simulation results are presented which show a comparison of model/rig performance, and the agreement obtained demonstrates the validity of the modelling approach. It is shown that the thermal response is dominated by the reservoir heat capacity and that close correspondence between the model and rig is only achievable with accurate hydraulic performance models.

...read moreread less

23 citations

The theory, practice, and a tool for BSP performance prediction applied to a CFD application

[...]

Jonathan M. D. Hill, Paul I. Crumpton, David A. Burgess

01 Feb 1996

TL;DR: A BSP programming library that has been developed is described, and it is shown how the tool can be used to explore the communication patterns of the CFD code and accurately predict the performance of the application on any parallel machine.

...read moreread less

Abstract: The Bulk Synchronous Parallel (BSP) model provides a theoretical framework to accurately predict the execution time of parallel programs. In this paper we describe a BSP programming library that has been developed, and contrast two approaches to analysing performance: (1) a pencil and paper method with a theoretical cost model; (2) a profiling tool that analyses trace information generated during program execution. These approaches are evaluated on an industrial application code that solves fluid dynamics equations around a complex aircraft geometry on an IBM SP2 and SGI PowerChallenge. We show how the tool can be used to explore the communication patterns of the CFD code and accurately predict the performance of the application on any parallel machine. This work was performed within Oxford Parallel with financial support from Rolls Royce plc and EPSRC.

...read moreread less

20 citations

Proceedings Article•DOI•

Performance prediction of PVM programs

[...]

Michael R. Steed¹, Mark J. Clement¹•Institutions (1)

Brigham Young University¹

15 Apr 1996

TL;DR: The APACHE (Automated PVM Application Characterization Environment) performance prediction system for PVM programs running on workstation clusters is developed and some experimental results obtained are presented.

...read moreread less

Abstract: As workstation clusters gain popularity as a parallel computing platform, there is an increasing need for performance tools that support these platforms Performance prediction is important for performance analysis of scalable parallel applications Although there are several performance tools available that work with PVM programs, none are capable of performance prediction We have developed the APACHE (Automated PVM Application Characterization Environment) performance prediction system for PVM programs running on workstation clusters We review the system implementation and present some experimental results obtained with the system

...read moreread less

Proceedings Article•DOI•

A testbed for parallel simulation performance prediction

[...]

Alois Ferscha¹, James Johnson¹•Institutions (1)

University of Vienna¹

08 Nov 1996

TL;DR: An incremental code development process that supports early performance predictions of Time Warp protocols and several of its optimizations is presented and it is shown how the scenario management features provided by the N-MAP tool can be efficiently utilized to predict performance sensitivities.

...read moreread less

Abstract: The overwhelming complexity of influencing factors determining the performance of parallel simulation executions demands a performance oriented development of logical process simulators. This paper presents an incremental code development process that supports early performance predictions of Time Warp protocols and several of its optimizations. A set of tools, N-MAP, for performance prediction and visualization has been developed, representing a testbed for a detailed sensitivity analysis of the various Time Warp execution parameters. As an example, the effects of various performance factors like the event structure underlying the simulation task, the average LVT progression per simulation step, the commitment rate, state saving overhead, etc. are demonstrated. We show how the scenario management features provided by the N-MAP tool can be efficiently utilized to predict performance sensitivities. For the particular example, the Time Warp protocol, though highly involved, N-MAP was able to predict the per formance sensitivity that was measured from the full implementation executing on the Meiko CS-2.

...read moreread less

Journal Article•DOI•

Computer simulation model for airplane landing-performance prediction

[...]

Byung J. Kim, Antonio A. Trani¹, Xiaoling Gu², Caoyuan Zhong¹•Institutions (2)

Virginia Tech¹, Boeing Commercial Airplanes²

01 Jan 1996-Transportation Research Record

TL;DR: In this paper, a Monte Carlo simulation algorithm and empirical heuristics derived from field observations were used to estimate landing-roll trajectories that can be programmed quickly in a personal computer.

...read moreread less

Abstract: A simple computer simulation model that predicts airplane landing performance on runways to locate high-speed exits is presented. A Monte Carlo simulation algorithm and empirical heuristics derived from field observations were used to estimate landing-roll trajectories that can be programmed quickly in a personal computer. The modeling process demonstrates statistically the validity of treating landing-roll profiles of various airplane models individually to locate high-speed exits. The model developed can be applied to a variety of airports and airplane types and is offered as an alternative to conventional methods for locating high-speed exits as well as a complement to more rigorous optimization methods.

...read moreread less

Modelling approaches for displacement ventilation in offices

[...]

Jlm Jan Hensen, Mjh Marjan Hamelinck, Mglc Marcel Loomans

01 Jan 1996

TL;DR: Displacement ventilation in offices is used a case study to demonstrate merits and drawbacks of various computer modelling approaches for HVAC design and performance prediction and that there is quite a lot of future work needed.

...read moreread less

Abstract: After briefly indicating HVAC performance evaluation criteria, displacement ventilation in offices is used a case study to demonstrate merits and drawbacks of various computer modelling approaches for HVAC design and performance prediction. The main conclusions are that each approach has its own (dis)advantages, different approaches should/could be used depending on the question to be answered, and that there is quite a lot of future work needed.

...read moreread less

Journal Article•DOI•

Compile-time performance prediction of HPF/Fortran 90D

[...]

Manish Parashar, Salim Hariri

01 Mar 1996-IEEE Parallel & Distributed Technology: Systems & Applications

TL;DR: The compile-time High-Performance Fortran (HPF)/Fortran 90D performance prediction framework allows accurate, cost-effective performance prediction in high-performance computing environments.

...read moreread less

Abstract: The compile-time High-Performance Fortran (HPF)/Fortran 90D performance prediction framework allows accurate, cost-effective performance prediction in high-performance computing environments. The framework implements an interpretative approach to performance prediction and helps select appropriate HPF/Fortran 90D compiler directives, debug application performance, and experiment with runtime and system parameters.

...read moreread less

Journal Article•DOI•

Performance prediction. A case study using a scalable shared-virtual memory machine

[...]

Xian-He Sun¹, Jianping Zhu•Institutions (1)

Louisiana State University¹

01 Dec 1996-IEEE Parallel & Distributed Technology: Systems & Applications

TL;DR: A simple formula shows the relationship between scalability, single-processor computing power, and degradation of parallelism and investigates the prediction and application of scalability.

...read moreread less

Abstract: As computers with tens of thousands of processors successfully deliver high performance power for solving some of the so called "grand challenge" applications, scalability is becoming an important metric in the evaluation of parallel architectures and algorithms. The authors carefully investigate the prediction of scalability and its application. With a simple formula, they show the relation between scalability, single processor computing power, and degradation of parallelism. They conduct a case study on a multi ring KSR-1 shared virtual memory machine. However, the prediction formula and methodology proposed in the study are not bound to any algorithm or architecture. They can be applied to any algorithm-machine combination. Experimental and theoretical results show that the influence of variation of ensemble size is predictable. Therefore, the performance of an algorithm on a sophisticated, hierarchical architecture can be predicted, and the best algorithm-machine combination can be selected for a given application.

...read moreread less

Journal Article•DOI•

Dynamic Simulation of Road Vehicle Performance Under Transient Accelerating Conditions

[...]

C.-W. Hong¹•Institutions (1)

National Tsing Hua University¹

01 Jan 1996

TL;DR: In this paper, a personal computer-based simulation package has been developed to design the powertrain system of passenger cars aiming to operate at optimal performance under transient accelerating conditions, including sudden full-throttle acceleration at a fixed gear and a changing-gear starting acceleration from standstill.

...read moreread less

Abstract: A personal computer-based simulation package has been developed to design the powertrain system of passenger cars aiming to operate at optimal performance. This package is capable of dynamic simulation of road vehicle performance under transient accelerating conditions. Two methods are included : one is the traditional transient-reconstruction method using steady-state engine performance maps ; the other is a dynamic simulation technique newly developed by the author. The latter is described in this paper. It is based on cyclic analysis of the engine thermofluid-combustion phenomena with additional considerations of flow inertia, thermal inertia and mechanical inertia effects. This transient engine model plus a dynamic powertrain model and a transient road-load simulation make it possible to predict the automobile performance under road-driving conditions. Two examples of transient performance prediction, including a sudden full-throttle acceleration at a fixed gear and a changing-gear starting acceleration from standstill, are demonstrated in this paper. These examples show that the relation between the engine speed and the road speed under accelerating conditions is very different to the steady-state relationships normally assumed.

...read moreread less

Proceedings Article•DOI•

Simulating message-driven programs

[...]

Attila Gursoy¹, Laxmikant V. Kale¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

12 Aug 1996

TL;DR: This work presents a methodology for simulating message-driven programs, and the information that is necessary to carry out such simulations is identified and a method for extracting such information from program executions is described.

...read moreread less

Abstract: Simulation studies are quite useful for performance prediction on new architectures and for systematic analysis of performance perturbations caused by variations in the machine parameters, such as communication latencies. Trace-driven simulation is necessary to avoid large computational costs over multiple simulation runs. However, trace-driven simulation of nondeterministic programs has turned out to be almost impossible. Simulation of message-driven programs is particularly challenging in this context because they are inherently nondeterministic. Yet message-driven execution is a very effective technique for enhancing performance, particularly in the presence of large or unpredictable communication latencies. We present a methodology for simulating message-driven programs. The information that is necessary to carry out such simulations is identified, and a method for extracting such information from program executions is described.

...read moreread less

Book Chapter•DOI•

Parallel Program Visualization with MUCH

[...]

Dieter Kranzlmüller¹, Rainer Koppler¹, Siegfried Grabner¹, Ch. Holzner¹, Jens Volkert¹ - Show less +1 more•Institutions (1)

Johannes Kepler University of Linz¹

23 Sep 1996

TL;DR: The use of visualization in parallel program development is manifold and is applied from data and control flow over debugging, performance analysis and performance prediction until data distribution on distributed memory architectures.

...read moreread less

Abstract: The use of visualization in parallel program development is manifold. It is applied from data and control flow over debugging, performance analysis and performance prediction until data distribution on distributed memory architectures. Most of these visualizations do not care about the physical topology of the underlying hardware although this can be of importance in the fields of performance analysis or error debugging.

...read moreread less

Journal Article•DOI•

A neural network for the prediction of performance parameters of transformer cores

[...]

C. Nussbaum, T. Booth, Albana Ilo, Helmut Pfützner

01 Jul 1996-Journal of Magnetism and Magnetic Materials

TL;DR: In this paper, Artificial Neural Networks (ANNs) were used for the prediction of transformer core performance parameters, such as no-load power losses and excitation in transformer cores.

...read moreread less

Journal Article•DOI•

On estimating the useful work distribution of parallel programs under P3T: a static performance estimator

[...]

Thomas Fahringer¹•Institutions (1)

University of Vienna¹

01 May 1996-Concurrency and Computation: Practice and Experience

TL;DR: A novel method of statically estimating the useful work distribution of distributed-memory parallel programs at the program level, which carefully distinguishes between useful and redundant work is described.

...read moreread less

Abstract: In order to improve a parallel program's performance it is critical to evaluate how even the work contained in a program is distributed over all processors dedicated to the computation. Traditional work distribution analysis is commonly performed at the machine level. The disadvantage of this method is that it cannot identify whether the processors are performing useful or redundant (replicated) work. The paper describes a novel method of statically estimating the useful work distribution of distributed-memory parallel programs at the program level, which carefully distinguishes between useful and redundant work. The amount of work contained in a parallel program, which correlates with the number of loop iterations to be executed by each processor, is estimated by accurately modeling loop iteration spaces, array access patterns and data distributions. A cost function defines the useful work distribution of loops, procedures and the entire program. Lower and upper bounds of the described parameter are presented. The computational complexity of the cost function is independent of the program's problem size, statement execution and loop iteration counts. As a consequence, estimating the work distribution based on the described method is considerably faster than simulating or actually compiling and executing the program. Automatically estimating the useful work distribution is fully implemented as part of P3T, which is a static parameter based performance prediction tool under the Vienna Fortran Compilation System (VFCS). The Lawrence Livermore Loops are used as a test case to verify the approach.

...read moreread less

Journal Article•DOI•

Non-Simulation Performance Prediction Methods for Different Implementations of a Multisensor Fusion Algorithm

[...]

Christian W. Frei¹, Lucy Y. Pao²•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, University of Colorado Boulder²

01 Jun 1996-IFAC Proceedings Volumes

TL;DR: The non-simulation techniques are shown to accurately predict the average superior performance of the sequential implementation in terms of RMS position error and track lifetime which has been observed in simulations.

...read moreread less

Proceedings Article•DOI•

Pre-layout performance prediction for automatic macro-cell synthesis

[...]

Fernando Moraes, Ricardo Reis, Lionel Torres, Michel Robert, Daniel Auvergne - Show less +1 more

12 May 1996

TL;DR: This paper proposes here a calibration method based on specific benchmarks to determine post-layout parasitic contribution at pre-layout level and compares predicted and simulated post- layout performances of various circuits for different sizing alternatives.

...read moreread less

Abstract: In this paper we present an approach allowing the area and delay prediction of macro-cells to be used in automatic layout synthesis tools. These parameters are of great importance in IC design. They allow us to guide the floor planning phase of an IC by specifying the number of rows and the aspect ratio of the macro-cell or to explore the power delay trade-off by selecting the size of the transistors (before layout generation) to satisfy the user constraints. We propose here a calibration method based on specific benchmarks to determine post-layout parasitic contribution at pre-layout level. Experimental results are given when we compare predicted and simulated post-layout performances of various circuits for different sizing alternatives.

...read moreread less

Proceedings Article•DOI•

Automatic performance prediction to support cross development of parallel programs

[...]

Matthias Schumann¹•Institutions (1)

Technische Universität München¹

01 Jan 1996

TL;DR: This paper presents a performance prediction methodology that is able to efficiently support cross development of deterministic real-life message-passing programs for recent parallel multi computer systems.

...read moreread less

Abstract: Cross development techniques are very attractive to be applied in high performance scientific computing because the parallel systems are expensive and should rather be utilized to perform production runs than to debug parallel programs. However, if development and execution platforms differ, techniques are required to efficiently predict the performance that will be actually gained on the target system. In this paper, we present a performance prediction methodology that is able to efficiently support cross development of deterministic real-life message-passing programs for recent parallel multi computer systems. The whole prediction process is supported by an environment of automatic tools. We demonstrate the feasibility of our approach by considering four programs from the NAS parallel benchmark suite and a multi-point boundary-value problem solver developed at TUM. The programs are implemented under the NX/NXLib and PVM message-passing environments. Our experimental environment comprises Paragon, iPSC/860, and G C/PowerPlus multi computer systems, and a small cluster of workstations that serves as development platform.

...read moreread less

Gathering and using time measurements in distributed systems

[...]

Margaret Aline Dietz, Carla Schlatter Ellis

01 Jan 1996

TL;DR: benchmark blending weaves time measurements with modeling techniques to create analytic models that quickly and accurately predict execution time, implicitly incorporate much of the underlying system complexity, and provide programmers with the needed insight to build efficient applications.

...read moreread less

Abstract: It is often necessary to measure time in distributed computer systems For example, timestamps are used for measuring the intervals between events (like banking transactions) They are also used for ordering events, as in the generation of transaction logs and debug traces Perhaps their most common use is in performance evaluation, capacity planning, and the detection of unpredicted delays or bottlenecks Measuring and using time in a distributed environment presents some unique challenges This thesis illuminates these problems and presents techniques for the proper measurement and use of time The first challenge is the clocks themselves Chapter 3 describes the tradeoff between keeping clocks synchronized and keeping them accurate The inaccuracies of networked workstation clocks are examined, and two statistical techniques are described for collecting time measurements in spite of these insufficiencies The second challenge is synchronization There is often a need to order events occurring on several processors, and to measure the time intervals between those events Chapter 4 is a workload characterization of message-passing in distributed applications In the course of this work, a method for generating global event traces was developed to get a system-wide view of message traffic The message traffic was found to have characteristics similar to those observed in network packet traffic The third challenge is performance prediction Programmers need tools to help them understand the interactions between components of a system, and to project the execution time of prospective designs and configurations Also, operating systems need fast tools to aid scheduling decisions and adaptation of software We present a technique called benchmark blending that combines the speed of analytic models with the flexibility of benchmarks Benchmark blending weaves time measurements with modeling techniques to create analytic models that: (1) quickly and accurately predict execution time, (2) implicitly incorporate much of the underlying system complexity, and (3) provide programmers with the needed insight to build efficient applications We conclude that time measurements are a necessary part of building and studying distributed systems Their use presents some challenges This thesis presents solutions for three of those challenges: clock instability, global event descriptions, and performance prediction

...read moreread less

Performance modeling and analysis of multithreaded architectures

[...]

Guang R. Gao, Shashank Nemawarkar

01 Jan 1996

TL;DR: The analysis of the EARTH system shows that under a multithreaded program workload, subsystem interactions at processing nodes are the bottlenecks, and the effectiveness of multithreading to tolerate communication latencies is shown.

...read moreread less

Abstract: Multithreaded architectures use the parallelism in programs to tolerate long latencies for communications and synchronizations. On encountering a long latency memory access, the processor in a multithreaded system rapidly switches context to another computation thread, thereby improving the performance. Unlike traditional single threaded execution and multitasking in operating systems, multithreading allows accesses from one or more threads of a user program at a processor to contend for system resources simultaneously. Hence, a performance analysis of multithreading should account for the effect of multiple concurrent accesses on throughput of subsystems. Modeling a real multithreaded system, like McGill's EARTH system, poses several problems. First, in realistic subsystem interactions, more than one subsystem may serve the same access simultaneously, so contentions are difficult to predict. Second, the thread characteristics like the number of remote accesses can differ with processing nodes. Thus, an accurate computation of delays at subsystems is essential. We propose analytical performance models, develop solution techniques, validate model predictions, and analyze the performance of multithreaded architectures. Our analytical models, based on closed queueing networks, account for the feedback effect of the load at subsystems on the processor performance. We demonstrate the robustness of these closed queueing network models over open system models for the performance prediction. With the feedback effect and the iterative nature of our solution technique, we predict the performance of complex subsystem interactions in the EARTH system under a multithreaded workload. Measurements from actual program executions are within 5% to 20% of model predictions. The model inputs are the architectural parameters and program workload characteristics. Model predictions include the processor utilization, message rate to the network, and latency for remote accesses. Given a program workload, we show the effectiveness of multithreading to tolerate communication latencies. We show the significance of the network capacity to tune program workload characteristics to achieve high performance. Our analysis of the EARTH system shows that under a multithreaded program workload, subsystem interactions at processing nodes are the bottlenecks. Reducing access times for subsystems in an EARTH node leads to a performance improvement especially at fine thread granularities. Multithreading provides more robust performance to the changes in data distributions than a single threaded execution. Our results demonstrate the tradeoffs of realistic costs of multithreading on the performance of fine-grain parallel program workload. Overall, our analytical models are useful to system architects and compiler writers to provide insight to the performance related optimizations.

...read moreread less

Proceedings Article•DOI•

Modelling frequency dependency of induction machine equivalent circuit parameters

[...]

S.J. Loddick

01 Jan 1996

TL;DR: The current aim is the development of an improved time domain model that includes frequency dependency of the key parameters, being developed in the C language in a format suitable for use with MATLAB and SIMULINK by compilation to a MEX-file.

...read moreread less

Abstract: This paper briefly discusses the variation with frequency of induction machine equivalent circuit parameters. It is important to include these variations in machine models used with variable speed drive models for performance prediction. Of particular concern is the accurate calculation of induction machine electrical harmonics, and the frequency and amplitude of the airgap torque pulsations they produce. Initial attempts to improve the calculation of electrical current harmonics using existing time domain simulators were unsuccessful. A specifically developed frequency domain program was used successfully, although the frequency domain approach does have limitations. The current aim is the development of an improved time domain model that includes frequency dependency of the key parameters. The model is being developed in the C language in a format suitable for use with MATLAB and SIMULINK by compilation to a MEX-file. This approach allows maximum flexibility in the development of the motor model, combined with the functionality of a commercial simulation package.

...read moreread less

Journal Article•DOI•

Performance Prediction of Parallel Self Consistent Field Computation

[...]

J. Papay, T. J. Atherton, M. J. Zemerly, G. R. Nudd

01 Feb 1996-Parallel Algorithms and Applications

TL;DR: A new model of computation that employs the tools of molecular biology whose in vitro implementation is far more error-resistant than extant proposals is introduced and a number of linear-time algorithms within the model are described, particularly for NP-complete problems.

...read moreread less

Abstract: This paper presents a methodology for performance prediction of parallel algorithms and illustrates its use on a large scale computational chemistry application. The performance prediction uses a component time characterisation technique which splits up the sequential code into computational components and measures the time for each of them. The parallel algorithm is built from these components by adding communication routines. A "Processor Activity Graph" (PAG) providing a graphical representation of the parallel algorithm runtime behaviour is used for predicting the execution time. For a case study a Self Consistent Field (SCF) computation has been selected which forms the basis of many computational chemistry packages [4,5]. The performance model of SCF computation has been built and the predictions have been compared with the results of measurements. The measurements have been provided on a mesh connected distributed memory parallel computer (128 T800 Parsytec SuperCluster). The prediction error is less than 10 percent. Performance optimisation of the application has been achieved by reducing the communication overhead and changing the data representation. This paper introduces a new model of computation that employs the tools of molecular biology whose in vitro implementation is far more error-resistant than extant proposals. We describe an abstraction of the model which lends itself to natural algorithmic description, particularly for problems in the complexity class NP. In addition we describe a number of linear-time algorithms within our model, particularly for NP-complete problems. We describe an in vitro realisation of the model and conclude with a discussion of future work.

...read moreread less

BiRASP - The Bistatic Range-Dependent Active System Performance Prediction Model.

[...]

David M. Fromm, John P. Crockett, L. B. Palmer

30 Sep 1996

TL;DR: The theoretical foundations of the BiRASP model and the corresponding numerical implementation of this theory are presented and a detailed description of the model software, instructions for execution, and sample results are provided.

...read moreread less

Abstract: : In 1992, the Naval Research Laboratory (NRL) published a report describing the Range-dependent Active System Performance (RASP) prediction model; a sequence of computer programs using multipath propagation and scattering processes to predict the long-range, low-frequency boundary reverberation, and target returns that would be received in real ocean environments. That computer model has been extended to admit arbitrary source/receiver configurations within a three-dimensional, range-dependent environment. The enhancements include: (1) range and azimuthal dependence in all environmental parameters, (2) volume scattering and bistatic scattering strength functions, (3) realistic source and receiver characteristics (e.g., three-dimensional beam patterns for linear arrays and array tilt), and (4) calculation of target returns with time/angle spreading. In addition to addressing the prediction and evaluation of active sonar systems in real ocean environments, the set of programs that comprise the Bistatic Range-dependent Active System Performance (BiRASP) prediction model also produce the detailed, high-resolution results required for basic theoretic and experimental acoustic research. This report presents the theoretical foundations of the BiRASP model and the corresponding numerical implementation of this theory. Further, a detailed description of the model software, instructions for execution, and sample results are provided.

...read moreread less

Dissertation•

Approaches to parallel performance prediction

[...]

Fred Howell

01 Jan 1996

TL;DR: This thesis explores an alternative by providing data sheets describing the performance of parallel building blocks, and then seeing how they may be used in practice, based on a graphing and equation plotting tool.

...read moreread less

Abstract: Designing parallel programs is both interesting and difficult. The reason for using a parallel machine is to obtain better performance, but the programmer will have little idea of the performance of a program at design time, and will only find out by actually running it. Design decisions have to be be made by guesswork alone. This thesis explores an alternative by providing data sheets describing the performance of parallel building blocks, and then seeing how they may be used in practice. The simplest way of using the data sheets is based on a graphing and equation plotting tool. More detailed design information is available from a "reverse" profiling technique which adapts standard profiling to generate predictions rather than measurements. The ultimate method for prediction is based on discrete event simulation, which allows modelling of all programs but is the most complex to use. The methods are compared, and their suitability for different design problems is discussed.

...read moreread less