Showing papers in &quot;Scientific Programming in 1999&quot;

VFCc The Vienna Fortran Compiler

TL;DR: This work collected week-long, 1 Hz resolution traces of the Digital Unix 5 second exponential load average to find that relatively simple linear models are sufficient for short-range host load prediction.

...read moreread less

Abstract: Understanding how host load changes over time is instrumental in predicting the execution time of tasks or jobs, such as in dynamic load balancing and distributed soft real-time systems. To improve this understanding, we collected week-long, 1 Hz resolution traces of the Digital Unix 5 second exponential load average on over 35 different machines including production and research cluster machines, compute servers, and desktop workstations. Separate sets of traces were collected at two different times of the year. The traces capture all of the dynamic load information available to user-level programs on these machines. We present a detailed statistical analysis of these traces here, including summary statistics, distributions, and time series analysis results. Two significant new results are that load is self-similar and that it displays epochal behavior. All of the traces exhibit a high degree of self-similarity with Hurst parameters ranging from 0.73 to 0.99, strongly biased toward the top of that range. The traces also display epochal behavior in that the local frequency content of the load signal remains quite stable for long periods of time (150-450 s mean) and changes abruptly at epoch boundaries. Despite these complex behaviors, we have found that relatively simple linear models are sufficient for short-range host load prediction.

...read moreread less

162 citations

Journal Article•DOI•

[...]

Siegfried Benkner

New insights from 40Ar/39Ar laserprobe dating of white mica fabrics from the Pelion Massif, Pelagonian Zone, Internal Hellenides, Greece: Implications for the timing of metamorphic episodes and tectonic events in the Aegean region.

TL;DR: The Vienna Fortran Compiler (VFC) is introduced, a new source-to-source parallelization system for HPF+, an optimized version of HPF, which addresses the requirements of irregular applications.

...read moreread less

Abstract: High Performance Fortran (HPF) offers an attractive high-level language interface for programming scalable parallel architectures providing the user with directives for the specification of data distribution and delegating to the compiler the task of generating an explicitly parallel program. Available HPF compilers can handle regular codes quite efficiently, but dramatic performance losses may be encountered for applications which are based on highly irregular, dynamically changing data structures and access patterns. In this paper we introduce the Vienna Fortran Compiler (VFC), a new source-to-source parallelization system for HPF+, an optimized version of HPF, which addresses the requirements of irregular applications. In addition to extended data distribution and work distribution mechanisms, HPF+ provides the user with language features for specifying certain information that decisively influence a program’s performance. This comprises data locality assertions, non-local access specifications and the possibility of reusing runtime-generated communication schedules of irregular loops. Performance measurements of kernels from advanced applications demonstrate that with a high-level data parallel language such as HPF+ a performance close to hand-written message-passing programs can be achieved even for highly irregular codes.

...read moreread less

70 citations

Journal Article•

[...]

A.W.L. Lips, Jan R. Wijbrans, S.H. White

JLAPACK - compiling LAPACK Fortran to Java

TL;DR: In this article, 40Ar/39Ar laser-probe dating of mylonites fabrics from the Pelion Massif in the Pelagonian Zone of mainland Greece has characterized its mid-late Alpine deformation history.

...read moreread less

Abstract: Abstract 40Ar/39Ar laserprobe dating of mylonites fabrics from the Pelion Massif in the Pelagonian Zone of mainland Greece has characterized its Mid-Late Alpine deformation history. Following high pressure (HP) metamorphism, ductile deformation occurred under greenschist-facies conditions from c. 54 Ma, and continued to affect the Pelion Massif until c. 15 Ma. The prolonged episode of ductile deformation in the Pelion Massif has resulted in the formation of an Oligocene-Early Miocene ductile domal structure. The new geochronological data obtained for the Pelion contribute to a detailed record of the Alpine kinematic history in the Pelagonian Zone and allow a discussion of P-T-t data from Aegean HP rocks to characterize the regional thermotectonic history. Comparison with the P-T-t data from the Cycladic region reinforces the point that Mid-Eocene phengite ages, commonly taken as the age of peak HP metamorphism in the Cyclades, do not always reflect the metamorphic culmination, but rather the retrograde paths of the HP rocks. It is shown that, on a regional scale, termination of HP metamorphism is a diachronous process in the Aegean region, being c. 54 Ma in the north (Pelagonian Zone) and shifting to younger ages, chiefly c. 40 Ma in the Cyclades, and c. 20 Ma on Crete, as the present-day subduction zone is approached. In contrast to the diachronous exhumation of Aegean HP assemblages, the well documented Miocene phase of ductile regional extension appears to be synchronous across the whole Aegean region and affected basement rocks until c. 15 Ma.

...read moreread less

35 citations

Journal Article•DOI•

[...]

David M. Doolin¹, Jack Dongarra², Keith Seymour²•Institutions (2)

University of California, Berkeley¹, University of Tennessee²

An evaluation of Java for numerical computing

TL;DR: The research issues involved in the JLAPACK project are described, and the LAPACK API will be considerably simplified to take advantage of Java’s object-oriented design.

...read moreread less

Abstract: The JLAPACK project provides the LAPACK numerical subroutines translated from their subset Fortran 77 source into class files, executable by the Java Virtual Machine (JVM) and suitable for use by Java programmers. This makes it possible for Java applications or applets, distributed on the World Wide Web (WWW) to use established legacy numerical code that was originally written in Fortran. The translation is accomplished using a special purpose Fortran-to-Java (source-to-source) compiler. The LAPACK API will be considerably simplified to take advantage of Java’s object-oriented design. This report describes the research issues involved in the JLAPACK project, and its current implementation and status.

...read moreread less

30 citations

Journal Article•DOI•

[...]

Brian Blount¹, Siddhartha Chatterjee¹•Institutions (1)

University of North Carolina at Chapel Hill¹

The cost of being object-orientedc A preliminary study

TL;DR: JLAPACK, a subset of the LAPACK library in Java, is implemented, a high-performance Fortran 77 library used to solve common linear algebra problems, and performs comparably with the Fortran version using the native BLAS library.

...read moreread less

Abstract: This paper describes the design and implementation of high performance numerical software in Java. Our primary goals are to characterize the performance of object-oriented numerical software written in Java and to investigate whether Java is a suitable language for such endeavors. We have implemented JLAPACK, a subset of the LAPACK library in Java. LAPACK is a high-performance Fortran 77 library used to solve common linear algebra problems. JLAPACK is an object-oriented library, using encapsulation, inheritance, and exception handling. It performs within a factor of four of the optimized Fortran version for certain platforms and test cases. When used with the native BLAS library, JLAPACK performs comparably with the Fortran version using the native BLAS library. We conclude that high-performance numerical software could be written in Java if a handful of concerns about language features and compilation strategies are adequately addressed.

...read moreread less

25 citations

Journal Article•DOI•

[...]

Zoran Budimlić¹, Ken Kennedy¹, Jeff Piper¹•Institutions (1)

Rice University¹

Multi-language programming environments for high performance Java computing

TL;DR: OwlPack develops two object-oriented versions of LINPACK in Java, a true polymorphic version and a “Lite” version designed for higher performance, to drive research on compiler technology that will reward, rather than penalize good object- oriented programming practice.

...read moreread less

Abstract: Since the introduction of the Java programming language, there has been widespread interest in the use Java for the high performance scientific computing. One major impediment to such use is the performance penalty paid relative to Fortran. To support our research on overcoming this penalty through compiler technology, we have developed a benchmark suite, called OwlPack, which is based on the popular LINPACK library. Although there are existing implementations of LINPACK in Java, most of these are produced by direct translation from Fortran. As such they do not reflect the style of programming that a good object-oriented programmer would use in Java. Our goal is to investigate how to make object-oriented scientific programming practical. Therefore we developed two object-oriented versions of LINPACK in Java, a true polymorphic version and a “Lite” version designed for higher performance. We used these libraries to perform a detailed performance analysis using several leading Java compilers and virtual machines, comparing the performance of the object-oriented versions of the benchmark with a version produced by direct translation from Fortran. Although Java implementations have been made great strides, they still fall short on programs that use the full power of Java’s object-oriented features. Our ultimate goal is to drive research on compiler technology that will reward, rather than penalize good object-oriented programming practice.

...read moreread less

17 citations

Journal Article•DOI•

[...]

Vladimir Getov¹, Paul A. Gray², Sava Mintchev¹, Vaidy S. Sunderam²•Institutions (2)

University of Westminster¹, Emory University²

Impulsec Memory system support for scientific applications

TL;DR: The Java-to-C Interface (JCI) tool is described, which provides application programmers wishing to use Java with immediate accessibility to existing scientific packages and facilitates rapid development and reuse of existing code.

...read moreread less

Abstract: Recent developments in processor capabilities, software tools, programming languages and programming paradigms have brought about new approaches to high performance computing. A steadfast component of this dynamic evolution has been the scientific community’s reliance on established scientific packages. As a consequence, programmers of high-performance applications are reluctant to embrace evolving languages such as Java. This paper describes the Java-to-C Interface (JCI) tool which provides application programmers wishing to use Java with immediate accessibility to existing scientific packages. The JCI tool also facilitates rapid development and reuse of existing code. These benefits are provided at minimal cost to the programmer. While beneficial to the programmer, the additional advantages of mixed-language programming in terms of application performance and portability are addressed in detail within the context of this paper. In addition, we discuss how the JCI tool is complementing other ongoing projects such as IBM’s High-Performance Compiler for Java (HPCJ) and IceT’s metacomputing environment.

...read moreread less

15 citations

Journal Article•DOI•

[...]

John B. Carter¹, Wilson C. Hsieh¹, Leigh Stoller¹, Mark Swanson², Lixin Zhang¹, Sally A. McKee¹ - Show less +2 more•Institutions (2)

University of Utah¹, Intel²

Tower Models for Fast-Front Lightning Currents

TL;DR: The design of the Impulse architecture is described, and it is shown how an Impulse memory system can improve the performance of memory-bound scientific applications, and increase the running time of the NAS conjugate gradient benchmark by 67%.

...read moreread less

Abstract: Impulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application-specific optimizations through configurable physical address remapping. By remapping physical addresses, applications control how their data is accessed and cached, improving their cache and bus utilization. Second, Impulse supports prefetching at the memory controller, which can hide much of the latency of DRAM accesses. Because it requires no modification to processor, cache, or bus designs, Impulse can be adopted in conventional systems. In this paper we describe the design of the Impulse architecture, and show how an Impulse memory system can improve the performance of memory-bound scientific applications. For instance, Impulse decreases the running time of the NAS conjugate gradient benchmark by 67%. We expect that Impulse will also benefit regularly strided, memory-bound applications of commercial importance, such as database and multimedia programs.

...read moreread less

15 citations

Journal Article•

[...]

Yoshihiro Baba¹, Masaru Ishii²•Institutions (2)

Doshisha University¹, University of Tokyo²

23 Jan 1999-Scientific Programming

13 citations

Journal Article•

Diurnal Variation of Lightning Characteristics Around Java Island

[...]

Syarif Hidayat, Masaru Ishii, Hojo Jun-ichi, M Pakpahan Parouli

CRAULc Compiler and run-time integration for adaptation under load[1]This work was supported in part by NSF grants CDA-9401142, CCR-9702466, and CCR-9705594s and an external research grant from Compaq.

12 citations

Journal Article•DOI•

[...]

Sotiris Ioannidis¹, Umit Rencuzogullari¹, Robert J. Stets¹, Sandhya Dwarkadas¹•Institutions (1)

University of Rochester¹

U-Net/SLEc A Java-based user-customizable virtual network interface

TL;DR: CRAUL (Compiler and Run-Time Integration for Adaptation Under Load), a system that dynamically balances computational load in a parallel application that combines compile-time support to identify data access patterns with a run-time system that uses the access information to intelligently distribute the parallel workload in loop-based programs.

...read moreread less

Abstract: Clusters of workstations provide a cost-effective, high performance parallel computing environment. These environments, however, are often shared by multiple users, or may consist of heterogeneous machines. As a result, parallel applications executing in these environments must operate despite unequal computational resources. For maximum performance, applications should automatically adapt execution to maximize use of the available resources. Ideally, this adaptation should be transparent to the application programmer. In this paper, we present CRAUL (Compiler and Run-Time Integration for Adaptation Under Load), a system that dynamically balances computational load in a parallel application. Our target run-time is software-based distributed shared memory (SDSM). SDSM is a good target for parallelizing compilers since it reduces compile-time complexity by providing data caching and other support for dynamic load balancing. CRAUL combines compile-time support to identify data access patterns with a run-time system that uses the access information to intelligently distribute the parallel workload in loop-based programs. The distribution is chosen according to the relative power of the processors and so as to minimize SDSM overhead and maximize locality. We have evaluated the resulting load distribution in the presence of different types of load - computational, computational and memory intensive, and network load. CRAUL performs within 5-23% of ideal in the presence of load, and is able to improve on naive compiler-based work distribution that does not take locality into account even in the absence of load.

...read moreread less

Journal Article•DOI•

[...]

Matt Welsh¹, David B. Oppenheimer¹, David E. Culler¹•Institutions (1)

University of California, Berkeley¹

Menhirc An environment for high performance Matlab

TL;DR: U-Net/SLE (Safe Language Extensions), a user-level network interface architecture which enables per-application customization of communication semantics through downloading of user extension applets, implemented as Java classfiles, to the network interface, is described.

...read moreread less

Abstract: We describe U-Net/SLE (Safe Language Extensions), a user-level network interface architecture which enables per-application customization of communication semantics through downloading of user extension applets, implemented as Java classfiles, to the network interface. This architecture permits applications to safely specify code to be executed within the NI on message transmission and reception. By leveraging the existing U-Net model, applications may implement protocol code at the user level, within the NI, or using some combination of the two. Our current implementation, using the Myricom Myrinet interface and a small Java Virtual Machine subset, allows host communication overhead to be reduced and improves the overlap of communication and computation during protocol processing.

...read moreread less

Journal Article•DOI•

[...]

Stéphane Chauveau¹, François Bodin¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

Irregular computations in Fortran - expression and implementation strategies

TL;DR: The compilation process and the target system description for Menhir, the Matlab language compiler, are presented and preliminary performances are given and compared with MCC, the MathWorks Matlab compiler.

...read moreread less

Abstract: In this paper we present Menhir a compiler for generating sequential or parallel code from the Matlab language The compiler has been designed in the context of using Matlab as a specification language One of the major features of Menhir is its retargetability to generate parallel and sequential C or Fortran code We present the compilation process and the target system description for Menhir Preliminary performances are given and compared with MCC, the MathWorks Matlab compiler

...read moreread less

Journal Article•DOI•

[...]

Jan F. Prins¹, Siddhartha Chatterjee¹, Martin Simons¹•Institutions (1)

University of North Carolina at Chapel Hill¹

Integrated task and data parallel support for dynamic applications

TL;DR: This work explores nested data-parallel implementations of the sparse matrix-vector product and the Barnes-Hut $n$-body algorithm by hand-coding thread-based and flattening-based versions of these algorithms and evaluating their performance on an SGI Origin 2000 and an NEC SX-4, two shared-memory machines.

...read moreread less

Abstract: Modern dialects of Fortran enjoy wide use and good support on high-performance computers as performance-oriented programming languages. By providing the ability to express nested data parallelism, modern Fortran dialects enable irregular computations to be incorporated into existing applications with minimal rewriting and without sacrificing performance within the regular portions of the application. Since performance of nested data-parallel computation is unpredictable and often poor using current compilers, we investigate threading and flattening, two source-to-source transformation techniques that can improve performance and performance stability. For experimental validation of these techniques, we explore nested data-parallel implementations of the sparse matrix-vector product and the Barnes-Hut $n$-body algorithm by hand-coding thread-based (using OpenMP directives) and flattening-based versions of these algorithms and evaluating their performance on an SGI Origin 2000 and an NEC SX-4, two shared-memory machines.

...read moreread less

Journal Article•DOI•

[...]

James M. Rehg, Kathleen Knobe, Umakishore Ramachandran¹, Rishiyur S. Nikhil, Arun Chauhan² - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, Rice University²

Lightning Protection Method of Power Distribution Lines Located at Mountainous Areas Facing the Sea of Japan

TL;DR: This work presents a novel framework for integrating task and data parallelism for applications that exhibit constrained dynamism, and has been implemented using Stampede, a cluster programming system developed at the Cambridge Research Laboratory.

...read moreread less

Abstract: There is an emerging class of real-time interactive applications that require the dynamic integration of task and data parallelism. An example is the Smart Kiosk, a free-standing computer device that provides information and entertainment to people in public spaces. The kiosk interface is computationally demandingc It employs vision and speech sensing and an animated graphical talking face for output. The computational demands of an interactive kiosk can vary widely with the number of customers and the state of the interaction. Unfortunately this makes it difficult to apply current techniques for integrated task and data parallel computing, which can produce optimal decompositions for static problems. Using experimental results from a color-based people tracking module, we demonstrate the existence of a small number of distinct operating regimes in the kiosk application. We refer to this type of program behavior as constrained dynamism. An application exhibiting constrained dynamism can execute efficiently by dynamically switching among a small number of statically determined fixed data parallel strategies. We present a novel framework for integrating task and data parallelism for applications that exhibit constrained dynamism. Our solution has been implemented using Stampede, a cluster programming system developed at the Cambridge Research Laboratory.

...read moreread less

Journal Article•

[...]

Hitoshi Sugimoto, Akira Asakawa, Shigeru Yokoyama, T Koide, K Nakada - Show less +1 more

23 Jan 1999-Scientific Programming

Journal Article•

Characteristics of Electromagnetic Fields due to Lightning Stroke Current to a High Stack in Winter Lightning

[...]

Hisashi Goshima, Hideki Motoyama, Akira Asakawa, A. Wada, T Shindo, Shigeru Yokoyama - Show less +2 more

New derivation method of the surge impedance on the tower model of a vertical conductor by the electromagnetic field theory (part 10 : Advanced theory)

Journal Article•

[...]

Hideomi Takahashi, Yutaka Yamashiro

23 Jan 1999-Scientific Programming

TL;DR: A new theoretical equation of the surge impedance is derived; Z=60・{log(h/r0)-1} +ZeP(h,r0,β), and found the theoretical values comparatively well coincide with the measured ones.

...read moreread less

Abstract: The tower surge impedance derived from the electromagnetic field theory doesn't always coincide with the measured values satisfactorily. The theory derived by Lundholm is the most famous one, and believed to have been established, but it doesn't coincide with the measured values. We investigated his theory precisely, and found his theory was incorrect. He derived the loop voltage method and skillfully used vector potential, electric and magnetic field. Especially he combined the vector potential with the electric field, however we clarified that, in this point the errors came in. The vector potential is the quantity from which the magnetic field is derived, therefore the electric field must be derived from the magnetic field coiled around. In most cases, undoubtedly the electric field can be calculated from the vector potential. In this case, however, the magnetic field is propagating, therefore the vector potential is also propagating, so that the electric field derived from the vector potential is the circulating local field. The electric field, therefore, must be calculated, considering the propagation phenomena and the simultaneity. We derived a new theoretical equation of the surge impedance ; Z=60・{log(h/r0)-1} +ZeP(h,r0,β), and found the theoretical values comparatively well coincide with the measured ones.

...read moreread less

Journal Article•DOI•

Automatic choice of scheduling heuristics for parallel/distributed computing

[...]

Clayton Ferner¹, Robert G. Babb²•Institutions (2)

Alcatel-Lucent¹, University of Denver²

Incorporating Intel MMX technology into a Java JIT compiler.

TL;DR: An algorithm is created, called a metaheuristic, which automatically chooses a scheduling heuristic for each input program and produces better schedules in general than the heuristics upon which it is based.

...read moreread less

Abstract: Task mapping and scheduling are two very difficult problems that must be addressed when a sequential program is transformed into a parallel program. Since these problems are NP-hard, compiler writers have opted to concentrate their efforts on optimizations that produce immediate gains in performance. As a result, current parallelizing compilers either use very simple methods to deal with task scheduling or they simply ignore it altogether. Unfortunately, the programmer does not have this luxury. The burden of repartitioning or rescheduling, should the compiler produce inefficient parallel code, lies entirely with the programmer. We were able to create an algorithm (called a metaheuristic), which automatically chooses a scheduling heuristic for each input program. The metaheuristic produces better schedules in general than the heuristics upon which it is based. This technique was tested on a suite of real scientific programs written in SISAL and simulated on four different network configurations. Averaged over all of the test cases, the metaheuristic out-performed all eight underlying scheduling algorithmss beating the best one by 2%, 12%, 13%, and 3% on the four separate network configurations. It is able to do this, not always by picking the best heuristic, but rather by avoiding the heuristics when they would produce very poor schedules. For example, while the metaheuristic only picked the best algorithm about 50% of the time for the 100 Gbps Ethernet, its worst decision was only 49% away from optimal. In contrast, the best of the eight scheduling algorithms was optimal 30% of the time, but its worst decision was 844% away from optimal.

...read moreread less

Journal Article•

[...]

Aart J. C. Bik, Milind Girkar, Mohammad R. Haghighat

A Basic Study on the Frequency Characteristics of Leakage Current Waveforms of Insulators : Fundamental Characteristics of an Artificially Polluted 180mm Suspension Insulator

Journal Article•

[...]

Tomotaka Suda

Thermal and Radiant Properties of Arc Plasma as Functions of Some Parameters

Journal Article•

[...]

Toru Iwao, Shigeyuki Kusunoki, Tsuginori Inaba, Masao Endo

Flexible IDL compilation for complex communication patterns[1]

Journal Article•DOI•

[...]

Eric Eide¹, James L. Simister¹, Tim Stack¹, Jay Lepreau¹•Institutions (1)

University of Utah¹

Combining compile-time and run-time parallelization[1]

TL;DR: This paper has implemented Flick, a flexible and optimizing idl compiler, and is using it to produce specialized high-performance code for complex distributed applications, and believes that the special idl compilation techniques developed for Khazana will be useful in other applications with similar communication requirements.

...read moreread less

Abstract: Distributed applications are complex by nature, so it is essential that there be effective software development tools to aid in the construction of these programs. Commonplace “middleware” tools, however, often impose a tradeoff between programmer productivity and application performance. For instance, many corba idl compilers generate code that is too slow for high-performance systems. More importantly, these compilers provide inadequate support for sophisticated patterns of communication. We believe that these problems can be overcome, thus making idl compilers and similar middleware tools useful for a broader range of systems. To this end we have implemented Flick, a flexible and optimizing idl compiler, and are using it to produce specialized high-performance code for complex distributed applications. Flick can produce specially “decomposed” stubs that encapsulate different aspects of communication in separate functions, thus providing application programmers with fine-grain control over all messages. The design of our decomposed stubs was inspired by the requirements of a particular distributed application called Khazana, and in this paper we describe our experience to date in refitting Khazana with Flick-generated stubs. We believe that the special idl compilation techniques developed for Khazana will be useful in other applications with similar communication requirements. [1]This research was supported in part by the Defense Advanced Research Projects Agency, monitored by the Department of the Army under contract number DABT63-94-C-0058, and the Air Force Research Laboratory, Rome Research Site, USAF, under agreement number F30602-96-2-0269. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation hereon.

...read moreread less

Journal Article•DOI•

[...]

Sungdo Moon¹, Byoungro So¹, Mary Hall¹•Institutions (1)

University of Southern California¹

Comparative evaluation and case studies of shared-memory and data-parallel execution patterns[1]

TL;DR: A new compile-time analysis technique is presented that can be used to parallelize most of the loops left unparallelized by the Stanford SUIF compiler's automatic parallelization system, and is designed to produce low-cost, directed run-time tests that allow the system to defer binding of parallelization until run- time when safety cannot be proven statically.

...read moreread less

Abstract: This paper demonstrates that significant improvements to automatic parallelization technology require that existing systems be extended in two waysc (1) they must combine high-quality compile-time analysis with low-cost run-time testings and (2) they must take control flow into account during analysis. We support this claim with the results of an experiment that measures the safety of parallelization at run time for loops left unparallelized by the Stanford SUIF compiler’s automatic parallelization system. We present results of measurements on programs from two benchmark suites - \textsc{Specfp95} and \textsc{Nas} sample benchmarks - which identify inherently parallel loops in these programs that are missed by the compiler. We characterize remaining parallelization opportunities, and find that most of the loops require run-time testing, analysis of control flow, or some combination of the two. We present a new compile-time analysis technique that can be used to parallelize most of these remaining loops. This technique is designed to not only improve the results of compile-time parallelization, but also to produce low-cost, directed run-time tests that allow the system to defer binding of parallelization until run-time when safety cannot be proven statically. We call this approach predicated array data-flow analysis. We augment array data-flow analysis, which the compiler uses to identify independent and privatizable arrays, by associating predicates with array data-flow values. Predicated array data-flow analysis allows the compiler to derive “optimistic” data-flow values guarded by predicatess these predicates can be used to derive a run-time test guaranteeing the safety of parallelization. [1]This work has been supported by DARPA Contract DABT63-95-C-0118 and NSF Contract ACI-9721368.

...read moreread less

Journal Article•DOI•

[...]

Xiaodong Zhang¹, Lin Sun•Institutions (1)

College of William & Mary¹