scispace - formally typeset
Search or ask a question

Showing papers in "Scientific Programming in 2003"


Journal ArticleDOI
TL;DR: This algorithm solves the definition-use chaining problem by performing backward iterative data flow analysis to compute the set of upward exposed uses at each statement and can be used to identify parallelism in programs even with cyclic pointer-linked data structures.
Abstract: This paper presents a flow-sensitive algorithm to compute interprocedural definition-use chains of dynamic pointer-linked data structures. The goal is to relate the statements that construct links of dynamic pointer-linked data structures (i.e. definitions) to the statements that might traverse the structures through the links (i.e. uses). Specifically, for each statement S that defines links of pointer-linked data structures, the algorithm finds the set of statements that traverse the links which are defined by S. This algorithm solves the definition-use chaining problem by performing backward iterative data flow analysis to compute the set of upward exposed uses at each statement. The results of this algorithm can be used to identify parallelism in programs even with cyclic pointer-linked data structures.

31 citations


Journal ArticleDOI
TL;DR: This paper presents detailed measurements of the benchmarks of the OpenMP standard, and discusses the important loops in the SPEC OMPM2001 benchmarks and the reasons for less than ideal speedup on this platform.
Abstract: The state of modern computer systems has evolved to allow easy access to multiprocessor systems by supporting multiple processors on a single physical package. As the multiprocessor hardware evolves, new ways of programming it are also developed. Some inventions may merely be adopting and standardizing the older paradigms. One such evolving standard for programming shared-memory parallel computers is the OpenMP API. The Standard Performance Evaluation Corporation (SPEC) has created a suite of parallel programs called SPEC OMP to compare and evaluate modern shared-memory multiprocessor systems using the OpenMP standard. We have studied these benchmarks in detail to understand their performance on a modern architecture. In this paper, we present detailed measurements of the benchmarks. We organize, summarize, and display our measurements using a Quantitative Model. We present a detailed discussion and derivation of the model. Also, we discuss the important loops in the SPEC OMPM2001 benchmarks and the reasons for less than ideal speedup on our platform.

30 citations


Journal ArticleDOI
TL;DR: It is found that template methods mostly deliver on this promise of ease of use and efficiency, though requiring moderate compromises in usability and efficiency.
Abstract: Template methods have opened up a new way of building C++ libraries. These methods allow the libraries to combine the seemingly contradictory qualities of ease of use and uncompromising efficiency. However, libraries that use these methods are notoriously difficult to develop. This article examines the benefits reaped and the difficulties encountered in using these methods to create a friendly, high performance, tensor library. We find that template methods mostly deliver on this promise, though requiring moderate compromises in usability and efficiency.

29 citations


Journal ArticleDOI
Timothy G. Mattson1
TL;DR: How good or bad OpenMP is?
Abstract: The OpenMP standard defines an Application Programming Interface (API) for shared memory computers. Since its introduction in 1997, it has grown to become one of the most commonly used API's for parallel programming. But success in the market doesn't necessarily imply successful computer science. Is OpenMP a "good" programming environment? What does it even mean to call a programming environment good? And finally, once we understand how good or bad OpenMP is; what can we do to make it even better? In this paper, we will address these questions.

21 citations


Journal ArticleDOI
TL;DR: This work presents a method for deferring the compilation of these templates until an exact type is needed, which will produce the minimum amount of compiled code needed for a particular application, while maintaining the generality and performance that templates innately provide.
Abstract: Generic programming using the C++ template facility has been a successful method for creating high-performance, yet general algorithms for scientific computing and visualization. However, adding template code tends to require more template code in surrounding structures and algorithms to maintain generality. Compiling all possible expansions of these templates can lead to massive template bloat. Furthermore, compile-time binding of templates requires that all possible permutations be known at compile time, limiting the runtime extensibility of the generic code. We present a method for deferring the compilation of these templates until an exact type is needed. This dynamic compilation mechanism will produce the minimum amount of compiled code needed for a particular application, while maintaining the generality and performance that templates innately provide. Through a small amount of supporting code within each templated class, the proper templated code can be generated at runtime without modifying the compiler. We describe the implementation of this goal within the SCIRun dataflow system. SCIRun is freely available online for research purposes.

19 citations


Journal ArticleDOI
TL;DR: The extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives and some results for several benchmark codes and one full application that have been parallelized using the system are reported.
Abstract: In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.

17 citations


Journal ArticleDOI
TL;DR: The approach provides a formal way of combining recurrent themes in Grid applications, and provides a set of operators that may be used to manipulate the patterns, expressed in the Unified Modelling Language.
Abstract: A pattern based approach for developing applications in a Grid computing environment is presented, and is based on the ability to manage components and their interactions. The approach provides a formal way of combining recurrent themes in Grid applications, and provides a set of operators that may be used to manipulate the patterns. The operators may be applied to individual patterns or groups, and may be managed as an independent library. The patterns distinguish between service providers and users, and may be used to also analyse the properties of a collection of components, or to vary these properties subject to a set of predefined constraints. Patterns are expressed in the Unified Modelling Language (UML), and operators correspond to manipulation of components within each pattern.

16 citations


Journal ArticleDOI
Y. Ren1, M. van Waveren1
TL;DR: The focus on data rather than work distribution appears misplaced in an SMP context, and the inherent flexible nature of shared memory paradigms such as OpenMP poses other difficulties when it becomes necessary to optimise performance across successive parallel library calls.
Abstract: Dense linear algebra libraries need to cope efficiently with a range of input problem sizes and shapes. Inherently this means that parallel implementations have to exploit parallelism wherever it is present. While OpenMP allows relatively fine grain parallelism to be exploited in a shared memory environment it currently lacks features to make it easy to partition computation over multiple array indices or to overlap sequential and parallel computations. The inherent flexible nature of shared memory paradigms such as OpenMP poses other difficulties when it becomes necessary to optimise performance across successive parallel library calls. Notions borrowed from distributed memory paradigms, such as explicit data distributions help address some of these problems, but the focus on data rather than work distribution appears misplaced in an SMP context.

16 citations


Journal ArticleDOI
TL;DR: This note introduces a software environment called EFCOSS (Environment For Combining Optimization and Simulation Software) that facilitates and speeds up this task by doing much of the required work automatically and includes support for automatic differentiation providing the derivatives required by many optimization algorithms.
Abstract: Numerical simulation is a powerful tool in science and engineering, and it is also used for optimizing the design of products and experiments rather than only for reproducing the behavior of scientific and engineering systems. In order to reduce the number of simulation runs, the traditional "trial and error" approach for finding near-to-optimum design parameters is more and more replaced with efficient numerical optimization algorithms. Done by hand, the coupling of simulation and optimization software is tedious and error-prone. In this note we introduce a software environment called EFCOSS (Environment For Combining Optimization and Simulation Software) that facilitates and speeds up this task by doing much of the required work automatically. Our framework includes support for automatic differentiation providing the derivatives required by many optimization algorithms. We describe the process of integrating the widely used computational fluid dynamics package FLUENT and a MINPACK-1 least squares optimizer into EFCOSS and follow a sample session solving a data assimilation problem.

10 citations


Journal ArticleDOI
TL;DR: This work formalizes three design patterns that have abstracted from many existing libraries and discusses the role of these formalizations as a tool for guiding compiler optimizers.
Abstract: We apply the notion of design patterns to optimizations performed by designers of software libraries, focusing especially on object-oriented numerical libraries. We formalize three design patterns that we have abstracted from many existing libraries and discuss the role of these formalizations as a tool for guiding compiler optimizers. These optimizers operate at a very high level that would otherwise be left unoptimized by traditional optimizers. Finally, we discuss the implementation of a design pattern-based compiler optimizer for C++ abstract data types.

8 citations


Journal ArticleDOI
TL;DR: The binding schema markup language (BSML) as discussed by the authors is a markup language for describing data interchange between scientific codes. But it is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data.
Abstract: We describe a binding schema markup language (BSML) for describing data interchange between scientific codes. Such a facility is an important constituent of scientific problem solving environments (PSEs). BSML is designed to integrate with a PSE or application composition system that views model specification and execution as a problem of managing semistructured data. The data interchange problem is addressed by three techniques for processing semistructured data: validation, binding, and conversion. We present BSML and describe its application to a PSE for wireless communications system design.

Journal ArticleDOI
TL;DR: A symbolic solver generator to deal with a system of partial differential equations (PDEs) in functions of an arbitrary number of variables is presented; it can also handle arbitrary domains (geometries) of the independent variables.
Abstract: A symbolic solver generator to deal with a system of partial differential equations (PDEs) in functions of an arbitrary number of variables is presented; it can also handle arbitrary domains (geometries) of the independent variables. Given a system of PDEs, the solver generates a set of explicit finite-difference methods to any specified order, and a Fourier stability criterion for each method. For a method that is stable, an iteration function is generated symbolically using the PDE and its initial and boundary conditions. This iteration function is dynamically generated for every PDE problem, and its evaluation provides a solution to the PDE problem. A C++/Fortran 90 code for the iteration function is generated using the MathCode system, which results in a performance gain of the order of a thousand over Mathematica, the language that has been used to code the solver generator. Examples of stability criteria are presented that agree with known criteria; examples that demonstrate the generality of the solver and the speed enhancement of the generated C++ and Fortran 90 codes are also presented.

Journal ArticleDOI
TL;DR: An extension of the Mathematica- and MathCode-based symbolic-numeric framework for solving a variety of partial differential equation (PDE) problems to implicit schemes and the method of lines is presented.
Abstract: This paper presents an extension of our Mathematica- and MathCode-based symbolic-numeric framework for solving a variety of partial differential equation (PDE) problems. The main features of our earlier work, which implemented explicit finite-difference schemes, include the ability to handle (1) arbitrary number of dependent variables, (2) arbitrary dimensionality, and (3) arbitrary geometry, as well as (4) developing finite-difference schemes to any desired order of approximation. In the present paper, extensions of this framework to implicit schemes and the method of lines are discussed. While C++ code is generated, using the MathCode system for the implicit method, Modelica code is generated for the method of lines. The latter provides a preliminary PDE support for the Modelica language. Examples illustrating the various aspects of the solver generator are presented.

Journal Article
TL;DR: It’s time to get used to the idea that there is no such thing as a “normal” relationship between a man and a woman.
Abstract: 전방 관측 적외선 영상에서 가려짐이 없는 표적과 부분적으로 가려진 표적을 식별하기 위해 국부적 표적 경계선에 대한 거리함수의 푸리에기술자와 다중의 다층 퍼셉트론을 사용한 특징정보 융합 방법을 제안한다. 표적을 배경으로부터 분리한 후에 표적 경계선의 중심을 기준으로 푸리에 기술자를 구해 전역적 특징으로 사용한다. 국부적인 형상 특징을 찾기 위해 표적 경계선을 분할하여 4개의 국부적 경계선을 만들고, 각 국부적 경계선에서 두 개의 극단점이 이루는 직선과 경계선 픽셀로부터 거리함수를 정의한다. 거리함수에 대한 푸리에 기술자를 국부적 형상특징으로 사용한다. 1개의 광역적 특징 벡터와 4개의 국부적 특징 벡터를 정의하고 다중의 다층 퍼셉트론을 사용하여 특징정보들을 융합함으로써 최종 표적식별 결과를 얻는다. 실험을 통해 기존의 특징벡터들에 의한 표적식별 방법과 비교하여 제안한 방법의 우수성을 입증한다.

Journal ArticleDOI
TL;DR: The purpose of this benchmark is to propose several optimization techniques and to test their existence in current OpenMP compilers, including the removal of redundant synchronization constructs, effective constructs for alternative code and orphaned directives.
Abstract: The purpose of this benchmark is to propose several optimization techniques and to test their existence in current OpenMP compilers. Examples are the removal of redundant synchronization constructs, effective constructs for alternative code and orphaned directives. The effectiveness of the compiler generated code is measured by comparing different OpenMP constructs and compilers. If possible, we also compare with the hand coded "equivalent" solution. Six out of seven proposed optimization techniques are already implemented in different compilers. However, most compilers implement only one or two of them.

Journal ArticleDOI
TL;DR: A compiler is presented that takes as source a C program annotated with complexity formulas and produces as output an instrumented code, giving us, among other information, the values of those parameters.
Abstract: Current performance prediction analytical models try to characterize the performance behavior of actual machines through a small set of parameters. In practice, substantial deviations are observed. These differences are due to factors as memory hierarchies or network latency. A natural approach is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each "communication block". Unfortunately, to use this approach implies that the evaluation of parameters must be done for each algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We present a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter, giving us, among other information, the values of those parameters.


Journal ArticleDOI
TL;DR: The main objective is to implicitly setup affinity links between threads and data, by devising loop schedules that achieve balanced work distribution within irregular data spaces and reusing them as much as possible along the execution of the program for better memory access locality.
Abstract: In this paper we explore the idea of customizing and reusing loop schedules to improve the scalability of non-regular numerical codes in shared-memory architectures with non-uniform memory access latency. The main objective is to implicitly setup affinity links between threads and data, by devising loop schedules that achieve balanced work distribution within irregular data spaces and reusing them as much as possible along the execution of the program for better memory access locality. This transformation provides a great deal of flexibility in optimizing locality, without compromising the simplicity of the shared-memory programming paradigm. In particular, the programmer does not need to explicitly distribute data between processors. The paper presents practical examples from real applications and experiments showing the efficiency of the approach.

Journal Article
TL;DR: In this paper, the authors proposed a method to solve the problem of the lack of resources in the South Korean market by using the concept of "social media" to improve the quality of information.
Abstract: 지문을 기반으로 하는 생체 인식 시스템이 보다 실용적이고 높은 신뢰성을 가지기 위해서는, 입력지문의 회전에 강인해야 할 뿐만 아니라 검증이나 인식에 소요되는 응답 시간이 짧아야 한다. 따라서 본 논문에서는 회전에 강인할 뿐만 아니라 처리속도가 빠른 방향성 필터 뱅크 기반의 지문 특징 추출 및 정합 방법을 제안한다. 제안한 방법은 영상을 방향별 대역 영상으로 효과적으로 분해해 주는 방향성 필터 뱅크를 이용하여 지문 패턴의 특징을 빠른 속도로 추출할 뿐만 아니라 특징 벡터 간 유클리드 거리에 기반하여 정합을 수행하기 때문에 전체 응답 속도가 매우 빠르다. 회전에 대해 강인한 특성을 가지도록 하기 위해 방향성 필터 뱅크에 의해 분해된 대역 영상에서 다양한 회전을 고려한 특징 벡터 집합을 구성한 다음 등록된 단일 템플릿 특징 벡터와 정합을 수행한다. 실험 결과 제안한 방법이 기존의 방법 중 선두적인 방법 중의 하나인 Gabor 필터 뱅크 기반 방법에 상응하는 정확도를 가지면서 훨씬 빠른 속도로 검증을 수행할 뿐만 아니라 회전에 강인한 특성을 가짐을 보여 주었다.


Journal Article
TL;DR: “ 하드웨어적인 부분과 자율 이동로봇을 구현한 논문이다”.
Abstract: 본 논문은 장애물을 인식하고 회피하면서 목적지까지 자율적으로 이동할 수 있는 로봇을 구현한 논문이다. 우리는 본 논문에서 영상처리보드의 구현이라는 하드웨어적인 부분과 자율 이동로봇을 위한 영상궤환 제어라는 소프트웨어의 두 가지 결과를 나타내었다. 첫 번째 부분에서, 영상처리를 수행하는 제어보드로부터 명령을 받는 로봇을 나타내었다. 우리는 오랫동안 CCD카메라를 탑재한 자율 이동로봇에 대하여 연구해왔다. 로봇의 구성은 DSP칩을 탑재한 영상보드와 스텝모터 그리고 CCD카메라로 구성된다.시스템 구성은 이동로봇의 영상처리 보드에서 영상을 획득하고 영상처리 알고리즘을 수행하고 로봇의 이동경로를 계산한다. 이동로봇에 탑재된 CCD카메라에서 획득한 영상 정보는 매 샘플링 시간마다 캡쳐한다. 화면에서 장애물의 유무를 판별한 후 좌 혹은 우로 회전하여 장애물을 회피하고 이동한 거리를 Feedback하는 시스템을 구현하여 초기에 지정한 목표지점까지 로봇이 갈 수 있도록 간략한 경로를 계획하여 절대좌표를 추적해 나가는 알고리즘을 구현한다. 이러한 영상을 획득하고 알고리즘을 처리하는 영상처리 보드의 구성은 DSP (TMS320VC33), ADV611, SAA7111, ADV7176A, CPLD(EPM7256ATC144), SRAM 메모리로 구성되어 있다. 두 번째 부분에서는 장애물을 인식하고 회피하기 위하여 두 가지의 영상궤환 제어 알고리즘을 나타낸다. 첫 번째 알고리즘은 필터링, 경계검출, NOR변환, 경계치 설정 등의 영상 전처리 과정을 거친 영상을 분할하는 기법이다. 여기에서는 Labeling과 Segmentation을 통한 pixel의 밀도 계산이 도입된다. 두 번째 알고리즘은 위와 같이 전처리된 영상에 웨이브렛 변환을 이용하여 수직방향(y축 성분)으로 히스토그램 분포를 20 Pixel 간격으로 스캔한다. 파형 변화에 의하여 장애물이 있는 부분의 히스토그램 분포는 거의 변동이 없이 나타난다. 이러한 특성을 분석하여 장애물이 있는 곳을 찾아내고 이것을 회피하기 위한 알고리즘을 세웠다. 본 논문은 로봇에 장착된 한 개의 CCD 카메라를 이용하여 장애물을 회피하면서 초기에 설정해둔 목적지까지 도달하기 위한 알고리즘을 제안하였으며, 영상처리 보드를 설계 및 제작하였다. 영상처리 보드는 일반적인 보드보다 빠른 속도(30frame/sec)와 해상도를 지원하며 압축 알고리즘을 탑재하고 있어서 영상을 전송하는 데에 있어서도 탁월한 성능을 보인다.

Journal Article
TL;DR: “ 보이지 효과적으로 전송하기 위해 테레오 영상간의 하였다.
Abstract: 본 논문에서는 디지털 워터마킹을 이용하여 기존 2차원 TV와 호환가능한 새로운 3차원 동영상 전송기법을 제안한다. 일반적으로 3차원 영상을 효과적으로 전송하기 위해 스테레오 영상간의 시간적·공간적 중복성을 이용한 동영상 전송방법을 사용하고 있다. 그러나 3차원 영상의 압축에 대한 효율성 때문에 복호화과정에서 기존의 일반 디지털 TV로는 전송된 3차원 영상의 시청이 불가능한 단점이 있다. 이러한 문제점을 개선하기 위해 제안한 방법에서는 디지털 워터마킹이 시각적으로 보이지 않는 곳에 새로운 정보를 은닉시키는 부분에 착안하여 스테레오 영상의 다른 한쪽 영상의 정보를 기준 영상의 각 채널별로 은닉하여 복호화시 3차원 영상의 복원 뿐만아니라 기존의 디지털 TV로도 시청가능하도록 하였다.

Journal ArticleDOI
TL;DR: This paper presents a hybrid Java/Fortran implementation of a parallel particle-in-cell (PIC) algorithm for plasma simulations, where the time-consuming components of this application are designed and implemented as Fortran subroutines, while less calculation-intensive components usually involved in building the user interface are written in Java.
Abstract: Java is receiving increasing attention as the most popular platform for distributed computing. However, programmers are still reluctant to embrace Java as a tool for writing scientific and engineering applications due to its still noticeable performance drawbacks compared with other programming languages such as Fortran or C. In this paper, we present a hybrid Java/Fortran implementation of a parallel particle-in-cell (PIC) algorithm for plasma simulations. In our approach, the time-consuming components of this application are designed and implemented as Fortran subroutines, while less calculation-intensive components usually involved in building the user interface are written in Java. The two types of software modules have been glued together using the Java native interface (JNI). Our mixed-language PIC code was tested and its performance compared with pure Java and Fortran versions of the same algorithm on a Sun E6500 SMP system and a Linux cluster of Pentium~III machines.

Journal ArticleDOI
TL;DR: This work presents an OpenMP implementation suitable for multiprogrammed environments on Intel-based SMP systems that consists of a runtime system and a resource manager, while the NanosCompiler is used to transform OpenMP-coded applications into code with calls to the runtime system.
Abstract: In this work, we present an OpenMP implementation suitable for multiprogrammed environments on Intel-based SMP systems. This implementation consists of a runtime system and a resource manager, while we use the NanosCompiler to transform OpenMP-coded applications into code with calls to our runtime system. The resource manager acts as the operating system scheduler for the applications built with our runtime system. It executes a custom made scheduling policy to distribute the available physical processors to the active applications. The runtime system cooperates with the resource manager in order to adapt each application's generated parallelism to the number of processors allocated to it, according to the resource manager scheduling policy. We use the OpenMP version of the NAS Parallel Benchmark suite in order to evaluate the performance of our implementation. In our experiments we compare the performance of our implementation with that of a commercial OpenMP implementation. The comparison proves that our approach performs better both on a dedicated and on a heavily multiprogrammed environment.


Journal Article
TL;DR: This paper proposes a user-assisted VOP for MPEG-4 visual standard segmentation, which combines automatic segmentation and user- assisted segmentation algorithms.
Abstract: Since the MPEG-4 visual standard enables content-based functionalities, it is necessary to extract video object from natural video sequences. Segmentation algorithms can largely be classified into automatic segmentation and user-assisted segmentation. In this paper, we propose a user-assisted VOP ge...

Journal Article
TL;DR: In this article, the authors propose a color checker based on RGB and show that it can be used to detect color changes in real-world images, but without the need for additional training data.
Abstract: 본 논문에서는 3대역 RGB카메라를 이용하여 분광 반사율을 추정할 때 추정오차를 개선하는 방법을 제안한다. 제안된 방법에서는 색상의 영역별로 적응적인 주성분 집합을 구성함으로써 추정오차를 줄였다. 이때 적응적인 주성분 집합을 구성하기 위하여 Lloyd양자화기 설계 알고리즘을 적용하여 N개의 주성분 집합을 구성하기 위한 분광반사율 모집단을 구성하였다. 전체 모집단으로 사용한 1485 Munsell 색시료의 대표값을 찾아내기 위해서, 초기값으로 Macbeth ColorChecker를 사용하였으며 Lloyd알고리즘의 반복 적용으로 분광 반사율 모집단 전체를 영역별로 분류하고 각 영역에 대하여 주성분 분석을 통해 적응적인 주성분 집합을 구성하였다. 실험 결과, 제안한 방법은 색차 및 분광 반사율에 대한 평균자승오차가 기존의 두 가지의 3대역 주성분 분석 방법 및 5대역 위너 추정을 이용한 분광 반사율 추정 방법보다 개선됨을 확인하였다.


Journal Article
TL;DR: It is confirmed that Olivetti research laboratory (ORL) will be using EMMARM, a novel and scalable approach to solve the challenge of integrating discrete component analysis into the design of buildings.
Abstract: 본 논문에서는 변형 Otsu 이진화 방법, Hu 모멘트 및 선형 판별 분석(linear discriminant analysis, LDA)를 기반으로 밝기, 명암도, 크기, 회전, 위치 변화에 강인한 얼굴 인식 방법을 제안하고자 한다 제안한 변형 Otsu 이진화를 사용하여 밝기 및 명암도에 불변한 이진 영상들을 만든다 그런 후 생성된 얼굴 영상의 경계 영상 및 다단계 이진 영상으로부터 총 17개의 Hu 모멘트를 계산한 다음 LDA 방법을 적용하여 최종 특징 벡터를 추출한다 특히 제안하는 얼굴 인식 방법은 Hu 모멘트를 이용함으로써 크기, 회전 및 위치 변화에도 강인한 특성을 갖고 있다 Olivetti research laboratory (ORL) 데이터베이스와 AR 데이터베이스의 총 100명의 얼굴 영상에 대해 기존의 주요 성문 분석(Principal component analysis, PCA) 방법 및 PCA와 LDA를 결합한 얼굴 인식 방법과 비교 실험한 결과, 제안한 얼굴 인식 방법은 대체적으로 기존 방법보다 뛰어난 인식 성능을 보였다

Journal ArticleDOI
TL;DR: This paper describes the design and development of a software package supporting variable precision arithmetic as a semantic extension to the Fortran 95 language that exploits the data-abstraction capabilities ofFortran 95 and allows the operations to be used elementally with array operands as well as with scalars.
Abstract: This paper describes the design and development of a software package supporting variable precision arithmetic as a semantic extension to the Fortran 95 language. The working precision of the arithmetic supported by this package can be dynamically and arbitrarily varied. The facility exploits the data-abstraction capabilities of Fortran 95 and allows the operations to be used elementally with array operands as well as with scalars. The number system is defined in such a way as to be closed under all of the basic operations of normal arithmetic; no program-terminating numerical exceptions can occur. Precision loss situations like underflow and overflow are handled by defining special value representations that preserve as much of the numeric information as is practical and the operation semantics are defined so that these exceptional values propagate as appropriate to reflect this loss of information. The number system uses an essentially conventional variable precision floating-point representation. When operations can be performed exactly within the currently-set working precision limit, the excess trailing zero digits are not stored, nor do they take part in future operations. This is both economical in storage and improves efficiency.