Showing papers on "Scalability published in 1989"

PDF

Open Access

Proceedings Article•DOI•

Exploring The Benefits Of Multiple Hardware Contexts In A Multiprocessor Architecture: Preliminary Results

[...]

Wolf-Dietrich Weber¹, Amit Gupta¹•Institutions (1)

01 Apr 1989

TL;DR: The extent to which multiple hardware contexts per processor can help to mitigate the negative effects of high latency is explored and it is shown that two or four contexts can achieve substantial performance gains over a single context.

...read moreread less

Abstract: A fundamental problem that any scalable multiprocessor must address is the ability to tolerate high latency memory operations. This paper explores the extent to which multiple hardware contexts per processor can help to mitigate the negative effects of high latency. In particular, we evaluate the performance of a directory-based cache coherent multiprocessor using memory reference traces obtained from three parallel applications. We explore the case where there are a small fixed number (2-4) of hardware contexts per processor and the context switch overhead is low. In contrast to previously proposed approaches, we also use a very simple context switch criterion, namely a cache miss or a write-hit to shared data. Our results show that the effectiveness of multiple contexts depends on the nature of the applications, the context switch overhead, and the inherent latency of the machine architecture. Given reasonably low overhead hardware context switches, we show that two or four contexts can achieve substantial performance gains over a single context. For one application, the processor utilization increased by about 46% with two contexts and by about 80% with four contexts.

...read moreread less

163 citations

Journal Article•DOI•

Finding idle machines in a workstation-based distributed system

[...]

M.M. Theimer¹, K.A. Lantz²•Institutions (2)

PARC¹, Olivetti²

01 Nov 1989-IEEE Transactions on Software Engineering

TL;DR: The authors describe the design and performance of scheduling facilities for finding idle hosts in a workstation-based distributed system and focus on the tradeoffs between centralized and decentralized architectures with respect to scalability, fault tolerance, and simplicity of design.

...read moreread less

Abstract: The authors describe the design and performance of scheduling facilities for finding idle hosts in a workstation-based distributed system. They focus on the tradeoffs between centralized and decentralized architectures with respect to scalability, fault tolerance, and simplicity of design, as well as several implementation issues of interest when multicast communication is used. They conclude that the principal tradeoff between the two approaches is that a centralized architecture can be scaled to a significantly greater degree and can more easily monitor global system statistics whereas a decentralized architecture is simpler to implement. >

...read moreread less

139 citations

Journal Article•DOI•

Exploiting coherence for multiprocessor ray tracing

[...]

S.A. Green¹, D.J. Paddon¹•Institutions (1)

University of Bristol¹

01 Nov 1989-IEEE Computer Graphics and Applications

TL;DR: A form of coherence in the ray-tracing algorithm is identified that can be exploited to develop optimum schemes for data distribution in a multiprocessor system, which gives rise to high processor efficiency for systems with limited distributed memory.

...read moreread less

Abstract: The scalability and cost effectiveness of general-purpose distributed-memory multiprocessor systems makes them particularly suitable for ray-tracing applications. However, the limited memory available to each processor in such a system requires schemes to distribute the model database among the processors. The authors identify a form of coherence in the ray-tracing algorithm that can be exploited to develop optimum schemes for data distribution in a multiprocessor system. This in turn gives rise to high processor efficiency for systems with limited distributed memory. >

...read moreread less

99 citations

Proceedings Article•DOI•

A dynamic scheduling strategy for the Chare-Kernel system

[...]

Wei Shu¹, Laxmikant V. Kale²•Institutions (2)

Yale University¹, University of Illinois at Urbana–Champaign²

01 Aug 1989

TL;DR: The experimental results show that ACWN algorithm achieves better performance in most cases than randomized allocation, and its agility in spreading the work helps it outperform the gradient model in performance and scalability.

...read moreread less

Abstract: One of the challenges in programming distributed memory parallel machines is deciding how to allocate work to processors. This problem is particularly acute for computations with unpredictable dynamic behavior or irregular structure. We present a scheme for dynamic scheduling of medium-grained processes that is useful in this context. The Adaptive Contracting Within Neighborhood (ACWN), is a dynamic, distributed, self-adaptive, and scalable scheme. The basic scheme and its adaptive extensions are described, and contrasted with other schemes that have been proposed in this context. The performance of all the three schemes on an iPSC/2 hypercube is presented and analyzed. The experimental results show that ACWN algorithm achieves better performance in most cases than randomized allocation. Its agility in spreading the work helps it outperform the gradient model in performance and scalability.

...read moreread less

64 citations

Proceedings Article•DOI•

Multi-level shared caching techniques for scalability in VMP-M/C

[...]

David R. Cheriton¹, H. A. Goosen¹, P. D. Boyle¹•Institutions (1)

Stanford University¹

01 Apr 1989

TL;DR: The VMP-MC design is described, a distributed parallel multi-computer based on the VMP multiprocessor design that is intended to provide a set of building blocks for configuring machines from one to several thousand processors.

...read moreread less

Abstract: The problem of building a scalable shared memory multiprocessor can be reduced to that of building a scalable memory hierarchy, assuming interprocessor communication is handled by the memory system. In this paper, we describe the VMP-MC design, a distributed parallel multi-computer based on the VMP multiprocessor design, that is intended to provide a set of building blocks for configuring machines from one to several thousand processors. VMP-MC uses a memory hierarchy based on shared caches, ranging from on- chip caches to board-level caches connected by busses to, at the bottom, a high-speed fiber optic ring. In addition to describing the building block components of this architecture, we identify the key performance issues associated with the design and provide performance evaluation of these issues using trace-drive simulation and measurements from the VMP. This work was sponsored in part by the Defense Advanced Re- search Projects Agency under Contract N00014-88-K-0619.

...read moreread less

55 citations

VLSI theory and parallel supercomputing

[...]

Charles E. Leiserson

01 Jun 1989

TL;DR: How layout theory engendered the notion of area and volume-universal networks, such as fat-trees is discussed and these scalable networks offer a flexible alternative to the more common hypercube-based networks for inter-connecting the processors of large parallel supercomputers.

...read moreread less

Abstract: : Since its inception, VLSI theory has expanded in many fruitful and interesting directions. One major branch is layout theory which studies the efficiency with which graphs can be embedded in the plane according to VLSI design rules. In this survey paper, I review some of the major accomplishments of VLSI layout theory and discuss how layout theory engendered the notion of area and volume-universal networks, such as fat-trees. These scalable networks offer a flexible alternative to the more common hypercube-based networks for inter-connecting the processors of large parallel supercomputers. Keywords: Integrated circuits; Interconnection networks; Parallel computing; Super-computing; Universality; Thompson's model; Tree of meshes.

...read moreread less

36 citations

Proceedings Article•

The Networked Resource Discovery Project.

[...]

Michael F. Schwartz

01 Jan 1989

TL;DR: In this paper, the authors describe an approach for a system that accesses the distributed collection of repositories that naturally maintain resource information, rather than building a global database to register all resources.

...read moreread less

Abstract: Large scale computer networks provide access to a bewilderingly large number and variety of resources, including retail products, network services, and people in various capacities. We consider the problem of allowing users to discover the existence of such resources in an administratively decentralized environment. We describe an approach for a system that accesses the distributed collection of repositories that naturally maintain resource information, rather than building a global database to register all resources. A key problem is organizing the resource space in a manner suitable to all participants. Rather than imposing an inflexible hierarchical organization, our approach allows the resource space organization to evolve in accordance with what resources exist and what types of queries users make. Concretely, a set of agents organize and search the resource space by constructing links between the repositories of resource information based on keywords that describe the contents of each repository, and the semantics of the resources being sought. The links form a general graph, with a flexible set of hierarchies embedded within the graph to provide some measure of scalability. The graph structure evolves over time through the use of cache aging protocols. Additional scalability is targeted through the use of probabilistic graph protocols. A prototype implementation and a measurement study are under way. hhhhhhhhhhhhhhhhhh 1 This material is based upon work supported in part by the National Science Foundation under Cooperative Agreement DCR-84200944, and by a grant from AT&T Bell Laboratories.

...read moreread less

27 citations

Proceedings Article•DOI•

Measuring the scalability of parallel computer systems

[...]

J. R. Zirbas, D. J. Reble, R. E. vanKooten

01 Aug 1989

TL;DR: A technique is proposed that can be used to help determine whether a candidate model is correct, that is, whether it adequately approximates the system's scalability, and Experimental results illustrate this technique for both a poorly scalable and a very scalable system.

...read moreread less

Abstract: This paper discusses scalability and outlines a specific approach to measuring the scalability of parallel computer systems. The relationship between scalability and speedup is described. It is shown that a parallel system is scalable for a given algorithm if and only if its speedup is unbounded. A technique is proposed that can be used to help determine whether a candidate model is correct, that is, whether it adequately approximates the system's scalability. Experimental results illustrate this technique for both a poorly scalable and a very scalable system.

...read moreread less

27 citations

Evolution of an operating system for large-scale shared-memory multiprocessors. Technical report

[...]

M.L. Scott, Thomas J. LeBlanc, Brian D. Marsh

01 Mar 1989

TL;DR: Psyche as discussed by the authors is an operating system designed to enable the most effective use possible of large-scale shared-memory multiprocessors, both within and among applications, with information sharing as the default, rather than the exception.

...read moreread less

Abstract: Scalable shared-memory multiprocessors (those with non-uniform memory access times) are among the most flexible architectures for high-performance parallel computing, admitting efficient implementations of a wide range of process models, communication mechanisms, and granularities of parallelism. Such machines present opportunities for general-purpose parallel computing that cannot be exploited by existing operating systems, because the traditional approach to operating system design presents a virtual machine in which the definition of process, communication, and grain size are outside the control of the user. Psyche is an operating system designed to enable the most effective use possible of large-scale shared-memory multiprocessors. The Psyche project is characterized by (1) a design that permits the implementation of multiple models of parallelism, both within and among applications, (2) the ability to trade protection for performance, with information sharing as the default, rather than the exception, (3) explicit, user-level control of process structure and scheduling, and (4) a kernel implementation that uses shared memory itself, and that provides users with the illusion of uniform memory access times.

...read moreread less

16 citations

Proceedings Article•DOI•

AS/sup 3/AP-a comparative relational database benchmark

[...]

C. Turbyfill, C. Orji, D. Bitton

27 Feb 1989

TL;DR: The authors describe and motivate the design of a scalable and portable benchmark for database systems, the AS/sup 3/AP benchmark (Ansi SQL Standard Scalable and Portable).

...read moreread less

Abstract: The authors describe and motivate the design of a scalable and portable benchmark for database systems, the AS/sup 3/AP benchmark (Ansi SQL Standard Scalable and Portable). The benchmark is designed to provide meaningful measures of database processing power, to be portable between different architectures, and to be scalable to facilitate comparisons between systems with different capabilities. The authors introduce a performance metric, namely, the equivalent database ratio, to be used in comparing systems. >

...read moreread less

16 citations

Journal Article•DOI•

A study parallel disk organizations

[...]

A. L. N. Reddy¹, Prithviraj Banerjee¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Sep 1989-ACM Sigarch Computer Architecture News

TL;DR: Results of performance evaluation of several parallel disk organizations are presented and a characterization of the disk systems is presented.

...read moreread less

Abstract: In this paper, several issues related to designing a parallel disk system are discussed. Results of performance evaluation of several parallel disk organizations are presented. A characterization of the disk systems is also presented. Issues such as scalability, networking etc. are discussed. Several problems for future research on improving the I/O performance are pointed out.

...read moreread less

Journal Article•DOI•

High-speed 32-bit buses for forward-looking computers

[...]

P.L. Borrill¹•Institutions (1)

National Semiconductor¹

01 Jul 1989-IEEE Spectrum

TL;DR: The features offered by current high-performance 32-bit system buses are examined, and the factors that need to be taken into account when designing these buses are considered.

...read moreread less

Abstract: The features offered by current high-performance 32-bit system buses are examined. They allow multiprocessing, scalability, block transfers to RAM, cache coherence, and autoconfiguration (the ability to poll boards connected to them, identify the boards, and adjust the software interface accordingly). The factors that need to be taken into account when designing these buses are considered, and their performance and limitations are discussed. >

...read moreread less

Book Chapter•DOI•

An experiment on response time scalability in Bubba

[...]

Marc Smith¹, William Alexander¹, Haran Boral¹, George P. Copeland¹, Tom Keller¹, Herbert D. Schwetman¹, Chii-Ren Young¹ - Show less +3 more•Institutions (1)

Monroe Community College¹

19 Jun 1989

TL;DR: It is concluded that although parallelism must be limited in some circumstances, in general the benefits of increased parallelism in shared-nothing systems exceed the costs.

...read moreread less

Abstract: We describe results from an experiment that investigates the scalability of response time performance in shared-nothing systems, such as the Bubba parallel database machine. In particular, we show how—and how much—potential response time improvements for certain transaction types can be impaired in shared-nothing architectures by the increased cost of transaction startup, communication, and synchronization as the degree of execution parallelism is increased. We further show how these effects change under increased levels of concurrency and heterogeneity in the transaction workload. From the results, we conclude that although parallelism must be limited in some circumstances, in general the benefits of increased parallelism in shared-nothing systems exceed the costs.

...read moreread less

Journal Article•DOI•

Distributed shared memory multiprocessor architecture MEMSY for high performance paralel computations

[...]

G. Fritsch, W. Henning, H. Hesenuer, R. Klar, C. U. Linster, C. w. Oehlrich, P. Schlenk, J. Vokert - Show less +4 more

01 Dec 1989-ACM Sigarch Computer Architecture News

TL;DR: A distributed shared memory (DSM) architecture is presented that is the basis for the design of a scalable high performance multiprocessor system that is able to process very large processing tasks with supercomputer performance.

...read moreread less

Abstract: The rapid progress of microprocessors provides economic solutions for small and medium-scale data processing tasks, e.g., workstations. It is a challenging task to combine many powerful microprocessors to a fixed or reconfigurable array which is able to process very large processing tasks with supercomputer performance. Fortunately, many very large applications are regularly structured and can easily be partitioned. One example are physical phenomena which are often described by mathematical models, e.g. by sets of partial differential equations (PDE). In most cases, the mathematical models can only be computed approximately The finer the used model is, the higher is the necessary computational effort. With the appearance of more powerful computers more complicated and more refined models can be calculated. Such user problems are compute- intensive and have strong inherent computational parallelism. Therefore, the needed high performance can be achieved by using many computers working in parallel. In particular, parallel architectures of the MIMD (multiple-instruction multiple-data) type, known as multiprocessors, are well suited because of their higher flexibility with respect to SIMD (single-instruction multiple-data). In this paper, the authors present a distributed shared memory (DSM) architecture that is the basis for the design of a scalable high performance multiprocessor system.

...read moreread less

Proceedings Article•DOI•

Analysis Of Computation-Communication Issues In Dynamic Dataflow Architectures

[...]

Dipak Ghosal¹, S.K. Tripathi¹, Laxmi N. Bhuyan², Hong Jiang²•Institutions (2)

University of Maryland, College Park¹, Sewanee: The University of the South²

01 Apr 1989

TL;DR: A queueing network model of the dynamic dataflow architecture is developed based on the idea of characterizing dataflow graphs by their average parallelism and the effect on the performance of the system due to factors such as scalability, coarse grain vs. fine grain parallelism, degree of decentralized scheduling of dataflow instructions, and locality is studied.

...read moreread less

Abstract: This paper presents analytical results of computation-communication issues in dynamic dataflow architectures. The study is based on a generalized architecture which encompasses all the features of the proposed dynamic dataflow architectures. Based on the idea of characterizing dataflow graphs by their average parallelism, a queueing network model of the architecture is developed. Since the queueing network violates properties required for product from solution, a few approximations have been used. These approximations yield a multi-chain closed queueing network in which the population of each chain is related to the average parallelism of the dataflow graph executed in the architecture. Based on the model, we are able to study the effect on the performance of the system due to factors such as scalability, coarse grain vs. fine grain parallelism, degree of decentralized scheduling of dataflow instructions, and locality.

...read moreread less

Proceedings Article•DOI•

HP Precision: a spectrum architecture

[...]

Ruby B. Lee¹•Institutions (1)

Hewlett-Packard¹

03 Jan 1989

TL;DR: The author discusses the Hewlett-Packard Precision architecture, which was designed as a common architecture for HP computer systems with a RISC (reduced-instruction-set computer)-like execution model, with features for code compaction and execution time reduction for frequent instruction sequences.

...read moreread less

Abstract: The author discusses the Hewlett-Packard Precision architecture, which was designed as a common architecture for HP computer systems. It has a RISC (reduced-instruction-set computer)-like execution model, with features for code compaction and execution time reduction for frequent instruction sequences. In addition, it has features for making the architecture extendible, for enhancing its longevity, and for supporting different operating environments. The author describes some aspects of the Precision processor architecture, its goals, how it addresses the spectrum of general-purpose use information, processing needs, and some architectural design tradeoffs. >

...read moreread less

Proceedings Article•DOI•

Distributed control of a broadband local access switch

[...]

T.F. Bowen

11 Jun 1989

TL;DR: A description is given of a prototype broadband integrated services digital network which uses a scalable, distributed control architecture, which enhances reliability, to meet the need for central office software control architectures for broadband services.

...read moreread less

Abstract: It is noted that broadband services accelerate the need for central office software control architectures allowing flexible allocation of computing resources, since they increase the volume, complexity, and fluctuation of the workload. A description is given of a prototype broadband integrated services digital network which uses a scalable, distributed control architecture, which enhances reliability, to meet this goal. In contrast with other architectures in which processors are tightly coupled to subscriber lines, the prototype control architecture decomposes call processing into functions that are distributed among several processors with minimized common functions and coupling between subscribers and processors. Scalability in terms of lines and traffic volume is achieved. Two versions of the architecture, one using general-purpose computers and the other a single board computer system, are operational. Extensions of the architecture for unified network control offer the additional benefit of simplifying new service deployment. >

...read moreread less

Investigating Skew and Scalability in Parallel Joins

[...]

Christopher B. Walton

01 Dec 1989

TL;DR: This research makes three major contributions: Several distinct types of skew are identified, and the relative partition model of skew is defined, a simple analytic model that allows worst-case analysis of each type of data skew.

...read moreread less

Abstract: This research will improve understanding of the interaction between data skew and scalability in parallel join algorithms. Previous work in this area assumes that data are uniformly distributed, but data skew is widespread in existing databases. This research makes three major contributions: 1. Several distinct types of skew are identified. Previous work treats skew as a homogeneous phenomenon, but simple analytic analysis shows that each type of skew has a different effect on response time. 2. The relative partition model of skew is defined. It is a simple analytic model that allows worst-case analysis of each type of data skew. The use of this model is demonstrated in an analysis of the sort-merge join algorithm. 3. A systematic plan for investigating skew and scalability. The interplay between simple analytic models and detailed simulations is vital: Analytic models bound the results expected from simulation, while more detailed simulation results validate the analytic models.

...read moreread less

Dissertation•

A scalable multiprocessor architecture using Cartesian Network-Relative Addressing

[...]

Joseph Derek Morrison

01 Dec 1989

TL;DR: This thesis discusses how the system software might manage the 'relative pointers' in a clean, transparent way, solutions to the problem of testing pointer equivalence, protocols and algorithms for migrating objects to maximize concurrency and communication locality, garbage collection techniques, and other aspects of the CNRA system design.

...read moreread less

Abstract: : The Computer Architecture Group is developing a new model of computation called L. This thesis describes a highly scalable architecture for implementing L called CNRA. In the CNRA architecture, processor/memory pairs are placed at the nodes of a low-dimensional Cartesian grid network. Addresses in the system are composed of a routing component which describes a relative path through the interconnection network (the origin of the path is the node on which the address resides), and a 'memory location' component which specifies the memory location to be addressed on the node at the destination of the routing path. The CNRA addressing system allows sharing of data structures in a style similar to that of global shared memory machines, but does not have the disadvantages normally associated with shared-memory machines (i.e. limited address space and memory access latency that increases with system size). This thesis discusses how a practical CNRA system might be built. It discusses how the system software might manage the 'relative pointers' in a clean, transparent way, solutions to the problem of testing pointer equivalence, protocols and algorithms for migrating objects to maximize concurrency and communication locality, garbage collection techniques, and other aspects of the CNRA system design. Simulations experiments with a toy program are presented. Multiprocessors; Scalability; Topology; Address space; Relative addressing; Task migration; Parallelism.

...read moreread less

Proceedings Article•DOI•

Limits on scalability in gracefully degradable large-scale systems

[...]

Walid Najjar¹, Jean-Luc Gaudiot•Institutions (1)

Colorado State University¹

10 Oct 1989

TL;DR: An analysis of the scalability of large-scale degradable homogeneous multiprocessors is presented by assessing the limitations imposed by reliability considerations on the number of processors and it is demonstrated that graceful degradation in large- scale systems is not scalable.

...read moreread less

Abstract: The authors present an analysis of the scalability of large-scale degradable homogeneous multiprocessors by assessing the limitations imposed by reliability considerations on the number of processors. They demonstrate that graceful degradation in large-scale systems is not scalable. An increase in the number of processors must be matched by a significant increase in the coverage factor in order to maintain the same performance and reliability levels. >

...read moreread less

Scalability of Parallel Joins on High Performance Multicomputers

[...]

Alfred G. Dale, Furman F. Haddix, Roy M. Jenevein, Christopher B. Walton

01 Jun 1989

TL;DR: A cost-effective sort engine and a novel algorithm that combines hashing and semijoins are proposed that scale very well and in some cases, synergistic effects lead to better than linear speedup.

...read moreread less

Abstract: This paper focuses on parallel joins computed on a mesh-connected multicomputer. We propose a cost-effective sort engine and a novel algorithm that combines hashing and semijoins. An analytic model is used to select hardware configurations for detailed evaluation and to suggest refinements to the algorithm. Simulation of our model confirmed the analytic results. Results indicate that parallel joins scale very well. In some cases, synergistic effects lead to better than linear speedup.

...read moreread less

Proceedings Article•DOI•

A reconfigurable fault-tolerant signal processor

[...]

Richard Robert Shively¹, Allen Louis Gorin¹•Institutions (1)

Bell Labs¹

23 May 1989

TL;DR: A parallel computer architecture targeted at signal pattern analysis applications, scalable to configurations capable of TeraFLOP throughput, and derived and used to analyze the impact of skewness of the embedded trees on the execution time of parallel recognition algorithms.

...read moreread less

Abstract: Describes a parallel computer architecture targeted at signal pattern analysis applications, scalable to configurations capable of TeraFLOP (10/sup 12/ floating point operations per second) throughput An important attribute of the architecture is its low interconnection overhead, making it well suited to miniaturization using advanced packaging Preliminary design and thermal tests project a computing density of 300 GigaFLOPS per cubit foot The architecture is reconfigurable as a tree machine, one or more rings, or a set of linear systolic arrays Fault tolerance is achieved by embedding these topologies within a four-connected lattice, growing around any faults A performance model is derived and used to analyze the impact of skewness of the embedded trees on the execution time of parallel recognition algorithms >

...read moreread less