scispace - formally typeset
Search or ask a question

Showing papers on "Scalability published in 1990"


Proceedings ArticleDOI
01 Jun 1990
TL;DR: The Tera architecture was designed with several goals in mind; it needed to be suitable for very high speed implementations, i.
Abstract: The Tera architecture was designed with several ma jor goals in mind. First, it needed to be suitable for very high speed implementations, i. e., admit a short clock period and be scalable to many processors. This goal will be achieved; a maximum configuration of the first implementation of the architecture will have 256 processors, 512 memory units, 256 I/O cache units, 256 I/O processors, and 4096 interconnection network nodes and a clock period less than 3 nanoseconds. The abstract architecture is scalable essentially without limit (although a particular implementation is not, of course). The only requirement is that the number of instruction streams increase more rapidly than the number of physical processors. Although this means that speedup is sublinear in the number of instruction streams, it can still increase linearly with the number of physical pro cessors. The price/performance ratio of the system is unmatched, and puts Tera’s high performance within economic reach. Second, it was important that the architecture be applicable to a wide spectrum of problems. Programs that do not vectoriae well, perhaps because of a preponderance of scalar operations or too-frequent conditional branches, will execute efficiently as long as there is sufficient parallelism to keep the processors busy. Virtually any parallelism available in the total computational workload can be turned into speed, from operation level parallelism within program basic blocks to multiuser timeand space-sharing. The architecture

797 citations


Journal ArticleDOI
TL;DR: The current Bubba prototype runs on a commercial 40-node multicomputer and includes a parallelizing compiler, distributed transaction management, object management, and a customized version of Unix.
Abstract: Bubba is a highly parallel computer system for data-intensive applications. The basis of the Bubba design is a scalable shared-nothing architecture which can scale up to thousands of nodes. Data are declustered across the nodes (i.e. horizontally partitioned via hashing or range partitioning) and operations are executed at those nodes containing relevant data. In this way, parallelism can be exploited within individual transactions as well as among multiple concurrent transactions to improve throughput and response times for data-intensive applications. The current Bubba prototype runs on a commercial 40-node multicomputer and includes a parallelizing compiler, distributed transaction management, object management, and a customized version of Unix. The current prototype is described and the major design decisions that went into its construction are discussed. The lessons learned from this prototype and its predecessors are presented. >

370 citations


Journal ArticleDOI
TL;DR: This paper first examines formal definitions of scalability, but fails to find a useful, rigorous definition of it, and concludes by challenging the technical community to either rigorously define scalability or stop using it to describe systems.
Abstract: Scalability is a frequently-claimed attribute of multiprocessor systems. While the basic notion is intuitive, scalability has no generally-accepted definition. For this reason, current use of the term adds more to marketing potential than technical insight.In this paper, I first examine formal definitions of scalability, but I fail to lind a useful, rigorous definition of it. I then question whether scalability is useful and conclude by challenging the technical community to either (1) rigorously define scalability or (2) stop using it to describe systems.

189 citations


01 Jan 1990
TL;DR: In this article, the authors propose a new approach to software development which explicitly avoids the use of a single representation scheme or common schema, instead, multiple ViewPoints are utilised to partition the domain information, the development method and the formal representations used to express software specifications.
Abstract: In this paper we propose a new approach to software development which explicitly avoids the use of a single representation scheme or common schema. Instead, multiple ViewPoints are utilised to partition the domain information, the development method and the formal representations used to express software specifications. System specifications and methods are then described as configurations of related ViewPoints. This partitioning of knowledge facilitates distributed development, the use of multiple representation schemes and scalability. Furthermore, the approach is general, covering all phases of the software process from requirements to evolution. This paper motivates and systematically characterises the concept of a "ViewPoint", illustrating the concepts using a simplified example.

152 citations


Book ChapterDOI
01 Mar 1990
TL;DR: The analysis shows that the parallel formulation of DFS can provide near linear speedup on very large parallel architectures, particularly on ring and shared-memory architectures.
Abstract: This paper presents a parallel formulation of depth-first search. To study its effectiveness we have implemented it to solve the 15-puzzle problem on a variety of commercially available multiprocessors. We are able to achieve fairly linear speedup on these multiprocessors for as many as 128 processors (the maximum configurations available to us). At the heart of this parallel formulation is a work-distribution scheme that divides the work dynamically among different processors. The effectiveness of the parallel formulation is strongly influenced by the work-distribution scheme and the target architecture. We introduce the concept of isoeffciency function to characterize the scalability of different architectures and work-distribution schemes. The isoefficiency analysis of previously known work-distribution schemes motivated the design of substantially improved schemes for ring and shared-memory architectures. The analysis shows that our parallel formulation of DFS can provide near linear speedup on very large parallel architectures.

76 citations


Proceedings ArticleDOI
01 Feb 1990
TL;DR: This work has developed a dynamic load balancing scheme which is applicable to OR-parallel programs in general and scalable to any number of processors because of this multi-level hierarchical structure.
Abstract: Good load balancing is the key to deriving maximal performance from multiprocessors. Several successful dynamic load balancing techniques on tightly-coupled multiprocessors have been developed. However, load balancing is more difficult on loosely-coupled multiprocessors because inter-processor communication overheads cost more. Dynamic load balancing techniques have been employed in a few programs on loosely-coupled multiprocessors, but they are tightly built into the particular programs and not much attention is paid to scalability. We have developed a dynamic load balancing scheme which is applicable to OR-parallel programs in general. Processors are grouped, and work loads of groups and processors are balanced hierarchically. Moreover, it is scalable to any number of processors because of this multi-level hierarchical structure. The scheme is tested for the all-solution exhaustive search Pentomino program on the mesh-connected loosely-coupled multiprocessor Multi-PSI, and speedups of 28.4 times with 32 processors and 50 times with 64 processors have been attained.

53 citations


Journal ArticleDOI
TL;DR: The test results indicate that, with some reservations, this theory of scaling is applicable to documents, and this finding is further applied to the construction of test collections for Information Retrieval research that could more sensitively measure retrieval system alterations through the use of documents scaled not merely by relevance, but rather, by preference.
Abstract: The relationship between scaling practice and scaling theory remains a controversial problem in Information Retrieval research and experimentation. This article reports a test of a general theory of scaling, i.e., Simple Scalability, applied to the stimulus domain of documents represented as abstracts. The significance of Simple Scalability is that it implies three important properties of scales: transitivity, substitutibility, and independence. The test results indicate that, with some reservations, this theory of scaling is applicable to documents. This finding is further applied to the construction of test collections for Information Retrieval research that could more sensitively measure retrieval system alterations through the use of documents scaled not merely by relevance, but rather, by preference. © 1990 John Wiley & Sons, Inc.

41 citations


Proceedings ArticleDOI
08 Oct 1990
TL;DR: The scalability analysis shows that the FFT algorithm cannot make efficient use of large-scale mesh architectures and the addition of such features as cut-through routing and multicasting does not improve the overall scalability on this architecture.
Abstract: The scalability of the parallel fast Fourier transform (FFT) algorithm on mesh- and hypercube-connected multicomputers is analyzed. The hypercube architecture provides linearly increasing performance for the FFT algorithm with an increasing number of processors and a moderately increasing problem size. However, there is a limit on the efficiency, which is determined by the communication bandwidth of the hypercube channels. Efficiencies higher than this limit can be obtained only if the problem size is increased very rapidly. Technology-dependent features, such as the communication bandwidth, determine the upper bound on the overall performance that can be obtained from a P-processor system. The upper bound can be moved up by either improving the communication-related parameters linearly or increasing the problem size exponentially. The scalability analysis shows that the FFT algorithm cannot make efficient use of large-scale mesh architectures. The addition of such features as cut-through routing and multicasting does not improve the overall scalability on this architecture. >

32 citations


Book
01 Jan 1990
TL;DR: This thesis addresses several issues in parallel architectures and parallel algorithms for integrated vision systems, and shows that SIMD, MIMD and systolic algorithms can be easily mapped onto processor clusters, and almost linear speedups are possible.
Abstract: Computer vision has been regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g, object recognition). This thesis addresses several issues in parallel architectures and parallel algorithms for integrated vision systems. First, a model of computation for IVSs is presented. The model captures computational requirements, defines spatial and temporal data dependencies between tasks, and shows what types of interactions may occur between tasks from different levels of processing. The model is used to develop features and capabilities of a parallel architecture suitable for IVSs. A multiprocessor architecture for IVSs (called NETRA) is presented. NETRA is highly flexible without the use of complex interconnection schemes. NETRA is recursively defined hierarchical architecture whose leaf nodes consist of clusters processors connected with a programmable crossbar with a selective broadcast capability. Hence, it is easily scalable from small to large systems. Homogeneity of NETRA permits fault tolerance and graceful degradation under faults. Several refinements in the architecture over the original design are also proposed. Performance of several vision algorithms when they are mapped on one cluster is presented. It is shown that SIMD, MIMD and systolic algorithms can be easily mapped onto processor clusters, and almost linear speedups are possible. An extensive analysis of inter-cluster communication strategies in NETRA is presented. A methodology to evaluate performance of algorithms on NETRA is described. Performance analysis of parallel algorithms when mapped across clusters is presented. The parameters are derived from the characteristics of the parallel algorithms, which are then, used to evaluate the alternative communication strategies in NETRA. The effects of communication interference on the performance of algorithms are studied. It is observed that if communication speeds are matched with the computation speeds, almost linear speedups are possible when algorithms are mapped across clusters. Finally, several techniques to perform data decomposition, and static and dynamic load balancing for IVS algorithms are described. These techniques can be used to perform load balancing for intermediate and high level, data dependent vision algorithms. They are shown to perform well, using them on an implementation of a motion estimation system on a hypercube multiprocessor. (Abstract shortened with permission of author.)

32 citations


Journal ArticleDOI
TL;DR: An optimized technique to perform effective broadcasting operations on networks belonging to the WK-recursive class is described, one prototype of which has been realized at the Hybrid Computing Research Center.

31 citations


Proceedings ArticleDOI
James E. Smith1, Wei-Chung Hsu1, C. Hsiung1
01 Oct 1990
TL;DR: The authors discuss the challenges facing designers of future general-purpose supercomputers, and describe architectural features and characteristics that may be used to meet these challenges.
Abstract: The authors discuss the challenges facing designers of future general-purpose supercomputers, and describe architectural features and characteristics that may be used to meet these challenges. Balancing vector/scalar performance, vector performance, scalar processing, supporting scalability, large-scale memory systems, and high-performance I/O and networking are discussed. Performance trends and projections for general-purpose supercomputer systems are given. >

Proceedings Article
13 Aug 1990
TL;DR: Two versions of an epoch algorithm for maintaining a consistent remote backup copy of a database are presented, which ensure scalability, which makes them suitable for very large databases.
Abstract: Remote backup copies of databases are often maintained to ensure availability of data even in the presence of extensive failures, for which local replication mechanisms may be inadequate. We present two versions of an epoch algorithm for maintaining a consistent remote backup copy of a database. The algorithms ensure scalability, which makes them suitable for very large databases. The correctness and the performance of the algorithms are discussed, and an additional application for distributed group commit is given.

Proceedings ArticleDOI
28 May 1990
TL;DR: An overview is given of the major issues involved in maintaining an up-to-date backup copy of a database, kept at a remote site, and a method is presented for performing this task without impairing the performance at the primary site.
Abstract: An overview is given of the major issues involved in maintaining an up-to-date backup copy of a database, kept at a remote site. A method is presented for performing this task without impairing the performance at the primary site. The method is scalable, and it is particularly suitable for multiprocessor systems. The mechanism is relatively straightforward and can be implemented using well-known concepts and techniques, such as locking and logging. >

Proceedings ArticleDOI
01 Apr 1990
TL;DR: An analytical model of the traffic in a machine loosely based on Stanford's DASH multiprocessor is developed and it is shown that both locality in the data reference stream and the amount of data sharing in a program have an important impact on performance.
Abstract: Scalable shared-memory multiprocessors are the subject of much current research, but little is known about the performance behavior of these machines. This paper studies the performance effects of two machine characteristics and two program characteristics that seem to be major factors in determining the performance of a hierarchical shared-memory machine. We develop an analytical model of the traffic in a machine loosely based on Stanford's DASH multiprocessor and use program parameters extracted from multiprocessor traces to study its performance. It is shown that both locality in the data reference stream and the amount of data sharing in a program have an important impact on performance. Although less obvious, the bandwidth within each cluster in the hierarchy also has a significant performance effect. Optimizations that improve the intracluster cache coherence protocol or increase the bandwidth within a cluster can be quite effective.

Proceedings ArticleDOI
16 Jun 1990
TL;DR: A multiprocessor architecture called NETRA is discussed, which has a tree-type hierarchical architecture featuring leaf nodes that consist of a cluster of small but powerful processors connected via a programmable crossbar with selective broadcast capability.
Abstract: A multiprocessor architecture called NETRA is discussed. It is highly reconfigurable and does not involve the use of complex interconnection schemes. The topology of this multiprocessor is recursively defined and is therefore easily scalable from small to large systems. It has a tree-type hierarchical architecture featuring leaf nodes that consist of a cluster of small but powerful processors connected via a programmable crossbar with selective broadcast capability. The architecture is simulated on a hypercube multiprocessor and the performance of one processor cluster is evaluated for stereo-vision tasks. The particular stereo algorithm selected for implementation requires computation of the two-dimensional fast Fourier transform (2-D FFT), template matching, histogram computation, and least-squares surface fitting. Static partitioning of data is used for the data-independent tasks such as 2-D FFT and dynamic scheduling, and load balancing is used for the data-dependent tasks of feature matching and disambiguation. >

Proceedings ArticleDOI
01 Jan 1990
TL;DR: A methodology for mapping rule-based expert systems onto a message-passing multicomputer based on static load balancing using simulated annealing to achieve a nearly optimal allocation of multiple production rules to processor nodes is presented.
Abstract: A methodology for mapping rule-based expert systems onto a message-passing multicomputer is presented. The method is based on static load balancing using simulated annealing to achieve a nearly optimal allocation of multiple production rules to processor nodes. The goal is to balance the initial load distribution and to avoid serious communication overhead among processor nodes at run time. A formal model is developed and a cost function is defined in the annealing process. Heuristic swap functions and cooling policies which ensure the efficiency and quality of the annealing process are given. A software load-balancing package is implemented on a SUN 3/280 workstation to carry out the benchmark experiments. The overhead associated with this mapping method is O(m ln m), where m is the number of production rules in the system. The Monkey and Bananas expert system with 24 rules is mapped onto an 8-node hypercube. Experimental results verify the effectiveness of the mapping method. The method can be applied in practical parallel production systems and to achieve scalability in performance. >

Journal ArticleDOI
E. Barsotti1, A. W. Booth1, M. Bowden1, C. Swoboda1, N. Lockyer, R. VanBerg 
TL;DR: A data acquisition system architecture which draws heavily from the communications industry, capable of data rates of hundreds of gigabytes per second from the detector and into an array of online processors, and uses an open systems architecture to guarantee compatibility with future commercially available online processor farms.
Abstract: A data acquisition system architecture which draws heavily from the communications industry is proposed. The architecture is totally parallel (i.e. without any bottlenecks), capable of data rates of hundreds of gigabytes per second from the detector and into an array of online processors (i.e. processor farm), and uses an open systems architecture to guarantee compatibility with future commercially available online processor farms. The main features of the system are standard interface ICs to detector subsystems wherever possible, fiber-optic digital data transmission from the near-detector electronics, a self-routing parallel event builder, and the use of industry-supported, high-level language programmable processors in the proposed Bottom Collider Detector (BCD) system for both triggers and online filters. A brief status report of an ongoing project to build a prototype of the proposed data acquisition system architecture is given. The major component of the system, a self-routing parallel event builder, is described in detail. >

Proceedings ArticleDOI
17 Jun 1990
TL;DR: A collective multilayer neurallike architecture is described, characterized by speed of convergence, scalability, and guaranteed convergence to optimal solutions for highly parallel, fine-grained, and distributed architectures.
Abstract: Highly interconnected networks of relatively simple processing elements are shown to be very effective in solving difficult optimization problems. Problems that fall into the broad category of finding a least-cost path between two points, given a distributed and sometimes complex cost map, are studied. A neurallike architecture and associated computational rules are proposed for the solution of this class of optimal path-finding problems in two- and higher-dimensional spaces. The proposed algorithm is local in nature and is very well suited for highly parallel, fine-grained, and distributed architectures. Also described is a collective multilayer neurallike architecture, characterized by speed of convergence, scalability, and guaranteed convergence to optimal solutions

Proceedings ArticleDOI
Smith1, Hsu1, Hsiung1
01 Nov 1990
TL;DR: In this article, the authors discuss the challenges facing designers of future general-purpose supercomputers, and describe architectural features and characteristics that may be used to meet these challenges, including balancing vector/scalar performance, vector performance, scalar processing, supporting scalability, large-scale memory systems, and high-performance I/O and networking.
Abstract: The authors discuss the challenges facing designers of future general-purpose supercomputers, and describe architectural features and characteristics that may be used to meet these challenges. Balancing vector/scalar performance, vector performance, scalar processing, supporting scalability, large-scale memory systems, and high-performance I/O and networking are discussed. Performance trends and projections for general-purpose supercomputer systems are given.

Journal ArticleDOI
TL;DR: Ariel is a multiprocessor architecture that is developing to simulate neural networks and other models of distributed computation based upon a hierarchical network of coarse-grained processing modules and its major components are described.
Abstract: Ariel is a multiprocessor architecture that we are developing to simulate neural networks and other models of distributed computation. The design is based upon a hierarchical network of coarse-grained processing modules. The module hardware uses fast digital signal processors and very large semiconductor memories to provide the throughput and storage capacity required to simulate large networks. Our objective is to provide a system that can be scaled up to simulate neural networks composed of millions of nodes and 10s of billions of interconnections at rates exceeding 100 billion connection operations per second. This paper discusses the technical challenges in neural network simulation and describes Ariel's major components.

Proceedings ArticleDOI
02 Dec 1990
TL;DR: The intent is to provide a first-round objective comparison of alternative switch architectures based on Batcher/banyan technologies which can provide 1 Tb/s of raw bandwidth, and to examine scaling issues associated with large central office switching fabrics.
Abstract: The authors examine the issues concerning the scalability of Batcher/banyan networks to architectures suitable for large broadband central offices. Their intent is to provide a first-round objective comparison of alternative switch architectures based on Batcher/banyan technologies which can provide 1 Tb/s of raw bandwidth. Scaling issues associated with large central office switching fabrics are examined. Device count is used as a measure to compare the complexity of each architecture. This provides a relative measure of the physical size and the power requirements that can be expected from each approach. All configurations assume the use of compact three-dimensional packaging and custom VLSI circuits to minimize size and connectivity constraints. >

01 Jan 1990
TL;DR: An implemented prototype interconnects IBM AT, SUN-3 and MAC-II machines demonstrating performance improvements over conventional high-performances scalable multiprocessors proves that high-performance, low-cost, scalable shared memory interconnections can be built and combine high performance with scalability.
Abstract: This dissertation presents an architecture and describes an implementation for a high-performance, scalable shared memory interconnection. The architecture is based on a scalable memory model called PRAM. Conventional shared memory multiprocessors provide high performance but they do not scale well to either a large number of processors or over long distances. The PRAM network is scalable and allows heterogeneous processors to be interconnected achieving high effective data transfer rates and low latencies. An implemented prototype interconnects IBM AT, SUN-3 and MAC-II machines demonstrating performance improvements over conventional high-performances scalable multiprocessors. The successful prototype implementation proves that high-performance, low-cost, scalable shared memory interconnections can be built and combine high performance with scalability.

Proceedings ArticleDOI
05 Sep 1990
TL;DR: CODA, a multiprocessor architecture designed for sensor fusion in real-time applications, is discussed and the hardware characteristics of CODA are scalability, high performance in computation and communication, and priority control for hard real- time processing.
Abstract: CODA, a multiprocessor architecture designed for sensor fusion in real-time applications, is discussed. The hardware characteristics of CODA are scalability, high performance in computation and communication, and priority control for hard real-time processing. CODA consists of multiple identical processors connected by means of a packet exchange interconnection network. The network is a prioritized multistage router network which is priority inversion free. The unit processor is a register-based architecture and realizes data synchronization on a register by a proposed instruction insertion method. By this method, all incoming data call be efficiently handled on the execution pipeline. >

Journal ArticleDOI
Aloke Guha1, M. W. Derstine
TL;DR: Performance analysis of SPARO reveals that while discrete computing structures can be implemented using optical techniques, massively parallel optical architectures for traditional computational models are currently unable to compete with electronic ones due to the lack of large scale addressable optical memory devices and large scale integratable optical computing elements.
Abstract: This paper presents a case study in the design and analysis of a massively parallel optical computer, SPARO, a novel scalable computer intended for symbolic and numeric computing. SPARO was designed for fine-grained parallel processing of combinator graph reduction, a special case of the graph reduction computational model, found most appropriate for parallel optical processing in earlier studies. The architecture consists of a planar array of optical processors that communicate through simple messages (data packets) over an optical interconnection network. A technique called instruction passing is used to realize distributed control of the architecture. Instruction passing can also be used to implement complex structures such as recursion and iteration. Each individual processor in SPARO is a finite state machine that is implemented using symbolic substitution techniques, while gateable interconnects are used to realize data movements between the processors and network. Performance analysis of SPARO reveals that while discrete computing structures can be implemented using optical techniques, massively parallel optical architectures for traditional computational models are currently unable to compete with electronic ones due to the lack of large scale addressable optical memory devices and large scale integratable optical computing elements. However, optical interconnections appear very promising for providing the network throughput necessary for these parallel architectures.

Proceedings ArticleDOI
01 Oct 1990
TL;DR: The problem of mapping a computation-intensive task of irregular structure onto a parallel framework is examined and the authors address the issues of portability and scalability and look at specific features of the application that can be exploited.
Abstract: This paper deals with the problem of mapping a computation-intensive task of irregular structure onto a parallel framework. Our application is the switch-level logic simulation of digital circuits, a technique that is in wide use for the verification of VLSI designs. We focus on medium-grain multiprocessors (shared memory or message passing machines) and only consider model parallel computation, where the model of the design to be simulated is partitioned among processors. We address the issues of portability and scalability and look at specific features of the application that can be exploited. Different ways of mapping the simulation problem onto a parallel framework are presented. A prototype implementation of our algorithms is described. Experimental results demonstrate the potential for speedup and highlight the problem of close coupling between processors due to distributed iteration.

Proceedings ArticleDOI
05 Dec 1990
TL;DR: A model that combines distributed computing, real-time constraints, probabilistic correctness, and large system size is introduced; to the authors' knowledge, no previous work addresses this combination.
Abstract: The authors address the topic of how to meet a real-time constraint as the load on a distributed system increases, without increasing the capacity of the individual processors. The application studied is a real-time resource counter with a probabilistic correctness criterion, and is motivated by the problem of implementing resource management in the telephone network. They introduce a model that combines distributed computing, real-time constraints, probabilistic correctness, and large system size; to the authors' knowledge, no previous work addresses this combination. >

Proceedings ArticleDOI
08 May 1990
TL;DR: An algorithm for time advancement in distributed simulation, which is part of CNSCIM, is presented and provides predictions of response time and other performance measures as a function of offered load.
Abstract: The overall aim of the research described is to investigate the use of scalable, decentralized state-feedback algorithms for distributed systems resource management. These algorithms are required to be efficient and to respond to system changes in real time but have to base their decisions on incomplete and inaccurate information of the system state. A description is also given of CNCSIM, a software tool developed to support designers in distributed algorithms modelling. Specifically, CNCSIM, which is oriented towards the rapid assessment of decentralized load-sharing algorithms for distributed systems, is a distributed discrete event-driven simulator implemented in CONIC. It provides predictions of response time and other performance measures as a function of offered load, and allows tracing of the progress through the system of each message/event to facilitate model validation and analysis. An algorithm for time advancement in distributed simulation, which is part of CNSCIM, is presented. >

Proceedings ArticleDOI
02 Dec 1990
TL;DR: A hybrid architecture is presented in which SE clusters are interconnected through a communication network to form a SN structure at the inter-cluster level to avoid the data access bottleneck.
Abstract: The most debated architectures for parallel database processing are shared nothing (SN) and shared everything (SE) structures. Although SN is considered to be most scalable, it is very sensitive to the data skew problem. On the other hand, SE allows the collaborating processors to share the work load more efficiently. It, however, suffers from the limitation of the memory and disk I/O bandwidth. The authors present a hybrid architecture in which SE clusters are interconnected through a communication network to form a SN structure at the inter-cluster level. Processing elements are clustered into SE systems to minimize the skew effect. Each cluster, however, is kept within the limitation of the memory and I/O technology to avoid the data access bottleneck. A generalized performance model was developed to perform sensitivity analysis for the hybrid structure, and to compare it against SE and SN organizations. >

Proceedings ArticleDOI
02 Dec 1990
TL;DR: A project (nicknamed Imagine) under development at CEFRIEL is presented, and the main scope is the study of a service of image retrieval from a remote database, accomplished by means of direct experimentation on a functionally complete laboratory system prototype.
Abstract: A project (nicknamed Imagine) under development at CEFRIEL is presented. The main scope of Imagine is the study of a service of image retrieval from a remote database, accomplished by means of direct experimentation on a functionally complete laboratory system prototype. Response time seems to be the most influencing factor from the point of view of man-machine interaction, and thus constitutes the focus of the investigation. Scalability. that is the ability of maintaining the service in a wide range of workstations, is also considered a key issue. Network digital rates are discussed. >

Proceedings ArticleDOI
R. Zippel1
01 Sep 1990
TL;DR: A fine-grained, massively parallel SIMD (single-instruction-stream, multiple-data-stream) architecture, called the data structure accelerator, is presented, and its use in a number of problems in computational geometry is demonstrated.
Abstract: A fine-grained, massively parallel SIMD (single-instruction-stream, multiple-data-stream) architecture, called the data structure accelerator, is presented, and its use in a number of problems in computational geometry is demonstrated. This architecture is extremely dense and highly scalable. Systems of 10/sup 6/ processing elements can be feasibly embedded in workstations. It is proposed that this architecture be used in tandem with conventional, single-sequence machines and with small-scale, shared-memory multiprocessors. A language for programming such heterogeneous systems that smoothly incorporates the SIMD instructions of the data structure accelerator with conventional single sequence code is presented. >