scispace - formally typeset
Search or ask a question

Showing papers in "ACM Sigarch Computer Architecture News in 1984"


Journal ArticleDOI
TL;DR: PASM is a multifunction partitionable SIMD/MIMD system being designed at Purdue for parallel image understanding that will incorporate over 1,000 complex processing elements.
Abstract: PASM is a multifunction partitionable SIMD/MIMD system being designed at Purdue for parallel image understanding. It is to be a large-scale, dynamically reconfigurable multimicroprocessor system, which will incorporate over 1,000 complex processing elements. Parallel algorithm studies and simulations have been used to analyze application tasks in order to guide design decisions. A prototype of PASM is under construction (funded by an equipment grant from IBM), including 30 Motorola MC68010 processors, a multistage interconnection network, five disk drives, and connections to the Purdue Engineering Computer Network (for access to peripherals, terminals, software development tools, etc.). PASM is to serve as a vehicle for studying the use of parallelism for performing the numeric and symbolic processing needed for tasks such as computer vision. The PASM design concepts and prototype are overviewed and brief examples of parallel algorithms are given.

31 citations


Journal ArticleDOI
TL;DR: There have been several new computers and new studies relating to Reduced Instruction Set Computers (RISC) since the last article in Computer Architecture News, and the last stages of the Berkeley RISC project are reviewed.
Abstract: There have been several new computers and new studies relating to Reduced Instruction Set Computers (RISC) since our last article in Computer Architecture News. The first report is on RISC II, a much more aggressive implementation of the Berkeley RISC architecture. The studies of RISCs in new areas include floating point, big benchmarks, Lisp, and ECL. After reviewing the last stages of the Berkeley RISC project, I list the commercial RISCs, and conclude with a short description of our next project.

18 citations


Journal ArticleDOI
TL;DR: This paper hopes to more completely evaluate the reduced Instruction Set Computer, a relatively new concept in c(mput-er architecture, by removing extraneous factors and re-evaluating the RISC I.
Abstract: 1. II~IKI~JCTION Recently reported research <3> indicates that the RISC I, a reduced instruction set con~puter, is able to outperform conventional processors. The validity of these results has, however, been questioned <7> since factors not directly related to the size and s ~ c d of the instruction set may have been utilized to the RISC I's advantage. By removing these extraneous factors, and re-evaluating the RISC I, this paper hopes to more completely evaluate the reduced instruction set computer. 2. BACKGR~JND The Reduced Instruction Set Computer is a relatively new concept in c(mput-er architecture. The most publicized example of the reduced instruction set design philosophy is the RISC I, a 32 bit microprocessor which has been developed at the University of California, Berkeley. The results reported for the RISC I, when compared to conventional microprocessors, indicate that the RISC I offers improved performance when executing compiled C programs. The tests used in this evaluation compared the performance of the RISC I to the MC68000, the Z8000, and several other processors. The performance of these processors was measured via benchmark programs which were written in C and translated into machine language using a compiler. There are, however, two factors other than the reduced instruction set which may have affected the performance of the RISC I. These are the register window (together with the large number of registers) and the type of compiler used for the C programs. The register scheme, 138 registers allocated in overlapping groups of 32, provide a means of context switching which may have significantly increased the performance of the RISC I. The prograns written for the RISC I were compiled with the use of a peephole optimizer. The same programs when written for the conventional processors were compiled with a portable compiler. This discrepancy between the compilers used for the different machines being compared may have served to disproportionately benefit the RISC I.

12 citations


Journal ArticleDOI
TL;DR: The long range goal of this research project is to develop a hierarchical structure with each node being replaced by a cluster of simple processors which are custom designed to implement computational primitives associated with digital signal and image processing applications.
Abstract: A 24 node multiprocessor computer system is currently under development at the N. C. State University. A diameter oriented ALPHA structure is used for interconnecting the computers which will operate in a MIMD mode. A communications processor is added to each node to satisfy data communications requirements in an efficient manner in a packet switched environment. A host computer will be used for program development and compilation. The software complexity will be reduced by the use of a static mapping of the partitioned computational flow diagram representing the algorithm to be executed. The long range goal of this research project is to develop a hierarchical structure with each node being replaced by a cluster of simple processors which are custom designed to implement computational primitives associated with digital signal and image processing applications.

6 citations


Journal ArticleDOI
TL;DR: The benchmark is a computer program established by Forest Baskett and used to compare the execution time of various languages and machines using a depth-first, recursive, backtracking tree search algorithm to find a solution to a particular puzzle.
Abstract: The Benchmark The benchmark is a computer program established by Forest Baskett and used to compare the execution time of various languages and machines [1]. It is a depth-first, recursive, backtracking tree search algorithm to find a solution to a particular puzzle. It searches the tree in a defined order and stops when it finds the first solution. The power of the language and machine are gauged by how long it takes to find that solution. Because the search algorithm uses operations similar to those in many complicated applications, its ranking of computational power is a reasonable guide for many users. ~.hc__PAtzyA._~ The benchmark uses a block packing puzzle invented by John Conway [2]. A total of 18 pieces fill a cube 5 units on a side. There are thirteen pieces of size 1X 2 X 4, three of 1X 1X 3, one of 1X 2 X 2, and one of 2 X 2 X 2. Each piece may be placed in any orientation. There are 572 distinct solutions; multiplying this by 2 for mirror reflections and by 24 for rotations of the completed puzzle yields 27,456 total solutions. Conway's invention of the puzzle includes an insight about where some of the pieces must go. Without this insight the puzzle is difficult to assemble by hand; with the trick it is realtively easy. Also, solutions exist only for two distinct positions of the small cube; and, once the small cube's position is chosen, the 1X 2 X 2 has a unique position in all solutions. The benchmark does not use Conway's insight or the coincidental constraints, though. Its purpose is to provide a standard of computational effort, so it uses a simple exhaustive search. The benchmark does use one insight: it places a 1X 2 X 4 in the corner during initialization, before the actual search begins. There is no need for this, but it is legitimate. Each of the five pieces of other dimensions can occupy only one corner of the completed puzzle. Since there are eight corners, every solution must have a 1X 2 X 4 in some corner, indeed in at least three.

6 citations


Journal ArticleDOI
TL;DR: A technique is presented whereby all of the data-accessing algorithms are separated out into separate code streams which run as coroutines in a special address processor which is appropriate to the accessing of arrays and data structures.
Abstract: Although present computers often provide excellent operations for data manipulation, data accessing is generally performed by manipulation of addresses with little recognition of the actual data structure of programs. In an attempt to overcome this deficiency, a technique is presented whereby all of the data-accessing algorithms are separated out into separate code streams which run as coroutines in a special address processor. The operations of this address processor are appropriate to the accessing of arrays and data structures. The resultant system is able to handle, in hardware, most of the data structures which are found in present programming languages.

5 citations


Journal ArticleDOI
TL;DR: It is suggested that instructions with implicit register references may be readily added to current computer processors in order to facilitate a number of common operations.
Abstract: It is suggested that instructions with implicit register references may be readily added to current computer processors in order to facilitate a number of common operations.

2 citations


Journal ArticleDOI
TL;DR: "The authors need an order of magnitude improvea~nts, hopefully in base ]0", before new architectures can find their way into commercial computers, as one of the Fame]Jsts remarked.
Abstract: During the past few years, researchers in Computer Architecture have seen a number of new, so called "non- von Neumann" innovations. Some of these innovations are finding their way into commercial computers, while others are relegated to the back waters of research logs. There are several reasons for the reluctance of manufacturers in implementing new ideas. One of the important reasons cited by the panel at NCC 81 [1] is the lack of quantitative data substantiating the benefits of the innovations. As one of the panelists remarked, "we need an order of magnitude improvements, hopefully in base 10", before new architectures can find their way into commercial computers.

2 citations


Journal ArticleDOI
TL;DR: Many current expert systems are constructed in a rule based fashion, using a series of rules or productions which implement the reasoning logic of the system, but this is not well suited to their implementation via conventional Von-Neumann architectures.
Abstract: Many current expert systems are constructed in a rule based fashion. That is, using a series of rules or productions which implement the reasoning logic of the system. While such an approach is well suited to the task of building expert systems, it is not well suited to their implementation via conventional Von-Neumann architectures.What is required is an architecture which is more appropriate for evaluating and executing rules. Such a machine requires the ability to perform many small, heterogenous operations in parallel. A dataflow machine provides just such an ability and there is a natural mapping which may be exploited from rules to a series of dataflow templates.

2 citations


Journal ArticleDOI
TL;DR: The automated process exchange mechanism on the Model P400 and the 50 Series of PR1ME machines is described and its use in PRIMOS is explained and validated.
Abstract: A high speed mechanism for process exchange is essential in a time sharing system based on the use of many processes. The automated process exchange mechanism on the Model P400 and the 50 Series of PR1ME machines is described. Typical timing for operations using the process exchange mechanism on the P750 is given. Its use in PRIMOS is explained and validated.

1 citations


Journal ArticleDOI
TL;DR: With the advent of self-configurable WSI, many candidates have been proposed for implementation and several have been produced, including main memory, which is an obvious candidate for wafer implementation.
Abstract: With the advent of self-configurable WSI, many candidates have been proposed for implementation and several have been produced. Trilogy set i~s sights on a supercomputer based on 40 ECL kilowatt wafers. Mosaic has reportedly delivered wafers with CMOS chips flip bonded on the wafer. Wafer Scale Integration, Inc. has announced its intent to produce CMOS wafers using a standard cell building block approach, and Sinclair has a~nounced a research effort based on the concepts of Ivor Catt x involving a wafer of serial shift register chips which self-connect to adjacent operable chips to form a wafer serial memory. NTT of Japan mounted a full wafer memory effort 2 using six memory modules of redundantly paired CMOS RAM chips. Most of the WSI projects have been based on memory elements since by their nature one element is like another and thus provide inherent redundancy, unlike unique logic elements (another possibility is specialized architecture~such as systolic processing using identical computing elements). A strong argument for memory wafers can be made by observing that the semiconductor industry is capable of producing a 32 bit wide multi-MIPS computer on a single chip while the demand for main memory remains insatiable. Thus an obvious candidate for wafer implementation is main memory. A recent paper 3 described the concept of wafer virtual memory (WAVM) which consists of a wafer of byte-wide off-the-shelf RAM's interconnected by a common bus on the wafer and used by treating bad RAM's as unavailable pages of a virtual memory. The only modification required to a conventional RAM is the addition of an address comparator so that each RAM can be uniquely addressed. In this architecture, the common bus is the focal point of both yield and performance: It must be small enough in total area to maintain a reasonable wafer yield while wide enough to allow an adequate transfer rate. Several techniques are available to enhance both yield and transfer rate. Yield of WSI is a mixture of two components-element yield and wafer yield. Element yield can be controlled at an acceptable level by utilizing smaller elements; e.g.~ a mature RAM of small area in lieu of a state-of-the-art RAM that pushes the size limits of reasonable yields. Another possibility is to create an element out of one half of a conventional RAM. Wafer yield is a function of the common bus and configuration logic. Since for WAVM the configuration logic …

Journal ArticleDOI
TL;DR: Power of the methodology will lie in its generality, i.e. it could be used to design an architecture for practically any arbitrary computing environment.
Abstract: To design a computer architecture for a class of computations (algorithms), systematically and in a top-down fashion, a general and uniform methodology should be developed. For a given class, there exists an information structure of the architecture such that efficient performance can be achieved for the given class. The methodology is used to find such an information structure and then, to define the control structure of the architecture at functional level. The control structure itself can be treated as another architecture (with a different computing environment), and therefore, again, its Information Structure and then Control Structure (at a lower level) could be found using the same methodology. This recursive application of the methodology to define and design Information Structures and Control Structures terminates when the Control Structure can be trivially 'hard-wired'. Power of the methodology will lie in its generality, i.e. it could be used to design an architecture for practically any arbitrary computing environment.