Showing papers in "IEEE Transactions on Computers in 1987"

PDF

Open Access

Journal Article•DOI•

Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

[...]

Dally¹, Seitz²•Institutions (2)

Massachusetts Institute of Technology¹, California Institute of Technology²

01 May 1987-IEEE Transactions on Computers

TL;DR: In this article, a deadlock-free routing algorithm for arbitrary interconnection networks using the concept of virtual channels is presented, where the necessary and sufficient condition for deadlock free routing is the absence of cycles in a channel dependency graph.

...read moreread less

Abstract: A deadlock-free routing algorithm can be generated for arbitrary interconnection networks using the concept of virtual channels. A necessary and sufficient condition for deadlock-free routing is the absence of cycles in a channel dependency graph. Given an arbitrary network and a routing function, the cycles of the channel dependency graph can be removed by splitting physical channels into groups of virtual channels. This method is used to develop deadlock-free routing algorithms for k-ary n-cubes, for cube-connected cycles, and for shuffle-exchange networks.

...read moreread less

2,110 citations

Journal Article•DOI•

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

[...]

Edward A. Lee¹, David G. Messerschmitt¹•Institutions (1)

University of California, Berkeley¹

01 Jan 1987-IEEE Transactions on Computers

TL;DR: This self-contained paper develops the theory necessary to statically schedule SDF programs on single or multiple processors, and a class of static (compile time) scheduling algorithms is proven valid, and specific algorithms are given for scheduling SDF systems onto single ormultiple processors.

...read moreread less

Abstract: Large grain data flow (LGDF) programming is natural and convenient for describing digital signal processing (DSP) systems, but its runtime overhead is costly in real time or cost-sensitive applications. In some situations, designers are not willing to squander computing resources for the sake of programmer convenience. This is particularly true when the target machine is a programmable DSP chip. However, the runtime overhead inherent in most LGDF implementations is not required for most signal processing systems because such systems are mostly synchronous (in the DSP sense). Synchronous data flow (SDF) differs from traditional data flow in that the amount of data produced and consumed by a data flow node is specified a priori for each input and output. This is equivalent to specifying the relative sample rates in signal processing system. This means that the scheduling of SDF nodes need not be done at runtime, but can be done at compile time (statically), so the runtime overhead evaporates. The sample rates can all be different, which is not true of most current data-driven digital signal processing programming methodologies. Synchronous data flow is closely related to computation graphs, a special case of Petri nets. This self-contained paper develops the theory necessary to statically schedule SDF programs on single or multiple processors. A class of static (compile time) scheduling algorithms is proven valid, and specific algorithms are given for scheduling SDF systems onto single or multiple processors.

...read moreread less

1,380 citations

Journal Article•DOI•

Debugging Parallel Programs with Instant Replay

[...]

Leblanc¹, Mellor-Crummey¹•Institutions (1)

University of Rochester¹

01 Apr 1987-IEEE Transactions on Computers

TL;DR: Instant Replay as discussed by the authors is a general solution for reproducing the execution behavior of parallel programs, termed Instant Replay, which saves the relative order of significant events as they occur, not the data associated with such events.

...read moreread less

Abstract: The debugging cycle is the most common methodology for finding and correcting errors in sequential programs. Cyclic debugging is effective because sequential programs are usually deterministic. Debugging parallel programs is considerably more difficult because successive executions of the same program often do not produce the same results. In this paper we present a general solution for reproducing the execution behavior of parallel programs, termed Instant Replay. During program execution we save the relative order of significant events as they occur, not the data associated with such events. As a result, our approach requires less time and space to save the information needed for program replay than other methods. Our technique is not dependent on any particular form of interprocess communication. It provides for replay of an entire program, rather than individual processes in isolation. No centralized bottlenecks are introduced and there is no need for synchronized clocks or a globally consistent logical time. We describe a prototype implementation of Instant Replay on the BBN Butterfly™ Parallel Processor, and discuss how it can be incorporated into the debugging cycle for parallel programs.

...read moreread less

765 citations

Journal Article•DOI•

Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers

[...]

Constantine D. Polychronopoulos¹, David J. Kuck¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Dec 1987-IEEE Transactions on Computers

TL;DR: For certain types of loops, it is shown analytically that guided self-scheduling uses minimal overhead and achieves optimal schedules, and experimental results that clearly show the advantage of guidedSelfScheduling over the most widely known dynamic methods are discussed.

...read moreread less

Abstract: This paper proposes guided self-scheduling, a new approach for scheduling arbitrarily nested parallel program loops on shared memory multiprocessor systems. Utilizing loop parallelism is clearly most crucial in achieving high system and program performance. Because of its simplicity, guided self-scheduling is particularly suited for implementation on real parallel machines. This method achieves simultaneously the two most important objectives: load balancing and very low synchronization overhead. For certain types of loops we show analytically that guided self-scheduling uses minimal overhead and achieves optimal schedules. Two other interesting properties of this method are its insensitivity to the initial processor configuration (in time) and its parameterized nature which allows us to tune it for different systems. Finally we discuss experimental results that clearly show the advantage of guided self-scheduling over the most widely known dynamic methods.

...read moreread less

656 citations

Journal Article•DOI•

Clock Synchronization in Distributed Real-Time Systems

[...]

Hermann Kopetz¹, Wilhelm Ochsenreiter¹•Institutions (1)

University of Vienna¹

01 Aug 1987-IEEE Transactions on Computers

TL;DR: Depending on the types and number of tolerated faults, this paper presents upper bounds on the achievable synchronization accuracy for external and internal synchronization in a distributed real-time system.

...read moreread less

Abstract: The generation of a fault-tolerant global time base with known accuracy of synchronization is one of the important operating system functions in a distributed real-time system. Depending on the types and number of tolerated faults, this paper presents upper bounds on the achievable synchronization accuracy for external and internal synchronization in a distributed real-time system. The concept of continuous versus instantaneous synchronization is introduced in order to generate a uniform common time base for local, global, and external time measurements. In the last section, the functions of a VLSI clock synchronization unit, which improves the synchronization accuracy and reduces the CPU load, are described. With this unit, the CPU overhead and the network traffic for clock synchronization in state-of-the-art distributed real-time systems can be reduced to less than 1 percent.

...read moreread less

625 citations

Journal Article•DOI•

A Partitioning Strategy for Nonuniform Problems on Multiprocessors

[...]

Berger¹, Bokhari²•Institutions (2)

Courant Institute of Mathematical Sciences¹, University of Engineering and Technology, Lahore²

01 May 1987-IEEE Transactions on Computers

TL;DR: This work uses a binary decomposition of the domain to partition it into rectangles requiring equal computational effort, and studies the communication costs of mapping this partitioning onto different multiprocessors: a mesh- connected array, a tree machine, and a hypercube.

...read moreread less

Abstract: We consider the partitioning of a problem on a domain with unequal work estimates in different subdomains in a way that balances the workload across multiple processors. Such a problem arises for example in solving partial differential equations using an adaptive method that places extra grid points in certain subregions of the domain. We use a binary decomposition of the domain to partition it into rectangles requiring equal computational effort. We then study the communication costs of mapping this partitioning onto different multiprocessors: a mesh- connected array, a tree machine, and a hypercube. The communication cost expressions can be used to determine the optimal depth of the above partitioning.

...read moreread less

623 citations

Journal Article•DOI•

The Warp Computer: Architecture, Implementation, and Performance

[...]

Marco Annaratone¹, E. Arnould¹, Thomas Gross¹, Hsiang-Tsung Kung¹, Monica S. Lam¹, O. Menzilcioglu¹, Jon A. Webb¹ - Show less +3 more•Institutions (1)

Carnegie Mellon University¹

01 Dec 1987-IEEE Transactions on Computers

TL;DR: The architecture, implementation, and performance of the Warp machine is described, demonstrating that the Warp architecture is effective in the application domain of robot navigation as well as in other fields such as signal processing, scientific computation, and computer vision research.

...read moreread less

Abstract: The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes ten cells, thus having a peak computation rate of 100 MFLOPS. The Warp array can be extended to include more cells to accommodate applications capable of using the increased computational bandwidth. Warp is integrated as an attached processor into a Unix host system. Programs for Warp are written in a high-level language supported by an optimizing compiler. The first ten-cell prototype was completed in February 1986; delivery of production machines started in April 1987. Extensive experimentation with both the prototype and production machines has demonstrated that the Warp architecture is effective in the application domain of robot navigation as well as in other fields such as signal processing, scientific computation, and computer vision research. For these applications, Warp is typically several hundred times faster than a VAX 11/780 class computer. This paper describes the architecture, implementation, and performance of the Warp machine. Each major architectural decision is discussed and evaluated with system, software, and application considerations. The programming model and tools developed for the machine are also described. The paper concludes with performance data for a large number of applications.

...read moreread less

328 citations

Journal Article•DOI•

On-the-Fly Conversion of Redundant into Conventional Representations

[...]

Ercegovac¹, Lang²•Institutions (2)

University of California, Berkeley¹, University of California, Los Angeles²

01 Jul 1987-IEEE Transactions on Computers

TL;DR: An algorithm to convert redundant number representations into conventional representations is presented, which is applicable in arithmetic algorithms such as nonrestoring division, square root, and on-line operations in which redundantly represented results are generated in a digit-by-digit manner.

...read moreread less

Abstract: An algorithm to convert redundant number representations into conventional representations is presented. The algorithm is performed concurrently with the digit-by-digit generation of redundant forms by schemes such as SRT division. It has a step delay roughly equivalent to the delay of a carry-save adder and simple implementation. The conversion scheme is applicable in arithmetic algorithms such as nonrestoring division, square root, and on-line operations in which redundantly represented results are generated in a digit-by-digit manner, from most significant to least significant.

...read moreread less

256 citations

Journal Article•DOI•

Coherent Cooperation Among Communicating Problem Solvers

[...]

Edmund H. Durfee¹, Victor Lesser¹, Daniel D. Corkill¹•Institutions (1)

University of Massachusetts Amherst¹

01 Nov 1987-IEEE Transactions on Computers

TL;DR: In this article, the authors describe three mechanisms that improve network coherence: an organizational structure that provides a long-term framework for network coordination to guide each node's local control decisions; a planner at each node that develops sequences of problem-solving activities based on the current situation; and meta-level communication about the current state of local problem solving that enables nodes to dynamically refine the organization.

...read moreread less

Abstract: When two or more computing agents work on interacting tasks, their activities should be coordinated so that they cooperate coherently. Coherence is particularly problematic in domains where each agent has only a limited view of the overall task, where communication between agents is limited, and where there is no ``controller'' to coordinate the agents. Our approach to coherent cooperation in such domains is developed in the context of a distributed problem-solving network where agents cooperate to solve a single problem. The approach stresses the importance of sophisticated local control by which each problem-solving node integrates knowledge of the problem domain with (meta-level) knowledge about network coordination. This allows nodes to make rapid, intelligent local decisions based on changing problem characteristics with only a limited amount of intercommunication to coordinate these decisions. We describe three mechanisms that improve network coherence: 1) an organizational structure that provides a long-term framework for network coordination to guide each node's local control decisions; 2) a planner at each node that develops sequences of problem-solving activities based on the current situation; and 3) meta-level communication about the current state of local problem solving that enables nodes to dynamically refine the organization. We present a variety of problem-solving situations to show the benefits and limitations of these mechanisms, and we provide simulation results showing the mechanisms to be particularly cost effective in more complex problem-solving situations. We also discuss how these mechanisms might be of more general use in other distributed computing applications.

...read moreread less

254 citations

Journal Article•DOI•

Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

[...]

Pen-Chung Yew¹, Nian-Feng Tzeng¹, Lawrie¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Apr 1987-IEEE Transactions on Computers

TL;DR: In this paper, it is shown that even if only a small percentage of all requests are to a hot-spot, these requests can cause very serious performances problems, and networks that do the necessary combining of requests are suggested to keep the interconnection network and memory contention from becoming a bottleneck.

...read moreread less

Abstract: When a large number of processors try to access a common variable, referred to as hot-spot accesses in [6], not only can the resulting memory contention seriously degrade performance, but it can also cause tree saturation in the interconnection network which blocks both hot and regular requests alike. It is shown in [6] that even if only a small percentage of all requests are to a hot-spot, these requests can cause very serious performances problems, and networks that do the necessary combining of requests are suggested to keep the interconnection network and memory contention from becoming a bottleneck.

...read moreread less

252 citations

Journal Article•DOI•

Processor Allocation in an N-Cube Multiprocessor Using Gray Codes

[...]

Ming-Syan Chen¹, Kang G. Shin¹•Institutions (1)

University of Michigan¹

01 Dec 1987-IEEE Transactions on Computers

TL;DR: The GC strategy is proposed and shown that the GC strategy outperforms the buddy strategy in detecting the availability of subcubes and the minimal number of GC's required for complete subcube recognition in a Qn is proved to be less than or equal to C[n/2]n.

...read moreread less

Abstract: The processor allocation problem in an n-dimensional hypercube (or an n-cube) multiprocessor is similar to the conventional memory allocation problem. The main objective in both problems is to maximize the utilization of available resources as well as minimize the inherent system fragmentation. A processor allocation strategy using the buddy system, called the buddy strategy, is discussed first and then a new allocation strategy using a Gray code (GC), called the GC strategy, is proposed. When processor relinquishment is not considered (i.e., static allocation), both of these strategies are proved to be optimal in the sense that each incoming request sequence is always assigned to a minimal subcube. It is also shown that the GC strategy outperforms the buddy strategy in detecting the availability of subcubes. Our results are extended further to implement an allocation strategy using more than one GC and derive the relationship between the GC's used and the corresponding ability of detecting the availability of various subcubes. The minimal number of GC's required for complete subcube recognition in a Q n is proved to be less than or equal to C [n/2] n. Several processor allocation strategies in a Q 5 are implemented on the NCUBE/six multiprocessor at the University of Michigan, and their performance is experimentally measured.

...read moreread less

Journal Article•DOI•

A Mapping Strategy for Parallel Processing

[...]

Soo-Young Lee¹, Aggarwal¹•Institutions (1)

University of Texas at Austin¹

01 Apr 1987-IEEE Transactions on Computers

TL;DR: This paper presents a mapping strategy for parallel processing using an accurate characterization of the communication overhead using an efficient mapping scheme for the objective functions, where two levels of assignment optimization procedures are employed: initial assignment and pairwise exchange.

...read moreread less

Abstract: This paper presents a mapping strategy for parallel processing using an accurate characterization of the communication overhead. A set of objective functions is formulated to evaluate the optimality of mapping a problem graph onto a system graph. One of them is especially suitable for real-time applications of parallel processing. These objective functions are different from the conventional objective functions in that the edges in the problem graph are weighted and the actual distance rather than the nominal distance for the edges in the system graph is employed. This facilitates a more accurate quantification of the communication overhead. An efficient mapping scheme has been developed for the objective functions, where two levels of assignment optimization procedures are employed: initial assignment and pairwise exchange. The mapping scheme has been tested using the hypercube as a system graph.

...read moreread less

Journal Article•DOI•

Processor Control Flow Monitoring Using Signatured Instruction Streams

[...]

Schuette¹, Shen¹•Institutions (1)

Carnegie Mellon University¹

01 Mar 1987-IEEE Transactions on Computers

TL;DR: This paper presents an innovative approach, called signatured instruction streams (SIS), to the on-line detection of control flow errors caused by transient and intermittent faults.

...read moreread less

Abstract: This paper presents an innovative approach, called signatured instruction streams (SIS), to the on-line detection of control flow errors caused by transient and intermittent faults. At compile time an application program is appropriately partitioned into smaller subprograms, and cyclic codes, or signatures, characterizing the control flow of each subprogram are generated and embedded in the object code. At runtime, special built-in hardware regenerates these signatures using runtime information and compares them to the precomputed signatures. A mismatch indicates the detection of an error. A demonstration system, based on the MC68000 processor, has been designed and built. Fault insertion experiments have been performed using the demonstration system. The demonstration system, using 17 percent hardware overhead, is able to detect 98 percent of faults affecting the control flow and 82 percent of all randomly inserted faults.

...read moreread less

Journal Article•DOI•

Line (Block) Size Choice for CPU Cache Memories

[...]

Alan Jay Smith¹•Institutions (1)

University of California, Berkeley¹

01 Sep 1987-IEEE Transactions on Computers

TL;DR: In this article, the authors examined the cache miss ratio as a function of line size, and found that for high performance microprocessor designs, line sizes in the range 16-64 bytes seem best; shorter line sizes yield high delays due to memory latency, although they reduce memory traffic somewhat.

...read moreread less

Abstract: The line (block) size of a cache memory is one of the parameters that most strongly affects cache performance. In this paper, we study the factors that relate to the selection of a cache line size. Our primary focus is on the cache miss ratio, but we also consider influences such as logic complexity, address tags, line crossers, I/O overruns, etc. The behavior of the cache miss ratio as a function of line size is examined carefully through the use of trace driven simulation, using 27 traces from five different machine architectures. The change in cache miss ratio as the line size varies is found to be relatively stable across workloads, and tables of this function are presented for instruction caches, data caches, and unified caches. An empirical mathematical fit is obtained. This function is used to extend previously published design target miss ratios to cover line sizes from 4 to 128 bytes and cache sizes from 32 bytes to 32K bytes; design target miss ratios are to be used to guide new machine designs. Mean delays per memory reference and memory (bus) traffic rates are computed as a function of line and cache size, and memory access time parameters. We find that for high performance microprocessor designs, line sizes in the range 16-64 bytes seem best; shorter line sizes yield high delays due to memory latency, although they reduce memory traffic somewhat. Longer line sizes are suitable for mainframes because of the higher bandwidth to main memory.

...read moreread less

Journal Article•DOI•

Task Allocation and Precedence Relations for Distributed Real-Time Systems

[...]

Chu¹, Lan²•Institutions (2)

University of California, Berkeley¹, University of California, Los Angeles²

01 Jun 1987-IEEE Transactions on Computers

TL;DR: This paper presents a method for optimal module allocation that satisfies certain performance constraints and proposes an objective function that includes the intermodule communication (IMC) and accumulative execution time (AET) of each module.

...read moreread less

Abstract: In a distributed processing system with the application software partitioned into a set of program modules, allocation of those modules to the processors is an important problem. This paper presents a method for optimal module allocation that satisfies certain performance constraints. An objective function that includes the intermodule communication (IMC) and accumulative execution time (AET) of each module is proposed. It minimizes the bottleneck-processor utilization—a good principle for task allocation. Next, the effects of precedence relationship (PR) among program modules on response time are studied. Both simulation and analytical results reveal that the program-size ratio between two consecutive modules plays an important role in task response time. Finally, an algorithm based on PR, AET, and IMC and on the proposed objective function is presented. This algorithm generates better module assignments than those that do not consider the PR effects.

...read moreread less

Journal Article•DOI•

The Fast Hartley Transform Algorithm

[...]

Hou¹•Institutions (1)

The Aerospace Corporation¹

01 Feb 1987-IEEE Transactions on Computers

TL;DR: Through use of the fast Hartley transform, discrete cosine transforms (DCT) and discrete Fourier transforms (DFT) can be obtained and the recursive nature of the FHT algorithm derived in this paper enables us to generate the next higher order FHT from two identical lower order F HT's.

...read moreread less

Abstract: The fast Hartley transform (FHT) is similar to the Cooley-Tukey fast Fourier transform (FFT) but performs much faster because it requires only real arithmetic computations compared to the complex arithmetic computations required by the FFT. Through use of the FHT, discrete cosine transforms (DCT) and discrete Fourier transforms (DFT) can be obtained. The recursive nature of the FHT algorithm derived in this paper enables us to generate the next higher order FHT from two identical lower order FHT's. In practice, this recursive relationship offers flexibility in programming different sizes of transforms, while the orderly structure of its signal flow-graphs indicates an ease of implementation in VLSI.

...read moreread less

Journal Article•DOI•

Compiler Algorithms for Synchronization

[...]

Samuel P. Midkiff¹, David Padua¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Dec 1987-IEEE Transactions on Computers

TL;DR: Several loop synchronization techniques to generate synchronization instructions for singly-nested loops are presented and a technique for the elimination of redundant synchronization instructions is presented.

...read moreread less

Abstract: Translating program loops into a parallel form is one of the most important transformations performed by concurrentizing compilers. This transformation often requires the insertion of synchronization instructions within the body of the concurrent loop. Several loop synchronization techniques are presented first. Compiler algorithms to generate synchronization instructions for singly-nested loops are then discussed. Finally, a technique for the elimination of redundant synchronization instructions is presented.

...read moreread less

Journal Article•DOI•

Distributed Nodes Organization Algorithm for Channel Access in a Multihop Dynamic Radio Network

[...]

Chlamtac¹•Institutions (1)

University of Massachusetts Amherst¹

01 Jun 1987-IEEE Transactions on Computers

TL;DR: The node organization algorithm presented in this paper provides a completely distributed, maximally localized execution of collision free channel allocation that allows for parallel channel allocation in stationary and mobile networks with provable spatial reuse properties.

...read moreread less

Abstract: This paper proposes a solution to providing a collision free channel allocation in a multihop mobile radio network. An efficient solution to this problem provides spatial reuse of the bandwidth whenever possible. A robust solution maintains the collision free property of the allocation under any combination of topological changes. The node organization algorithm presented in this paper provides a completely distributed, maximally localized execution of collision free channel allocation. It allows for parallel channel allocation in stationary and mobile networks with provable spatial reuse properties. A simpler version of the algorithm provides also a highly localized distributed coloring algorithm of dynamic graphs.

...read moreread less

Journal Article•DOI•

Checkpoint Repair for High-Performance Out-of-Order Execution Machines

[...]

Wen-mei W. Hwu¹, Yale N. Patt²•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of California, Berkeley²

01 Dec 1987-IEEE Transactions on Computers

TL;DR: This paper presents a class of repair mechanisms using the concept of checkpointing and derives several properties of checkpoint repair mechanisms, and provides algorithms for performing checkpoint repair that incur little overhead in time and modest cost in hardware.

...read moreread less

Abstract: Out-or-order execution and branch prediction are two mechanisms that can be used profitably in the design of supercomputers to increase performance. Proper exception handling and branch prediction miss handling in an out-of-order execution machine do require some kind of repair mechanism which can restore the machine to a known previous state. In this paper we present a class of repair mechanisms using the concept of checkpointing. We derive several properties of checkpoint repair mechanisms. In addition, we provide algorithms for performing checkpoint repair that incur little overhead in time and modest cost in hardware. We also note that our algorithms require no additional complexity or time for use with write-back cache memory systems than they do with write-through cache memory systems, contrary to statements made by previous researchers.

...read moreread less

Journal Article•DOI•

Hypernet: A Communication-Efficient Architecture for Constructing Massively Parallel Computers

[...]

Kai Hwang¹, Joydeep Ghosh¹•Institutions (1)

University of Southern California¹

01 Dec 1987-IEEE Transactions on Computers

TL;DR: This paper presents the principles of constructing hypernets and analyzes their architectural potentials in terms of message routing complexity, cost-effective support for global as well as localized communication, I/O capabilities, and fault tolerance.

...read moreread less

Abstract: A new class of modular networks is proposed for hierarchically constructing massively parallel computer systems for distributed supercomputing and AI applications. These networks are called hypernets. They are constructed incrementally with identical cubelets, treelets, or buslets that are well suited for VLSI implementation. Hypernets integrate positive features of both hypercubes and tree-based topologies, and maintain a constant node degree when the network size increases. This paper presents the principles of constructing hypernets and analyzes their architectural potentials in terms of message routing complexity, cost-effective support for global as well as localized communication, I/O capabilities, and fault tolerance. Several algorithms are mapped onto hypernets to illustrate their ability to support parallel processing in a hierarchically structured or data-dependent environment. The emulation of hypercube connections using less hardware is shown. The potential of hypernets for efficient support of connectionist models of computation is also explored.

...read moreread less

Journal Article•DOI•

A Graph-Theoretic Approach for Timing Analysis and its Implementation

[...]

Farnam Jahanian¹, Aloysius K. Mok¹•Institutions (1)

University of Texas at Austin¹

01 Aug 1987-IEEE Transactions on Computers

TL;DR: A graph-theoretic algorithm for safety analysis of a class of timing properties in real-time systems which are expressible in a subset of real time logic (RTL) formulas.

...read moreread less

Abstract: This paper presents a graph-theoretic algorithm for safety analysis of a class of timing properties in real-time systems which are expressible in a subset of real time logic (RTL) formulas Our procedure is in three parts: the first part constructs a graph representing the system specification and the negation of the safety assertion The second part detects positive cycles in the graph using a node removal operation The third part determines the consistency of the safety assertion with respect to the system specification based on the positive cycles detected The implementation and an application of this procedure will also be described

...read moreread less

Journal Article•DOI•

Tree-Based Broadcasting in Multihop Radio Networks

[...]

Chlamtac¹, Kutten•Institutions (1)

University of Massachusetts Amherst¹

01 Oct 1987-IEEE Transactions on Computers

TL;DR: The proposed broadcast protocol thus possesses the advantages of TDM solutions while allowing the channel bandwidth to be shared, concurrently with the broadcast, with other transmission activities as dictated, for instance, by data link protocols.

...read moreread less

Abstract: This paper considers the issue of broadcasting protocols in multihop radio networks. The objective of a broadcasting protocol is to deliver the broadcasted message to all network nodes. To efficiently achieve this objective the broad- casting protocol in this paper utilizes two basic properties of the multihop radio network. One is the broadcast nature of the radio which allows every single trasmission to reach all nodes that are in line of sight and within range of the transmitting node. The other, spatial reuse of the radio channel, which due to the multihop nature of the network allows multiple simultaneous transmissions to be received correctly. The proposed protocol incorporates these properties to obtain a collision free forwarding of the broadcasted message on a tree. Centralized and distributed algorithms for the tree construction are presented. The obtained trees are unique in incorporating radio oriented time ordering as part of their definition. In this way multiple copies of one or more broadcasted messages can be transmitted simultaneously without collision, requiring only a small number of message transmissions. Consequently, the protocol not only guarantees that the broadcasted message reaches all network nodes in bounded time, but also ensures that the broadcasting activity will use only limited channel bandwidth and node memory. The proposed broadcast protocol thus possesses the advantages of TDM solutions while allowing the channel bandwidth to be shared, concurrently with the broadcast, with other transmission activities as dictated, for instance, by data link protocols. Some NP-completeness proofs are also given.

...read moreread less

Journal Article•

Pseudorandom Testing

[...]

Wagner¹, Chin, McCluskey•Institutions (1)

IBM¹

01 Mar 1987-IEEE Transactions on Computers

TL;DR: In this paper, pseudorandom patterns generated by a linear feedback shift register (LFSR) were used to test a circuit for high fault coverage using the detectability profile of the circuit.

...read moreread less

Abstract: Algorithmic test generation for high fault coverage is an expensive and time-consuming process As an alternative, circuits can be tested by applying pseudorandom patterns generated by a linear feedback shift register (LFSR) Although no fault simulation is needed, analysis of pseudorandom testing requires the circuit detectability profile

...read moreread less

Journal Article•DOI•

SYREL: A Symbolic Reliability Algorithm Based on Path and Cutset Methods

[...]

Hariri¹, Raghavendra²•Institutions (2)

Damascus University¹, University of Southern California²

01 Oct 1987-IEEE Transactions on Computers

TL;DR: A simple and efficient algorithm, SYREL, to obtain compact terminal reliability expressions between a terminal pair of computers of complex networks that incorporates conditional probability, set theory, and Boolean algebra in a distinct approach.

...read moreread less

Abstract: Symbolic terminal reliability algorithms are important for analysis and synthesis of computer networks. In this paper, we present a simple and efficient algorithm, SYREL, to obtain compact terminal reliability expressions between a terminal pair of computers of complex networks. This algorithm incorporates conditional probability,, set theory, and Boolean algebra in a distinct approach in which most of the computations performed are directly executable Boolean operations. The conditibnal probability is used to avoid applying at each iteration the most time consuming step in reliability algorithms, which is making a set of events mutually exclusive. The algorithm has been implemented on a VAX 11/750 and can analyze fairly large networks with modest memory and time requirements.

...read moreread less

Journal Article•DOI•

A Generalized Theory for System Level Diagnosis

[...]

Somani¹, Agarwal², Avis²•Institutions (2)

University of Washington¹, McGill University²

01 May 1987-IEEE Transactions on Computers

TL;DR: A completely new generalization of the characterization problem in the system-level diagnosis area is developed and provides necessary and sufficient conditions for any fault-pattern of any size to be uniquely diagnosable, under the symmetric, and asymmetric invalidation models with or without the intermittent faults.

...read moreread less

Abstract: System-level diagnosis appears to be a viable alternative to circuit-level testing in complex multiprocessor systems. A completely new generalization of the characterization problem in the system-level diagnosis area is developed in this paper. This generalized characterization theorem provides necessary and sufficient conditions for any fault-pattern of any size to be uniquely diagnosable, under the symmetric, and asymmetric invalidation models with or without the intermittent faults. Moreover, it is also shown that the well known t-characterization theorems under these models can be derived as special cases. In addition to the generalization provided by these results, it is hoped that these results will also have a great impact on the diagnosis of faulty units in uniform structures based on the system-level diagnosis concepts and would be particularly useful in the diagnosis of WSI-oriented multiprocessor systems.

...read moreread less

Journal Article•DOI•

Discrete Optimization Problem in Local Networks and Data Alignment

[...]

Fiol, Yebra, Alegre, Valero¹•Institutions (1)

Polytechnic University of Puerto Rico¹

01 Jun 1987-IEEE Transactions on Computers

TL;DR: This paper presents the solution of the following optimization problem that appears in the design of double-loop structures for local networks and also in data memory, allocation and data alignment in SIMD processors.

...read moreread less

Abstract: This paper presents the solution of the following optimization problem that appears in the design of double-loop structures for local networks and also in data memory, allocation and data alignment in SIMD processors.

...read moreread less

Journal Article•DOI•

The Reliability of Voting Mechanisms

[...]

Barbara¹, Garcia-Molina•Institutions (1)

Simón Bolívar University¹

01 Oct 1987-IEEE Transactions on Computers

TL;DR: This paper addresses the problem of selecting vote assignments in order to maximize the probability that the critical operations can be performed at a given time by some group of nodes, and suggests simple heuristics to assign votes.

...read moreread less

Abstract: In a faulty distributed system, voting is commonly used to achieve mutual exclusion among groups of isolated nodes. Each node is assigned a number of votes, and any group with a majority of votes can perform the critical operations. Vote assignments can have a significant impact on system reliability. In this paper we address the problem of selecting vote assignments in order to maximize the probability that the critical operations can be performed at a given time by some group of nodes. We suggest simple heuristics to assign votes, and show that they give good results in most cases. We also study three particular homogeneous topologies (fully connected, Ethernet, and ring networks), and derive analytical expressions for system reliability. These expressions provide useful insights into the reliability provided by voting mechanisms.

...read moreread less

Journal Article•DOI•

Rectilinear Shortest Paths and Minimum Spanning Trees in the Presence of Rectilinear Obstacles

[...]

Ying-Fung Wu, Widmayer¹, Schlag², Wong³•Institutions (3)

Karlsruhe Institute of Technology¹, University of California, Los Angeles², IBM³

01 Mar 1987-IEEE Transactions on Computers

TL;DR: This work uses the track graph, a suitably defined grid-like structure, to obtain efficient solutions for rectilinear shortest paths and minimum spanning tree (MST) problems for a set of points in the plane in the presence of rectilInear obstacles.

...read moreread less

Abstract: We study the rectilinear shortest paths and minimum spanning tree (MST) problems for a set of points in the plane in the presence of rectilinear obstacles. We use the track graph, a suitably defined grid-like structure, to obtain efficient solutions for both problems. The track graph consists of rectilinear tracks defined by the obstacles and the points for which shortest paths and a minimum spanning tree are sought. We use a growth process like Dijkstra's on the track graph to find shortest paths from any point in the set to all other points (the one-to-all shortest paths problem). For the one-to-all shortest paths problem for n points we derive an O(n min {log n, log e} + (e + k) log t) time algorithm, where e is the total number of edges of all obstacles, t is the number of extreme edges of all obstacles, and k is the number of intersections among obstacle tracks (all bounds are for the worst case). The MST for the points is constructed also in time O(n log n + (e + k) log t) by a hybrid method of searching for shortest paths while simultaneously constructing an MST. An interesting application of the MST algorithm is the approximation of Steiner trees in graphs.

...read moreread less

Journal Article•DOI•

Nearest-Neighbor Mapping of Finite Element Graphs onto Processor Meshes

[...]

P. Sadayappan¹, Fikret Ercal¹•Institutions (1)

Ohio State University¹

01 Dec 1987-IEEE Transactions on Computers

TL;DR: A heuristic two-step, graph-based mapping scheme with polynomial-time complexity is developed and a heuristic boundary refinement procedure is developed to incrementally alter the initial partition for improved load balancing among the processors.

...read moreread less

Abstract: The processor allocation problem is addressed in the context of the parallelization of a finite element modeling program on a processor mesh. A heuristic two-step, graph-based mapping scheme with polynomial-time complexity is developed: 1) initial generation of a graph partition for nearest-neighbor mapping of the finite element graph onto the processor graph, and, 2) a heuristic boundary refinement procedure to incrementally alter the initial partition for improved load balancing among the processors. The effectiveness of the approach is gaged both by estimation using a model with empirically determined parameters, as well as implementation and experimental measurement on a 16 node hypercube parallel computer.

...read moreread less

Journal Article•DOI•

New Connectivity and MSF Algorithms for Shuffle-Exchange Network and PRAM

[...]

Awerbuch¹, Shiloach•Institutions (1)

Massachusetts Institute of Technology¹

01 Oct 1987-IEEE Transactions on Computers

TL;DR: Parallel algorithms for finding the connected components (CC) and a minimum spanning FOREST of an undirected graph are presented and the PRAM algorithm is a simplification of the one appearing in [17].

...read moreread less

Abstract: Parallel algorithms for finding the connected components (CC) and a minimum spanning FOREST (MSF) of an undirected graph are presented. The primary model of computation considered is that called "shuffle-exchange network" in which each processor has its own local memory, no memory is shared, and communication among processors is done via a fixed degree network. This model is very convenient for actual realization. Both algorithms have depth of O(log2 n) while using n2 processors. Here n is the number of vertices in the graph. The algorithms are first presented for the PRAM (parallel RAM) model, which is not realizable, but much more convenient for the design and presentation of algorithms. The CC and MSF algorithms are no exceptions. The CC PRAM algorithm is a simplification of the one appearing in [17]. A modification of this algorithm yields a simple and efficient MSF algorithm. Both have depth of O(log m) and they use m processors, where m is the number of edges in the graph.

...read moreread less

Collapse