scispace - formally typeset
Search or ask a question

Showing papers presented at "Asia and South Pacific Design Automation Conference in 1995"


Proceedings ArticleDOI
01 Aug 1995
TL;DR: A datapath synthesis system (DPSS) for the reconfigurabledatapath architecture (rDPA) that allows automatic mapping of high level descriptions onto the rDPA without manual interaction is presented.
Abstract: A datapath synthesis system (DPSS) for the reconfigurable datapath architecture (rDPA) is presented. The DPSS allows automatic mapping of high level descriptions onto the rDPA without manual interaction. The required algorithms of this synthesis system are described in detail. Optimization techniques like loop folding or loop unrolling are sketched. The rDPA is scalable to arbitrarily large arrays and reconfigurable to be adaptable to the computational problem. Fine grained parallelism is achieved by using simple reconfigurable processing elements which are called datapath units (DPUs). The rDPA can be used as a reconfigurable ALU for bus oriented systems as well as for rapid prototyping of high speed datapaths.

118 citations


Proceedings ArticleDOI
01 Aug 1995
TL;DR: The application of this technique for a comprehensive instruction-level power analysis of a commercial 32-bit RISC-based embedded microcontroller and the salient results of the analysis and the basic instruction- level power model are described.
Abstract: A new approach for power analysis of microprocessors has recently been proposed (Tiwari et al 1994). The idea is to look at the power consumption in a microprocessor from the point of view of the actual software executing on the processor. The basic component of this approach is a measurement based, instruction-level power analysis technique. The technique allows for the development of an instruction-level power model for the given processor, which can be used to evaluate software in terms of the power consumption, and for exploring the optimization of software for lower power. This paper describes the application of this technique for a comprehensive instruction-level power analysis of a commercial 32-bit RISC-based embedded microcontroller. The salient results of the analysis and the basic instruction-level power model are described. Interesting observations and insights based on the results are also presented. Such an instruction-level power analysis can provide cues as to what optimizations in the micro-architecture design of the processor would lead to the most effective power savings in actual software applications. Wherever the results indicate such optimizations, they have been discussed. Furthermore, ideas for low power software design, as suggested by the results, are described in this paper as well.

73 citations


Proceedings ArticleDOI
01 Aug 1995
TL;DR: This paper addresses problems that arise while checking the equivalence of two Boolean functions under arbitrary input permutations, showing that, for a given example, this set of problematic variables tends to be the same-regardless of the choice of signatures.
Abstract: This paper addresses problems that arise while checking the equivalence of two Boolean functions under arbitrary input permutations The permutation problem has several applications in the synthesis and verification of combinational logic: it arises in the technology mapping stage of logic synthesis and in logic verification A popular method to solve it is to compute a signature for each variable that helps to establish a correspondence between the variables Several researchers have suggested a wide range of signatures that have been used for this purpose However, for each choice of signature, there remain variables that cannot be uniquely identified Our research has shown that, for a given example, this set of problematic variables tends to be the same-regardless of the choice of signatures The paper investigates this problem

59 citations


Proceedings ArticleDOI
01 Aug 1995
TL;DR: A scheduling method is presented which is capable of allocating supplementary resources during scheduling and is based on genetic algorithms which makes it very suitable in synthesis strategies based on lower bound estimations techniques.
Abstract: In this article a scheduling method is presented which is capable of allocating supplementary resources during scheduling. This makes it very suitable in synthesis strategies based on lower bound estimations techniques. The method is based on genetic algorithms. Special coding techniques and analysis methods are used to improve the runtime and quality of the results. The scheduler can easily be extended to cover other architectural issues and (for example) provides ways to make trade-offs between functional unit allocation and register allocation. Experiments and comparisons show high quality results and fast run times that outperform results produced by other heuristic scheduling methods.

46 citations


Proceedings ArticleDOI
01 Aug 1995
TL;DR: Results on a set of circuits from MCNC benchmark set demonstrate that the proposed power reduction algorithm can reduce about 10% more power, on the average, than a previously proposed gate sizing algorithm.
Abstract: This paper describes methods for reducing power consumption. We propose using gate sizing technique to reduce power for circuits that have already satisfied the timing constraint. Replacement of gates on noncritical paths with smaller templates is used in reducing the dissipated power of a circuit. We find that not only gates on noncritical paths can be down-sized, but also gates on critical paths can be down-sized. A power reduction algorithm by means of single gate resizing as well as multiple gates resizing will be proposed. In addition, to identify gates to be resized, a path-oriented method in calculating slack time with false path taken into consideration will be also proposed. During the slack time computation, in order to prevent long false path from becoming sensitizable and thus increasing the circuit delay, slack constraint will be set for gales. Results on a set of circuits from MCNC benchmark set demonstrate that our power reduction algorithm can reduce about 10% more power, on the average, than a previously proposed gate sizing algorithm.

38 citations


Proceedings ArticleDOI
29 Aug 1995
TL;DR: The project formally specified in SRI's PVS language a Rockwell proprietary pipelined microprocessor and used the PVS theorem prover to show the microcode correctly implemented the instruction-level specification for a representative subset of instructions.
Abstract: Formal verification using interactive proof-checkers has been used successfully to verify a wide variety of moderate-sized hardware designs. The industry is beginning to look at formal verification as an alternative to simulation for obtaining higher assurance than is currently possible. However, many questions remain regarding its use in practice: Can these techniques scale up to industrial systems, where are they likely to be useful, and how should industry go about incorporating them into practice? This paper describes a project recently undertaken by SRI International and Collins Commercial Avionics, a division of Rockwell International to explore some of these questions. The project formally specified in SRI's PVS language a Rockwell proprietary pipelined microprocessor (the AAMP5, built using almost half a million transistors) at both the instruction-set and register-transfer levels and used the PVS theorem prover to show the microcode correctly implemented the instruction-level specification for a representative subset of instructions. The key results of the project were the development of a practical methodology for microprocessor verification in industrial settings and the discovery of both actual and seeded errors.

36 citations


Proceedings ArticleDOI
29 Aug 1995
TL;DR: An HML type checker and a translator and a synthesizable subset of VHDL that automatically infer types and interfaces and presents a non-restoring integer square-root example to illustrate the HML system.
Abstract: HML (Hardware ML) is an innovative hardware description language based on the functional programming language SML. HML is a high-order language with polymorphic types. It uses advanced type checking and type inference techniques. We have implemented an HML type checker and a translator to VHDL. We generate a synthesizable subset of VHDL and automatically infer types and interfaces. This paper gives an overview of HML and discusses its typechecking techniques and the translation from HML to VHDL. We present a non-restoring integer square-root example to illustrate the HML system.

32 citations


Proceedings ArticleDOI
Abhijit Ghosh1, M. Bershteyn1, R. Casley1, C. Chien1, A. Jain1, M. Lipsie1, D. Tarrodaychik1, O. Yamamoto1 
01 Aug 1995
TL;DR: A hardware-software co-simulator that can be used in the design, debugging and verification of embedded systems, and a set of techniques to speed up simulation of processors and peripherals without significant loss in timing accuracy.
Abstract: One of the interesting problems in hardware-software co-design is that of debugging embedded software in conjunction with hardware. Currently, most software designers wait until a working hardware prototype is available before debugging software. Bugs discovered in hardware during the software debugging phase require re-design and re-fabrication, thereby not only delaying the project but also increasing cost. It also puts software debugging on hold until a new hardware prototype is available. In this paper we describe a hardware-software co-simulator that can be used in the design, debugging and verification of embedded systems. This tool contains simulators for different parts of the system and a backplane which is used to integrate the simulators. This enables us to simulate hardware, software and their interaction efficiently. We also address the problem of simulation speed. Currently, the more accurate (in terms of timing) the models used, the longer it takes to simulate a system. Our main contribution is a set of techniques to speed up simulation of processors and peripherals without significant loss in timing accuracy. Finally, we describe applications used to test the co-simulator and our experience in using it.

31 citations


Proceedings ArticleDOI
01 Aug 1995
TL;DR: A spectral transform interpretation of AND-EXOR representations of switching functions and related decision diagrams in the vector space over GF(2) is given and the EVBDDs are the ATDDs in different notation.
Abstract: In this paper we give a spectral transform interpretation of AND-EXOR representations of switching functions and related decision diagrams in the vector space over GF(2). The consideration is uniformly extended to the Fourier series-like expressions of functions in the complex vector space and the decision diagrams for integer-valued functions. It is shown that the multi-terminal decision diagrams, MTBDDs, and edge-valued decision diagrams, EVBDDs, for integer-valued functions are derived by using the same sets of basic functions already applied for the decision diagrams attached to some AND-EXOR expressions, but considered over the complex field. The algebraic transform decision diagrams, ATDDs, are considered as the integer counterparts of the functional decision diagrams, FDDs, attached to the algebraic transform in the same way as the FDDs are attached to the Reed-Muller expressions. It is shown that the EVBDDs are the ATDDs in different notation.

28 citations


Proceedings ArticleDOI
29 Aug 1995
TL;DR: It is argued that a heterogeneous logic supporting hardware diagrams and sentential logic provides a natural framework for reasoning and for the formal integration of design and verification environments.
Abstract: Formal methods and verification tools are difficult for designers to use. Research has been concentrated on handling large proofs; meanwhile, insufficient attention has been paid to the reasoning process. We argue that a heterogeneous logic supporting hardware diagrams and sentential logic provides a natural framework for reasoning and for the formal integration of design and verification environments. We present such a logic and demonstrate its flexibility on fragments of a traffic light controller design and verification problem.

28 citations


Proceedings ArticleDOI
29 Aug 1995
TL;DR: The design of a real time image processing circuit based on an optimized Canny Deriche filter for ramp edge detection in a recursive form and able to process a pixel in less than 30 ns is presented.
Abstract: We present the design of a real time image processing circuit based on an optimized Canny Deriche filter for ramp edge detection. This filter is implemented in a recursive form. A retiming method is used to achieve very high speed filtering. The edge calculation function has been implemented using a CMOS 1 /spl mu/m process (area 29 mm/sup 2/). This ASIC is able to process a pixel in less than 30 ns and image sizes from 64/spl times/64 to 1024/spl times/1024 pixels.

Proceedings ArticleDOI
01 Aug 1995
TL;DR: Experimental results on real FIR filter examples show up to 88% reduction in coefficient memory data bus power, and simple architectural extensions to overcome limitations of the existing DSP architectures are presented.
Abstract: We propose techniques for low power realization of FIR filters on programmable DSPs. We first analyse the FIR implementation to arrive at useful measures to reduce power and present techniques that exploit these measures. We then identify limitations of the existing DSP architectures in implementing these techniques and propose simple architectural extensions to overcome these limitations. Finally we present experimental results on real FIR filter examples that show up to 88% reduction in coefficient memory data bus power, upto 49% reduction in coefficient memory address bus power.

Proceedings ArticleDOI
01 Aug 1995
TL;DR: This work proposes several new sequential transformations which can be efficiently identified and used for optimizing large designs and uses efficient sequential ATPG techniques to identify more sequential redundancies for either addition or removal.
Abstract: Logic optimization methods using automatic test pattern generation (ATPG) techniques such as redundancy addition and removal have recently been proposed. We generalize this approach for synchronous sequential circuits. We proposed several new sequential transformations which can be efficiently identified and used for optimizing large designs. One of the new transformations involves adding redundancies across time frames in a sequential circuit. We also suggest a new transformation which involves adding redundancies to block initialization of other wires. We use efficient sequential ATPG techniques to identify more sequential redundancies for either addition or removal. We have implemented a sequential logic optimization system based upon this approach. We show experimental results to demonstrate that this approach is both CPU time efficient and memory efficient and can optimize large sequential designs significantly.

Proceedings ArticleDOI
01 Aug 1995
TL;DR: Based on the input signal probabilities and transition densities, a set of simple transistor reordering rules for both basic and complex CMOS gates to minimize the transition counts at the internal nodes are proposed.
Abstract: The goal of transistor reordering for a logic gate is to reduce the propagation delay as well as the charging and discharging of internal capacitances to achieve low power consumption. In this paper, based on the input signal probabilities and transition densities, we propose a set of simple transistor reordering rules for both basic and complex CMOS gates to minimize the transition counts at the internal nodes. The most attractive feature of this approach is that not only the power consumption is reduced efficiently, but also the other performances are not degraded. Experimental results show that this technique typically reduces the power by about 10% in average, but in some cases the improvement is even 35%.

Proceedings ArticleDOI
01 Aug 1995
TL;DR: An algorithm is proposed (EVENPGA) that generates a monotonic topological routing that has no detours and is uniformly distributed optimally and can be obtained using Surf, a rubberband-based routing system.
Abstract: PGA routing has the freedom of routing any pin to any pad. We propose an algorithm (EVENPGA) that generates a monotonic topological routing. The routing has no detours and is uniformly distributed optimally. The wire length is also the shortest possible under the taxicab wiring metric. If the topological routing is routable, the maximum density of critical cuts along a ring is the minimum possible. Once the topological routing is done, physical layout can easily be obtained using Surf, a rubberband-based routing system.

Proceedings ArticleDOI
29 Aug 1995
TL;DR: This work has developed MDG-based techniques for combinational verification, reachability analysis, verification of behavioral equivalence, and verification of a microprocessor against its instruction set architecture.
Abstract: Traditional OBDD-based methods of automated verification suffer from, the drawback that they require a binary representation of the circuit. Multiway Decision Graphs (MDGs) combine the advantages of OBDD techniques with those of abstract types. RTL designs can be compactly described by MDGs using abstract data values and uninterpreted function symbols. We have developed MDG-based techniques for combinational verification, reachability analysis, verification of behavioral equivalence, and verification of a microprocessor against its instruction set architecture. We report on the results of several verification experiments using our MDG package.

Proceedings ArticleDOI
Wayne Wolf1
29 Aug 1995
TL;DR: A new co-synthesis algorithm which synthesizes a distributed processing engine of arbitrary topology and the application software it executes from an object-oriented specification to satisfy performance constraints and minimize costs is described.
Abstract: This paper describes a new co-synthesis algorithm which synthesizes a distributed processing engine of arbitrary topology and the application software it executes from an object-oriented specification. Process partitioning is an especially important optimization for such systems because the specification will not in general take into account the process structure required for efficient execution on the distributed engine. Our algorithm takes advantage of the structure of the object-oriented specification to simultaneously partition, allocate, schedule, and map the required function to satisfy performance constraints and minimize costs. Experimental results show that our algorithm provides good results in reasonable CPU times.

Proceedings ArticleDOI
29 Aug 1995
TL;DR: This paper presents a model and examines four pipeline verifications to see how they compare and believes that the correctness statement should show that the parallel machine representing by the pipeline behaves in the same manner as the sequential machine represented by the instruction set semantics.
Abstract: Recently there has been much research in verifying pipelined microprocessors. Even so, there has been little consensus on what form the correctness statement should take. Put another way, what should we be verifying about pipelined microprocessors? We believe that the correctness statement should show that the parallel machine represented by the pipeline behaves in the same manner as the sequential machine represented by the instruction set semantics. In this paper, we present such a model and examine four pipeline verifications to see how they compare.

Proceedings ArticleDOI
29 Aug 1995
TL;DR: A novel method of specification, simulation and partitioning on the system level using a common and convenient language (C++) with special base classes for analyzing and simulating the whole system during an early design phase.
Abstract: The paper introduces a novel method of specification, simulation and partitioning on the system level using a common and convenient language (C++) Special base classes provide explicit concurrency and additional possibilities for analyzing and simulating the whole system during an early design phase The hardware/software partitioning algorithm uses the results of the analysis and simulation in order to partition the specification into hardware and software

Proceedings ArticleDOI
29 Aug 1995
TL;DR: An algorithm combining two approaches to PN verification: PN unfolding and BDD-based traversal is proposed, used for obtaining the close-to-optimal ordering of BDD variables.
Abstract: We propose an algorithm combining two approaches to PN verification: PN unfolding and BDD-based traversal We introduce a new application of the PN unfolding method The results of unfolding construction are used for obtaining the close-to-optimal ordering of BDD variables The effect of this combination is demonstrated on a set of benchmarks The overall framework has been used for the verification of circuits in an asynchronous microprocessor

Proceedings ArticleDOI
01 Aug 1995
TL;DR: This work presents an efficient ROBDD based implementation of this common decomposition functions problem (CDF) using a minimal number r/sub k/ of single output Boolean decompositionfunction functions and results applying the method to FPGA synthesis are promising.
Abstract: One of the crucial problems multi level logic synthesis techniques for multi output Boolean functions f=(f/sub 1/,...,f/sub m/):{0,1}/sup n//spl rarr/{0,1}/sup m/ have to deal with is finding sublogic which can be shared by different outputs, i.e., finding Boolean functions /spl alpha/=(/spl alpha//sub 1/,...,/spl alpha//sub h/):{0,1}/sup p//spl rarr/{0,1}/sup h/ which can be used as common sublogic of good realizations of f/sub 1/,...,f/sub m/. We present an efficient ROBDD based implementation of this common decomposition functions problem (CDF). Formally, CDF is defined as follows: given m Boolean functions f/sub 1/,...,f/sub m/:{0,1}/sup n//spl rarr/{0,1}, and two natural numbers p and h, find h Boolean functions /spl alpha//sub 1/,..., /spl alpha//sub h/:{0, 1}/sup p//spl rarr/{0,1} such that /spl forall/1/spl les/k/spl les/m there is a decomposition of f/sub k/ of the form: f/sub k/(x/sub 1/,...x/sub n/)=g/sup (k)/(/spl alpha//sub 1/(x/sub 1/,...x/sub p/),...,/spl alpha//sub h/(x/sub 1/,...,x/sub p/),/spl alpha//sub h+1//sup (k)/(x/sub 1/,...,x/sub p/),...,/spl alpha/(r/sub k/)/sup (k)/(x/sub 1/,...x/sub p/),x/sub p+1/,...,x/sub n/) using a minimal number r/sub k/ of single output Boolean decomposition functions. Experimental results applying the method to FPGA synthesis are promising.

Proceedings ArticleDOI
01 Aug 1995
TL;DR: This paper presents a model for Genetic Algorithms (GA) to learn heuristics starting from a given set of basic operations and demonstrates the efficiency of this approach.
Abstract: In many applications of Computer Aided Design (CAD) of Integrated Circuits (ICs) the problems that have to be solved are NP-hard Thus, exact algorithms are only applicable to small problem instances and many authors have presented heuristics to obtain solutions (non-optimal in general) for larger instances of these hard problems In this paper we present a model for Genetic Algorithms (GA) to learn heuristics starting from a given set of basic operations The difference to other previous applications of GAs in CAD of ICs is that the GA does not solve the problem directly Rather, it develops strategies for solving the problem To demonstrate the efficiency of our approach experimental results for a specific problem are presented

Proceedings ArticleDOI
01 Aug 1995
TL;DR: This paper exploits the use of signal flow and logic dependency in standard cell placement by using the maximum fanout-free cone (MFFC) decomposition technique, and develops a containment tree based algorithm for splitting large MFFCs into smaller ones to get clusters with restricted sizes.
Abstract: Most existing placement algorithms consider only connectivity information during the placement process, and ignore other information available from the higher levels of design process. In this paper, we exploit the use of signal flow and logic dependency in standard cell placement by using the maximum fanout-free cone (MFFC) decomposition technique. We developed a containment tree based algorithm for splitting large MFFCs into smaller ones to get clusters with restricted sizes. We also developed a placement algorithm, named MFFC-TW, which first clusters the circuit based on MFFC decomposition and then feeds the clustered circuit to the Timberwolf 6.0 placement package. Very promising experimental results were obtained.

Proceedings ArticleDOI
01 Aug 1995
TL;DR: This paper presents GRMIN, a heuristic simplification algorithm for GRMs of multiple-output functions, which uses eight rules and outperforms existing algorithms.
Abstract: A generalized Reed-Muller expression (GRM) is a type of AND-EXOR expressions. In a GRM, each variable may appear both complemented and uncomplemented. Networks realized using GRMs are easily tested. This paper presents GRMIN, a heuristic simplification algorithm for GRMs of multiple-output functions. GRMIN uses eight rules. As the primary objective, it reduces the number of products, and as the secondary objective, it reduces the number of literals. Experimental results show that, in most cases, GRMs require fewer products than conventional sum-of-products expressions (SOPs). GRMIN outperforms existing algorithms.

Proceedings ArticleDOI
Tetsushi Koide1, M. Ono1, S. Wakabayashi1, Y. Nishimaru1, N. Yoshida 
01 Aug 1995
TL;DR: From the experimental results, the proposed method is much better than RITUAL in point of the maximal violation ratio, the total wire length, and the cut size, and is more effective in the interconnection delay model and its extendability.
Abstract: In this paper, we present a new performance driven placement method based on path delay constraint approach for large standard cell layout. The proposed method consists of three phases and uses the Elmore delay model to model interconnection delay precisely in each phase. In the first phase, initial placement is performed by an efficient performance driven mincut partitioning method. Next, an iterative improvement method by nonlinear programming improves the layout. The improvement is formulated as the problem of minimizing the total wire length subject to critical path delays. Finally, row assignment considering timing constraint is performed. From the experimental results, the proposed method is much better than RITUAL in point of the maximal violation ratio, the total wire length, and the cut size, and is more effective in the interconnection delay model and its extendability.

Proceedings ArticleDOI
29 Aug 1995
TL;DR: A CAD system for the original FPGA "PROTEUS", which has several features suitable for the efficient realization of practical digital transport processing systems and the algorithms that realize them are introduced.
Abstract: We introduce a CAD system for the original FPGA "PROTEUS", which has several features suitable for the efficient realization of practical digital transport processing systems These features are considered in the design of the CAD system Our CAD system supports both automatic and manual design environments The automatic design environment offers complete top down design from high level hardware description to downloading the programming data into the FPGA In the manual design environment, an interactive chip editor is provided that enables high performance circuits to be constructed skillfully The paper introduces our design strategy and the algorithms that realize them

Proceedings ArticleDOI
01 Aug 1995
TL;DR: A new algorithm is shown that generates optimal fixed polarity Reed-Muller expansions based on user specified optimization criteria and operates on an Algebraic Ternary Decision Tree together with lookup tables of flexible sizes.
Abstract: A new algorithm that generates optimal fixed polarity Reed-Muller expansions based on user specified optimization criteria is shown. The algorithm accepts reduced representation of Boolean functions in form of an array of cubes and operates on an Algebraic Ternary Decision Tree together with lookup tables of flexible sizes. Allocation of don't care minterms is performed in an non exhaustive way by a heuristic approach based on the properties of Reed-Muller expansions.

Proceedings ArticleDOI
29 Aug 1995
TL;DR: A small set of properties that capture a general notion of refinement of control/data-flow graphs used in an industrial synthesis framework have been given, and the properties are independent of the underlying behaviour model.
Abstract: This paper presents a formal approach to address the correctness of transformations in high-level synthesis. The novelty of the work is that a small set of properties that capture a general notion of refinement of control/data-flow graphs used in an industrial synthesis framework have been given, and the properties are independent of the underlying behaviour model. We have mechanized the specification and verification of several optimization and refinement transformations used in industrial hardware design. This work has enabled to find and rectify errors in the transformations. Further, the work has led to generalization of transformations typically used in high-level synthesis.

Proceedings ArticleDOI
01 Aug 1995
TL;DR: The environment provides interface transparency, simulation acceleration, smooth transition to cosynthesis, and integrated user interface and internal representation, which shows that the environment can be a useful heterogeneous system specification/verification environment for rapid prototyping.
Abstract: In this paper, we present a hardware-software cosimulation environment for heterogeneous systems. To be an efficient system verification environment for the rapid prototyping of heterogeneous systems, the environment provides interface transparency, simulation acceleration, smooth transition to cosynthesis, and integrated user interface and internal representation. As an experimental example, a heterogeneous system is cosimulated and prototyped successfully, which shows that our environment can be a useful heterogeneous system specification/verification environment for rapid prototyping.

Proceedings ArticleDOI
29 Aug 1995
TL;DR: This paper describes how to generate VHDL from timing diagrams in order to get a hardware implementation or simply to get V HDL code for stimuli to be used in a test bench by giving timing diagrams a formal semantics in terms of T-LOTOS.
Abstract: Timing diagrams with data and timing annotations are introduced as a language for specifying interface circuits In this paper we describe how to generate VHDL from timing diagrams in order to get a hardware implementation or simply to get VHDL code for stimuli to be used in a test bench By giving timing diagrams a formal semantics in terms of T-LOTOS, we can apply optimizing correctness-preserving transformation steps In order to produce good VHDL code on the way to a hardware implementation it is of great importance to introduce structures into the final description that are not automatically derivable from a given specification The designer is rather asked to assist in introducing a structure by applying a bottom-up interactive synthesis procedure