Showing papers by "Heinrich Meyr published in 1996"

PDF

Open Access

Proceedings Article•DOI•

LISA-machine description language and generic machine model for HW/SW co-design

[...]

V. Zivojnovic¹, Stefan Pees, Heinrich Meyr•Institutions (1)

30 Oct 1996

TL;DR: The development of a new language was necessary in order to cover the gap between coarse ISA models used in compilers, and instruction set simulators on the one hand, and detailed models used for hardware design on the other.

...read moreread less

Abstract: A machine description language is presented The language, LISA, and its generic machine model are able to produce bit- and cycle/phase-accurate processor models covering the specific needs of HW/SW codesign, and cosimulation environments The development of a new language was necessary in order to cover the gap between coarse ISA models used in compilers, and instruction set simulators on the one hand, and detailed models used for hardware design on the other The main part of the paper is devoted to behavioral pipeline modeling The pipeline controller of the generic machine model is represented as an ASAP (as soon as possible) sequencer parameterized by precedence and resource constraints of operations of each instruction The standard pipeline description based on reservation tables and Gantt charts was extended by additional operation descriptors which enable the detection of data and control hazards, and permit modeling of pipeline flushes Using the newly introduced L-charts we reduced the parameterization of the pipeline controller to a minimum and at the same time covered typical pipeline controls found in state of the art signal processors As an example, the application of the LISA model on the TI-TMS320C54x signal processor is presented

...read moreread less

151 citations

Proceedings Article•DOI•

Compiled HW/SW co-simulation

[...]

V. Zivojnovic¹, Heinrich Meyr¹•Institutions (1)

RWTH Aachen University¹

01 Jun 1996

TL;DR: In this paper, the sources of the speedup and the limitations of the technique are analyzed and the realization of the simulation compiler is presented.

...read moreread less

Abstract: This paper presents a technique for simulating processors and attached hardware using the principle of compiled simulation. Unlike existing, inhouse and off-the-shelf hardware/software co-simulators, which use interpretive processor simulation, the proposed technique performs instruction decoding and simulation scheduling at compile time. The technique offers up to three orders of magnitude faster simulation. The high speed allows the user to explore algorithms and hardware/software trade-offs before any hardware implementation. In this paper, the sources of the speedup and the limitations of the technique are analyzed and the realization of the simulation compiler is presented.

...read moreread less

84 citations

Journal Article•DOI•

The differential CORDIC algorithm: Constant scale factor redundant implementation without correcting iterations

[...]

H. Dawid¹, Heinrich Meyr¹•Institutions (1)

RWTH Aachen University¹

01 Mar 1996-IEEE Transactions on Computers

TL;DR: It is proved that, due to the lack of additional operations, DCORDIC compares favorably with the previously known redundant methods in terms of latency and computational complexity.

...read moreread less

Abstract: The CORDIC algorithm is a well-known iterative method for the efficient computation of vector rotations, and trigonometric and hyperbolic functions. Basically, CORDIC performs a vector rotation which is not a perfect rotation, since the vector is also scaled by a constant factor. This scaling has to be compensated for following the CORDIC iteration. Since CORDIC implementations using conventional number systems are relatively slow, current research has focused on solutions employing redundant number systems which make a much faster implementation possible. The problem with these methods is that either the scale factor becomes variable, making additional operations necessary to compensate for the scaling, or additional iterations are necessary compared to the original algorithm. In contrast we developed transformations of the usual CORDIC algorithm which result in a constant scale factor redundant implementation without additional operations. The resulting "Differential CORDIC Algorithm" (DCORDIC) makes use of on-line (most significant digit first redundant) computation. We derive parallel architectures for the radix-2 redundant number systems and present some implementation results based on logic synthesis of VHDL descriptions produced by a DCORDIC VHDL generator. We finally prove that, due to the lack of additional operations, DCORDIC compares favorably with the previously known redundant methods in terms of latency and computational complexity.

...read moreread less

84 citations

Book Chapter•DOI•

Code Generation and Optimization Techniques for Embedded Digital Signal Processors

[...]

Stan Liao¹, Srinivas Devadas¹, Kurt Keutzer², Steve Tjiang², Albert Wang², Guido Araujo³, Ashok Sudarsanam³, Sharad Malik³, Vojin Živojnović⁴, Heinrich Meyr⁴ - Show less +6 more•Institutions (4)

Massachusetts Institute of Technology¹, Synopsys², Princeton University³, RWTH Aachen University⁴

01 Jan 1996

TL;DR: The advent of 0.5μ processing that allows for the integration of 5 million transistors on a single integrated circuit has brought forth new challenges and opportunities in embedded-system design.

...read moreread less

Abstract: The advent of 0.5μ processing that allows for the integration of 5 million transistors on a single integrated circuit has brought forth new challenges and opportunities in embedded-system design. This high level of integration makes it possible and desirable to integrate a processor core, a program ROM, and an ASIC together on a single IC. To justify the design costs of such an IC, these embedded-system designs must be sold in large volumes and, as a result, they are very cost-sensitive. The cost of an IC is most closely linked to its size, which is derived from the final circuit area. It is not unusual for the ROM that stores the program code to be the largest contributor to the area of such ICs. Thus the incremental value of using logic optimization to reduce the size of the ASIC is smaller because the ASIC takes up a relatively smaller percentage of the final circuit area. On the other hand, the potential for cost reduction through diminishing the size of the program ROM is great. There are also often strong real-time performance requirements on the final code; hence, there is a necessity for producing high-performance code as well.

...read moreread less

58 citations

Journal Article•DOI•

A CMOS IC for Gb/s Viterbi decoding: system design and VLSI implementation

[...]

H. Dawid, Gerhard Fettweis¹, Heinrich Meyr•Institutions (1)

RWTH Aachen University¹

01 Mar 1996-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The aim of this paper is to describe system design and VLSI implementation of a complex system of fabricated ASIC's for high speed Viterbi decoding using the "minimized method" (MM) parallelized VA.

...read moreread less

Abstract: At present, the Viterbi algorithm (VA) is widely used in communication systems for decoding and equalization. The achievable speed of conventional Viterbi decoders (VD's) is limited by the inherent nonlinear add-compare-select (ACS) recursion. The aim of this paper is to describe system design and VLSI implementation of a complex system of fabricated ASIC's for high speed Viterbi decoding using the "minimized method" (MM) parallelized VA. We particularly emphasize the interaction between system design, architecture and VLSI implementation as well as system partitioning issues and the resulting requirements for the system design flow. Our design objectives were 1) to achieve the same decoding performance as a conventional VD using the parallelized algorithm, 2) to achieve a speed of more than 1 Gb/s, and 3) to realize a system for this task using a single cascadable ASIC. With a minimum system configuration of four identical ASIC's produced by using 1.0 /spl mu/ CMOS technology, the design objective of a decoding speed of 1.2 Gb/s is achieved. This means, compared to previous implementations of Viterbi decoders, the speed is increased by an order of magnitude.

...read moreread less

43 citations

Proceedings Article•DOI•

Variable sample rate digital feedback NDA timing synchronization

[...]

U. Lambrette¹, K. Langhammer, Heinrich Meyr•Institutions (1)

RWTH Aachen University¹

18 Nov 1996

TL;DR: Two digital receiver algorithms for the processing of an extended range of variable sample rates are proposed, based on filtering the received samples prior to timing synchronization, whereas the second algorithm increases the sample rate in the timing recovery loop and matched filter.

...read moreread less

Abstract: The evolving digital television broadcasting standard does not standardize the data rate of the transmitted data but instead leaves it completely unspecified. We propose two digital receiver algorithms for the processing of an extended range of variable sample rates. The algorithms are compared in terms of their complexity and performance. One of the algorithms is based on filtering the received samples prior to timing synchronization, whereas the second algorithm increases the sample rate in the timing recovery loop and matched filter. Both algorithms can be implemented causing negligible loss in a DVB receiver.

...read moreread less

28 citations

Proceedings Article•DOI•

DSP processor/compiler co-design: a quantitative approach

[...]

V. Zivojnovic¹, Stefan Pees¹, C. Schlager¹, M. Willems¹, Rainer Schoenen¹, Heinrich Meyr¹ - Show less +2 more•Institutions (1)

RWTH Aachen University¹

06 Nov 1996

TL;DR: Three main components of the exploration environment-benchmarking methodology (DSPstone), fast processor simulation (SuperSim), and machine description (LISA) are focused on, which allow an exploration of a much larger design space than it was possible with standard processor simulators.

...read moreread less

Abstract: In the paper the problem of processor/compiler codesign for digital signal processing and embedded systems is discussed. The main principle we follow is the top-down approach characterized by extensive simulation and quantitative performance evaluation of processor and compiler. Although well established in the design of state-of-the-art general purpose processors and compilers, this approach is rarely followed by leading producers of signal and embedded processors. As a consequence, the matching between the processor and the compiler is low. In the paper we focus on three main components of our exploration environment-benchmarking methodology (DSPstone), fast processor simulation (SuperSim), and machine description (LISA). Most of the paper is devoted to the technique of compiled processor simulation. The speedup obtained allows an exploration of a much larger design space than it was possible with standard processor simulators.

...read moreread less

22 citations

Systemkomponenten für eine terrestrische digitale mobile Breitbandübertragung

[...]

Ferdinand Claßen, Heinrich Meyr

01 Jan 1996

9 citations

Proceedings Article•DOI•

ComBox: library-based generation of VHDL modules

[...]

M. Vaupel, T. Grotker, Heinrich Meyr

30 Oct 1996

TL;DR: The automated generation of components for high throughput data-flow dominated VLSI-systems in digital communications is described by means of a hierarchically organized library and the design environment ComBox enhances reusability and enables rapid implementation of complex systems starting from a system level description.

...read moreread less

Abstract: We describe the automated generation of components for high throughput data-flow dominated VLSI-systems in digital communications. By means of a hierarchically organized library both behavioural models with high simulation efficiency and corresponding hardware generators that produce sophisticated VHDL descriptions are made easily accessible to the system designer. The structured approach allows the evaluation of the trade-offs between alternatives at each design step and guarantees a fast and reliable design flow towards hardware. The design environment ComBox enhances reusability and enables rapid implementation of complex systems starting from a system level description.

...read moreread less

7 citations

Book Chapter•DOI•

Comparison of Demodulation Techniques for MSK

[...]

U. Lambrette¹, Ralf Mehlan¹, Heinrich Meyr¹•Institutions (1)

RWTH Aachen University¹

01 Jan 1996

TL;DR: It is shown that the first demodulation algorithm is superior both in performance and computational complexity and exhibits the best robustness properties in the case of signal impairments.

...read moreread less

Abstract: For MSK, three demodulators are compared. The first demodulation algorithm, partially coherent demodulation, is based on a classical matched filter approach combined with feedforward phase synchronization, whereas the second algorithm, block demodulation, is based on minimizing a distance measure based on the symbol vector trial and the observed differential phase vector. The third algorithm is based on the same distance measure, however the minimization is carried out using the viterbi algorithm. We provide a derivation of the second algorithm. It is shown that the first approach is superior both in performance and computational complexity. The first algorithm also exhibits the best robustness properties in the case of signal impairments.

...read moreread less

6 citations

Book Chapter•DOI•

Concurrent Hw/Sw Design For Telecommunication Systems

[...]

Heinrich Meyr, T. Grotker

01 Jan 1996

TL;DR: This chapter addresses the process of implementing complex functions by an appropriate combination of application specific hardware and software modules in telecommunication product design.

...read moreread less

Abstract: In most general terms telecommunication product design can be defined as the process of implementing complex functions by an appropriate combination of application specific hardware and software modules. In the future, one of the most important assets of a successful company will be the mastering of this product development process. In this chapter we address this process.

...read moreread less

Proceedings Article•DOI•

Logic synthesis of binary, carry-save and mixed-radix arithmetic for digital signal processing

[...]

S.J. Bitterlich, Heinrich Meyr

30 Oct 1996

TL;DR: This work analyzes and compares silicon real-estate and throughput of word-parallel arithmetic circuits (add and shift type arithmetic) based on various redundant number representations and compares these results with the automatically optimized two's complement implementations.

...read moreread less

Abstract: All the commercially available logic-synthesis tools currently use only (non-redundant) binary and two's complement number representations for representing the results of arithmetic operators We analyze and compare silicon real-estate and throughput of word-parallel arithmetic circuits (add and shift type arithmetic) based on various redundant number representations and compare these results with the automatically optimized two's complement implementations The literature on redundant number representations typically recommends radix-4 arithmetic for full-custom or a traditional semi-custom design style We show that the radix-4 implementation is often not optimal for a logic-synthesis based semi-custom design style Instead, a high-radix or a mixed-radix implementation (which we derive) should be considered

...read moreread less

Book Chapter•DOI•

Efficient VHDL Code Generation for Digital Receiver Design

[...]

T. Grotker¹, U. Lambrette¹, Heinrich Meyr¹•Institutions (1)

RWTH Aachen University¹

01 Jan 1996

TL;DR: The process of designing an ASIC implementation of a digital receiver is carried out on different levels of abstraction and often involves the error-prone transition between different description styles which imposes obstacles on the joint optimization of algorithm and architecture.

...read moreread less

Abstract: The process of designing an ASIC implementation of a digital receiver is carried out on different levels of abstraction. This often involves the error-prone transition between different description styles which imposes obstacles on the joint optimization of algorithm and architecture.

...read moreread less

Journal Article•DOI•

Codesign of a parallel architecture and an optimizing compiler backend: SIN Rete processing as a case study

[...]

Klaus ten Hagen¹, Dirk Steinberg, Heinrich Meyr²•Institutions (2)

Synopsys¹, RWTH Aachen University²

01 Nov 1996-Design Automation for Embedded Systems

TL;DR: The joint design process leading to an ASIC chipset accelerating the execution of rulebased systems is described and the interaction between the algorithm used for software implementation and the parallel algorithm suited for hardware implementation is examined.

...read moreread less

Abstract: The move towards higher levels of abstraction in hardware design begins to blur the difference between hardware and software design. Nevertheless, the attractiveness of a software implementation is still defined by the much smaller abstraction gap between specification and implementation. Whereas, hardware design creates the possibility to exploit parallelism at a very fine level of granularity and thereby achieve tremendous performance gains with a moderate expenditure of hardware. This paper describes the joint design process leading to an ASIC chipset accelerating the execution of rulebased systems. The interaction between the algorithm used for software implementation and the parallel algorithm suited for hardware implementation is examined. An area efficient implementation of the programmable hardware was enabled by an application specific compiler backend. The heuristics applied by the optimising “code” generator are discussed quantitatively.

...read moreread less