Flit Synchronous Aelite Network on Chip

Home
/
Papers
/
Flit Synchronous Aelite Network on Chip

Flit Synchronous Aelite Network on Chip

01 Jan 2008-

TL;DR: The Aelite NoC offering guaranteed services exploits the complexities of System-on-Chip design with real time requirements and implements flit synchronous communication using mesochronous and asynchronous links.

read less

Abstract: The deep sub micron process technology and application convergence increases the design challenges in System-on-Chip (SoC). The traditional bus based on chip communication are not scalable and fails to deliver the performance requirements of the complex SoC. The Network on Chip (NoC) has been emerged as a solution to address these complexities of a efficient, high performance, scalable SoC design. The Aethereal NoC provides the latency and throughput bounds by pipelined timedivision multiplexed (TDM) circuit switching architecture. A global synchronous clock defines the timing for TDM, which is not beneficial for decreasing process geometry and increasing clock frequency. This thesis work focuses on the Aelite NoC architecture. The Aelite NoC offering guaranteed services exploits the complexities of System-on-Chip design with real time requirements. The Aelite NoC implements flit synchronous communication using mesochronous and asynchronous links.

...read moreread less

Content maybe subject to copyright Report

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Latency Minimization for Synchronous Data Flow Graphs

[...]

Amir Hossein Ghamarian¹, Sander Stuijk¹, Twan Basten¹, Marc Geilen¹, Bart Theelen¹ - Show less +1 more•Institutions (1)

Eindhoven University of Technology¹

29 Aug 2007

TL;DR: An algorithm to determine the minimal achievable latency is proposed, providing an execution scheme for executing an SDFG with this latency, and a heuristic is proposed for optimizing latency under a throughput constraint.

...read moreread less

Abstract: Synchronous data flow graphs (SDFGs) are a very useful means for modeling and analyzing streaming applications. Some performance indicators, such as throughput, have been studied before. Although throughput is a very useful performance indicator for concurrent real-time applications, another important metric is latency. Especially for applications such as video conferencing, telephony and games, latency beyond a certain limit cannot be tolerated. This paper proposes an algorithm to determine the minimal achievable latency, providing an execution scheme for executing an SDFG with this latency. In addition, a heuristic is proposed for optimizing latency under a throughput constraint. Experimental results show that latency computations are efficient despite the theoretical complexity of the problem. Substantial latency improvements are obtained, of 24-54% on average for a synthetic benchmark of 900 models, and up to 37% for a benchmark of six real DSP and multimedia models. The heuristic for minimizing latency under a throughput constraint gives optimal latency and throughput results under a constraint of maximal throughput for all DSP and multimedia models, and for over 95% of the synthetic models.

...read moreread less

61 citations

Proceedings Article•DOI•

Globally asynchronous locally synchronous architecture for large high-performance ASICs

[...]

T. Meincke, Ahmed Hemani, Shashi Kumar, Peeter Ellervee, Johnny Öberg, Tommy Olsson, Peter Nilsson, Dan Lindqvist, Hannu Tenhunen - Show less +5 more

30 May 1999

TL;DR: This work describes a methodology for partitioning a design into large synchronous blocks each having its own clock and presents results of applying it to a realistic design done in 0.25 micron, showing that the net power savings compared to fully synchronous designs are on average about 30%.

...read moreread less

Abstract: Clock nets are the major source of power consumption in large, high-performance ASICs and a design bottleneck when it comes to tolerable clock skew. A way to obviate the global clock net is to partition the design into large synchronous blocks each having its own clock. Data with other blocks is exchanged asynchronously using handshake signals. Adopting such a strategy requires a methodology that supports: 1) a partitioning method dividing a design into the number of synchronous blocks such that the gain due to global clock net removal exceeds the communication overhead and 2) synthesis of handshake protocols to implement the data transfer between synchronous blocks. We describe this methodology and present results of applying it to a realistic design done in 0.25 micron, ranging in operating frequencies from 20 MHz to 1 GHz. The results show that the net power savings compared to fully synchronous designs are on an average about 30%.

...read moreread less

43 citations

Proceedings Article•DOI•

Timing closure through a globally synchronous, timing partitioned design methodology

[...]

Anders Edman¹, Christer Svensson¹•Institutions (1)

Linköping University¹

07 Jun 2004

TL;DR: A method to mitigate timing problems due to global wire delays is proposed, which follows closely a fully synchronous design flow and utilizes only hue digital lihraly elements.

...read moreread less

Abstract: A method to mitigate timing problems due to global wire delays is proposed. The method follows closely a fully synchronous design flow and utilizes only true digital library elements. The design is partitioned into isochronous blocks at system level, where a few clock cycles latency is inserted between the isochronous blocks. This latency is then utilized to automatically mitigate unknown global wire delays, unknown global clock skews and other timing uncertainties occurring in backend design. The new method is expected to considerably reduce the timing closure effort in large high frequency digital designs in deep submicron technologies.

...read moreread less

39 citations

Timing analysis model for network based multiprocessor systems.

[...]

Ajm Arno Moonen¹, Mjg Marco Bekooij¹, J. van Meerbergen•Institutions (1)

Philips¹

01 Jan 2004

TL;DR: An SDF model of the network in which an arbiter is applied which allows the transfer of a possibly varying but bounded number of words per period is proposed.

...read moreread less

Abstract: In this paper an embedded multiprocessor system on top of a network on chip is proposed which is amenable for timing analysis. This multiprocessor system is intended for multimedia application that process data streams. The temporal behavior of applications executed on this multiprocessor system is derived with a Synchronous Data Flow (SDF) graph in which computation, communication, buffer sizes as well as arbitration is modeled. This graph can be transformed in an event graph which is a special case of a Petri net from which properties like the minimal throughput can be derived with results of MaxPlus Linear System Theory [1]. Our main contribution in this paper is an SDF model of the network in which an arbiter is applied which allows the transfer of a possibly varying but bounded number of words per period.

...read moreread less

33 citations

Proceedings Article•DOI•

Design and DfT of a high-speed area-efficient embedded asynchronous FIFO

[...]

Paul Wielage¹, Erik Jan Marinissen¹, Michel Altheimer¹, Clemens Wouters¹•Institutions (1)

NXP Semiconductors¹

16 Apr 2007

TL;DR: An overview of the most important design features of the new full-custom embedded ripple-through FIFO module is given and its test and design-for-test approach is described.

...read moreread less

Abstract: Embedded First-In First-Out (FIFO) memories are increasingly used in many IC designs. We have created a new full-custom embedded ripple-through FIFO module with asynchronous read and write clocks. The implementation is based on a micropipeline architecture and is at least a factor two smaller than SRAM-based and standard-cell-based counterparts. This paper gives an overview of the most important design features of the new FIFO module and describes its test and design-for-test approach.

...read moreread less

32 citations