scispace - formally typeset
Search or ask a question

Flit Synchronous Aelite Network on Chip

TL;DR: The Aelite NoC offering guaranteed services exploits the complexities of System-on-Chip design with real time requirements and implements flit synchronous communication using mesochronous and asynchronous links.
Abstract: The deep sub micron process technology and application convergence increases the design challenges in System-on-Chip (SoC). The traditional bus based on chip communication are not scalable and fails to deliver the performance requirements of the complex SoC. The Network on Chip (NoC) has been emerged as a solution to address these complexities of a efficient, high performance, scalable SoC design. The Aethereal NoC provides the latency and throughput bounds by pipelined timedivision multiplexed (TDM) circuit switching architecture. A global synchronous clock defines the timing for TDM, which is not beneficial for decreasing process geometry and increasing clock frequency. This thesis work focuses on the Aelite NoC architecture. The Aelite NoC offering guaranteed services exploits the complexities of System-on-Chip design with real time requirements. The Aelite NoC implements flit synchronous communication using mesochronous and asynchronous links.

Content maybe subject to copyright    Report

References
More filters
Proceedings ArticleDOI
29 Aug 2007
TL;DR: An algorithm to determine the minimal achievable latency is proposed, providing an execution scheme for executing an SDFG with this latency, and a heuristic is proposed for optimizing latency under a throughput constraint.
Abstract: Synchronous data flow graphs (SDFGs) are a very useful means for modeling and analyzing streaming applications. Some performance indicators, such as throughput, have been studied before. Although throughput is a very useful performance indicator for concurrent real-time applications, another important metric is latency. Especially for applications such as video conferencing, telephony and games, latency beyond a certain limit cannot be tolerated. This paper proposes an algorithm to determine the minimal achievable latency, providing an execution scheme for executing an SDFG with this latency. In addition, a heuristic is proposed for optimizing latency under a throughput constraint. Experimental results show that latency computations are efficient despite the theoretical complexity of the problem. Substantial latency improvements are obtained, of 24-54% on average for a synthetic benchmark of 900 models, and up to 37% for a benchmark of six real DSP and multimedia models. The heuristic for minimizing latency under a throughput constraint gives optimal latency and throughput results under a constraint of maximal throughput for all DSP and multimedia models, and for over 95% of the synthetic models.

61 citations

Proceedings ArticleDOI
30 May 1999
TL;DR: This work describes a methodology for partitioning a design into large synchronous blocks each having its own clock and presents results of applying it to a realistic design done in 0.25 micron, showing that the net power savings compared to fully synchronous designs are on average about 30%.
Abstract: Clock nets are the major source of power consumption in large, high-performance ASICs and a design bottleneck when it comes to tolerable clock skew. A way to obviate the global clock net is to partition the design into large synchronous blocks each having its own clock. Data with other blocks is exchanged asynchronously using handshake signals. Adopting such a strategy requires a methodology that supports: 1) a partitioning method dividing a design into the number of synchronous blocks such that the gain due to global clock net removal exceeds the communication overhead and 2) synthesis of handshake protocols to implement the data transfer between synchronous blocks. We describe this methodology and present results of applying it to a realistic design done in 0.25 micron, ranging in operating frequencies from 20 MHz to 1 GHz. The results show that the net power savings compared to fully synchronous designs are on an average about 30%.

43 citations

Proceedings ArticleDOI
07 Jun 2004
TL;DR: A method to mitigate timing problems due to global wire delays is proposed, which follows closely a fully synchronous design flow and utilizes only hue digital lihraly elements.
Abstract: A method to mitigate timing problems due to global wire delays is proposed. The method follows closely a fully synchronous design flow and utilizes only true digital library elements. The design is partitioned into isochronous blocks at system level, where a few clock cycles latency is inserted between the isochronous blocks. This latency is then utilized to automatically mitigate unknown global wire delays, unknown global clock skews and other timing uncertainties occurring in backend design. The new method is expected to considerably reduce the timing closure effort in large high frequency digital designs in deep submicron technologies.

39 citations

01 Jan 2004
TL;DR: An SDF model of the network in which an arbiter is applied which allows the transfer of a possibly varying but bounded number of words per period is proposed.
Abstract: In this paper an embedded multiprocessor system on top of a network on chip is proposed which is amenable for timing analysis. This multiprocessor system is intended for multimedia application that process data streams. The temporal behavior of applications executed on this multiprocessor system is derived with a Synchronous Data Flow (SDF) graph in which computation, communication, buffer sizes as well as arbitration is modeled. This graph can be transformed in an event graph which is a special case of a Petri net from which properties like the minimal throughput can be derived with results of MaxPlus Linear System Theory [1]. Our main contribution in this paper is an SDF model of the network in which an arbiter is applied which allows the transfer of a possibly varying but bounded number of words per period.

33 citations

Proceedings ArticleDOI
16 Apr 2007
TL;DR: An overview of the most important design features of the new full-custom embedded ripple-through FIFO module is given and its test and design-for-test approach is described.
Abstract: Embedded First-In First-Out (FIFO) memories are increasingly used in many IC designs. We have created a new full-custom embedded ripple-through FIFO module with asynchronous read and write clocks. The implementation is based on a micropipeline architecture and is at least a factor two smaller than SRAM-based and standard-cell-based counterparts. This paper gives an overview of the most important design features of the new FIFO module and describes its test and design-for-test approach.

32 citations