scispace - formally typeset
Search or ask a question
Author

Antoni Roca

Bio: Antoni Roca is an academic researcher from Polytechnic University of Catalonia. The author has contributed to research in topics: Network on a chip & Network topology. The author has an hindex of 10, co-authored 32 publications receiving 368 citations. Previous affiliations of Antoni Roca include University of Valencia & Open University of Catalonia.

Papers
More filters
Proceedings ArticleDOI
03 May 2010
TL;DR: Universal Logic-Based Distributed Routing (uLBDR) as mentioned in this paper is an efficient logic-based mechanism that adapts to any irregular topology derived from 2D meshes, being an alternative to the use of routing tables.
Abstract: The high-performance computing domain is enriching with the inclusion of Networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face the communication scalability challenge while meeting tight power, area and latency constraints. Designers must address new challenges that were not present before. Defective components, the enhancement of application-level parallelism or power-aware techniques may break topology regularity, thus, efficient routing becomes a challenge.In this paper, uLBDR (Universal Logic-Based Distributed Routing) is proposed as an efficient logic-based mechanism that adapts to any irregular topology derived from 2D meshes, being an alternative to the use of routing tables (either at routers or at end-nodes). uLBDR requires a small set of configuration bits, thus being more practical than large routing tables implemented in memories. Several implementations of uLBDR are presented highlighting the trade-off between routing cost and coverage. The alternatives span from the previously proposed LBDR approach (with 30\% of coverage) to the uLBDR mechanism achieving full coverage. This comes with a small performance cost, thus exhibiting the trade-off between fault tolerance and performance.

92 citations

Journal ArticleDOI
TL;DR: ULBDR is presented, an efficient logic-based mechanism that adapts to any irregular topology derived from 2-D meshes, instead of using routing tables, that requires a small set of configuration bits, thus being more practical than large routing tables implemented in memories.
Abstract: The high-performance computing domain is enriching with the inclusion of networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face the communication scalability challenge while meeting tight power, area, and latency constraints. Designers must address new challenges that were not present before. Defective components, the enhancement of application-level parallelism, or power-aware techniques may break topology regularity, thus, efficient routing becomes a challenge. This paper presents universal logic-based distributed routing (uLBDR), an efficient logic-based mechanism that adapts to any irregular topology derived from 2-D meshes, instead of using routing tables. uLBDR requires a small set of configuration bits, thus being more practical than large routing tables implemented in memories. Several implementations of uLBDR are presented highlighting the tradeoff between routing cost and coverage. The alternatives span from the previously proposed LBDR approach (with 30% of coverage) to the uLBDR mechanism achieving full coverage. This comes with a small performance cost, thus exhibiting the tradeoff between fault tolerance and performance. Power consumption, area, and delay estimates are also provided highlighting the efficiency of the mechanism. To do this, different router models (one for CMPs and one for MPSoCs) have been designed as a proof concept.

51 citations

Proceedings ArticleDOI
08 May 2016
TL;DR: The paper concludes that a synchronous circuit with a ring oscillator clock shows similar benefits in performance and energy as those of bundled-data asynchronous circuits.
Abstract: How much margin do we have to add to the delay lines of a bundled-data circuit? This paper is an attempt to give a methodical answer to this question, taking into account all sources of variability and the existing EDA machinery for timing analysis and sign-off. The paper is based on the study of the margins of a ring oscillator that substitutes a PLL as clock generator. A timing model is proposed that shows that a 12% margin for delay lines can be sufficient to cover variability in a 65nm technology. In a typical scenario, performance and energy improvements between 15% and 35% can be obtained by using a ring oscillator instead of a PLL. The paper concludes that a synchronous circuit with a ring oscillator clock shows similar benefits in performance and energy as those of bundled-data asynchronous circuits.

22 citations

Proceedings ArticleDOI
03 May 2010
TL;DR: A novel approach, called performance domains, intended to reduce the negative impact of variability on application execution time is drafted, suitable when several applications are simultaneously running in the CMP chip.
Abstract: Current integration scales allow designing chip multiprocessors (CMP) where cores are interconnected by means of a network-on-chip (NoC). Unfortunately, the small feature size of current integration scales cause some unpredictability in manufactured devices because of process variation. In NoCs,variability may affect links and routers causing that they do not match the parameters established at design time. In this paper we first analyze the way that manufacturing deviations affect the components of a NoC by applying a comprehensive and detailed variability model to 200 instances of an 8x8 mesh NoC synthesized using 45nm technology. A second contribution of this paper is showing that GALS-based NoCs present communication bottlenecks under process variation. To overcome this performance reduction we draft a novel approach, called performance domains, intended to reduce the negative impact of variability on application execution time. This mechanism is suitable when several applications are simultaneously running in the CMP chip.

19 citations

Journal Article
TL;DR: In this article, a rate allocation algorithm for pixel-domain distributed video (PDDV) coders without FBC is proposed, which estimates at the encoder the number of bits for every frame without significantly increasing the encoding complexity.
Abstract: In some video coding applications, it is desirable to reduce the complexity of the video encoder at the expense of a more complex decoder. Distributed Video (DV) Coding is a new paradigm that aims at achieving this. To allocate a proper number of bits to each frame, most DV coding algorithms use a feedback channel (FBC). However, in some cases, a FBC does not exist. In this paper, we therefore propose a rate allocation (RA) algorithm for pixel-domain distributed video (PDDV) coders without FBC. Our algorithm estimates at the encoder the number of bits for every frame without significantly increasing the encoder complexity. For this calculation we consider each pixel of the frame individually, in contrast to our earlier work where the whole frame is treated jointly. Experimental results show that this pixel-based approach delivers better estimates of the adequate encoding rate than the frame-based approach. Compared to the PDDV coder with FBC, the PDDV coder without FBC has only a small loss in RD performance, especially at low rates.

13 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The article at hand reviews the failure mechanisms, fault models, diagnosis techniques, and fault-tolerance methods in on-chip networks, and surveys and summarizes the research of the last ten years.
Abstract: Networks-on-Chip constitute the interconnection architecture of future, massively parallel multiprocessors that assemble hundreds to thousands of processing cores on a single chip. Their integration is enabled by ongoing miniaturization of chip manufacturing technologies following Moore's Law. It comes with the downside of the circuit elements' increased susceptibility to failure. Research on fault-tolerant Networks-on-Chip tries to mitigate partial failure and its effect on network performance and reliability by exploiting various forms of redundancy at the suitable network layers. The article at hand reviews the failure mechanisms, fault models, diagnosis techniques, and fault-tolerance methods in on-chip networks, and surveys and summarizes the research of the last ten years. It is structured along three communication layers: the data link, the network, and the transport layers. The most important results are summarized and open research problems and challenges are highlighted to guide future research on this topic.

198 citations

Journal ArticleDOI
TL;DR: It is formally proved that independently of the shape and dimensions of the planar topologies and of the number and placement of the TSVs, the proposed routing algorithm using two virtual channels in the plane is deadlock and livelock free.
Abstract: In this paper, we propose a distributed routing algorithm for vertically partially connected regular 2D topologies of different shapes and sizes (e.g., 2D mesh, torus, ring). The topologies that are the target of this algorithm are of practical interest in the 3D integration of heterogeneous dies using Through-Silicon-Vias (TSVs). Indeed, TSV-based 3D integration allows to envision the stacking of dies with different functions and technologies, using as an interconnect backbone a 3D-NoC. Intrinsically, 3D topologies have better performances, but yield and active area (and thus the cost) are function of the number of TSVs; therefore, the designs tend to use only a subset of available TSVs between two dies. The definition of blockage free and low implementation cost distributed deterministic routing on this kind of topology is thus of theoretical and practical interests. We formally prove that independently of the shape and dimensions of the planar topologies and of the number and placement of the TSVs, the proposed routing algorithm using two virtual channels in the plane is deadlock and livelock free. We also experimentally show that the performance of this algorithm is still acceptable when the number of vertical connections decreases.

121 citations

Journal ArticleDOI
TL;DR: A novel hierarchical network-on-chip (H-NoC) architecture for SNN hardware is presented, which aims to address the scalability issue by creating a modular array of clusters of neurons using a hierarchical structure of low and high-level routers.
Abstract: Spiking neural networks (SNNs) attempt to emulate information processing in the mammalian brain based on massively parallel arrays of neurons that communicate via spike events. SNNs offer the possibility to implement embedded neuromorphic circuits, with high parallelism and low power consumption compared to the traditional von Neumann computer paradigms. Nevertheless, the lack of modularity and poor connectivity shown by traditional neuron interconnect implementations based on shared bus topologies is prohibiting scalable hardware implementations of SNNs. This paper presents a novel hierarchical network-on-chip (H-NoC) architecture for SNN hardware, which aims to address the scalability issue by creating a modular array of clusters of neurons using a hierarchical structure of low and high-level routers. The proposed H-NoC architecture incorporates a spike traffic compression technique to exploit SNN traffic patterns and locality between neurons, thus reducing traffic overhead and improving throughput on the network. In addition, adaptive routing capabilities between clusters balance local and global traffic loads to sustain throughput under bursting activity. Analytical results show the scalability of the proposed H-NoC approach under different scenarios, while simulation and synthesis analysis using 65-nm CMOS technology demonstrate high-throughput, low-cost area, and power consumption per cluster, respectively.

110 citations

01 Jan 2009
TL;DR: In this paper, the authors proposed 3-D networks-on-chip (NoC) topologies that exploit the diversity of 3D structures to further enhance the performance of multiplane integrated systems.
Abstract: Design techniques for three-dimensional (3-D) ICs considerably lag the significant strides achieved in 3-D manufacturing technologies. Advanced design methodologies for two-dimensional circuits are not sufficient to manage the added complexity caused by the third dimension. Consequently, design methodologies that efficiently handle the added complexity and inherent heterogeneity of 3-D circuits are necessary. These 3-D design methodologies should support robust and reliable 3-D circuits while considering different forms of vertical integration, such as system-in-package and 3-D ICs with fine grain vertical interconnections. Global signaling issues, such as clock and power distribution networks, are further exacerbated in vertical integration due to the limited number of package pins, the distance of these pins from other planes within the 3-D system, and the impedance characteristics of the through silicon vias (TSVs). In addition to these dedicated networks, global signaling techniques that incorporate the diverse traits of complex 3-D systems are required. One possible approach, potentially significantly reducing the complexity of interconnect issues in 3-D circuits, is 3-D networks-on-chip (NoC). Design methodologies that exploit the diversity of 3-D structures to further enhance the performance of multiplane integrated systems are necessary. The longest interconnects within a 3-D circuit are those interconnects comprising several TSVs and traversing multiple physical planes. Consequently, minimizing the delay of the interplane nets is of great importance. By considering the nonuniform impedance characteristics of the interplane interconnects while placing the TSVs, the delay of these nets is decreased. In addition, the difference in electrical behavior between the horizontal and vertical interconnects suggests that asymmetric structures can be useful candidates for distributing the clock signal within a 3-D circuit. A 3-D test circuit fabricated with a 180 nm silicon-on-insulator (SOI) technology, manufactured by MIT Lincoln Laboratories, exploring several clock distribution topologies is described. Correct operation at 1 GHz has been demonstrated. Several 3-D NoC topologies incorporating dissimilar 3-D interconnect structures are reviewed as a promising solution for communication limited systems-on-chip (SoC). Appropriate performance models are described to evaluate these topologies. Several forms of vertical integration, such as system-in-package and different candidate technologies for 3-D circuits, such as SOI, are considered. The techniques described in this paper address fundamental interconnect structures in the 3-D design process. Several interesting research problems in the design of 3-D circuits are also discussed.

106 citations

Journal ArticleDOI
TL;DR: This paper presents a comprehensive overview of the known topology-agnostic routing algorithms, classify these algorithms by their most important properties, and evaluate them consistently, providing significant insight into the algorithms and their appropriateness for different on- and off-chip environments.
Abstract: Most standard cluster interconnect technologies are flexible with respect to network topology. This has spawned a substantial amount of research on topology-agnostic routing algorithms, which make no assumption about the network structure, thus providing the flexibility needed to route on irregular networks. Actually, such an irregularity should be often interpreted as minor modifications of some regular interconnection pattern, such as those induced by faults. In fact, topology-agnostic routing algorithms are also becoming increasingly useful for networks on chip (NoCs), where faults may make the preferred 2D mesh topology irregular. Existing topology-agnostic routing algorithms were developed for varying purposes, giving them different and not always comparable properties. Details are scattered among many papers, each with distinct conditions, making comparison difficult. This paper presents a comprehensive overview of the known topology-agnostic routing algorithms. We classify these algorithms by their most important properties, and evaluate them consistently. This provides significant insight into the algorithms and their appropriateness for different on- and off-chip environments.

104 citations