Home
/
Authors
/
Jose Flich

Author

Jose Flich

Other affiliations: University of Valencia, IEEE Computer Society

Bio: Jose Flich is an academic researcher from Polytechnic University of Valencia. The author has contributed to research in topics: Static routing & Routing table. The author has an hindex of 31, co-authored 173 publications receiving 3087 citations. Previous affiliations of Jose Flich include University of Valencia & IEEE Computer Society.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori

[...]

A. Mejia¹, Jose Flich¹, José Duato¹, Sven-Arne Reinemo², Tor Skeie² - Show less +1 more•Institutions (2)

Polytechnic University of Valencia¹, Simula Research Laboratory²

25 Apr 2006

TL;DR: A new deterministic routing methodology for tori and meshes, which achieves high performance without the use of virtual channels, and is topology agnostic in nature, meaning it can handle any topology derived from any combination of faults when combined with static reconfiguration.

...read moreread less

Abstract: Computers get faster every year, but the demand for computing resources seems to grow at an even faster rate. Depending on the problem domain, this demand for more power can be satisfied by either, massively parallel computers, or clusters of computers. Common for both approaches is the dependence on high performance interconnect networks such as Myrinet, Infiniband, or 10 Gigabit Ethernet. While high throughput and low latency are key features of interconnection networks, the issue of fault-tolerance is now becoming increasingly important. As the number of network components grows so does the probability for failure, thus it becomes important to also consider the fault-tolerance mechanism of interconnection networks. The main challenge then lies in combining performance and fault-tolerance, while still keeping cost and complexity low. This paper proposes a new deterministic routing methodology for tori and meshes, which achieves high performance without the use of virtual channels. Furthermore, it is topology agnostic in nature, meaning it can handle any topology derived from any combination of faults when combined with static reconfiguration. The algorithm, referred to as segment-based routing (SR), works by partitioning a topology into subnets, and subnets into segments. This allows us to place bidirectional turn restrictions locally within a segment. As segments are independent, we gain the freedom to place turn restrictions within a segment independently from other segments. This results in a larger degree of freedom when placing turn restrictions compared to other routing strategies. In this paper a way to compute segment-based routing tables is presented and applied to meshes and tori. Evaluation results show that SR increases performance by a factor of 1.8 over FX and up*/down* routing.

...read moreread less

143 citations

Proceedings Article•DOI•

A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks

[...]

José Duato, Ian Johnson, Jose Flich¹, F. Naven, Pedro Javier Garcia², T. Nachiondo¹ - Show less +2 more•Institutions (2)

University of Valencia¹, University of Castilla–La Mancha²

12 Feb 2005

TL;DR: A new congestion management strategy for lossless multistage interconnection networks that scales as network size and/or link bandwidth increase and completely eliminates the performance degradation produced by HOL blocking while using only a small number of additional queues.

...read moreread less

Abstract: In this paper, we propose a new congestion management strategy for lossless multistage interconnection networks that scales as network size and/or link bandwidth increase Instead of eliminating congestion, our strategy avoids performance degradation beyond the saturation point by eliminating the HOL blocking produced by congestion trees This is achieved in a scalable manner by using separate queues for congested flows These are dynamically allocated only when congestion arises, and deallocated when congestion subsides Performance evaluation results show that our strategy responds to congestion immediately and completely eliminates the performance degradation produced by HOL blocking while using only a small number of additional queues

...read moreread less

118 citations

Journal Article•DOI•

A Survey and Evaluation of Topology-Agnostic Deterministic Routing Algorithms

[...]

Jose Flich, Tor Skeie¹, A. Mejia², Olav Lysne¹, Pedro López, Antonio Robles, José Duato, Michihiro Koibuchi, Tomas Rokicki, Jose Carlos Sancho - Show less +6 more•Institutions (2)

University of Oslo¹, Intel²

01 Mar 2012-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper presents a comprehensive overview of the known topology-agnostic routing algorithms, classify these algorithms by their most important properties, and evaluate them consistently, providing significant insight into the algorithms and their appropriateness for different on- and off-chip environments.

...read moreread less

Abstract: Most standard cluster interconnect technologies are flexible with respect to network topology. This has spawned a substantial amount of research on topology-agnostic routing algorithms, which make no assumption about the network structure, thus providing the flexibility needed to route on irregular networks. Actually, such an irregularity should be often interpreted as minor modifications of some regular interconnection pattern, such as those induced by faults. In fact, topology-agnostic routing algorithms are also becoming increasingly useful for networks on chip (NoCs), where faults may make the preferred 2D mesh topology irregular. Existing topology-agnostic routing algorithms were developed for varying purposes, giving them different and not always comparable properties. Details are scattered among many papers, each with distinct conditions, making comparison difficult. This paper presents a comprehensive overview of the known topology-agnostic routing algorithms. We classify these algorithms by their most important properties, and evaluate them consistently. This provides significant insight into the algorithms and their appropriateness for different on- and off-chip environments.

...read moreread less

104 citations

Proceedings Article•DOI•

Efficient unicast and multicast support for CMPs

[...]

Samuel Rodrigo¹, Jose Flich¹, José Duato¹, Mark D. Hummel²•Institutions (2)

Polytechnic University of Valencia¹, Advanced Micro Devices²

08 Nov 2008

TL;DR: In this paper, the authors propose bLBDR, an efficient multicast and broadcast mechanism built on top of LBDR, which performs multicast operations using a logic-based broadcast within a domain (a region with bounds).

...read moreread less

Abstract: Beyond a certain number of cores, multi-core processing chips will require a network-on-chip (NoC) to interconnect the cores and overcome the limitations of a bus. NoCs must be carefully designed to meet constraints like power consumption, area, and ultra low latencies. Although 2D meshes with DOR (dimension-order-routing) meet these constraints, the need for partitioning (e.g. virtual machines, coherency domains) and traffic isolation may prevent the use of DOR routing. Also, core heterogeneity and manufacturing and run-time faults may lead to partially irregular topologies. Routing in these topologies is complex, and previously proposed solutions required routing tables, which drastically increase power consumption, area, and latency. The exception is LBDR (logic-based distributed routing), a flexible routing method for irregular topologies that removes the need for using routing tables (both at end-nodes and switches), thus achieving large savings in chip area and power consumption. But LBDR lacks support for multicast and broadcast, which are required to efficiently support cache coherence protocols both for single and multiple coherence domains. In this paper we propose bLBDR, an efficient multicast and broadcast mechanism built on top of LBDR. bLBDR performs multicast operations using a logic-based broadcast within a domain (a region with bounds). This allows us to isolate the traffic into different domains, thus enabling the concept of visualization at the NoC level. Also, bLBDR extends the concept of routing regions in LBDR by providing a mechanism that allows the flexible definition of multiple domains, sets of network resources. bLBDR fulfills all the practical requirements, including not only low latency and power and area efficiency, but also support for visualization, partitionability, fault-tolerance, traffic isolation and broadcast across the entire network as well as constrained to coherency domains or regions. All this is achieved by a small and power efficient routing logic (7times area savings and 17times power reduction when compared to a routing table in an 8 times 8 mesh network).

...read moreread less

100 citations

Journal Article•DOI•

A routing methodology for achieving fault tolerance in direct networks

[...]

Maria E. Gomez, Nils Agne Nordbotten¹, Jose Flich¹, Pedro López¹, Antonio Robles¹, José Duato, Tor Skeie, Olav Lysne - Show less +4 more•Institutions (1)

IEEE Computer Society¹

01 Apr 2006-IEEE Transactions on Computers

TL;DR: This paper presents a new fault-tolerant routing methodology that does not degrade performance in the absence of faults and tolerates a reasonably large number of faults without disabling any healthy node.

...read moreread less

Abstract: Massively parallel computing systems are being built with thousands of nodes. The interconnection network plays a key role for the performance of such systems. However, the high number of components significantly increases the probability of failure. Additionally, failures in the interconnection network may isolate a large fraction of the machine. It is therefore critical to provide an efficient fault-tolerant mechanism to keep the system running, even in the presence of faults. This paper presents a new fault-tolerant routing methodology that does not degrade performance in the absence of faults and tolerates a reasonably large number of faults without disabling any healthy node. In order to avoid faults, for some source-destination pairs, packets are first sent to an intermediate node and then from this node to the destination node. Fully adaptive routing is used along both subpaths. The methodology assumes a static fault model and the use of a checkpoint/restart mechanism. However, there are scenarios where the faults cannot be avoided solely by using an intermediate node. Thus, we also provide some extensions to the methodology. Specifically, we propose disabling adaptive routing and/or using misrouting on a per-packet basis. We also propose the use of more than one intermediate node for some paths. The proposed fault-tolerant routing methodology is extensively evaluated in terms of fault tolerance, complexity, and performance.

...read moreread less

92 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Collapse

Cited by

PDF

Open Access

More filters

Book•

Computer Architecture, Fifth Edition: A Quantitative Approach

[...]

John L. Hennessy, David A. Patterson

29 Sep 2011

TL;DR: The Fifth Edition of Computer Architecture focuses on this dramatic shift in the ways in which software and technology in the "cloud" are accessed by cell phones, tablets, laptops, and other mobile computing devices.

...read moreread less

Abstract: The computing world today is in the middle of a revolution: mobile clients and cloud computing have emerged as the dominant paradigms driving programming and hardware innovation today. The Fifth Edition of Computer Architecture focuses on this dramatic shift, exploring the ways in which software and technology in the "cloud" are accessed by cell phones, tablets, laptops, and other mobile computing devices. Each chapter includes two real-world examples, one mobile and one datacenter, to illustrate this revolutionary change. Updated to cover the mobile computing revolutionEmphasizes the two most important topics in architecture today: memory hierarchy and parallelism in all its forms.Develops common themes throughout each chapter: power, performance, cost, dependability, protection, programming models, and emerging trends ("What's Next")Includes three review appendices in the printed text. Additional reference appendices are available online.Includes updated Case Studies and completely new exercises.

...read moreread less

984 citations

Journal Article•DOI•

Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives

[...]

Radu Marculescu¹, Umit Y. Ogras¹, Li-Shiuan Peh², Natalie Enright Jerger³, Yatin Hoskote⁴ - Show less +1 more•Institutions (4)

Carnegie Mellon University¹, Princeton University², University of Wisconsin-Madison³, Intel⁴

01 Jan 2009-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper provides a general description of NoC architectures and applications and enumerates several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation.

...read moreread less

Abstract: To alleviate the complex communication problems that arise as the number of on-chip components increases, network-on-chip (NoC) architectures have been recently proposed to replace global interconnects. In this paper, we first provide a general description of NoC architectures and applications. Then, we enumerate several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation. Motivation, problem description, proposed approaches, and open issues are discussed for each problem from system, microarchitecture, and circuit perspectives. Finally, we address the interactions among these research problems and put the NoC design process into perspective.

...read moreread less

733 citations

Proceedings Article•DOI•

GARNET: A detailed on-chip network model inside a full-system simulator

[...]

Niket Agarwal¹, Tushar Krishna¹, Li-Shiuan Peh¹, Niraj K. Jha¹•Institutions (1)

Princeton University¹

26 Apr 2009

TL;DR: In this article, a detailed cycle-accurate interconnection network model (GARNET) is proposed to simulate a CMP architecture with virtual channel (VC) flow control.

...read moreread less

Abstract: Until very recently, microprocessor designs were computation-centric. On-chip communication was frequently ignored. This was because of fast, single-cycle on-chip communication. The interconnect power was also insignificant compared to the transistor power. With uniprocessor designs providing diminishing returns and the advent of chip multiprocessors (CMPs) in mainstream systems, the on-chip network that connects different processing cores has become a critical part of the design. Transistor miniaturization has led to high global wire delay, and interconnect power comparable to transistor power. CMP design proposals can no longer ignore the interaction between the memory hierarchy and the interconnection network that connects various elements. This necessitates a detailed and accurate interconnection network model within a full-system evaluation framework. Ignoring the interconnect details might lead to inaccurate results when simulating a CMP architecture. It also becomes important to analyze the impact of interconnection network optimization techniques on full system behavior. In this light, we developed a detailed cycle-accurate interconnection network model (GARNET), inside the GEMS full-system simulation framework. GARNET models a classic five-stage pipelined router with virtual channel (VC) flow control. Microarchitectural details, such as flit-level input buffers, routing logic, allocators and the crossbar switch, are modeled. GARNET, along with GEMS, provides a detailed and accurate memory system timing model. To demonstrate the importance and potential impact of GARNET, we evaluate a shared and private L2 CMP with a realistic state-of-the-art interconnection network against the original GEMS simple network. The objective of the evaluation was to figure out which configuration is better for a particular workload. We show that not modeling the interconnect in detail might lead to an incorrect outcome. We also evaluate Express Virtual Channels (EVCs), an on-chip network flow control proposal, in a full-system fashion. We show that in improving on-chip network latency-throughput, EVCs do lead to better overall system runtime, however, the impact varies widely across applications.

...read moreread less

719 citations

Proceedings Article•DOI•

Regional congestion awareness for load balance in networks-on-chip

[...]

Paul V. Gratz¹, Boris Grot¹, Stephen W. Keckler¹•Institutions (1)

University of Texas at Austin¹

24 Oct 2008

TL;DR: Regional Congestion Awareness (RCA) is proposed, a lightweight technique to improve global network balance that informs the routing policy of congestion in parts of the network beyond adjacent routers.

...read moreread less

Abstract: Interconnection networks-on-chip (NOCs) are rapidly replacing other forms of interconnect in chip multiprocessors and system-on-chip designs. Existing interconnection networks use either oblivious or adaptive routing algorithms to determine the route taken by a packet to its destination. Despite somewhat higher implementation complexity, adaptive routing enjoys better fault tolerance characteristics, increases network throughput, and decreases latency compared to oblivious policies when faced with non-uniform or bursty traffic. However, adaptive routing can hurt performance by disturbing any inherent global load balance through greedy local decisions. To improve load balance in adapting routing, we propose Regional Congestion Awareness (RCA), a lightweight technique to improve global network balance. Instead of relying solely on local congestion information, RCA informs the routing policy of congestion in parts of the network beyond adjacent routers. Our experiments show that RCA matches or exceeds the performance of conventional adaptive routing across all workloads examined, with a 16% average and 71% maximum latency reduction on SPLASH-2 benchmarks running on a 49-core CMP. Compared to a baseline adaptive router, RCA incurs a negligible logic and modest wiring overhead.

...read moreread less

409 citations

Proceedings Article•DOI•

Congestion Control for Large-Scale RDMA Deployments

[...]

Yibo Zhu¹, Haggai Eran, Daniel Firestone², Chuanxiong Guo², Marina Lipshteyn², Yehonatan Liron, Jitendra Padhye², Shachar Raindel, Mohamad Haj Yahia, Ming Zhang² - Show less +6 more•Institutions (2)

University of California, Santa Barbara¹, Microsoft²

17 Aug 2015

TL;DR: DCQCN, an end-to-end congestion control scheme for RoCEv2, is introduced and it is shown that DCQCN dramatically improves throughput and fairness of Ro CEv2 RDMA traffic.

...read moreread less

Abstract: Modern datacenter applications demand high throughput (40Gbps) and ultra-low latency (< 10 μs per hop) from the network, with low CPU overhead. Standard TCP/IP stacks cannot meet these requirements, but Remote Direct Memory Access (RDMA) can. On IP-routed datacenter networks, RDMA is deployed using RoCEv2 protocol, which relies on Priority-based Flow Control (PFC) to enable a drop-free network. However, PFC can lead to poor application performance due to problems like head-of-line blocking and unfairness. To alleviates these problems, we introduce DCQCN, an end-to-end congestion control scheme for RoCEv2. To optimize DCQCN performance, we build a fluid model, and provide guidelines for tuning switch buffer thresholds, and other protocol parameters. Using a 3-tier Clos network testbed, we show that DCQCN dramatically improves throughput and fairness of RoCEv2 RDMA traffic. DCQCN is implemented in Mellanox NICs, and is being deployed in Microsoft's datacenters.

...read moreread less

398 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse