scispace - formally typeset
Search or ask a question

Showing papers by "Srinivas Devadas published in 2009"


Journal ArticleDOI
TL;DR: The Stochastic Simulator Compiler is the first tool to allow a readable high-level description with spatially heterogeneous simulation algorithms and complex geometries; this permits large systems to be expressed concisely and direct native-code compilation allows SSC to generate very fast simulations.
Abstract: We present the Stochastic Simulator Compiler (SSC), a tool for exact stochastic simulations of well-mixed and spatially heterogeneous systems. SSC is the first tool to allow a readable high-level description with spatially heterogeneous simulation algorithms and complex geometries; this permits large systems to be expressed concisely. Meanwhile, direct native-code compilation allows SSC to generate very fast simulations. Availability: SSC currently runs on Linux and Mac OS X, and is freely available at http://web.mit.edu/irc/ssc/. Contact: mieszko@csail.mit.edu Supplementary information:Supplementary data are available at Bioinformatics online.

79 citations


Proceedings ArticleDOI
20 Jun 2009
TL;DR: This work presents a framework for application-aware routing that assures deadlock-freedom under one or more channels by forcing routes to conform to an acyclic channel dependence graph and presents a mixed integer-linear programming (MILP) approach and a heuristic approach for producing deadlocked routes that minimize maximum channel load.
Abstract: Conventional oblivious routing algorithms are either not application-aware or assume that each flow has its own private channel to ensure deadlock avoidance. We present a framework for application-aware routing that assures deadlock-freedom under one or more channels by forcing routes to conform to an acyclic channel dependence graph. Arbitrary minimal routes can be made deadlock-free through appropriate static channel allocation when two or more channels are available. Given bandwidth estimates for flows, we present a mixed integer-linear programming (MILP) approach and a heuristic approach for producing deadlock-free routes that minimize maximum channel load. The heuristic algorithm is calibrated using the MILP algorithm and evaluated on a number of benchmarks through detailed network simulation. Our framework can be used to produce application-aware routes that target the minimization of latency, number of flows through a link, bandwidth, or any combination thereof.

79 citations


01 Jan 2009
TL;DR: In this paper, the authors present a framework for application-aware routing that assures deadlock-freedom under one or more channels by forcing routes to conform to an acyclic channel dependence graph.
Abstract: Conventional oblivious routing algorithms are either not application-aware or assume that each flow has its own private channel to ensure deadlock avoidance. We present a framework for application-aware routing that assures deadlock-freedom under one or more channels by forcing routes to conform to an acyclic channel dependence graph. Arbitrary minimal routes can be made deadlock-free through appropriate static channel allocation when two or more channels are available. Given bandwidth estimates for flows, we present a mixed integer-linear programming (MILP) approach and a heuristic approach for producing deadlock-free routes that minimize maximum channel load. The heuristic algorithm is calibrated using the MILP algorithm and evaluated on a number of benchmarks through detailed network simulation. Our framework can be used to produce application-aware routes that target the minimization of latency, number of flows through a link, bandwidth, or any combination thereof.

78 citations


Proceedings ArticleDOI
10 May 2009
TL;DR: Methods that statically allocate channels to flows at each link when oblivious routing is used, and ensure deadlock freedom for arbitrary minimal routes when two or more virtual channels are available are presented.
Abstract: Most virtual channel routers have multiple virtual channels to mitigate the effects of head-of-line blocking. When there are more flows than virtual channels at a link, packets or flows must compete for channels, either in a dynamic way at each link or by static assignment computed before transmission starts. In this paper, we present methods that statically allocate channels to flows at each link when oblivious routing is used, and ensure deadlock freedom for arbitrary minimal routes when two or more virtual channels are available. We then experimentally explore the performance trade-offs of static and dynamic virtual channel allocation for various oblivious routing methods, including DOR, ROMM, Valiant and a novel bandwidth-sensitive oblivious routing scheme (BSORM). Through judicious separation of flows, static allocation schemes often exceed the performance of dynamic allocation schemes.

50 citations


Proceedings ArticleDOI
12 Sep 2009
TL;DR: This work proposes onchip bandwidth-adaptive networks to mitigate the performance problems of oblivious routing and the complexity issues of adaptive routing, and describes one implementation of a bandwidth- Adaptive network in the form of a two-dimensional mesh with adaptive bidirectional links.
Abstract: Oblivious routing can be implemented on simple router hardware, but network performance suffers when routes become congested. Adaptive routing attempts to avoid hot spots by re-routing flows, but requires more complex hardware to determine and configure new routing paths. We propose onchip bandwidth-adaptive networks to mitigate the performance problems of oblivious routing and the complexity issues of adaptive routing. In a bandwidth-adaptive network, the bisection bandwidth of network can adapt to changing network conditions. We describe one implementation of a bandwidth-adaptive network in the form of a two-dimensional mesh with adaptive bidirectional links, where the bandwidth of the link in one direction can be increased at the expense of the other direction. Efficient local intelligence is used to reconfigure each link, and this reconfiguration can be done very rapidly in response to changing traffic demands. We compare the hardware designs of a unidirectional and bidirectional link and evaluate the performance gains provided by a bandwidth-adaptive network in comparison to a conventional network under uniform and bursty traffic when oblivious routing is used.

48 citations


Book ChapterDOI
01 Jan 2009
TL;DR: This chapter covers classic elements of logic synthesis for combinational circuits, including basic data structures for Boolean function representation and reasoning, technology-independent logic minimization,Technology-dependent circuit optimization, timing analysis, and timing optimization are discussed.
Abstract: Publisher Summary Logic synthesis is the process of automatic production of logic components, in particular digital circuits. It is a subject about how to abstract and represent logic circuits, how to manipulate and transform them, and how to analyze and optimize them. Not only does it play a crucial role in the electronic design automation flow, its techniques also find broader applications in formal verification, software synthesis, and other fields. This chapter covers classic elements of logic synthesis for combinational circuits. After introducing basic data structures for Boolean function representation and reasoning, technology-independent logic minimization, technology-dependent circuit optimization, timing analysis, and timing optimization are discussed. Some advanced subjects and important trends are presented as well for further exploration. Logic synthesis is the process that takes place in the transition from the register-transfer level to the transistor level. It bridges the gap between high-level synthesis and physical design automation. Given a digital design at the register-transfer level, logic synthesis transforms it into a gate-level or transistor-level implementation. The highly engineered process explores different ways of implementing a logic function optimal with respect to some desired design constraints. The physical positions and interconnections of the gate layouts are then further determined at the time of physical design.

26 citations


01 Jan 2009
TL;DR: Path-based, randomized, oblivious, minimal, path-diverse routing (PROM) as mentioned in this paper is a family of oblivious routing algorithms especially suitable for network-on-chip applications with n x n mesh geometry.
Abstract: Path-based, Randomized, Oblivious, Minimal routing (PROM) is a family of oblivious, minimal, path-diverse routing algorithms especially suitable for Network-on-Chip applications with n x n mesh geometry Rather than choosing among all possible paths at the source node, PROM algorithms achieve the same effect progressively through efficient, local randomized decisions at each hop Routing is deadlock-free in all PROM algorithms when the routers have at least two virtual channelsWhile the approach we present can be viewed as a generalization of both ROMM and O1TURN routing, it combines the low-hardware cost of O1TURN with the routing diversity offered by the most complex n-phase ROMM schemes As all PROM algorithms employ the same hardware, a wide range of routing behaviors, from O1TURN-equivalent to uniformly path-diverse, can be effected by adjusting just one parameter, even while the network is live and continues to forward packets Detailed simulation on a set of benchmarks indicates that, on equivalent hardware, the performance of PROM algorithms compares favorably to existing oblivious routing algorithms, including dimension-ordered routing, two-phase ROMM, and O1TURN

22 citations


18 Aug 2009
TL;DR: Exclusive Dynamic VCA is presented, an oblivious virtual channel allocation scheme which combines the performance advantages of dynamic virtual allocation with in-network, deadlock-free in-order delivery, and reduces head-of-line blocking.
Abstract: In-order packet delivery, a critical abstraction for many higher-level protocols, can severely limit the performance potential in low-latency networks (common, for example, in network-on-chip designs with many cores). While basic variants of dimension-order routing guarantee in-order delivery, improving performance by adding multiple dynamically allocated virtual channels or using other routing schemes compromises this guarantee. Although this can be addressed by reordering out-of-order packets at the destination core, such schemes incur significant overheads, and, in the worst case, raise the specter of deadlock or require expensive retransmission. We present Exclusive Dynamic VCA, an oblivious virtual channel allocation scheme which combines the performance advantages of dynamic virtual allocation with in-network, deadlock-free in-order delivery. At the same time, our scheme reduces head-of-line blocking, often significantly improving throughput compared to equivalent baseline (out-of-order) dimension-order routing when multiple virtual channels are used, and so may be desirable even when in-order delivery is not required. Implementation requires only minor, inexpensive changes to traditional oblivious dimension-order router architectures, more than offset by the removal of packet reorder buffers and logic.

19 citations


Proceedings ArticleDOI
12 Dec 2009
TL;DR: Detailed simulation on a set of benchmarks indicates that, on equivalent hardware, the performance of PROM algorithms compares favorably to existing oblivious routing algorithms, including dimension-ordered routing, two-phase ROMM, and O1TURN.
Abstract: Path-based, Randomized, Oblivious, Minimal routing (PROM) is a family of oblivious, minimal, path-diverse routing algorithms especially suitable for Network-on-Chip applications with n x n mesh geometry. Rather than choosing among all possible paths at the source node, PROM algorithms achieve the same effect progressively through efficient, local randomized decisions at each hop. Routing is deadlock-free in all PROM algorithms when the routers have at least two virtual channels.While the approach we present can be viewed as a generalization of both ROMM and O1TURN routing, it combines the low-hardware cost of O1TURN with the routing diversity offered by the most complex n-phase ROMM schemes. As all PROM algorithms employ the same hardware, a wide range of routing behaviors, from O1TURN-equivalent to uniformly path-diverse, can be effected by adjusting just one parameter, even while the network is live and continues to forward packets. Detailed simulation on a set of benchmarks indicates that, on equivalent hardware, the performance of PROM algorithms compares favorably to existing oblivious routing algorithms, including dimension-ordered routing, two-phase ROMM, and O1TURN.

16 citations


01 Nov 2009
TL;DR: The practical use of this work is demonstrated by applying these strategies, and combinations thereof, to implement several different parallelizations of a multicore H.264 encoder for high-definition video.
Abstract: We describe four partitioning strategies, or patterns, used to decompose a serial application into multiple concurrently executing parts. These partitioning strategies augment the commonly used task and data parallel patterns by recognizing that applications are spatiotemporal in nature. Therefore, data and instruction decomposition are further distinguished by whether the partitioning is done in the spatial or in temporal dimension. Thus, we arrive at four decomposition strategies: spatial data partitioning (SDP), temporal data partitioning (TDP), spatial instruction partitioning (SIP), and temporal instruction partitioning (TIP), and catalog the benefits and drawbacks of each. In addition, the practical use of this work is demonstrated by applying these strategies, and combinations thereof, to implement several different parallelizations of a multicore H.264 encoder for high-definition video.

6 citations


16 Jun 2009
TL;DR: This work presents four partitioning strategies, or design patterns, useful for decomposing a serial application into multiple concurrently executing parts, while cataloging the benefits and drawbacks of each and combining these strategies to realize the benefits of multiple patterns in the same program.
Abstract: This work presents four partitioning strategies, or design patterns, useful for decomposing a serial application into multiple concurrently executing parts. These partitioning strategies augment the commonly used task and data parallel design patterns by recognizing that applications are spatiotemporal in nature. Therefore, data and instruction decomposition are further distinguished by whether the partitioning is done in the spatial or in temporal dimension. Thus, this work describes four decomposition strategies: spatial data partitioning (SDP), temporal data partitioning (TDP), spatial instruction partitioning (SIP), and temporal instruction partitioning (TIP), while cataloging the benefits and drawbacks of each. These strategies can be combined to realize the benefits of multiple patterns in the same program. The practical use of the partitioning strategies is demonstrated through a case study which implements several different parallelizations of a multicore H.264 encoder for HD video. This case study illustrates the application of the patterns, their effects on the performance of the encoder, and the combination of multiple strategies in a single program.

Book ChapterDOI
14 May 2009
TL;DR: PartiFold-Align as discussed by the authors exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding.
Abstract: Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We presentpartiFold-Align , the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm's complexity is polynomial in time and space. Algorithmically,partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane β -barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments,partiFold-Align significantly outperforms state-of-the-art pairwise sequence alignment tools in the most difficult low sequence homology case and improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families. partiFold-Align is available at http://partiFold.csail.mit.edu .