scispace - formally typeset
Search or ask a question
Author

S. Ogrenci Memik

Bio: S. Ogrenci Memik is an academic researcher from Northwestern University. The author has contributed to research in topics: Dynamic priority scheduling & Scheduling (computing). The author has an hindex of 5, co-authored 6 publications receiving 392 citations. Previous affiliations of S. Ogrenci Memik include University of California, Los Angeles.

Papers
More filters
Journal ArticleDOI
TL;DR: This work presents an algorithm for simultaneous template generation and matching, which can be applied to any type of graph, including directed graphs and hypergraphs, and targets the strategically programmable system.
Abstract: Future computing systems need to balance flexibility, specialization, and performance in order to meet market demands and the computing power required by new applications. Instruction generation is a vital component for determining these trade-offs. In this work, we present theory and an algorithm for instruction generation. The algorithm profiles a dataflow graph and iteratively contracts edges to create the templates. We discuss how to target the algorithm toward the novel problem of instruction generation for hybrid reconfigurable systems. In particular, we target the Strategically Programmable System, which embeds complex computational units such as ALUs, IP blocks, and so on into a configurable fabric. We argue that an essential compilation step for these systems is instruction generation, as it is needed to specify the functionality of the embedded computational units. In addition, instruction generation can be used to create soft reconfigurable macros---tightly sequenced prespecified operations placed in the reconfigurable fabric.

210 citations

Proceedings ArticleDOI
13 Jun 2005
TL;DR: Two temperature-aware resource allocation and binding algorithms that aim to minimize the maximum temperature that can be reached by a resource in a design and have an impact on the prevention of hot spots are developed.
Abstract: Physical phenomena such as temperature have an increasingly important role in performance and reliability of modern process technologies. This trend will only strengthen with future generations. Attempts to minimize the design effort required for reaching closure in reliability and performance constraints are agreeing on the fact that higher levels of design abstractions need to be made aware of lower level physical phenomena. In this paper, we investigated techniques to incorporate temperature-awareness into high-level synthesis. Specifically, we developed two temperature-aware resource allocation and binding algorithms that aim to minimize the maximum temperature that can be reached by a resource in a design. Such a control scheme will have an impact on the prevention of hot spots, which in turn is one of the major hurdles in front of reliability for future integrated circuits. Our algorithms are able to reduce the maximum attained temperature by any module in a design by up to 19.6/spl deg/C compared to a binding that optimizes switching power.

63 citations

Journal ArticleDOI
TL;DR: A routability-driven clustering method for cluster-based FPGAs that packs LUTs into logic clusters while incorporating routability metrics into a cost function and integrates the routability model into a timing-driven packing algorithm.
Abstract: Most of the FPGA's area and delay are due to routing. Considering routability at earlier steps of the CAD flow would both yield better quality and faster design process. In this paper, we discuss the metrics that affect routability in packing logic into clusters. We are presenting a routability-driven clustering method for cluster-based FPGAs. Our method packs LUTs into logic clusters while incorporating routability metrics into a cost function. Based on our routability model, the routability in timing-driven packing algorithm is analyzed. We integrate our routability model into a timing-driven packing algorithm. Our method yields up to 50% improvement in terms of the minimum number of routing tracks compared to VPack (16.5% on average). The average routing area improvement is 27% over VPack and 12% over t-VPack.

51 citations

Proceedings ArticleDOI
04 Nov 2001
TL;DR: An algorithm to perform simultaneous scheduling and binding, targeting embedded reconfigurable systems, and is referred to as a super-scheduler, in order to perform the trade-off between maximally utilizing the high-performance embedded blocks and exploiting parallelism in the schedule.
Abstract: Emerging reconfigurable systems attain high peformance with embedded optimized cores. For mapping designs on such special architectures, synthesis tools, that are aware of the special capabilities of the underlying architecture are necessary. In this paper we are proposing an algorithm to perform simultaneous scheduling and binding, targeting embedded reconfigurable systems. Our algorithm differs from traditional scheduling methods in its capability of efficiently utilizing embedded blocks within the reconfigurable system. Our algorithm can be used to implement several other scheduling techniques, such as ASAP, ALAP, and list scheduling. Hence we refer to it as a super-scheduler. Our algorithm is a path-based scheduling algorithm. At each step, an individual path from the input DFG is scheduled. Our experiments with several DFG's extracted from MediaBench suit indicate promising results. Our scheduler presents capability to perform the trade-off between maximally utilizing the high-performance embedded blocks and exploiting parallelism in the schedule.

50 citations

Journal ArticleDOI
TL;DR: An exact ILP formulation for the task scheduling problem on a 2D dynamically and partially reconfigurable architecture and a reconfiguration-aware heuristic scheduler, which exploits configuration prefetching, module reuse, and antifragmentation techniques are proposed.
Abstract: This work proposes an exact ILP formulation for the task scheduling problem on a 2D dynamically and partially reconfigurable architecture. Our approach takes physical constraints of the target device that is relevant for reconfiguration into account. Specifically, we consider the limited number of reconfigurators, which are used to reconfigure the device. This work also proposes a reconfiguration-aware heuristic scheduler, which exploits configuration prefetching, module reuse, and antifragmentation techniques. We experimented with a system employing two reconfigurators. This work also extends the ILP formulation for a HW/SW Codesign scenario. A heuristic scheduler for this extension has been developed too. These systems can be easily implemented using standard FPGAs. Our approach is able to improve the schedule quality by 8.76% on average (22.22% in the best case). Furthermore, our heuristic scheduler obtains the optimal schedule length in 60% of the considered cases. Our extended analysis demonstrated that HW/SW codesign can indeed lead to significantly better results. Our experiments show that by using our proposed HW/SW codesign method, the schedule length of applications can be reduced by a factor of 2 in the best case.

18 citations


Cited by
More filters
Proceedings ArticleDOI
02 Jun 2003
TL;DR: In this article, a more general algorithm which selects maximal speedup convex subgraphs of the application dataflow graph under fundamental micro-architectural constraints is presented, which improves significantly on the state of the art.
Abstract: Many commercial processors now offer the possibility of extending their instruction set for a specific application - that is, to introduce customized functional units. There is a need to develop algorithms that decide automatically, from high-level application code, which operations are to be carried out in the customized extensions. A few algorithms exist but are severely limited in the type of operation clusters they can choose and hence reduce significantly the effectiveness of specialization. In this paper, we introduce a more general algorithm which selects maximal-speedup convex subgraphs of the application dataflow graph under fundamental microarchitectural constraints, and which improves significantly on the state of the art.

355 citations

Proceedings ArticleDOI
22 Feb 2004
TL;DR: A set of algorithms, including pattern generation, pattern selection, and application mapping, are proposed to efficiently utilize the instruction set extensibility of the target configurable processor.
Abstract: Designing an application-specific embedded system in nanometer technologies has become more difficult than ever due to the rapid increase in design complexity and manufacturing cost. Efficiency and flexibility must be carefully balanced to meet different application requirements. The recently emerged configurable and extensible processor architectures offer a favorable tradeoff between efficiency and flexibility, and a promising way to minimize certain important metrics (e.g., execution time, code size, etc.) of the embedded processors. This paper addresses the problem of generating the application-specific instructions to improve the execution speed for configurable processors. A set of algorithms, including pattern generation, pattern selection, and application mapping, are proposed to efficiently utilize the instruction set extensibility of the target configurable processor. Applications of our approach to several real-life benchmarks on the Altera Nios processor show encouraging performance speedup (2.75X on average and up to 3.73X in some cases).

255 citations

Proceedings ArticleDOI
03 Dec 2003
TL;DR: This paper presents the design of a system to automate the instruction set customization process, which contains a compiler subgraphmatching framework that identifies opportunities to exploit and generalize the hardware to support more computationgraphs.
Abstract: Application-specific extensions to the computational capabilities of a processor provide an efficient mechanism to meet the growing performance and power demands of embedded applications. Hardware, in the form of new function units (or co-processors), and the corresponding instructions, are added to a baseline processor to meet the critical computational demands of a target application. The central challenge with this approach is the large degree of human effort required to identify and create the custom hardware units, as well as porting the application to the extended processor. In this paper, we present the design of a system to automate the instruction set customization process. A dataflow graph design space exploration engine efficiently identifies profitable computation subgraphs from which to create custom hardware, without artificially constraining their size or shape. The system also contains a compiler subgraph matching framework that identifies opportunities to exploit and generalize the hardware to support more computation graphs. We demonstrate the effectiveness of this system across a range of application domains and study the applicability of the custom hardware across the domain.

218 citations

Journal ArticleDOI
TL;DR: In this paper, a set of algorithms are proposed to find the best instruction set extensions (ISEs) for a given application, based on a detailed analysis of the application code.
Abstract: In embedded computing, cost, power, and performance constraints call for the design of specialized processors, rather than for the use of the existing off-the-shelf solutions. While the design of these application-specific CPUs could be tackled from scratch, a cheaper and more effective option is that of extending the existing processors and toolchains. Extensibility is indeed a feature now offered in real designs, e.g., by processors such as Tensilica Xtensa [T. R. Halfhill, Microprocess Rep., 2003], ARC ARCtangent [T. R. Halfhill, Microprocess Rep., 2000], STMicroelectronics ST200 [P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood, Proc. 27th Annu. Int. Symp. Computer Architecture, 2000, p. 203], and MIPS CorExtend [T. R. Halfhill, Microprocess Rep., 2003]. While all these processors provide development environments with simulation capabilities for evaluating efficiently hand-crafted solutions, the tools to identify automatically the best processor configuration for a given application are less common. In particular, solutions to choose specialized instruction-set extensions (ISEs) have been investigated in the past years but are still seldom part of commercial toolchains. This paper provides a formal methodology and a set of algorithms that help address the problem. It proposes exact algorithms to derive optimal ISEs; exact identification of a single ISE is applicable to basic blocks of up to 1500 assembler-like instructions. This paper also introduces approximate methods that can process basic blocks of larger size. Results show that the described algorithms find solutions close to those that a designer would obtain by a detailed study of the application code. Both heuristic and exact algorithms find ISEs able to speed up unextended processors up to 5.0x. State-of-the-art comparisons show that the presented algorithms outperform existing ones by up to 2.6x

212 citations

Journal ArticleDOI
TL;DR: This work presents an algorithm for simultaneous template generation and matching, which can be applied to any type of graph, including directed graphs and hypergraphs, and targets the strategically programmable system.
Abstract: Future computing systems need to balance flexibility, specialization, and performance in order to meet market demands and the computing power required by new applications. Instruction generation is a vital component for determining these trade-offs. In this work, we present theory and an algorithm for instruction generation. The algorithm profiles a dataflow graph and iteratively contracts edges to create the templates. We discuss how to target the algorithm toward the novel problem of instruction generation for hybrid reconfigurable systems. In particular, we target the Strategically Programmable System, which embeds complex computational units such as ALUs, IP blocks, and so on into a configurable fabric. We argue that an essential compilation step for these systems is instruction generation, as it is needed to specify the functionality of the embedded computational units. In addition, instruction generation can be used to create soft reconfigurable macros---tightly sequenced prespecified operations placed in the reconfigurable fabric.

210 citations