Home
/
Authors
/
S. Ogrenci Memik

Author

S. Ogrenci Memik

Other affiliations: University of California, Los Angeles

Bio: S. Ogrenci Memik is an academic researcher from Northwestern University. The author has contributed to research in topics: Dynamic priority scheduling & Scheduling (computing). The author has an hindex of 5, co-authored 6 publications receiving 392 citations. Previous affiliations of S. Ogrenci Memik include University of California, Los Angeles.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Instruction generation for hybrid reconfigurable systems

[...]

Ryan Kastner¹, Adam Kaplan², S. Ogrenci Memik², Elaheh Bozorgzadeh²•Institutions (2)

University of California, Santa Barbara¹, University of California, Los Angeles²

01 Oct 2002-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This work presents an algorithm for simultaneous template generation and matching, which can be applied to any type of graph, including directed graphs and hypergraphs, and targets the strategically programmable system.

...read moreread less

Abstract: Future computing systems need to balance flexibility, specialization, and performance in order to meet market demands and the computing power required by new applications. Instruction generation is a vital component for determining these trade-offs. In this work, we present theory and an algorithm for instruction generation. The algorithm profiles a dataflow graph and iteratively contracts edges to create the templates. We discuss how to target the algorithm toward the novel problem of instruction generation for hybrid reconfigurable systems. In particular, we target the Strategically Programmable System, which embeds complex computational units such as ALUs, IP blocks, and so on into a configurable fabric. We argue that an essential compilation step for these systems is instruction generation, as it is needed to specify the functionality of the embedded computational units. In addition, instruction generation can be used to create soft reconfigurable macros---tightly sequenced prespecified operations placed in the reconfigurable fabric.

...read moreread less

210 citations

Proceedings Article•DOI•

Temperature-aware resource allocation and binding in high-level synthesis

[...]

Rajarshi Mukherjee¹, S. Ogrenci Memik¹, Gokhan Memik¹•Institutions (1)

Northwestern University¹

13 Jun 2005

TL;DR: Two temperature-aware resource allocation and binding algorithms that aim to minimize the maximum temperature that can be reached by a resource in a design and have an impact on the prevention of hot spots are developed.

...read moreread less

Abstract: Physical phenomena such as temperature have an increasingly important role in performance and reliability of modern process technologies. This trend will only strengthen with future generations. Attempts to minimize the design effort required for reaching closure in reliability and performance constraints are agreeing on the fact that higher levels of design abstractions need to be made aware of lower level physical phenomena. In this paper, we investigated techniques to incorporate temperature-awareness into high-level synthesis. Specifically, we developed two temperature-aware resource allocation and binding algorithms that aim to minimize the maximum temperature that can be reached by a resource in a design. Such a control scheme will have an impact on the prevention of hot spots, which in turn is one of the major hurdles in front of reliability for future integrated circuits. Our algorithms are able to reduce the maximum attained temperature by any module in a design by up to 19.6/spl deg/C compared to a binding that optimizes switching power.

...read moreread less

63 citations

Journal Article•DOI•

ROUTABILITY-DRIVEN PACKING: METRICS AND ALGORITHMS FOR CLUSTER-BASED FPGAs

[...]

Eli Bozorgzadeh¹, S. Ogrenci Memik², X. Yang, Majid Sarrafzadeh³•Institutions (3)

University of California, Irvine¹, Northwestern University², University of California, Los Angeles³

01 Feb 2004-Journal of Circuits, Systems, and Computers

TL;DR: A routability-driven clustering method for cluster-based FPGAs that packs LUTs into logic clusters while incorporating routability metrics into a cost function and integrates the routability model into a timing-driven packing algorithm.

...read moreread less

Abstract: Most of the FPGA's area and delay are due to routing. Considering routability at earlier steps of the CAD flow would both yield better quality and faster design process. In this paper, we discuss the metrics that affect routability in packing logic into clusters. We are presenting a routability-driven clustering method for cluster-based FPGAs. Our method packs LUTs into logic clusters while incorporating routability metrics into a cost function. Based on our routability model, the routability in timing-driven packing algorithm is analyzed. We integrate our routability model into a timing-driven packing algorithm. Our method yields up to 50% improvement in terms of the minimum number of routing tracks compared to VPack (16.5% on average). The average routing area improvement is 27% over VPack and 12% over t-VPack.

...read moreread less

51 citations

Proceedings Article•DOI•

A super-scheduler for embedded reconfigurable systems

[...]

S. Ogrenci Memik¹, Elaheh Bozorgzadeh¹, Ryan Kastner¹, Majid Sarrafzadeh¹•Institutions (1)

University of California, Los Angeles¹

04 Nov 2001

TL;DR: An algorithm to perform simultaneous scheduling and binding, targeting embedded reconfigurable systems, and is referred to as a super-scheduler, in order to perform the trade-off between maximally utilizing the high-performance embedded blocks and exploiting parallelism in the schedule.

...read moreread less

Abstract: Emerging reconfigurable systems attain high peformance with embedded optimized cores. For mapping designs on such special architectures, synthesis tools, that are aware of the special capabilities of the underlying architecture are necessary. In this paper we are proposing an algorithm to perform simultaneous scheduling and binding, targeting embedded reconfigurable systems. Our algorithm differs from traditional scheduling methods in its capability of efficiently utilizing embedded blocks within the reconfigurable system. Our algorithm can be used to implement several other scheduling techniques, such as ASAP, ALAP, and list scheduling. Hence we refer to it as a super-scheduler. Our algorithm is a path-based scheduling algorithm. At each step, an individual path from the input DFG is scheduled. Our experiments with several DFG's extracted from MediaBench suit indicate promising results. Our scheduler presents capability to perform the trade-off between maximally utilizing the high-performance embedded blocks and exploiting parallelism in the schedule.

...read moreread less

50 citations

Journal Article•DOI•

An ILP formulation for the task graph scheduling problem tailored to bi-dimensional reconfigurable architectures

[...]

F. Redaelli¹, Marco D. Santambrogio², S. Ogrenci Memik³•Institutions (3)

Polytechnic University of Milan¹, Massachusetts Institute of Technology², Northwestern University³

01 Jan 2009-International Journal of Reconfigurable Computing

TL;DR: An exact ILP formulation for the task scheduling problem on a 2D dynamically and partially reconfigurable architecture and a reconfiguration-aware heuristic scheduler, which exploits configuration prefetching, module reuse, and antifragmentation techniques are proposed.

...read moreread less

Abstract: This work proposes an exact ILP formulation for the task scheduling problem on a 2D dynamically and partially reconfigurable architecture. Our approach takes physical constraints of the target device that is relevant for reconfiguration into account. Specifically, we consider the limited number of reconfigurators, which are used to reconfigure the device. This work also proposes a reconfiguration-aware heuristic scheduler, which exploits configuration prefetching, module reuse, and antifragmentation techniques. We experimented with a system employing two reconfigurators. This work also extends the ILP formulation for a HW/SW Codesign scenario. A heuristic scheduler for this extension has been developed too. These systems can be easily implemented using standard FPGAs. Our approach is able to improve the schedule quality by 8.76% on average (22.22% in the best case). Furthermore, our heuristic scheduler obtains the optimal schedule length in 60% of the considered cases. Our extended analysis demonstrated that HW/SW codesign can indeed lead to significantly better results. Our experiments show that by using our proposed HW/SW codesign method, the schedule length of applications can be reduced by a factor of 2 in the best case.

...read moreread less

18 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Automatic application-specific instruction-set extensions under microarchitectural constraints

[...]

Kubilay Atasu¹, Laura Pozzi², Paolo Ienne²•Institutions (2)

Boğaziçi University¹, École Polytechnique Fédérale de Lausanne²

02 Jun 2003

TL;DR: In this article, a more general algorithm which selects maximal speedup convex subgraphs of the application dataflow graph under fundamental micro-architectural constraints is presented, which improves significantly on the state of the art.

...read moreread less

Abstract: Many commercial processors now offer the possibility of extending their instruction set for a specific application - that is, to introduce customized functional units. There is a need to develop algorithms that decide automatically, from high-level application code, which operations are to be carried out in the customized extensions. A few algorithms exist but are severely limited in the type of operation clusters they can choose and hence reduce significantly the effectiveness of specialization. In this paper, we introduce a more general algorithm which selects maximal-speedup convex subgraphs of the application dataflow graph under fundamental microarchitectural constraints, and which improves significantly on the state of the art.

...read moreread less

355 citations

Proceedings Article•DOI•

Application-specific instruction generation for configurable processor architectures

[...]

Jason Cong¹, Yiping Fan¹, Guoling Han¹, Zhiru Zhang¹•Institutions (1)

University of California, Los Angeles¹

22 Feb 2004

TL;DR: A set of algorithms, including pattern generation, pattern selection, and application mapping, are proposed to efficiently utilize the instruction set extensibility of the target configurable processor.

...read moreread less

Abstract: Designing an application-specific embedded system in nanometer technologies has become more difficult than ever due to the rapid increase in design complexity and manufacturing cost. Efficiency and flexibility must be carefully balanced to meet different application requirements. The recently emerged configurable and extensible processor architectures offer a favorable tradeoff between efficiency and flexibility, and a promising way to minimize certain important metrics (e.g., execution time, code size, etc.) of the embedded processors. This paper addresses the problem of generating the application-specific instructions to improve the execution speed for configurable processors. A set of algorithms, including pattern generation, pattern selection, and application mapping, are proposed to efficiently utilize the instruction set extensibility of the target configurable processor. Applications of our approach to several real-life benchmarks on the Altera Nios processor show encouraging performance speedup (2.75X on average and up to 3.73X in some cases).

...read moreread less

255 citations

Proceedings Article•DOI•

Processor acceleration through automated instruction set customization

[...]

Nathan Clark¹, Hongtao Zhong¹, Scott Mahlke¹•Institutions (1)

University of Michigan¹

03 Dec 2003

TL;DR: This paper presents the design of a system to automate the instruction set customization process, which contains a compiler subgraphmatching framework that identifies opportunities to exploit and generalize the hardware to support more computationgraphs.

...read moreread less

Abstract: Application-specific extensions to the computational capabilities of a processor provide an efficient mechanism to meet the growing performance and power demands of embedded applications. Hardware, in the form of new function units (or co-processors), and the corresponding instructions, are added to a baseline processor to meet the critical computational demands of a target application. The central challenge with this approach is the large degree of human effort required to identify and create the custom hardware units, as well as porting the application to the extended processor. In this paper, we present the design of a system to automate the instruction set customization process. A dataflow graph design space exploration engine efficiently identifies profitable computation subgraphs from which to create custom hardware, without artificially constraining their size or shape. The system also contains a compiler subgraph matching framework that identifies opportunities to exploit and generalize the hardware to support more computation graphs. We demonstrate the effectiveness of this system across a range of application domains and study the applicability of the custom hardware across the domain.

...read moreread less

218 citations

Journal Article•DOI•

Exact and approximate algorithms for the extension of embedded processor instruction sets

[...]

Laura Pozzi¹, Kubilay Atasu², Paolo Ienne²•Institutions (2)

École Polytechnique¹, École Polytechnique Fédérale de Lausanne²

01 Jul 2006-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: In this paper, a set of algorithms are proposed to find the best instruction set extensions (ISEs) for a given application, based on a detailed analysis of the application code.

...read moreread less

Abstract: In embedded computing, cost, power, and performance constraints call for the design of specialized processors, rather than for the use of the existing off-the-shelf solutions. While the design of these application-specific CPUs could be tackled from scratch, a cheaper and more effective option is that of extending the existing processors and toolchains. Extensibility is indeed a feature now offered in real designs, e.g., by processors such as Tensilica Xtensa [T. R. Halfhill, Microprocess Rep., 2003], ARC ARCtangent [T. R. Halfhill, Microprocess Rep., 2000], STMicroelectronics ST200 [P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood, Proc. 27th Annu. Int. Symp. Computer Architecture, 2000, p. 203], and MIPS CorExtend [T. R. Halfhill, Microprocess Rep., 2003]. While all these processors provide development environments with simulation capabilities for evaluating efficiently hand-crafted solutions, the tools to identify automatically the best processor configuration for a given application are less common. In particular, solutions to choose specialized instruction-set extensions (ISEs) have been investigated in the past years but are still seldom part of commercial toolchains. This paper provides a formal methodology and a set of algorithms that help address the problem. It proposes exact algorithms to derive optimal ISEs; exact identification of a single ISE is applicable to basic blocks of up to 1500 assembler-like instructions. This paper also introduces approximate methods that can process basic blocks of larger size. Results show that the described algorithms find solutions close to those that a designer would obtain by a detailed study of the application code. Both heuristic and exact algorithms find ISEs able to speed up unextended processors up to 5.0x. State-of-the-art comparisons show that the presented algorithms outperform existing ones by up to 2.6x

...read moreread less

212 citations

Journal Article•DOI•

Instruction generation for hybrid reconfigurable systems

[...]