Author
Bulent Abali
Bio: Bulent Abali is an academic researcher from IBM. The author has contributed to research in topics: Memory management & Cache. The author has an hindex of 30, co-authored 119 publications receiving 3483 citations.
Papers published on a yearly basis
Papers
More filters
••
IBM1
TL;DR: Start-Gap is proposed, a simple, novel, and effective wear-leveling technique that uses only two registers that boosts the achievable lifetime of the baseline 16 GB PCM-based system from 5% to 97% of the theoretical maximum, while incurring a total storage overhead of less than 13 bytes and obviating the latency overhead of accessing large tables.
Abstract: Phase Change Memory (PCM) is an emerging memory technology that can increase main memory capacity in a cost-effective and power-efficient manner. However, PCM cells can endure only a maximum of 107 - 108 writes, making a PCM based system have a lifetime of only a few years under ideal conditions. Furthermore, we show that non-uniformity in writes to different cells reduces the achievable lifetime of PCM system by 20x. Writes to PCM cells can be made uniform with Wear-Leveling. Unfortunately, existing wear-leveling techniques require large storage tables and indirection, resulting in significant area and latency overheads. We propose Start-Gap, a simple, novel, and effective wear-leveling technique that uses only two registers. By combining Start-Gap with simple address-space randomization techniques we show that the achievable lifetime of the baseline 16GB PCM-based system is boosted from 5% (with no wear-leveling) to 97% of the theoretical maximum, while incurring a total storage overhead of less than 13 bytes and obviating the latency overhead of accessing large tables. We also analyze the security vulnerabilities for memory systems that have limited write endurance, showing that under adversarial settings, a PCM-based system can fail in less than one minute. We provide a simple extension to Start-Gap that makes PCM-based systems robust to such malicious attacks.
782 citations
••
28 Jun 2006TL;DR: A case for HPC with virtual machines is presented by introducing a framework which addresses the performance and management overhead associated with VM-based computing and shows that HPC applications can achieve almost the same performance as those running in a native, non-virtualized environment.
Abstract: Virtual machine (VM) technologies are experiencing a resurgence in both industry and research communities. VMs offer many desirable features such as security, ease of management, OS customization, performance isolation, check-pointing, and migration, which can be very beneficial to the performance and the manageability of high performance computing (HPC) applications. However, very few HPC applications are currently running in a virtualized environment due to the performance overhead of virtualization. Further, using VMs for HPC also introduces additional challenges such as management and distribution of OS images.In this paper we present a case for HPC with virtual machines by introducing a framework which addresses the performance and management overhead associated with VM-based computing. Two key ideas in our design are: Virtual Machine Monitor (VMM) bypass I/O and scalable VM image management. VMM-bypass I/O achieves high communication performance for VMs by exploiting the OS-bypass feature of modern high speed interconnects such as Infini-Band. Scalable VM image management significantly reduces the overhead of distributing and managing VMs in large scale clusters. Our current implementation is based on the Xen VM environment and InfiniBand. However, many of our ideas are readily applicable to other VM environments and high speed interconnects.We carry out detailed analysis on the performance and management overhead of our VM-based HPC framework. Our evaluation shows that HPC applications can achieve almost the same performance as those running in a native, non-virtualized environment. Therefore, our approach holds promise to bring the benefits of VMs to HPC applications with very little degradation in performance.
352 citations
•
30 May 2006TL;DR: VMM-bypass allows time-critical I/O operations to be carried out directly in guest VMs without involvement of the VMM and/or a privileged VM by exploiting the intelligence found in modern high speed network interfaces.
Abstract: Currently, I/O device virtualization models in virtual machine (VM) environments require involvement of a virtual machine monitor (VMM) and/or a privileged VM for each I/O operation, which may turn out to be a performance bottleneck for systems with high I/O demands, especially those equipped with modern high speed interconnects such as InfiniBand.
In this paper, we propose a new device virtualization model called VMM-bypass I/O, which extends the idea of OS-bypass originated from user-level communication. Essentially, VMM-bypass allows time-critical I/O operations to be carried out directly in guest VMs without involvement of the VMM and/or a privileged VM. By exploiting the intelligence found in modern high speed network interfaces, VMM-bypass can significantly improve I/O and communication performance for VMs without sacrificing safety or isolation.
To demonstrate the idea of VMM-bypass, we have developed a prototype called Xen-IB, which offers InfiniBand virtualization support in the Xen 3.0 VM environment. Xen-IB runs with current InfiniBand hardware and does not require modifications to existing user-level applications or kernel-level drivers that use InfiniBand. Our performance measurements show that Xen-IB is able to achieve nearly the same raw performance as the original InfiniBand driver running in a non-virtualized environment.
330 citations
•
IBM1
TL;DR: In this paper, an electronic tag is equipped with a sensor which determines dynamic properties of a product when the tag is activated, such as the temperature of the product or the expiration date.
Abstract: A method and apparatus for reporting dynamic properties of a product using radio frequency identification device technology. With this invention, an electronic tag is equipped with a sensor which determines dynamic properties of a product when the tag is activated. The dynamic properties of the product are then either further processed into other dynamic properties. In any event either the former or the latter dynamic properties are then transmitted from the tag. Such dynamic properties could be the temperature of a product or the expiration date of the product derived from periodic measurements of the temperature of the product.
132 citations
••
IBM1
TL;DR: Results show that the hardware compression of main memory has a negligible penalty compared to an uncompressed main memory, and for memory-starved applications it increases performance significantly, and the memory content of an application can usually be compressed by a factor of 2.
Abstract: A novel memory subsystem called Memory Expansion Technology (MXT) has been built for fast hardware compression of main-memory content. This allows a memory expansion to present a "real" memory larger than the physically available memory. This paper provides an overview of the memory-compression architecture, its OS support under Linux and Windows®, and an analysis of the performance impact of memory compression. Results show that the hardware compression of main memory has a negligible penalty compared to an uncompressed main memory, and for memory-starved applications it increases performance significantly. We also show that the memory content of an application can usually be compressed by a factor of 2.
106 citations
Cited by
More filters
•
01 Jan 2004
TL;DR: This book offers a detailed and comprehensive presentation of the basic principles of interconnection network design, clearly illustrating them with numerous examples, chapter exercises, and case studies, allowing a designer to see all the steps of the process from abstract design to concrete implementation.
Abstract: One of the greatest challenges faced by designers of digital systems is optimizing the communication and interconnection between system components. Interconnection networks offer an attractive and economical solution to this communication crisis and are fast becoming pervasive in digital systems. Current trends suggest that this communication bottleneck will be even more problematic when designing future generations of machines. Consequently, the anatomy of an interconnection network router and science of interconnection network design will only grow in importance in the coming years.
This book offers a detailed and comprehensive presentation of the basic principles of interconnection network design, clearly illustrating them with numerous examples, chapter exercises, and case studies. It incorporates hardware-level descriptions of concepts, allowing a designer to see all the steps of the process from abstract design to concrete implementation.
·Case studies throughout the book draw on extensive author experience in designing interconnection networks over a period of more than twenty years, providing real world examples of what works, and what doesn't.
·Tightly couples concepts with implementation costs to facilitate a deeper understanding of the tradeoffs in the design of a practical network.
·A set of examples and exercises in every chapter help the reader to fully understand all the implications of every design decision.
Table of Contents
Chapter 1 Introduction to Interconnection Networks
1.1 Three Questions About Interconnection Networks
1.2 Uses of Interconnection Networks
1.3 Network Basics
1.4 History
1.5 Organization of this Book
Chapter 2 A Simple Interconnection Network
2.1 Network Specifications and Constraints
2.2 Topology
2.3 Routing
2.4 Flow Control
2.5 Router Design
2.6 Performance Analysis
2.7 Exercises
Chapter 3 Topology Basics
3.1 Nomenclature
3.2 Traffic Patterns
3.3 Performance
3.4 Packaging Cost
3.5 Case Study: The SGI Origin 2000
3.6 Bibliographic Notes
3.7 Exercises
Chapter 4 Butterfly Networks
4.1 The Structure of Butterfly Networks
4.2 Isomorphic Butterflies
4.3 Performance and Packaging Cost
4.4 Path Diversity and Extra Stages
4.5 Case Study: The BBN Butterfly
4.6 Bibliographic Notes
4.7 Exercises
Chapter 5 Torus Networks
5.1 The Structure of Torus Networks
5.2 Performance
5.3 Building Mesh and Torus Networks
5.4 Express Cubes
5.5 Case Study: The MIT J-Machine
5.6 Bibliographic Notes
5.7 Exercises
Chapter 6 Non-Blocking Networks
6.1 Non-Blocking vs. Non-Interfering Networks
6.2 Crossbar Networks
6.3 Clos Networks
6.4 Benes Networks
6.5 Sorting Networks
6.6 Case Study: The Velio VC2002 (Zeus) Grooming Switch
6.7 Bibliographic Notes
6.8 Exercises
Chapter 7 Slicing and Dicing
7.1 Concentrators and Distributors
7.2 Slicing and Dicing
7.3 Slicing Multistage Networks
7.4 Case Study: Bit Slicing in the Tiny Tera
7.5 Bibliographic Notes
7.6 Exercises
Chapter 8 Routing Basics
8.1 A Routing Example
8.2 Taxonomy of Routing Algorithms
8.3 The Routing Relation
8.4 Deterministic Routing
8.5 Case Study: Dimension-Order Routing in the Cray T3D
8.6 Bibliographic Notes
8.7 Exercises
Chapter 9 Oblivious Routing
9.1 Valiant's Randomized Routing Algorithm
9.2 Minimal Oblivious Routing
9.3 Load-Balanced Oblivious Routing
9.4 Analysis of Oblivious Routing
9.5 Case Study: Oblivious Routing in the
Avici Terabit Switch Router(TSR)
9.6 Bibliographic Notes
9.7 Exercises
Chapter 10 Adaptive Routing
10.1 Adaptive Routing Basics
10.2 Minimal Adaptive Routing
10.3 Fully Adaptive Routing
10.4 Load-Balanced Adaptive Routing
10.5 Search-Based Routing
10.6 Case Study: Adaptive Routing in the
Thinking Machines CM-5
10.7 Bibliographic Notes
10.8 Exercises
Chapter 11 Routing Mechanics
11.1 Table-Based Routing
11.2 Algorithmic Routing
11.3 Case Study: Oblivious Source Routing in the
IBM Vulcan Network
11.4 Bibliographic Notes
11.5 Exercises
Chapter 12 Flow Control Basics
12.1 Resources and Allocation Units
12.2 Bufferless Flow Control
12.3 Circuit Switching
12.4 Bibliographic Notes
12.5 Exercises
Chapter 13 Buffered Flow Control
13.1 Packet-Buffer Flow Control
13.2 Flit-Buffer Flow Control
13.3 Buffer Management and Backpressure
13.4 Flit-Reservation Flow Control
13.5 Bibliographic Notes
13.6 Exercises
Chapter 14 Deadlock and Livelock
14.1 Deadlock
14.2 Deadlock Avoidance
14.3 Adaptive Routing
14.4 Deadlock Recovery
14.5 Livelock
14.6 Case Study: Deadlock Avoidance in the Cray T3E
14.7 Bibliographic Notes
14.8 Exercises
Chapter 15 Quality of Service
15.1 Service Classes and Service Contracts
15.2 Burstiness and Network Delays
15.3 Implementation of Guaranteed Services
15.4 Implementation of Best-Effort Services
15.5 Separation of Resources
15.6 Case Study: ATM Service Classes
15.7 Case Study: Virtual Networks in the Avici TSR
15.8 Bibliographic Notes
15.9 Exercises
Chapter 16 Router Architecture
16.1 Basic Router Architecture
16.2 Stalls
16.3 Closing the Loop with Credits
16.4 Reallocating a Channel
16.5 Speculation and Lookahead
16.6 Flit and Credit Encoding
16.7 Case Study: The Alpha 21364 Router
16.8 Bibliographic Notes
16.9 Exercises
Chapter 17 Router Datapath Components
17.1 Input Buffer Organization
17.2 Switches
17.3 Output Organization
17.4 Case Study: The Datapath of the IBM Colony
Router
17.5 Bibliographic Notes
17.6 Exercises
Chapter 18 Arbitration
18.1 Arbitration Timing
18.2 Fairness
18.3 Fixed Priority Arbiter
18.4 Variable Priority Iterative Arbiters
18.5 Matrix Arbiter
18.6 Queuing Arbiter
18.7 Exercises
Chapter 19 Allocation
19.1 Representations
19.2 Exact Algorithms
19.3 Separable Allocators
19.4 Wavefront Allocator
19.5 Incremental vs. Batch Allocation
19.6 Multistage Allocation
19.7 Performance of Allocators
19.8 Case Study: The Tiny Tera Allocator
19.9 Bibliographic Notes
19.10 Exercises
Chapter 20 Network Interfaces
20.1 Processor-Network Interface
20.2 Shared-Memory Interface
20.3 Line-Fabric Interface
20.4 Case Study: The MIT M-Machine Network Interface
20.5 Bibliographic Notes
20.6 Exercises
Chapter 21 Error Control 411
21.1 Know Thy Enemy: Failure Modes and Fault Models
21.2 The Error Control Process: Detection, Containment,
and Recovery
21.3 Link Level Error Control
21.4 Router Error Control
21.5 Network-Level Error Control
21.6 End-to-end Error Control
21.7 Bibliographic Notes
21.8 Exercises
Chapter 22 Buses
22.1 Bus Basics
22.2 Bus Arbitration
22.3 High Performance Bus Protocol
22.4 From Buses to Networks
22.5 Case Study: The PCI Bus
22.6 Bibliographic Notes
22.7 Exercises
Chapter 23 Performance Analysis
23.1 Measures of Interconnection Network Performance
23.2 Analysis
23.3 Validation
23.4 Case Study: Efficiency and Loss in the
BBN Monarch Network
23.5 Bibliographic Notes
23.6 Exercises
Chapter 24 Simulation
24.1 Levels of Detail
24.2 Network Workloads
24.3 Simulation Measurements
24.4 Simulator Design
24.5 Bibliographic Notes
24.6 Exercises
Chapter 25 Simulation Examples 495
25.1 Routing
25.2 Flow Control Performance
25.3 Fault Tolerance
Appendix A Nomenclature
Appendix B Glossary
Appendix C Network Simulator
3,233 citations
••
18 May 2009TL;DR: This work presents Eucalyptus -- an open-source software framework for cloud computing that implements what is commonly referred to as Infrastructure as a Service (IaaS); systems that give users the ability to run and control entire virtual machine instances deployed across a variety physical resources.
Abstract: Cloud computing systems fundamentally provide access to large pools of data and computational resources through a variety of interfaces similar in spirit to existing grid and HPC resource management and programming systems. These types of systems offer a new programming target for scalable application developers and have gained popularity over the past few years. However, most cloud computing systems in operation today are proprietary, rely upon infrastructure that is invisible to the research community, or are not explicitly designed to be instrumented and modified by systems researchers. In this work, we present Eucalyptus -- an open-source software framework for cloud computing that implements what is commonly referred to as Infrastructure as a Service (IaaS); systems that give users the ability to run and control entire virtual machine instances deployed across a variety physical resources. We outline the basic principles of the Eucalyptus design, detail important operational aspects of the system, and discuss architectural trade-offs that we have made in order to allow Eucalyptus to be portable, modular and simple to use on infrastructure commonly found within academic settings. Finally, we provide evidence that Eucalyptus enables users familiar with existing Grid and HPC systems to explore new cloud computing functionality while maintaining access to existing, familiar application development software and Grid middle-ware.
1,962 citations
•
20 Sep 2004
1,387 citations
••
18 Jun 2016TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.
Abstract: Processing-in-memory (PIM) is a promising solution to address the "memory wall" challenges for future computer systems. Prior proposed PIM architectures put additional computation logic in or near memory. The emerging metal-oxide resistive random access memory (ReRAM) has showed its potential to be used for main memory. Moreover, with its crossbar array structure, ReRAM can perform matrix-vector multiplication efficiently, and has been widely studied to accelerate neural network (NN) applications. In this work, we propose a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory. In PRIME, a portion of ReRAM crossbar arrays can be configured as accelerators for NN applications or as normal memory for a larger memory space. We provide microarchitecture and circuit designs to enable the morphable functions with an insignificant area overhead. We also design a software/hardware interface for software developers to implement various NNs on PRIME. Benefiting from both the PIM architecture and the efficiency of using ReRAM for NN computation, PRIME distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving. Our experimental results show that, compared with a state-of-the-art neural processing unit design, PRIME improves the performance by ~2360× and the energy consumption by ~895×, across the evaluated machine learning benchmarks.
1,197 citations
•
08 Mar 2002
TL;DR: An improved endoscopic device which is introduced into the intestinal tract of a living organism and which operates autonomously therein, adapted to obtain and store or transmit one or more types of data such as visual image data, laser autofluorescence data, or ultrasonic waveform data is described in this article.
Abstract: An improved endoscopic device which is introduced into the intestinal tract of a living organism and which operates autonomously therein, adapted to obtain and store or transmit one or more types of data such as visual image data, laser autofluorescence data, or ultrasonic waveform data. In another aspect of the invention, an improved endoscopic device useful for implanting the aforementioned endoscopic smart probe is disclosed. In another aspect of the invention, apparatus for delivering agents including nanostructures, radionuclides, medication, and ligands is disclosed. In another aspect of the invention, apparatus for obtaining a biopsy of intestinal tissue is disclosed. In another aspect of the invention, apparatus for detecting the presence of one or more molecular species within the intestine is disclosed. Methods for inspecting and/or treating the interior regions of the intestinal tract using the aforementioned apparatus are also disclosed.
953 citations