Showing papers presented at "Virtual Execution Environments in 2005"

PDF

Open Access

Proceedings Article•DOI•

Diagnosing performance overheads in the xen virtual machine environment

[...]

Aravind Menon¹, Jose Renato Santos², Yoshio Turner², Gopalakrishnan Janakiraman², Willy Zwaenepoel¹ - Show less +1 more•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Hewlett-Packard²

11 Jun 2005

TL;DR: Xenoprof is presented, a system-wide statistical profiling toolkit implemented for the Xen virtual machine environment that will facilitate a better understanding of performance characteristics of Xen's mechanisms allowing the community to optimize the Xen implementation.

...read moreread less

Abstract: Virtual Machine (VM) environments (e.g., VMware and Xen) are experiencing a resurgence of interest for diverse uses including server consolidation and shared hosting. An application's performance in a virtual machine environment can differ markedly from its performance in a non-virtualized environment because of interactions with the underlying virtual machine monitor and other virtual machines. However, few tools are currently available to help debug performance problems in virtual machine environments.In this paper, we present Xenoprof, a system-wide statistical profiling toolkit implemented for the Xen virtual machine environment. The toolkit enables coordinated profiling of multiple VMs in a system to obtain the distribution of hardware events such as clock cycles and cache and TLB misses. The toolkit will facilitate a better understanding of performance characteristics of Xen's mechanisms allowing the community to optimize the Xen implementation.We use our toolkit to analyze performance overheads incurred by networking applications running in Xen VMs. We focus on networking applications since virtualizing network I/O devices is relatively expensive. Our experimental results quantify Xen's performance overheads for network I/O device virtualization in uni- and multi-processor systems. With certain Xen configurations, networking workloads in the Xen environment can suffer significant performance degradation. Our results identify the main sources of this overhead which should be the focus of Xen optimization efforts. We also show how our profiling toolkit was used to uncover and resolve performance bugs that we encountered in our experiments which caused unexpected application behavior.

...read moreread less

571 citations

Proceedings Article•DOI•

The pauseless GC algorithm

[...]

Cliff N. Click¹, Gil Tene¹, Michael A. Wolf¹•Institutions (1)

Azul Systems¹

11 Jun 2005

TL;DR: The Pauseless GC algorithm is presented, the supporting hardware features that enable it, and data on the overhead, efficiency, and pause times when running a sustained workload is presented.

...read moreread less

Abstract: Modern transactional response-time sensitive applications have run into practical limits on the size of garbage collected heaps. The heap can only grow until GC pauses exceed the response-time limits. Sustainable, scalable concurrent collection has become a feature worth paying for.Azul Systems has built a custom system (CPU, chip, board, and OS) specifically to run garbage collected virtual machines. The custom CPU includes a read barrier instruction. The read barrier enables a highly concurrent (no stop-the-world phases), parallel and compacting GC algorithm. The Pauseless algorithm is designed for uninterrupted application execution and consistent mutator throughput in every GC phase.Beyond the basic requirement of collecting faster than the allocation rate, the Pauseless collector is never in a "rush" to complete any GC phase. No phase places an undue burden on the mutators nor do phases race to complete before the mutators produce more work. Portions of the Pauseless algorithm also feature a "self-healing" behavior which limits mutator overhead and reduces mutator sensitivity to the current GC state.We present the Pauseless GC algorithm, the supporting hardware features that enable it, and data on the overhead, efficiency, and pause times when running a sustained workload.

...read moreread less

147 citations

Proceedings Article•DOI•

HyperSpector: virtual distributed monitoring environments for secure intrusion detection

[...]

Kenichi Kourai¹, Shigeru Chiba¹•Institutions (1)

Tokyo Institute of Technology¹

11 Jun 2005

TL;DR: A virtual distributed monitoring environment called HyperSpector is described that achieves secure intrusion detection in distributed computer systems by using virtualization to isolate each IDS from the servers it monitors.

...read moreread less

Abstract: In this paper, a virtual distributed monitoring environment called HyperSpector is described that achieves secure intrusion detection in distributed computer systems. While multiple intrusion detection systems (IDSes) can protect a distributed system from attackers, they can increase the number of insecure points in the protected system. HyperSpector overcomes this problem without any additional hardware by using virtualization to isolate each IDS from the servers it monitors. The IDSes are located in a virtual machine called an IDS VM and the servers are located in a server VM. The IDS VMs among different hosts are connected using a virtual network. To enable legacy IDSes running in the IDS VM to monitor the server VM, HyperSpector provides three inter-VM monitoring mechanisms: software port mirroring, inter-VM disk mounting, and inter-VM process mapping. Consequently, active attacks, which directly attack the IDSes, are prevented. The impact of passive attacks, which wait until data including malicious code is read by an IDS and the IDS becomes compromised, is confined to within an affected HyperSpector environment.

...read moreread less

107 citations

Proceedings Article•DOI•

PDS: a virtual execution environment for software deployment

[...]

Bowen Alpern¹, Joshua S. Auerbach¹, Vasanth Bala¹, Thomas V. Frauenhofer¹, Todd W. Mummert¹, Michael A. Pigott¹ - Show less +2 more•Institutions (1)

IBM¹

11 Jun 2005

TL;DR: The paper presents the design of P DS, motivates its "porous isolation model" with respect to the challenges of software deployment, and presents measurements of PDS's execution characteristics.

...read moreread less

Abstract: The Progressive Deployment System (PDS) is a virtual execution environment and infrastructure designed specifically for deploying software, or "assets", on demand while enabling management from a central location. PDS intercepts a select subset of system calls on the target machine to provide a partial virtualization at the operating system level. This enables an asset's install-time environment to be reproduced virtually while otherwise not isolating the asset from peer applications on the target machine. Asset components, or "shards", are fetched as they are needed (or they may be pre-fetched), enabling the asset to be progressively deployed by overlapping deployment with execution. Cryptographic digests are used to eliminate redundant shards within and among assets, which enables more efficient deployment. A framework is provided for intercepting interfaces above the operating system (e.g., Java class loading), enabling optimizations requiring semantic awareness not present at the OS level. The paper presents the design of PDS, motivates its "porous isolation model" with respect to the challenges of software deployment, and presents measurements of PDS's execution characteristics.

...read moreread less

90 citations

Proceedings Article•DOI•

Friendly virtual machines: leveraging a feedback-control model for application adaptation

[...]

Yuting Zhang¹, Azer Bestavros¹, Mina Guirguis¹, Ibrahim Matta¹, Richard West¹ - Show less +1 more•Institutions (1)

Boston University¹

11 Jun 2005

TL;DR: In this article, the authors define "Friendly" VM (FVM) as a virtual machine that adjusts its demand for system resources, so that they are both efficiently and fairly allocated to competing FVMs.

...read moreread less

Abstract: With the increased use of "Virtual Machines" (VMs) as vehicles that isolate applications running on the same host, it is necessary to devise techniques that enable multiple VMs to share underlying resources both fairly and efficiently. To that end, one common approach is to deploy complex resource management techniques in the hosting infrastructure. Alternately, in this paper, we advocate the use of self-adaptation in the VMs themselves based on feedback about resource usage and availability. Consequently, we define "Friendly" VM (FVM) to be a virtual machine that adjusts its demand for system resources, so that they are both efficiently and fairly allocated to competing FVMs. Such properties are ensured using one of many provably convergent control rules, such as Additive-Increase/Multiplicative-Decrease (AIMD). By adopting this distributed application-based approach to resource management, it is not necessary to make assumptions about the underlying resources nor about the requirements of FVMs competing for these resources. To demonstrate the elegance and simplicity of our approach, we present a prototype implementation of our FVM framework in User-Mode Linux (UML)---an implementation that consists of less than 500 lines of code changes to UML. We present an analytic, control-theoretic model of FVM adaptation, which establishes convergence and fairness properties. These properties are also backed up with experimental results using our prototype FVM implementation.

...read moreread less

90 citations

Proceedings Article•DOI•

Virtual machine showdown: stack versus registers

[...]

Yunhe Shi¹, David Gregg¹, Andrew Beatty¹, M. Anton Ertl²•Institutions (2)

Trinity College, Dublin¹, Vienna University of Technology²

11 Jun 2005

TL;DR: This work extends existing work on comparing virtual stack and virtual register architectures in two ways, and presents an implementation of a register machine in a fully standard-compliant implementation of the Java VM.

...read moreread less

Abstract: Virtual machines (VMs) are commonly used to distribute programs in an architecture-neutral format, which can easily be interpreted or compiled. A long-running question in the design of VMs is whether stack architecture or register architecture can be implemented more efficiently with an interpreter. We extend existing work on comparing virtual stack and virtual register architectures in two ways. Firstly, our translation from stack to register code is much more sophisticated. The result is that we eliminate an average of more than 47% of executed VM instructions, with the register machine bytecode size only 25% larger than that of the corresponding stack bytecode. Secondly we present an implementation of a register machine in a fully standard-compliant implementation of the Java VM. We find that, on the Pentium 4, the register architecture requires an average of 32.3% less time to execute standard benchmarks if dispatch is performed using a C switch statement. Even if more efficient threaded dispatch is available (which requires labels as first class values), the reduction in running time is still approximately 26.5% for the register architecture.

...read moreread less

83 citations

Proceedings Article•DOI•

Optimized interval splitting in a linear scan register allocator

[...]

Christian Wimmer¹, Hanspeter Mössenböck¹•Institutions (1)

Johannes Kepler University of Linz¹

11 Jun 2005

TL;DR: An optimized implementation of the linear scan register allocation algorithm for Sun Microsystems' Java HotSpot™ client compiler is presented, with the high impact of the Intel SSE2 extensions on the speed of numeric Java applications.

...read moreread less

Abstract: We present an optimized implementation of the linear scan register allocation algorithm for Sun Microsystems' Java HotSpot™ client compiler. Linear scan register allocation is especially suitable for just-in-time compilers because it is faster than the common graph-coloring approach and yields results of nearly the same quality.Our allocator improves the basic linear scan algorithm by adding more advanced optimizations: It makes use of lifetime holes, splits intervals if the register pressure is too high, and models register constraints of the target architecture with fixed intervals. Three additional optimizations move split positions out of loops, remove register-to-register moves and eliminate unnecessary spill stores. Interval splitting is based on use positions, which also capture the kind of use and whether an operand is needed in a register or not. This avoids the reservation of a scratch register.Benchmark results prove the efficiency of the linear scan algorithm: While the compilation speed is equal to the old local register allocator that is part of the Sun JDK 5.0, integer benchmarks execute about 15% faster. Floating-point benchmarks show the high impact of the Intel SSE2 extensions on the speed of numeric Java applications: With the new SSE2 support enabled, SPECjvm98 executes 25% faster compared with the current Sun JDK 5.0.

...read moreread less

72 citations

Proceedings Article•DOI•

The entropia virtual machine for desktop grids

[...]

Brad Calder¹, Andrew A. Chien¹, Ju Wang¹, Don Yang¹•Institutions (1)

University of California, San Diego¹

11 Jun 2005

TL;DR: The Entropia Virtual Machine is described, which aims to provide protection for the distributed application's program and its data, and the solutions it embodies for each of these challenges.

...read moreread less

Abstract: Desktop distributed computing allows companies to exploit the idle cycles on pervasive desktop PC systems to increase the available computing power by orders of magnitude (10x - 1000x). Applications are submitted, distributed, and run on a grid of desktop PCs. Since the applications may be malformed, or malicious, the key challenges for a desktop grid are how to 1) prevent the distributed computing application from unwarranted access or modification of data and files on the desktop PC, 2) control the distributed computing application's resource usage and behavior as it runs on the desktop PC, and 3) provide protection for the distributed application's program and its data. In this paper we describe the Entropia Virtual Machine, and the solutions it embodies for each of these challenges.

...read moreread less

71 citations

Proceedings Article•DOI•

Escape analysis in the context of dynamic compilation and deoptimization

[...]

Thomas Kotzmann¹, Hanspeter Mössenböck¹•Institutions (1)

Johannes Kepler University of Linz¹

11 Jun 2005

TL;DR: A new intraProcedural and interprocedural algorithm for escape analysis in the context of dynamic compilation where the compiler has to cope with dynamic class loading and deoptimization is presented.

...read moreread less

Abstract: In object-oriented programming languages, an object is said to escape the method or thread in which it was created if it can also be accessed by other methods or threads. Knowing which objects do not escape allows a compiler to perform aggressive optimizations.This paper presents a new intraprocedural and interprocedural algorithm for escape analysis in the context of dynamic compilation where the compiler has to cope with dynamic class loading and deoptimization. It was implemented for Sun Microsystems' Java HotSpot™ client compiler and operates on an intermediate representation in SSA form. We introduce equi-escape sets for the efficient propagation of escape information between related objects. The analysis is used for scalar replacement of fields and synchronization removal, as well as for stack allocation of objects and fixed-sized arrays. The results of the interprocedural analysis support the compiler in inlining decisions and allow actual parameters to be allocated on the caller stack.Under certain circumstances, the Java HotSpot™ VM is forced to stop executing a method's machine code and transfer control to the interpreter. This is called deoptimization. Since the interpreter does not know about the scalar replacement and synchronization removal performed by the compiler, the deoptimization framework was extended to reallocate and relock objects on demand.

...read moreread less

57 citations

Proceedings Article•DOI•

An execution layer for aspect-oriented programming languages

[...]

Michael Haupt¹, Mira Mezini¹, Christoph Bockisch¹, Tom Dinkelaker¹, Michael Eichberg¹, Michael Krebs¹ - Show less +2 more•Institutions (1)

Technische Universität Darmstadt¹

11 Jun 2005

TL;DR: Performance measurements show that an AOP-enabled virtual machine like Steamloom does not inflict unnecessary performance penalties on a running application; when it comes to executing AOP -related operations, there even are significant performance gains compared to other approaches.

...read moreread less

Abstract: Language mechanisms deserve language implementation effort. While this maxim has led to sophisticated support for language features specific to object-oriented, functional and logic programming languages, aspect-oriented programming languages are still mostly implemented using postprocessors. The Steamloom virtual machine, based on IBM's Jikes RVM, provides support for aspect-oriented programming at virtual machine level. A bytecode framework called BAT was integrated with the Jikes RVM to replace its bytecode management logic. While preserving the functionality needed by the VM, BAT also allows for querying application code for join point shadows, avoiding redundancy in bytecode representation. Performance measurements show that an AOP-enabled virtual machine like Steamloom does not inflict unnecessary performance penalties on a running application; when it comes to executing AOP-related operations, there even are significant performance gains compared to other approaches.

...read moreread less

50 citations

Proceedings Article•DOI•

Inlining java native calls at runtime

[...]

Levon Sassoon Stepanian¹, Angela Demke Brown¹, Allan H. Kielstra², Gita Koblents², Kevin A. Stoodley² - Show less +1 more•Institutions (2)

University of Toronto¹, IBM²

11 Jun 2005

TL;DR: This work leverages the ability to store statically-generated IL alongside native binaries, to facilitate native inlining at Java callsites at JIT compilation time and shows speedups of up to 93X when inlining and callback transformation are combined.

...read moreread less

Abstract: We introduce a strategy for inlining native functions into Java™ applications using a JIT compiler. We perform further optimizations to transform inlined callbacks into semantically equivalent lightweight operations. We show that this strategy can substantially reduce the overhead of performing JNI calls, while preserving the key safety and portability properties of the JNI. Our work leverages the ability to store statically-generated IL alongside native binaries, to facilitate native inlining at Java callsites at JIT compilation time. Preliminary results with our prototype implementation show speedups of up to 93X when inlining and callback transformation are combined.

...read moreread less

Proceedings Article•DOI•

A programmable microkernel for real-time systems

[...]

Christoph M. Kirsch¹, Marco A. A. Sanvido², Thomas A. Henzinger³•Institutions (3)

University of Salzburg¹, VMware², University of California, Berkeley³

11 Jun 2005

TL;DR: A new software system architecture for the implementation of hard real-time applications is presented whose reactivity and proactivity are fully programmable and whose separation of E from S code permits the independent programming, verification, optimization, composition, dynamic adaptation, and reuse of both reaction and scheduling mechanisms.

...read moreread less

Abstract: We present a new software system architecture for the implementation of hard real-time applications. The core of the system is a microkernel whose reactivity (interrupt handling as in synchronous reactive programs) and proactivity (task scheduling as in traditional RTOSs) are fully programmable. The microkernel, which we implemented on a StrongARM processor, consists of two interacting domain-specific virtual machines, a reactive E (Embedded) machine and a proactive S (Scheduling) machine. The microkernel code (or microcode) that runs on the microkernel is partitioned into E and S code. E code manages the interaction of the system with the physical environment: the execution of E code is triggered by environment interrupts, which signal external events such as the arrival of a message or sensor value, and it releases application tasks to the S machine. S code manages the interaction of the system with the processor: the execution of S code is triggered by hardware interrupts, which signal internal events such as the completion of a task or time slice, and it dispatches application tasks to the CPU, possibly preempting a running task. This partition of the system orthogonalizes the two main concerns of real-time implementations: E code refers to environment time and thus defines the reactivity of the system in a hardware- and scheduler-independent fashion; S code refers to CPU time and defines a system scheduler. If both time lines can be reconciled, then the code is called time safe; violations of time safety are handled again in a programmable way, by run-time exceptions. The separation of E from S code permits the independent programming, verification, optimization, composition, dynamic adaptation, and reuse of both reaction and scheduling mechanisms. Our measurements show that the system overhead is very acceptable even for large sets of task, generally in the 0.2--0.3% range.

...read moreread less

Proceedings Article•DOI•

An efficient and generic reversible debugger using the virtual machine based approach

[...]

Toshihiko Koju¹, Shingo Takada¹, Norihisa Doi²•Institutions (2)

Keio University¹, Chuo University²

11 Jun 2005

TL;DR: This paper proposes a novel reversible debugger that enables reverse execution of programs written in the C language and takes the virtual machine based approach.

...read moreread less

Abstract: The reverse execution of programs is a function where programs are executed backward in time. A reversible debugger is a debugger that provides such a functionality. In this paper, we propose a novel reversible debugger that enables reverse execution of programs written in the C language. Our approach takes the virtual machine based approach. In this approach, the target program is executed on a special virtual machine. Our contribution in this paper is two-fold. First, we propose an approach that can address problems of (1) compatibility and (2) efficiency that exist in previous works. By compatibility, we mean that previous debuggers are not generic, i.e., they support only a special language or special intermediate code. Second, our approach provides two execution modes: the native mode, where the debuggee is directly executed on a real CPU, and the virtual machine mode, where the debuggee is executed on a virtual machine. Currently, our debugger provides four types of trade-off settings (designated by unit and optimization) to consider trade-offs between granularity, accuracy, overhead and memory requirement. The user can choose the appropriate setting flexibly during debugging without finishing and restarting the debuggee.

...read moreread less

Proceedings Article•DOI•

Planning for code buffer management in distributed virtual execution environments

[...]

Shukang Zhou¹, Bruce R. Childers², Mary Lou Soffa¹•Institutions (2)

University of Virginia¹, University of Pittsburgh²

11 Jun 2005

TL;DR: This paper proposes to move code buffer management to the server, where sophisticated schemes can be employed and describes two schemes that use profiling information to direct the client in caching code partitions, with the adaptive scheme having the best performance overall.

...read moreread less

Abstract: Virtual execution environments have become increasingly useful in system implementation, with dynamic translation techniques being an important component for performance-critical systems. Many devices have exceptionally tight performance and memory constraints (e.g., smart cards and sensors in distributed systems), which require effective resource management. One approach to manage code memory is to download code partitions on-demand from a server and to cache the partitions in the resource-constrained device (client). However, due to the high cost of downloading code and re-translation, it is critical to intelligently manage the code buffer to minimize the overhead of code buffer misses. Yet, intelligent buffer management on the tightly constrained client can be too expensive. In this paper, we propose to move code buffer management to the server, where sophisticated schemes can be employed. We describe two schemes that use profiling information to direct the client in caching code partitions. One scheme is designed for workloads with stable run-time behavior, while the other scheme adapts its decisions for workloads with unstable behaviors. We evaluate and compare our schemes and show they perform well, compared to other approaches, with the adaptive scheme having the best performance overall.

...read moreread less

Proceedings Article•DOI•

Using page residency to balance tradeoffs in tracing garbage collection

[...]

Daniel Spoonhower¹, Guy E. Blelloch¹, Robert Harper¹•Institutions (1)

Carnegie Mellon University¹

11 Jun 2005

TL;DR: An extension of mostly copying collection that uses page residency to determine when to relocate objects, and which prefers copying collection when there is ample heap space but falls back on non-copying collection when space becomes limited.

...read moreread less

Abstract: We introduce an extension of mostly copying collection that uses page residency to determine when to relocate objects. Our collector promotes pages with high residency in place, avoiding unnecessary work and wasted space. It predicts the residency of each page, but when its predictions prove to be inaccurate, our collector reclaims unoccupied space by using it to satisfy allocation requests.Using residency allows our collector to dynamically balance the tradeoffs of copying and non-copying collection. Our technique requires less space than a pure copying collector and supports object pinning without otherwise sacrificing the ability to relocate objects.Unlike other hybrids, our collector does not depend on application-specific configuration and can quickly respond to changing application behavior. Our measurements show that our hybrid performs well under a variety of conditions; it prefers copying collection when there is ample heap space but falls back on non-copying collection when space becomes limited.

...read moreread less

Proceedings Article•DOI•

Exploiting frequent field values in java objects for reducing heap memory requirements

[...]

Guangyu Chen¹, Mahmut Kandemir¹, Mary Jane Irwin¹•Institutions (1)

Pennsylvania State University¹

11 Jun 2005

TL;DR: This paper proposes two object compression schemes that eliminate/reduce the space occupied by the frequent field values of heap-allocated objects so as to allow Java applications to execute without out-of-memory exceptions.

...read moreread less

Abstract: The capabilities of applications executing on embedded and mobile devices are strongly influenced by memory size limitations. In fact, memory limitations are one of the main reasons that applications run slowly or even crash in embedded/mobile devices. While improvements in technology enable the integration of more memory into embedded devices, the amount memory that can be included is also limited by cost, power consumption, and form factor considerations. Consequently, addressing memory limitations will continue to be of importance.Focusing on embedded Java environments, this paper shows how object compression can improve memory space utilization. The main idea is to make use of the observation that a small set of values tend to appear in some fields of the heap-allocated objects much more frequently than other values. Our analysis shows the existence of such frequent field values in the SpecJVM98 benchmark suite. We then propose two object compression schemes that eliminate/reduce the space occupied by the frequent field values. Our extensive experimental evaluation using a set of eight Java benchmarks shows that these schemes can reduce the minimum heap size allowing Java applications to execute without out-of-memory exceptions by up to 24% (14% on an average).

...read moreread less

Proceedings Article•DOI•

Supporting per-processor local-allocation buffers using lightweight user-level preemption notification

[...]

Alex Garthwaite¹, David Dice¹, Derek R. White¹•Institutions (1)

Sun Microsystems¹

11 Jun 2005

TL;DR: A novel mechanism for implementing critical sections for processor-specific transactions in dynamically generated code is developed that is efficient, allows preemption-notification at known points in a given critical section, and does not require explicit registration of the critical sections.

...read moreread less

Abstract: One challenge for runtime systems like the Java™ platform that depend on garbage collection is the ability to scale performance with the number of allocating threads. As the number of such threads grows, allocation of memory in the heap becomes a point of contention. To relieve this contention, many collectors allow threads to preallocate blocks of memory from the shared heap. These per-thread local-allocation buffers (LABs) allow threads to allocate most objects without any need for further synchronization. As the number of threads exceeds the number of processors, however, the cost of committing memory to local-allocation buffers becomes a challenge and sophisticated LAB-sizing policies must be employed.To reduce this complexity, we implement support for local-allocation buffers associated with processors instead of threads using multiprocess restartable critical sections (MP-RCSs). MP-RCSs allow threads to manipulate processor-local data safely. To support processor-specific transactions in dynamically generated code, we have developed a novel mechanism for implementing these critical sections that is efficient, allows preemption-notification at known points in a given critical section, and does not require explicit registration of the critical sections. Finally, we analyze the performance of per-processor LABs and show that, for highly threaded applications, this approach performs better than per-thread LABs, and allows for simpler LAB-sizing policies.

...read moreread less

Proceedings Article•DOI•

Module-aware translation for real-life desktop applications

[...]

Jianhui Li¹, Peng Zhang¹, Orna Etzion¹•Institutions (1)

Intel¹

11 Jun 2005

TL;DR: A translation reuse engine that uses a novel verification method and a module-aware memory management mechanism to address the overhead for translating real-life desktop applications and improves the performance of Adobe* Illustrator and Microsoft* Publisher.

...read moreread less

Abstract: A dynamic binary translator is a just-in-time compiler that translates source architecture binaries into target architecture binaries on the fly. It enables the fast running of the source architecture binaries on the target architecture. Traditional dynamic binary translators invalidate their translations when a module is unloaded, so later re-loading of the same module will lead to a full retranslation. Moreover, most of the loading and unloading are performed on a few "hot" modules, which causes the dynamic binary translator to spend a significant amount of time on repeatedly translating these "hot" modules. Furthermore, the retranslation may lead to excessive memory consumption if the code pages containing the translated codes that have been invalidated are not timely recycled. In addition, we observed that the overhead for translating real-life desktop applications is a big challenge to the overall performance of the applications, and our detailed analysis proved that real-life desktop applications dynamically load and unload modules much more frequently as compared to popular benchmarks, such as SPEC CPU2000. To address these issues, we propose a translation reuse engine that uses a novel verification method and a module-aware memory management mechanism. The proposed approach was fully implemented in IA-32 Execution Layer (IA-32 EL) [1], a commercial dynamic binary translator that enables the execution of IA-32 applications on Intel® Itanium® processor family. Collected results show that the module-aware translation improves the performance of Adobe* Illustrator by 14.09% and Microsoft* Publisher by 9.73%. The overhead brought by the translation reuse engine accounts for no more than 0.2% of execution time.

...read moreread less

Proceedings Article•DOI•

Instrumenting annotated programs

[...]

Marina Biberstein¹, Vugranam C. Sreedhar², Bilha Mendelson¹, Daniel Citron¹, Alberto Giammaria² - Show less +1 more•Institutions (2)

University of Haifa¹, IBM²

11 Jun 2005

TL;DR: It is shown how the annotations can expose enough information about themselves to prevent the instrumentation from accidentally corrupting the annotations, and an annotation taxonomy is proposed and demonstrated.

...read moreread less

Abstract: Instrumentation is commonly used to track application behavior: to collect program profiles; to monitor component health and performance; to aid in component testing; and more. Program annotation enables developers and tools to pass extra information to later stages of software development and execution. For example, the .NET runtime relies on annotations for a significant chunk of the services it provides. Both mechanisms are evolving into important parts of software development %, in the context of modern platforms such as Java and .NET.Instrumentation tools are generally not aware of the semantics of information passed via the annotation mechanism. This is especially true for post-compiler, e.g., run-time, instrumentation. The problem is that instrumentation may affect the correctness of annotations, rendering them invalid or misleading, and producing unforeseen side-effects during program execution. This problem has not been addressed so far.In this paper, we show the subtle interaction that takes place between annotations and instrumentation using several real-life examples. Many annotations are intended to provide information for the runtime; the virtual environment is a prominent annotation consumer, and must be aware of this conflict. It may also be required to provide runtime support to other annotation consumers. We propose an annotation taxonomy and show how instrumentation affects various annotations that were used in research and in industrial applications. We show how the annotations can expose enough information about themselves to prevent the instrumentation from accidentally corrupting the annotations. We demonstrate this approach on our annotations benchmark.

...read moreread less

Proceedings Article•DOI•

A unified view of virtualization

[...]

James E. Smith¹•Institutions (1)

University of Wisconsin-Madison¹

11 Jun 2005

TL;DR: In the future, virtualization will become an essential part of all computer systems by providing smart interconnection mechanisms for the three major system components - application software, system software, and hardware.

...read moreread less

Abstract: Virtualization technologies have been developed by a number of computer science and engineering disciplines, sometimes independently, often by different groups and at different times. Not surprisingly, these groups each view virtualization as a sub-discipline, so it is studied in a fragmented way. In the future, however, virtualization will become an essential part of all computer systems by providing smart interconnection mechanisms for the three major system components - application software, system software, and hardware. Consequently, the study of virtualization technologies will become a discipline in its own right and will stand on equal footing with the other major areas of computer systems design.

...read moreread less

Proceedings Article•

Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments

[...]

Michael Hind¹, Jan Vitek²•Institutions (2)

IBM¹, Purdue University²

11 Jun 2005

TL;DR: VEE is intended to be a unique forum that brings together practitioners and researchers working on interpreters, high-level language virtual machines, machine emulators, translators, and machine simulators to address the breadth of issues related to virtual execution environments.

...read moreread less

Abstract: It is our great pleasure to welcome you to the 1st International Conference on Virtual Execution Environments - VEE'05. Up to now, research results on virtual execution engines were scattered among a number of different venues in the language (VM, PLDI, OOPSLA, IVME, ICFP), operating system (SOSP, OSDI), and architecture (ASPLOS, CGO, PACT) communities. The organizers of the USENIX VM Symposium and the ACM SIGPLAN IVME Workshop felt the needs of the community would be better served by a single conference that could address the breadth of issues related to virtual execution environments. VEE is intended to be a unique forum that brings together practitioners and researchers working on interpreters, high-level language virtual machines, machine emulators, translators, and machine simulators. VEE'05 gives researchers and practitioners a unique opportunity to share their perspectives with others interested in the various aspects of virtual execution environments. This year's VEE is co-located with PLDI 2005 in Chicago, Illinois. Future instances are planned jointly with leading conferences in operating systems, programming languages, and architecture.The call for papers attracted 65 submissions from the USA (31), Canada (9), Austria (6), Switzerland (3), Japan (3), Ireland (2), Israel (2), Australia, Belgium, China, Finland, Germany, Hungary, Russia, Sweden, and the United Kingdom,. The submissions showed a healthy mix between academia (34), industry (22) and joint academia-industry projects (9). The program committee met at IBM Research in Hawthorne, NY on Friday, March 25, 2005. The committee accepted 19 excellent papers that cover a wide spectrum of topics related to virtual execution environments. On Saturday, March 26, members of the committee participated in an informal workshop that provided a forum for committee members to present their work and build collaborations. We hope this tradition will continue in future program committee meetings. In addition to the 19 accepted papers, the program includes keynote talks by James E. Smith and Martin Nally.

...read moreread less

Proceedings Article•DOI•

Application servers: virtualizing location, resources, memory, users and threads for business applications and web applications

[...]

Martin P. Nally¹•Institutions (1)

IBM¹

11 Jun 2005

TL;DR: This talk will describe some of the major features of modern application servers and show how concepts of virtualization are fundamental to their design and realization.

...read moreread less

Abstract: Application servers provide an environment for running business and web applications. By virtualizing threads, data and processing resources, memory and users, they provide the simplifying illusion for the programmer that the application is interacting with a single user, is running alone on the server, and is the sole user of resources, while allowing an efficient realization that scales with the number of users, and available hardware. They also provide a virtual environment where security enforcement and demarcation of transaction boundaries are automatic. This talk will describe some of the major features of modern application servers and show how concepts of virtualization are fundamental to their design and realization.

...read moreread less