scispace - formally typeset
Search or ask a question
Patent

Binary translation using peephole translation rules

12 Feb 2008-
TL;DR: In this article, an efficient binary translator uses peephole translation rules to directly translate executable code from one instruction set to another, using superoptimization techniques that enable the translator to automatically learn translation rules for translating code from the source to target instruction set architecture.
Abstract: An efficient binary translator uses peephole translation rules to directly translate executable code from one instruction set to another. In a preferred embodiment, the translation rules are generated using superoptimization techniques that enable the translator to automatically learn translation rules for translating code from the source to target instruction set architecture.
Citations
More filters
Patent
26 Sep 2014
TL;DR: In this paper, the authors present techniques to control power and processing among a plurality of asymmetric cores by migrating processes or threads among a number of cores according to the performance and power needs of the system.
Abstract: Techniques to control power and processing among a plurality of asymmetric cores. In one embodiment, one or more asymmetric cores are power managed to migrate processes or threads among a plurality of cores according to the performance and power needs of the system.

148 citations

Patent
12 Dec 2008
TL;DR: In this paper, the authors present a system that provides precise exception semantics for a virtual machine using a gated store buffer to ensure that any stores that occurred after the previous safepoint are discarded when reverting the virtual machine to the previous one.
Abstract: One embodiment of the present invention provides a system that provides precise exception semantics for a virtual machine. During operation, the system receives a program comprised of instructions that are specified in a machine instruction set architecture of the virtual machine, and translates these instructions into native instructions for the processor that the virtual machine is executing upon. While performing this translation, the system inserts one or more safepoints into the translated native instructions. The system then executes these native instructions on the processor. During execution, if the system detects that an exception was signaled by a native instruction, the system reverts the virtual machine to a previous safepoint to ensure that the virtual machine will precisely emulate the exception behavior of the virtual machine's instruction set architecture. The system uses a gated store buffer to ensure that any stores that occurred after the previous safepoint are discarded when reverting the virtual machine to the previous safepoint.

47 citations

Patent
Guilherme Ottoni1, Hong Wang1, Wei Li1
14 Jun 2011
TL;DR: In this paper, the authors present a system and method for mapping registers from a system with more registers to a system having fewer registers, where each block in the region is bounded by a prologue and at least one epilogue.
Abstract: Generally, the present disclosure provides a system and method for mapping registers from a system with more registers to a system with fewer registers. Regions may be formed that include one or more blocks of code with relatively frequent register accesses. The most frequently accessed source registers may be mapped to target registers. Each block in the region may be bounded by a prologue and at least one epilogue. The prologue may be configured to implement register mapping and the epilogue(s) may be configured to manage program flow from a block in the region to another block in the region or to a block not in the region.

38 citations

Patent
Dan Arnon1, David Meiri1
19 Dec 2011
TL;DR: In this article, the authors describe techniques for performing dynamic binding of device identifiers to data storage devices, where the first device identifier is attached to a first data storage device of the data storage system used by the application at a first point in time.
Abstract: Described are techniques for performing dynamic binding of device identifiers to data storage devices. A first device identifier assigned to an application on a host is received. The first device identifier is a unique detachable device identifier dynamically bound to different data storage devices at different points in time in accordance with data storage devices used by the application. The first device identifier is attached to a first data storage device of the data storage system used by the application at a first point in time. The first device identifier is detached from the first data storage device thereby making the first data storage device unavailable for data operations from the application. The first device identifier is attached to a second data storage device used by the application at a second point in time thereby making the second data storage device available for data operations from the application.

22 citations

Patent
Tamir Eliezer1, Friedman Ben-Zion1
21 Jun 2018
TL;DR: In this paper, the authors present systems and methods for multi-architecture computing, where a computing device may include a processor system including at least one first processing core having a first instruction set architecture (ISA), and at least another second processing core with a second ISA different from the first ISA.
Abstract: Disclosed herein are systems and methods for multi-architecture computing. For example, in some embodiments, a computing device may include: a processor system including at least one first processing core having a first instruction set architecture (ISA), and at least one second processing core having a second ISA different from the first ISA; and a memory device coupled to the processor system, wherein the memory device has stored thereon a first binary representation of a program for the first ISA and a second binary representation of the program for the second ISA, and the memory device has stored thereon data for the program having an in-memory representation compatible with both the first ISA and the second ISA.

21 citations

References
More filters
Journal ArticleDOI
01 Oct 1987
TL;DR: The superoptimizer as mentioned in this paper is a probabilistic test that makes exhaustive searches practical for programs of useful size, where the search space is defined by the processor's instruction set, which may include the whole set but it is typically restricted to a subset.
Abstract: Given an instruction set, the superoptimizer finds the shortest program to compute a function. Startling programs have been generated, many of them engaging in convoluted bit-fiddling bearing little resemblance to the source programs which defined the functions. The key idea in the superoptimizer is a probabilistic test that makes exhaustive searches practical for programs of useful size. The search space is defined by the processor's instruction set, which may include the whole set, but it is typically restricted to a subset. By constraining the instructions and observing the effect on the output program, one can gain insight into the design of instruction sets. In addition, superoptimized programs may be used by peephole optimizers to improve the quality of generated code, or by assembly language programmers to improve manually written code.

273 citations

Proceedings ArticleDOI
17 May 2002
TL;DR: A code generator that uses an automatic theorem prover to produce very high-quality (in fact, nearly mathematically optimal) machine code for modern architectures is constructed.
Abstract: This paper provides a preliminary report on a new research project that aims to construct a code generator that uses an automatic theorem prover to produce very high-quality (in fact, nearly mathematically optimal) machine code for modern architectures. The code generator is not intended for use in an ordinary compiler, but is intended to be used for inner loops and critical subroutines in those cases where peak performance is required, no available compiler generates adequately efficient code, and where current engineering practice is to use hand-coded machine language. The paper describes the design of the superoptimizer, and presents some encouraging preliminary results.

174 citations

Proceedings ArticleDOI
20 Oct 2006
TL;DR: It is shown experimentally that the fully automatic construction of peephole optimizers using brute force superoptimization is able to exploit performance opportunities not found by existing compilers, and speedups from 1.7 to a factor of 10 on some compute intensive kernels over a conventional optimizing compiler are shown.
Abstract: Peephole optimizers are typically constructed using human-written pattern matching rules, an approach that requires expertise and time, as well as being less than systematic at exploiting all opportunities for optimization. We explore fully automatic construction of peephole optimizers using brute force superoptimization. While the optimizations discovered by our automatic system may be less general than human-written counterparts, our approach has the potential to automatically learn a database of thousands to millions of optimizations, in contrast to the hundreds found in current peephole optimizers. We show experimentally that our optimizer is able to exploit performance opportunities not found by existing compilers; in particular, we show speedups from 1.7 to a factor of 10 on some compute intensive kernels over a conventional optimizing compiler.

169 citations

Patent
29 Jan 1996
TL;DR: In this paper, a run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction and then feeds them to a binary translator operating in a background mode and which is responsive to the profile data generated by the runtime system to form a translated native image.
Abstract: A computer system for executing a binary image conversion system which converts instructions from a instruction set of a first, non native computer system to a second, different, native computer system, includes an run-time system which in response to a non-native image of an application program written for a non-native instruction set provides an native instruction or a native instruction routine. The run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction. Thereafter, the non-native instructions and the profile statistics are fed to a binary translator operating in a background mode and which is responsive to the profile data generated by the run-time system to form a translated native image. The run-time system and the binary translator are under the control of a server process. The non-native image is executed in two different enviroments with first portion executed as an interpreted image and remaining portions as a translated image. The run-time system includes an interpreter which is capable of handling condition codes corresponding to the non-native architecute. A technique is also provided to jacket calls between the two execution enviroments and to support object based services. Preferred techniques are also provide to determine interprocedural translation units. Further, intermixed translation/optimization techniques are discussed.

158 citations

Patent
29 Jan 1996
TL;DR: In this article, a run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction and then feeds them to a binary translator operating in a background mode.
Abstract: A computer system for executing a binary image conversion system which converts instructions from a instruction set of a first, non native computer system to a second, different native computer system, includes an run-time system which in response to a non-native image of an application program written for a non-native instruction set provides an native instruction or a native instruction routine. The run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction. Thereafter, the non-native instructions and the profile statistics are fed to a binary translator operating in a background mode and which is responsive to the profile data generated by the run-time system to form a translated native image. The run-time system and the binary translator are under the control of a server process. The non-native image is executed in two different environments with first portion executed as an interpreted image and remaining portions as a translated image. The run-time system includes an interpreter which is capable of handling condition codes corresponding to the non-native architecture. A technique is also provided to jacket calls between the two execution environments and to support object based services. Preferred techniques are also provided to determine interprocedural translation units. Further, intermixed translation/optimization techniques are discussed.

115 citations