scispace - formally typeset
Search or ask a question

Showing papers by "Michael D. Smith published in 1995"


Proceedings ArticleDOI
01 May 1995
TL;DR: A framework is presented that categorizes branch prediction schemes by the way in which they partition dynamic branches and by the kind of predictor that they use, to show how a static correlated branch prediction scheme increases branch bias and thus improves overall branch prediction accuracy.
Abstract: Modern high-performance architectures require extremely accurate branch prediction to overcome the performance limitations of conditional branches. We present a framework that categorizes branch prediction schemes by the way in which they partition dynamic branches and by the kind of predictor that they use. The framework allows us to compare and contrast branch prediction schemes, and to analyze why they work. We use the framework to show how a static correlated branch prediction scheme increases branch bias and thus improves overall branch prediction accuracy. We also use the framework to identify the fundamental differences between static and dynamic correlated branch prediction schemes. This study shows that there is room to improve the prediction accuracy of existing branch prediction schemes.

145 citations


Book
01 Mar 1995
TL;DR: This paper investigates the limitations on designing a processor which can sustain an execution rate of greater than one instruction per cycle on highly-optimized, non-scientific applications and determines that these applications contain enough instruction independence to sustain an instruction rate of about two instructions per cycle.
Abstract: This paper investigates the limitations on designing a processor which can sustain an execution rate of greater than one instruction per cycle on highly-optimized, non-scientific applications. We have used trace-driven simulations to determine that these applications contain enough instruction independence to sustain an instruction rate of about two instructions per cycle. In a straightforward implementation, cost considerations argue strongly against decoding more than two instructions in one cycle. Given this constraint, the efficiency in instruction fetching rather than the complexity of the execution hardware limits the concurrency attainable at the instruction level.

91 citations


Journal Article
TL;DR: Three cases of extraskeletal myxoid chondrosarcoma with typical histologic and ultrastructural features were investigated cytogenetically and showed a reciprocal chromosome translocation characterized as t(9;22)(q22-31)(q11-12), thus confirming the findings in three previously karyotyped cases in the literature.

91 citations


Patent
08 Nov 1995
TL;DR: Programmable Reduced Instruction Set Computers (PRISC) as discussed by the authors use RISC techniques as a basis for operation and provide hardware programmable resources which can be configured optimally for a given user application.
Abstract: A new class of purpose computers called Programmable Reduced Instruction Set Computers (PRISC) use RISC techniques a basis for operation. In addition to the conventional RISC instructions, PRISC computers provide hardware programmable resources which can be configured optimally for a given user application. A given user application is compiled using a PRISC compiler which recognizes and evaluates complex instructions into a Boolean expression which is assigned an identifier and stored in conventional memory. The recognition of instructions which may be programmed in hardware is achieved through a combination of bit width analysis and instruction optimization. During execution of the user application on the PRISC computer, the stored expressions are loaded as needed into a programmable functional unit. Once loaded, the expressions are executed during a single instruction cycle.

66 citations


Patent
08 Nov 1995
TL;DR: Programmable Reduced Instruction Set Computers (PRISC) as discussed by the authors use RISC techniques as a basis for operation and provide hardware programmable resources which can be configured optimally for a given user application.
Abstract: A new class of general purpose computers called Programmable Reduced Instruction Set Computers (PRISC) use RISC techniques a basis for operation. In addition to the conventional RISC instructions, PRISC computers provide hardware programmable resources which can be configured optimally for a given user application. A given user application is compiled using a PRISC compiler which recognizes and evaluates complex instructions into a Boolean expression which is assigned an identifier and stored in conventional memory. The recognition of instructions which may be programmed in hardware is achieved through a combination of bit width analysis and instruction optimization. During execution of the user application on the PRISC computer, the stored expressions are loaded as needed into a programmable functional unit. Once loaded, the expressions are executed during a single instruction cycle.

60 citations


Proceedings ArticleDOI
03 Dec 1995
TL;DR: Results show that accessing system functionality is often more expensive in Windows for Workgroups than in the other two systems due to frequent changes in machine mode and the use of system call hooks, and overall system functionality can be accessed most efficiently in NetBSD.
Abstract: This paper presents a comparative study of the performance of three operating systems that run on the personal computer architecture derived from the IBM-PC. The operating systems, Windows for Workgroups, Windows NT, and NetBSD (a freely available variant of the UNIX operating system), cover a broad range of system functionality and user requirements, from a single address space model to full protection with preemptive multi-tasking. Our measurements were enabled by hardware counters in Intel's Pentium processor that permit measurement of a broad range of processor events including instruction counts and on-chip cache miss counts. We used both microbenchmarks, which expose specific differences between the systems, and application workloads, which provide an indication of expected end-to-end performance. Our microbenchmark results show that accessing system functionality is often more expensive in Windows for Workgroups than in the other two systems due to frequent changes in machine mode and the use of system call hooks. When running native applications, Windows NT is more efficient than Windows, but it incurs overhead similar to that of a microkernel since its application interface (the Win32 API) is implemented as a user-level server. Overall, system functionality can be accessed most efficiently in NetBSD ; we attribute this to its monolithic structure, and to the absence of the complications created by hardware backwards compatibility requirements in the other systems. Measurements of application performance show that although the impact of these differences is significant in terms of instruction counts and other hardware events (often a factor of 2 to 7 difference between the systems), overall performance is sometimes determined by the functionality provided by specific subsystems, such as the graphics subsystem or the file system buffer cache.

56 citations


Patent
08 Nov 1995
TL;DR: Programmable Reduced Instruction Set Computers (PRISC) as discussed by the authors use RISC techniques as a basis for operation and provide hardware programmable resources which can be configured optimally for a given user application.
Abstract: A new class of general purpose computers called Programmable Reduced Instruction Set Computers (PRISC) use RISC techniques a basis for operation. In addition to the conventional RISC instructions, PRISC computers provide hardware programmable resources which can be configured optimally for a given user application. A given user application is compiled using a PRISC compiler which recognizes and evaluates complex instructions into a Boolean expression which is assigned an identifier and stored in conventional memory. The recognition of instructions which may be programmed in hardware is achieved through a combination of bit width analysis and instruction optimization. During execution of the user application on the PRISC computer, the stored expressions are loaded as needed into a programmable functional unit. Once loaded, the expressions are executed during a single instruction cycle.

32 citations


Proceedings ArticleDOI
01 Dec 1995
TL;DR: In this paper, the authors evaluate the performance effect of static correlated branch prediction (SCBP) and profile-driven optimizations on instruction cache misses, branch mispredictions, and branch misfetches for a number of recent processor implementations.
Abstract: Accurate static branch prediction is the key to many techniques for exposing, enhancing, and exploiting Instruction Level Parallelism (ILP). The initial work on static correlated branch prediction (SCBP) demonstrated improvements in branch prediction accuracy, but did not address overall performance. In particular SCBP expands the size of executable programs, which negatively affects the performance of the instruction memory hierarchy. Using the profile information available under SCBP we can minimize these negative performance effects through the application of code layout and branch alignment techniques. We evaluate the performance effect of SCBP and these profile-driven optimizations on instruction cache misses, branch mispredictions, and branch misfetches for a number of recent processor implementations. We find that SCBP improves performance over (traditional) per-branch static profile prediction. We also find that SCBP improves the performance benefits gained from branch alignment. As expected, SCBP gives larger benefits on machine organizations with high mispredict/misfetch penalties and low cache miss penalties. Finally, we find that the application of profile-driven code layout and branch alignment techniques (without SCBP) can improve the performance of the dynamic correlated branch prediction techniques.

15 citations


01 Jul 1995
TL;DR: This work describes the design and functionality of an informing load instruction, a primitive that allows the software to observe cache misses and to act upon this information inexpensively within the current software context, and finds that the apparent benefit and hardware cost of this functionality are quite modest.
Abstract: Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem successfully in specific situations. However, the generality of these software approaches has been limited because current architectures do not provide a fine-grained, low-overhead mechanism to observe memory behavior directly. To fill this need, we propose a new set of memory operations called informing memory operations, and in particular, we describe the design and functionality of an informing load instruction. This instruction serves as a primitive that allows the software to observe cache misses and to act upon this information inexpensively (i.e. under the miss, when the processor would typically be idle) within the current software context. Informing loads enable new solutions to several important software problems. We demonstrate this through examples that show their usefulness in (i) the collection of fine-grained memory profiles with high precision and low overhead and (ii) the automatic improvement of memory system performance through compiler techniques that take advantage of cache-miss information. Overall, we find that the apparent benefit of an informing load instruction is quite high, while the hardware cost of this functionality is quite modest. In fact, the bulk of the required hardware support is already present in today''s high-performance processors.

14 citations


01 Jan 1995
TL;DR: Overall, system functionality can be accessed most efficiently in NetBSD; this is attributed to its monolithic structure, and to the absence of the complications created by backwards compatibility in the other systems.
Abstract: This paper presents a comparative study of the performance of three operating systems that run on the personal computer architecture derived from the IBM-PC. The operating systems, Windows for Workgroups (tm), Windows NT (tm), and NetBSD (a freely available UNIX (tm) variant) cover a broad range of system functionality and user requirements, from a single address space model to full protection with preemptive multi-tasking. Our measurements were enabled by hardware counters in Intel’s Pentium (tm) processor that permit measurement of a broad range of processor events including instruction counts and on-chip cache miss rates. We used both microbenchmarks, which expose specific differences between the systems, and application workloads, which provide an indication of expected end-to-end performance. Our microbenchmark results show that accessing system functionality is more expensive in Windows than in the other two systems due to frequent changes in machine mode and the use of system call hooks. When running native applications, Windows NT is more efficient than Windows, but it does incur overhead from its microkernel structure. Overall, system functionality can be accessed most efficiently in NetBSD; we attribute this to its monolithic structure, and to the absence of the complications created by backwards compatibility in the other systems. Measurements of application performance show that the impact of these differences is significant in terms of overall execution time.

1 citations


Journal ArticleDOI
TL;DR: 3. Chong BH, Ismail F, Cade J, Gallus AS, Gordon S, Chesterman CN (1989) Heparin-induced thrombocytopenia: studies with a new molecular weight heparinoid, Org 10172.
Abstract: 3. Chong BH, Ismail F, Cade J, Gallus AS, Gordon S, Chesterman CN (1989) Heparin-induced thrombocytopenia: studies with a new molecular weight heparinoid, Org 10172. Blood 73:1592-1596 4. Keeling DM, Richards EM, Baglin TP (1994) Platelet aggregation in response fo four low molecular weight heparins and the heparinoid ORG 10172 in patients with heparin-induced thrombocytopenia. Br J Haematol 86:425 426 5. Greinacher A, Michels I, Kiefel V, Mneller-Eckhardt C (1991) A rapid and sensitive test for diagnosing heparin-associated thrombocytopenia. Thromb Haemost 66:734-736