Top 11 papers published by Michael D. Smith from Goddard Space Flight Center in 1995

Proceedings Article•DOI•

A comparative analysis of schemes for correlated branch prediction

[...]

Cliff Young¹, Nicolas Gloy¹, Michael D. Smith¹•Institutions (1)

01 May 1995

TL;DR: A framework is presented that categorizes branch prediction schemes by the way in which they partition dynamic branches and by the kind of predictor that they use, to show how a static correlated branch prediction scheme increases branch bias and thus improves overall branch prediction accuracy.

...read moreread less

Abstract: Modern high-performance architectures require extremely accurate branch prediction to overcome the performance limitations of conditional branches. We present a framework that categorizes branch prediction schemes by the way in which they partition dynamic branches and by the kind of predictor that they use. The framework allows us to compare and contrast branch prediction schemes, and to analyze why they work. We use the framework to show how a static correlated branch prediction scheme increases branch bias and thus improves overall branch prediction accuracy. We also use the framework to identify the fundamental differences between static and dynamic correlated branch prediction schemes. This study shows that there is room to improve the prediction accuracy of existing branch prediction schemes.

...read moreread less

145 citations

Book•

Limits on multiple instruction issue

[...]

Michael D. Smith, Michael K. Johnson, Mark Horowitz

01 Mar 1995

TL;DR: This paper investigates the limitations on designing a processor which can sustain an execution rate of greater than one instruction per cycle on highly-optimized, non-scientific applications and determines that these applications contain enough instruction independence to sustain an instruction rate of about two instructions per cycle.

...read moreread less

Abstract: This paper investigates the limitations on designing a processor which can sustain an execution rate of greater than one instruction per cycle on highly-optimized, non-scientific applications. We have used trace-driven simulations to determine that these applications contain enough instruction independence to sustain an instruction rate of about two instructions per cycle. In a straightforward implementation, cost considerations argue strongly against decoding more than two instructions in one cycle. Given this constraint, the efficiency in instruction fetching rather than the complexity of the execution hardware limits the concurrency attainable at the instruction level.

...read moreread less

91 citations

Journal Article•

t(9;22)(q22-31;q11-12) is a consistent marker of extraskeletal myxoid chondrosarcoma: evaluation of three cases.

[...]

Raf Sciot¹, P. Dal Cin, Christopher D.M. Fletcher, Ignace Samson, Michael D. Smith, R. De Vos, B. Van Damme, H. Van den Berghe - Show less +4 more•Institutions (1)

Katholieke Universiteit Leuven¹

01 Sep 1995-Modern Pathology

TL;DR: Three cases of extraskeletal myxoid chondrosarcoma with typical histologic and ultrastructural features were investigated cytogenetically and showed a reciprocal chromosome translocation characterized as t(9;22)(q22-31)(q11-12), thus confirming the findings in three previously karyotyped cases in the literature.

...read moreread less

91 citations

Patent•

Hardware extraction technique for programmable reduced instruction set computers

[...]

Rahul Razdan¹, Michael D. Smith¹•Institutions (1)

Harvard University¹

08 Nov 1995

TL;DR: Programmable Reduced Instruction Set Computers (PRISC) as discussed by the authors use RISC techniques as a basis for operation and provide hardware programmable resources which can be configured optimally for a given user application.

...read moreread less

Abstract: A new class of purpose computers called Programmable Reduced Instruction Set Computers (PRISC) use RISC techniques a basis for operation. In addition to the conventional RISC instructions, PRISC computers provide hardware programmable resources which can be configured optimally for a given user application. A given user application is compiled using a PRISC compiler which recognizes and evaluates complex instructions into a Boolean expression which is assigned an identifier and stored in conventional memory. The recognition of instructions which may be programmed in hardware is achieved through a combination of bit width analysis and instruction optimization. During execution of the user application on the PRISC computer, the stored expressions are loaded as needed into a programmable functional unit. Once loaded, the expressions are executed during a single instruction cycle.

...read moreread less

66 citations

Patent•

Dynamically programmable reduced instruction set computer with programmable processor loading on program number field and program number register contents

[...]

Rahul Razdan, Bill Grundmann, Michael D. Smith

08 Nov 1995

TL;DR: Programmable Reduced Instruction Set Computers (PRISC) as discussed by the authors use RISC techniques as a basis for operation and provide hardware programmable resources which can be configured optimally for a given user application.

...read moreread less

Abstract: A new class of general purpose computers called Programmable Reduced Instruction Set Computers (PRISC) use RISC techniques a basis for operation. In addition to the conventional RISC instructions, PRISC computers provide hardware programmable resources which can be configured optimally for a given user application. A given user application is compiled using a PRISC compiler which recognizes and evaluates complex instructions into a Boolean expression which is assigned an identifier and stored in conventional memory. The recognition of instructions which may be programmed in hardware is achieved through a combination of bit width analysis and instruction optimization. During execution of the user application on the PRISC computer, the stored expressions are loaded as needed into a programmable functional unit. Once loaded, the expressions are executed during a single instruction cycle.

...read moreread less

60 citations

Proceedings Article•DOI•

The measured performance of personal computer operating systems

[...]

J. B. Chen¹, Yasuhiro Endo¹, Kee Chan¹, David Mazières¹, Antonio Dias¹, Margo Seltzer¹, Michael D. Smith¹ - Show less +3 more•Institutions (1)

Harvard University¹

03 Dec 1995

TL;DR: Results show that accessing system functionality is often more expensive in Windows for Workgroups than in the other two systems due to frequent changes in machine mode and the use of system call hooks, and overall system functionality can be accessed most efficiently in NetBSD.

...read moreread less

Abstract: This paper presents a comparative study of the performance of three operating systems that run on the personal computer architecture derived from the IBM-PC. The operating systems, Windows for Workgroups, Windows NT, and NetBSD (a freely available variant of the UNIX operating system), cover a broad range of system functionality and user requirements, from a single address space model to full protection with preemptive multi-tasking. Our measurements were enabled by hardware counters in Intel's Pentium processor that permit measurement of a broad range of processor events including instruction counts and on-chip cache miss counts. We used both microbenchmarks, which expose specific differences between the systems, and application workloads, which provide an indication of expected end-to-end performance. Our microbenchmark results show that accessing system functionality is often more expensive in Windows for Workgroups than in the other two systems due to frequent changes in machine mode and the use of system call hooks. When running native applications, Windows NT is more efficient than Windows, but it incurs overhead similar to that of a microkernel since its application interface (the Win32 API) is implemented as a user-level server. Overall, system functionality can be accessed most efficiently in NetBSD ; we attribute this to its monolithic structure, and to the absence of the complications created by hardware backwards compatibility requirements in the other systems. Measurements of application performance show that although the impact of these differences is significant in terms of instruction counts and other hardware events (often a factor of 2 to 7 difference between the systems), overall performance is sometimes determined by the functionality provided by specific subsystems, such as the graphics subsystem or the file system buffer cache.

...read moreread less

56 citations

Patent•

Determining hardware complexity of software operations

[...]

Rahul Razdan, Michael D. Smith

08 Nov 1995

TL;DR: Programmable Reduced Instruction Set Computers (PRISC) as discussed by the authors use RISC techniques as a basis for operation and provide hardware programmable resources which can be configured optimally for a given user application.

...read moreread less

Abstract: A new class of general purpose computers called Programmable Reduced Instruction Set Computers (PRISC) use RISC techniques a basis for operation. In addition to the conventional RISC instructions, PRISC computers provide hardware programmable resources which can be configured optimally for a given user application. A given user application is compiled using a PRISC compiler which recognizes and evaluates complex instructions into a Boolean expression which is assigned an identifier and stored in conventional memory. The recognition of instructions which may be programmed in hardware is achieved through a combination of bit width analysis and instruction optimization. During execution of the user application on the PRISC computer, the stored expressions are loaded as needed into a programmable functional unit. Once loaded, the expressions are executed during a single instruction cycle.

...read moreread less

32 citations

Proceedings Article•DOI•

Performance issues in correlated branch prediction schemes

[...]

Nicolas Gloy¹, Michael D. Smith¹, Cliff Young¹•Institutions (1)

Harvard University¹

01 Dec 1995

TL;DR: In this paper, the authors evaluate the performance effect of static correlated branch prediction (SCBP) and profile-driven optimizations on instruction cache misses, branch mispredictions, and branch misfetches for a number of recent processor implementations.

...read moreread less

Abstract: Accurate static branch prediction is the key to many techniques for exposing, enhancing, and exploiting Instruction Level Parallelism (ILP). The initial work on static correlated branch prediction (SCBP) demonstrated improvements in branch prediction accuracy, but did not address overall performance. In particular SCBP expands the size of executable programs, which negatively affects the performance of the instruction memory hierarchy. Using the profile information available under SCBP we can minimize these negative performance effects through the application of code layout and branch alignment techniques. We evaluate the performance effect of SCBP and these profile-driven optimizations on instruction cache misses, branch mispredictions, and branch misfetches for a number of recent processor implementations. We find that SCBP improves performance over (traditional) per-branch static profile prediction. We also find that SCBP improves the performance benefits gained from branch alignment. As expected, SCBP gives larger benefits on machine organizations with high mispredict/misfetch penalties and low cache miss penalties. Finally, we find that the application of profile-driven code layout and branch alignment techniques (without SCBP) can improve the performance of the dynamic correlated branch prediction techniques.

...read moreread less

15 citations

Informing Loads: Enabling Software to Observe and React to Memory Behavior

[...]

Mark Horowitz, Margaret Martonosi, Todd C. Mowry, Michael D. Smith

01 Jul 1995

TL;DR: This work describes the design and functionality of an informing load instruction, a primitive that allows the software to observe cache misses and to act upon this information inexpensively within the current software context, and finds that the apparent benefit and hardware cost of this functionality are quite modest.

...read moreread less

Abstract: Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem successfully in specific situations. However, the generality of these software approaches has been limited because current architectures do not provide a fine-grained, low-overhead mechanism to observe memory behavior directly. To fill this need, we propose a new set of memory operations called informing memory operations, and in particular, we describe the design and functionality of an informing load instruction. This instruction serves as a primitive that allows the software to observe cache misses and to act upon this information inexpensively (i.e. under the miss, when the processor would typically be idle) within the current software context. Informing loads enable new solutions to several important software problems. We demonstrate this through examples that show their usefulness in (i) the collection of fine-grained memory profiles with high precision and low overhead and (ii) the automatic improvement of memory system performance through compiler techniques that take advantage of cache-miss information. Overall, we find that the apparent benefit of an informing load instruction is quite high, while the hardware cost of this functionality is quite modest. In fact, the bulk of the required hardware support is already present in today''s high-performance processors.

...read moreread less

14 citations

The Impact of Operating System Structure on Personal Computer Performance

[...]

J. Bradley Chen, Yashuhiro Endo, Kee Chan, David Mazières, Antonio Dias, Margo Seltzer, Michael D. Smith - Show less +3 more

01 Jan 1995

TL;DR: Overall, system functionality can be accessed most efficiently in NetBSD; this is attributed to its monolithic structure, and to the absence of the complications created by backwards compatibility in the other systems.

...read moreread less

Abstract: This paper presents a comparative study of the performance of three operating systems that run on the personal computer architecture derived from the IBM-PC. The operating systems, Windows for Workgroups (tm), Windows NT (tm), and NetBSD (a freely available UNIX (tm) variant) cover a broad range of system functionality and user requirements, from a single address space model to full protection with preemptive multi-tasking. Our measurements were enabled by hardware counters in Intel’s Pentium (tm) processor that permit measurement of a broad range of processor events including instruction counts and on-chip cache miss rates. We used both microbenchmarks, which expose specific differences between the systems, and application workloads, which provide an indication of expected end-to-end performance. Our microbenchmark results show that accessing system functionality is more expensive in Windows than in the other two systems due to frequent changes in machine mode and the use of system call hooks. When running native applications, Windows NT is more efficient than Windows, but it does incur overhead from its microkernel structure. Overall, system functionality can be accessed most efficiently in NetBSD; we attribute this to its monolithic structure, and to the absence of the complications created by backwards compatibility in the other systems. Measurements of application performance show that the impact of these differences is significant in terms of overall execution time.

...read moreread less

1 citations

Journal Article•DOI•

Bleeding from the gastrointestinal tract during prolonged ventilation

[...]

Michael D. Smith¹, J. Bihari¹, D. F. Zandstra, C. P. Stoutenbeek•Institutions (1)

Guy's Hospital¹

01 Jun 1995-Intensive Care Medicine

TL;DR: 3. Chong BH, Ismail F, Cade J, Gallus AS, Gordon S, Chesterman CN (1989) Heparin-induced thrombocytopenia: studies with a new molecular weight heparinoid, Org 10172.

...read moreread less

Abstract: 3. Chong BH, Ismail F, Cade J, Gallus AS, Gordon S, Chesterman CN (1989) Heparin-induced thrombocytopenia: studies with a new molecular weight heparinoid, Org 10172. Blood 73:1592-1596 4. Keeling DM, Richards EM, Baglin TP (1994) Platelet aggregation in response fo four low molecular weight heparins and the heparinoid ORG 10172 in patients with heparin-induced thrombocytopenia. Br J Haematol 86:425 426 5. Greinacher A, Michels I, Kiefel V, Mneller-Eckhardt C (1991) A rapid and sensitive test for diagnosing heparin-associated thrombocytopenia. Thromb Haemost 66:734-736

...read moreread less

Showing papers by "Michael D. Smith published in 1995"