Showing papers on "PowerPC published in 2002"

PDF

Open Access

Proceedings Article•DOI•

Critical power slope: understanding the runtime effects of frequency scaling

[...]

A. Miyoshi¹, Charles R. Lefurgy, Eric Van Hensbergen, Ramakrishnan Rajamony, Ragunathan Rajkumar¹ - Show less +1 more•Institutions (1)

Carnegie Mellon University¹

22 Jun 2002

TL;DR: The critical power slope concept is introduced to explain and capture the power-performance characteristics of systems with power management features, and it is shown that in some cases, it may be energy efficient not to reduce voltage below a certain point.

...read moreread less

Abstract: Energy efficiency is becoming an increasingly important feature for both mobile and high-performance server systems. Most processors designed today include power management features that provide processor operating points which can be used in power management algorithms. However, existing power management algorithms implicitly assume that lower performance points are more energy efficient than higher performance points. Our empirical observations indicate that for many systems, this assumption is not valid.We introduce a new concept called critical power slope to explain and capture the power-performance characteristics of systems with power management features. We evaluate three systems - a clock throttled Pentium laptop, a frequency scaled PowerPC platform, and a voltage scaled system to demonstrate the benefits of our approach. Our evaluation is based on empirical measurements of the first two systems, and publicly available data for the third. Using critical power slope, we explain why on the Pentium-based system, it is energy efficient to run only at the highest frequency, while on the PowerPC-based system, it is energy efficient to run at the lowest frequency point. We confirm our results by measuring the behavior of a web serving benchmark. Furthermore, we extend the critical power slope concept to understand the benefits of voltage scaling when combined with frequency scaling. We show that in some cases, it may be energy efficient not to reduce voltage below a certain point.

...read moreread less

273 citations

Journal Article•DOI•

A 32-bit PowerPC system-on-a-chip with support for dynamic voltage scaling and dynamic frequency scaling

[...]

Kevin J. Nowka¹, Gary D. Carpenter¹, Eric MacDonald¹, H.C. Ngo¹, Bishop Brock¹, Koji Ishii¹, T.Y. Nguyen¹, Jeffrey L. Burns¹ - Show less +4 more•Institutions (1)

IBM¹

10 Dec 2002-IEEE Journal of Solid-state Circuits

TL;DR: In this paper, a PowerPC system-on-a-chip processor which makes use of dynamic voltage scaling and on-the-fly frequency scaling to adapt to the dynamically changing performance demands and power consumption constraints of high-content, battery powered applications is described.

...read moreread less

Abstract: A PowerPC system-on-a-chip processor which makes use of dynamic voltage scaling and on-the-fly frequency scaling to adapt to the dynamically changing performance demands and power consumption constraints of high-content, battery powered applications is described. The PowerPC core and caches achieve frequencies as high as 380 MHz at a supply of 1.8 V and active power consumption as low as 53 mW at a supply of 1.0 V. The system executes up to 500 MIPS and can achieve standby power as low as 54 /spl mu/W. Logic supply changes as fast as 10 mV//spl mu/s are supported. A low-voltage PLL supplied by an on-chip regulator, which isolates the clock generator from the variable logic supply, allows the SOC to operate continuously while the logic supply voltage is modified. Hardware accelerators for speech recognition, instruction-stream decompression and cryptography are included in the SOC. The SOC occupies 36 mm/sup 2/ in a 0.18 /spl mu/m, 1.8 V nominal supply, bulk CMOS process.

...read moreread less

258 citations

Proceedings Article•DOI•

Novel techniques for achieving high at-speed transition fault test coverage for Motorola's microprocessors based on PowerPC/spl trade/ instruction set architecture

[...]

N. Tendolkar¹, R. Raina¹, R. Woltenberg¹, Xijiang Lin, B. Swanson, G. Aldrich - Show less +2 more•Institutions (1)

Motorola¹

28 Apr 2002

TL;DR: Using the enhanced ATPG tool, this work generated 15,000 transition fault test patterns and achieved 76% test coverage for the MPC7400 microprocessor based on the PowerPC/spl trade/ instruction set architecture that has 10.5 million transistors and runs at 540 MHz.

...read moreread less

Abstract: Scan based at-speed transition fault testing of Motorola's microprocessors based on the PowerPC/spl trade/ instruction set architecture requires broad-side transition fault test patterns that have a specific launch and capture clocking sequence. We describe the concepts we developed and incorporated in the ATPG tool to support efficient generation of such test patterns to achieve high transition fault test coverage and for analysis of undetected transition faults. Using the enhanced ATPG tool, we generated 15,000 transition fault test patterns and achieved 76% test coverage for the MPC7400 microprocessor based on the PowerPC/spl trade/ instruction set architecture that has 10.5 million transistors and runs at 540 MHz.

...read moreread less

82 citations

Proceedings Article•DOI•

Experimental evaluation of a COTS system for space applications

[...]

Henrique Madeira¹, R.R. Some, Francisco Moreira, Duro da Costa, D. Rennels - Show less +1 more•Institutions (1)

University of Coimbra¹

23 Jun 2002

TL;DR: The results provide a comprehensive picture of the impact of faults on LynxOS key features (process scheduling and the most frequent system calls), data integrity, error propagation, application termination, and correctness of application results.

...read moreread less

Abstract: This paper evaluates the impact of transient errors in the operating system of a COTS-based system (CETIA board with two PowerPC 750 processors running LynxOS) and quantifies their effects at both the OS and at the application level. The study has been conducted using a Software-Implemented Fault Injection tool (Xception) and both realistic programs and synthetic workloads (to focus on specific OS features) have been used. The results provide a comprehensive picture of the impact of faults on LynxOS key features (process scheduling and the most frequent system calls), data integrity, error propagation, application termination, and correctness of application results.

...read moreread less

74 citations

Proceedings Article•DOI•

[...]

M. Probst, Andreas Krall, Bernhard Scholz

29 Oct 2002

TL;DR: This work presents a dynamic liveness analysis algorithm that trades precision for fast execution and conducted experiments with the SpecInt95 benchmark suite using the authors' PowerPC to Alpha translator, which resulted in a speed-up of 10 to 30 percent depending on the target machine.

...read moreread less

Abstract: Dynamic binary translators compile machine code from a source architecture to a target architecture at run time. Due to the hard time constraints of just-in-time compilation only highly efficient optimization algorithms can be employed. Common problems are an insufficient number of registers on the target architecture and the different handling of condition codes in source and target architecture. Without optimizations useless stores and computations are generated by the dynamic binary translator and cause significant performance losses. In order to eliminate these useless operations, a very fast liveness analysis is required. We present a dynamic liveness analysis algorithm that trades precision for fast execution and conducted experiments with the SpecInt95 benchmark suite using our PowerPC to Alpha translator. The optimizations reduced the number of stores by about 50 percent. This resulted in a speed-up of 10 to 30 percent depending on the target machine. The dynamic liveness analysis results are very close to the most precise solution.

...read moreread less

35 citations

Proceedings Article•DOI•

Single-chip gigabit mixed-version IP router on Virtex-II Pro

[...]

G. Brebner¹•Institutions (1)

University of Edinburgh¹

22 Sep 2002

TL;DR: Novel single-chip system architecture options, based on the Xilinx Virtex-II Pro part, which includes up to four PowerPC cores and was launched in Spring 2002 are considered, to carry out more frequent, and less control intensive, functions in logic, and other functions in the processor.

...read moreread less

Abstract: This paper concerns novel single-chip system architecture options, based on the Xilinx Virtex-II Pro part, which includes up to four PowerPC cores and was launched in Spring 2002. The research described here was carried out pre-launch (i.e., prior to availability of real parts), so the paper focuses on initial architectural experiments based on simulation. The application is a Mixed-version IP Router, named MIR, servicing gigabit ethernet ports. This would be of use to organizations with several gigabit ethernets, with a mixture of IPv4 and IPv6 hosts and routers attached directly to the networks. A particular benefit of a programmable approach based on Virtex-II Pro is that the router's functions can evolve smoothly, maintaining router performance as the organization migrates from IPv4 to IPv6 internally, and also as the Internet migrates externally. The basic aim is to carry out more frequent, and less control intensive, functions in logic, and other functions in the processor. Two prototypes are described here. Both support four ethernet ports, but the designs are scalable upwards. The second one, the more ambitious of the two, instantiates a configuration appropriate when the bulk of the incoming packets are IPv4. Such packets are processed and switched entirely by logic, with no internal copying of packets between buffers and virtually no delay between packet receipt and onward forwarding. This involves a specially-tailored internal interconnection network between the four ports, and also processing performed in parallel with packet receipt, i.e. multi-threading in logic. IPv6 packets, or some rare IPv4 cases, are passed to a PowerPC core for processing. In essence, the PowerPC acts as a slave to the logic, rather than the more common opposite master-slave relationship.

...read moreread less

31 citations

Proceedings Article•DOI•

A 0.9 V to 1.95 V dynamic voltage-scalable and frequency-scalable 32 b PowerPC processor

[...]

Kevin J. Nowka¹, Gary D. Carpenter¹, E. Mac Donald¹, Hung Ngo¹, Bishop Brock¹, Koji Ishii¹, Tuyet Nguyen¹, Jeffrey L. Burns¹ - Show less +4 more•Institutions (1)

IBM¹

07 Aug 2002

TL;DR: In this article, a 32 b PowerPC/spl trade/ system-on-a-chip supporting dynamic voltage supply and dynamic frequency scaling operates from 366 MHz at 1.8 V and 600 mW down to 150 MHz at 2.0 V and 53 mW in a 0.18 /spl mu/m CMOS process.

...read moreread less

Abstract: A 32 b PowerPC/spl trade/ system-on-a-chip supporting dynamic voltage supply and dynamic frequency scaling operates from 366 MHz at 1.8 V and 600 mW down to 150 MHz at 1.0 V and 53 mW in a 0.18 /spl mu/m CMOS process. Maximum supply change without PLL relock is 10 mV//spl mu/s. Processor state save/restore enables a deep-sleep state.

...read moreread less

29 citations

Proceedings Article•DOI•

A radix-2 FFT algorithm for Modern Single Instruction Multiple Data (SIMD) architectures

[...]

V. Paul Rodriguez¹•Institutions (1)

University of New Mexico¹

13 May 2002

TL;DR: The general radix-2 FFT algorithm (SIMD-FFT) is developed for the modern SIMD architectures and is found to be faster than the other two implementations for complex 1D input data and for complex 2D inputData as well.

...read moreread less

Abstract: Modern Single Instruction Multiple Data (SIMD) microprocessor architectures allow parallel floating point operations over four contiguous elements in memory. The radix-2 FFT algorithm is well suited for modern SIMD architectures after the second stage (decimation-in-time case). In this paper, a general radix-2 FFT algorithm is developed for the modern SIMD architectures. This algorithm (SIMD-FFT) is implemented on the Intel Pentium and Motorola PowerPC architecture for 1D and 2D. The results are compared against Intel's implementation of the split-radix FFT for the SIMD architecture [2] and the FFTW [3]. Overall, the SIMDFFT was found to be faster than the other two implementations for complex 1D input data (ranging from 95.9% up to 372%), and for complex 2D input data (ranging from 68.8% up to 343%) as well.

...read moreread less

28 citations

Proceedings Article•DOI•

Instruction-level reverse execution for debugging

[...]

Tankut Akgul¹, Vincent J. Mooney¹•Institutions (1)

Georgia Institute of Technology¹

18 Nov 2002

TL;DR: A reverse execution methodology at the assembly instruction-level with low memory and time overheads to generate a reverse program able to undo, in almost all cases, normal forward execution of an assembly instruction in the program being debugged.

...read moreread less

Abstract: The ability to execute a program in reverse is advantageous for shortening debug time. This paper presents a reverse execution methodology at the assembly instruction-level with low memory and time overheads. The core idea of this approach is to generate a reverse program able to undo, in almost all cases, normal forward execution of an assembly instruction in the program being debugged. The methodology has been implemented on a PowerPC processor in a custom made debugger. Compared to previous work -- all of which use a variety of state saving techniques -- the experimental results show 2.5X to 400X memory overhead reduction for the tested benchmarks. Furthermore, the results with the same benchmarks show an average of 4.1X to 5.7X time overhead reduction.

...read moreread less

23 citations

Proceedings Article•DOI•

Test methodology for Motorola's high performance e500 core based on PowerPC instruction set architecture

[...]

B. Bailey¹, A. Metayer¹, B. Svrcek¹, N. Tendolkar¹, E. Wolf¹, E. Fiene¹, M. Alexander¹, R. Woltenberg¹, R. Raina¹ - Show less +5 more•Institutions (1)

Motorola¹

07 Oct 2002

TL;DR: This paper presents the DFT techniques used in Motorola's high performance e500 core, which implements the PowerPC "Book E" architecture and is designed to run at 600 MHz to 1 GHz.

...read moreread less

Abstract: This paper presents the DFT techniques used in Motorola's high performance e500 core, which implements the PowerPC "Book E" architecture and is designed to run at 600 MHz to 1 GHz. Highlights of the DFT features are at-speed logic built-in self-test (LBIST) for delay fault detection, very high test coverage for scan based at-speed deterministic delay-fault test patterns, 100% BIST for embedded memory arrays and 99.2 % stuck-at fault test coverage for deterministic scan test patterns. A salient design feature is the isolation ring that facilitates testing of the core when it is integrated in an SoC or host processor.

...read moreread less

22 citations

Journal Article•DOI•

Verification of Out-Of-Order Processor Designs Using Model Checking and a Light-Weight Completion Function

[...]

Sergey Berezin¹, Edmund M. Clarke¹, Armin Biere², Yunshan Zhu³•Institutions (3)

Carnegie Mellon University¹, ETH Zurich², Synopsys³

01 Mar 2002

TL;DR: A new technique for verification of complex hardware devices that allows both generality and a high degree of automation is presented, based on a new way of constructing a “light-weight” completion function together with new encoding of uninterpreted functions called reference file representation.

...read moreread less

Abstract: We present a new technique for verification of complex hardware devices that allows both generality and a high degree of automation. The technique is based on our new way of constructing a “light-weight” completion function together with new encoding of uninterpreted functions called reference file representation. Our technique combines our completion function method and reference file representation with compositional model checking and theorem proving. This extends the state of the art in two directions. First, we obtain a more general verification methodology. Second, it is easier to use, since it has a higher degree of automation. As a benchmark, we take Tomasulo's algorithm for scheduling out-of-order instruction execution used in many modern superscalar processors like the Pentium-II and the PowerPC 604. The algorithm is parameterized by the processor configuration, and our approach allows us to prove its correctness in general, independent of any actual design.

...read moreread less

Proceedings Article•DOI•

Enhancing student learning in an introductory embedded systems laboratory

[...]

Aaron Striegel¹, Diane T. Rover¹•Institutions (1)

Iowa State University¹

06 Nov 2002

TL;DR: The authors describe steps taken at Iowa State University to upgrade a sophomore level laboratory in embedded systems, and examine the similarities and differences between the laboratory platforms and their impact on student learning.

...read moreread less

Abstract: As technology advances, curriculum and laboratories are challenged to keep pace. This is especially true in computer engineering, where the range of technologies is constantly broadening and diversifying, as computer-based systems take on many forms and functions in everyday life. The question is, how should a contemporary curriculum train computer-engineering students for the vast scope of embedded system solutions? In this paper, the authors specifically consider where to begin, and ask, can powerful tools empower students to learn? They describe steps taken at Iowa State University to upgrade a sophomore level laboratory in embedded systems. They migrated from the popular 8-bit Motorola 68HC11 microcontroller as the core for a hardware-software development platform to the emerging 32-bit PowerPC 555 microcontroller. The 68HC11 is used in labs at numerous universities across the country and is supported with textbooks and educational packages. Conversely, the PowerPC is new to the academic environment and comes with a rich set of features and development tools. In this paper, they examine the similarities and differences between the laboratory platforms and their impact on student learning. They also identify strengths and weaknesses of the new laboratory environment, based on their own perspective and student feedback.

...read moreread less

Posted Content•

The BlueGene/L Supercomputer

[...]

Gyan V. Bhanot, Dong Chen, Alan Gara, Pavlos Vranas

17 Dec 2002-arXiv: High Energy Physics - Lattice

TL;DR: The architecture of the BlueGene/L massively parallel supercomputer is described in this article, where each computing node consists of a single compute ASIC plus 256 MB of external memory, and 65,536 of such nodes are connected into a 3-dimensional torus with a geometry of 32x32x64.

...read moreread less

Abstract: The architecture of the BlueGene/L massively parallel supercomputer is described. Each computing node consists of a single compute ASIC plus 256 MB of external memory. The compute ASIC integrates two 700 MHz PowerPC 440 integer CPU cores, two 2.8 Gflops floating point units, 4 MB of embedded DRAM as cache, a memory controller for external memory, six 1.4 Gbit/s bi-directional ports for a 3-dimensional torus network connection, three 2.8 Gbit/s bi-directional ports for connecting to a global tree network and a Gigabit Ethernet for I/O. 65,536 of such nodes are connected into a 3-d torus with a geometry of 32x32x64. The total peak performance of the system is 360 Teraflops and the total amount of memory is 16 TeraBytes.

...read moreread less

Designing a single board computers for space using the most advanced processor and mitigation technologies

[...]

L. Longden, C. Thibodeau, R. Hiliman, P. Layton, M. Dowd - Show less +1 more

01 Dec 2002

TL;DR: Maxwell Technologies has developed a Super Computer for Space (SCS750) that utilizes the latest commercial Silicon-on-Insulator PowerPC processors and state-of-the-art memory modules to achieve space-qualified performance that is from 10 to 1000 times that of current technology.

...read moreread less

Abstract: As high-end computing becomes more of a necessity in space, there currently exists a large gap between what is available to satellite manufacturers and the state of the commercial processor industry. As a result, Maxwell Technologies has developed a Super Computer for Space that utilizes the latest commercial Silicon-on-Insulator PowerPC processors and state-of-the-art memory modules to achieve space-qualified performance that is from 10 to 1000 times that of current technology. In addition, Maxwell’s Super Computer for Space (SCS750) SBC is capable of executing up to 1800+ millions of instruction per second (MIPS), while guaranteeing upset rates for the entire board of less then 1 every 1000 years. Presented is a brief synopsis of Maxwell’s design approach and radiation mitigation techniques and radiation test results employed on Maxwell’s next generation SBC.

...read moreread less

Proceedings Article•DOI•

Automated equivalence checking of switch level circuits

[...]

Simon Jolly, Atanas Nikolaev Parashkevov¹, Tim McDougall¹•Institutions (1)

Motorola¹

10 Jun 2002

TL;DR: This paper presents Motorola's Switch Level Verification (SLV) tool, which employs detailed switch level analysis to model the behavior of MOS transistors and obtain an equivalent RTL model.

...read moreread less

Abstract: A chip that is required to meet strict operating criteria in terms of speed, power, or area is commonly custom designed at the switch level. Traditional techniques for verifying these designs, based on simulation, are expensive in terms of resources and cannot completely guarantee correct operation. Formal verification methods, on the other hand, provide for a complete proof of correctness, and require less effort to setup. This paper presents Motorola's Switch Level Verification (SLV) tool, which employs detailed switch level analysis to model the behavior of MOS transistors and obtain an equivalent RTL model. This tool has been used for equivalence checking at the switch level for several years within Motorola for the PowerPC, M*Core and DSP custom blocks. We focus on the novel techniques employed in SLV, particularly in the areas of pre-charged and sequential logic analysis, and provide details on the automated and integrated equivalence checking flow in which the tool is used.

...read moreread less

Proceedings Article•DOI•

Simple and effective array prefetching in Java

[...]

Brendon Cahoon¹, Kathryn S. McKinley²•Institutions (2)

University of Massachusetts Amherst¹, University of Texas at Austin²

03 Nov 2002

TL;DR: A new unified compile-time analysis for software prefetching arrays and linked structures in Java is described that identifies loop induction variables used in array accesses and is suitable for including in a just-in-time compiler.

...read moreread less

Abstract: Java is becoming a viable choice for numerical algorithms due to the software engineering benefits of object-oriented programming. Because these programs still use large arrays that do not fit in the cache, they continue to suffer from poor memory performance. To hide memory latency, we describe a new unified compile-time analysis for software prefetching arrays and linked structures in Java. Our previous work uses data-flow analysis to discover linked data structure accesses, and here we present a more general version that also identifies loop induction variables used in array accesses. Our algorithm schedules prefetches for all array references that contain induction variables. We evaluate our technique using a simulator of an out-of-order superscalar processor running a set of array-based Java programs. Across all our programs, prefetching reduces execution time by a geometric mean of 23%, and the largest improvement is 58%. We also evaluate prefetching on a PowerPC processor, and we show that prefetching reduces execution time by a geometric mean of 17%. Traditional software prefetching algorithms for C and Fortran use locality analysis and sophisticated loop transformations. Because our analysis is much simpler and quicker, it is suitable for including in a just-in-time compiler. We further show that the additional loop transformations and careful scheduling of prefetches used in previous work are not always necessary for modern architectures and Java programs.

...read moreread less

Computational Core Design of a Wireless Structural Health Monitoring System

[...]

Jerome P. Lynch¹, Arvind Sundararajan, Kincho H. Law, Anne S. Kiremidjian, Thomas W. Kenny, Ed Carryer - Show less +2 more•Institutions (1)

Stanford University¹

01 Jan 2002

TL;DR: The fundamental building block of the proposed sensing network is a wireless sensing unit capable of acquiring measurement data, interrogating the data and transmitting the data in real-time to the network.

...read moreread less

Abstract: Complementing recent advances made in the field of structural health monitoring and damage detection, the concept of a wireless sensing network with distributed computational power is proposed. The fundamental building block of the proposed sensing network is a wireless sensing unit capable of acquiring measurement data, interrogating the data and transmitting the data in real-time to the network. To perform the computationally intensive task of damage detection, an advanced PowerPC computational core is chosen. First, a layer of software comprised of various device driver modules is developed to operate the various hardware subsystems of the wireless sensing unit. Additional software is then designed for embedment that can locally execute a time-series based damage detection algorithm.

...read moreread less

Proceedings Article•DOI•

Static timing analysis based circuit-limited-yield estimation

[...]

A. Gattiker, S. Nassif, R. Dinakar, C. Long

07 Aug 2002

TL;DR: This paper presents a computationally efficient means for estimating parametric timing yield and guiding robust design-for-quality in the presence of manufacturing and operating environment variations by basing the proposed methodology on a post-processing step applied to the report generated as a by-product of static timing analysis.

...read moreread less

Abstract: This paper presents a computationally efficient means for estimating parametric timing yield and guiding robust design-for-quality in the presence of manufacturing and operating environment variations. Computational efficiency is achieved by basing the proposed methodology on a post-processing step applied to the report generated as a by-product of static timing analysis. Efficiency is also ensured by exploiting the fact that for small processing/environment variations, a linear model is adequate for capturing the resulting delay change. Meaningful design guidance is achieved by analyzing the timing-related influence of variations on a path-by-path basis, allowing designers to perform a quality-oriented design pass focused on key paths. A coherent strategy is provided to handle both die-to-die and within-die variations. Examples from a PowerPC microprocessor illustrate the methodology and its capabilities.

...read moreread less

Proceedings Article•DOI•

Bytecode fetch optimization for a Java interpreter

[...]

Kazunori Ogata¹, Hideaki Komatsu¹, Toshio Nakatani¹•Institutions (1)

IBM¹

01 Oct 2002

TL;DR: Three novel techniques of the Java bytecode interpreter are described, write-through top-of-stack caching (WT), position-based handler customization (PHC), and position- based speculative decoding (PSD), which ameliorate problems for the PowerPC processors and are shown to be the most effective among three.

...read moreread less

Abstract: Interpreters play an important role in many languages, and their performance is critical particularly for the popular language Java. The performance of the interpreter is important even for high-performance virtual machines that employ just-in-time compiler technology, because there are advantages in delaying the start of compilation and in reducing the number of the target methods to be compiled. Many techniques have been proposed to improve the performance of various interpreters, but none of them has fully addressed the issues of minimizing redundant memory accesses and the overhead of indirect branches inherent to interpreters running on superscalar processors. These issues are especially serious for Java because each bytecode is typically one or a few bytes long and the execution routine for each bytecode is also short due to the low-level, stack-based semantics of Java bytecode. In this paper, we describe three novel techniques of our Java bytecode interpreter, write-through top-of-stack caching (WT), position-based handler customization (PHC), and position-based speculative decoding (PSD), which ameliorate these problems for the PowerPC processors. We show how each technique contributes to improving the overall performance of the interpreter for major Java benchmark programs on an IBM POWER3 processor. Among three, PHC is the most effective one. We also show that the main source of memory accesses is due to bytecode fetches and that PHC successfully eliminates the majority of them, while it keeps the instruction cache miss ratios small.

...read moreread less

Performance Analysis of Simultaneous Multithreading in a PowerPC-based Processor

[...]

F.N. Eskesen, M. Hack, T. Kimbrel, Mark S. Squillante, Richard J. Eickemeyer, Steven R. Kunkel - Show less +2 more

01 Jan 2002

TL;DR: A performance analysis of SMT in a PowerPC-based wide superscalar processor architecture under a broad range of workloads, which includes combinations of TPC-C, SPECint and SPECfp is presented.

...read moreread less

Abstract: Simultaneous multithreading (SMT) is an approach to address the well-known problems of memory accesses increasingly dominating processor execution time and of limited instruction level parallelism. Previous research has explored the benefits and limitations of SMT based on specific processor architectures under a variety of workloads. In this paper, we present a performance analysis of SMT in a PowerPC-based wide superscalar processor architecture under a broad range of workloads, which includes combinations of TPC-C, SPECint and SPECfp. Although some of our results are consistent with previous work, our results also demonstrate some differences and we use these results to explore and identify the primary causes of such differences. This includes an investigation of thread characteristics that work well together in SMT environments, and thread characteristics that do not work well together.

...read moreread less

Proceedings Article•DOI•

A study of CodePack: optimizing embedded code space

[...]

Avishay Orpaz¹, Shlomo Weiss¹•Institutions (1)

Tel Aviv University¹

06 May 2002

TL;DR: A novel efficient algorithm is presented to optimize the class structure of the CodePack system and its design parameters and investigate how each affects its performance on the compression rate and decoder complexity.

...read moreread less

Abstract: CodePack is a code compression system used by IBM in its PowerPC family of embedded processors. CodePack combines high compression capability along with fast and simple decoding hardware. IBM did not release much information about the design of the system and the influence of various design parameters on its performance. In our work we will present the system and its design parameters and investigate how each affects its performance on the compression rate and decoder complexity. We also present a novel efficient algorithm to optimize the class structure of the system.

...read moreread less

Workload Characterization of Java Server Applications on Two PowerPC Processors

[...]

Pattabi Seshadri¹, Lizy K. John¹, Alex E. Mericas²•Institutions (2)

University of Texas at Austin¹, IBM²

01 Jan 2002

TL;DR: It is found that the Java server benchmarks characterize, SPECjbb2000 and VolanoMark 2.1.2, have generally the same characteristics on both platforms: in particular, high instruction cache, ITLB, and BTAC miss rates.

...read moreread less

Abstract: Java has become fairly popular on commercial servers in recent years. However, the behavior of Java server applications has not been studied extensively. We characterize two Java server benchmarks, SPECjbb2000 and VolanoMark 2.1.2, on two IBM PowerPC architectures, the RS64-III and the POWER3-II, and compare them to more traditional workloads as represented by selected benchmarks from SPECint2000. We find that our Java server benchmarks have generally the same characteristics on both platforms: in particular, high instruction cache, ITLB, and BTAC (Branch Target Address Cache) miss rates. These benchmarks also exhibit high L2 miss rates due mostly to data loads. Instruction cache and L2 misses are seen to be the primary contributors to CPI.

...read moreread less

Proceedings Article•DOI•

Receiver control for the submillimeter array

[...]

Todd R. Hunter¹, Robert W. Wilson¹, Robert Kimberk¹, P. S. Leiker¹, R. D. Christensen² - Show less +1 more•Institutions (2)

Harvard University¹, Smithsonian Institution²

01 Dec 2002-Astronomical Telescopes and Instrumentation

TL;DR: Efficient operation of a submillimeter interferometer requires remote (preferably automated) control of mechanically tuned local oscillators, phase-lock loops, mixers, optics, calibration vanes and cryostats.

...read moreread less

Abstract: Efficient operation of a submillimeter interferometer requires remote (preferably automated) control of mechanically tuned local oscillators, phase-lock loops, mixers, optics, calibration vanes and cryostats. The present control system for these aspects of the Submillimeter Array (SMA) will be described. Distributed processing forms the underlying architecture. In each antenna cabin, a serial network of up to ten independent 80C196 microcontroller boards attaches to the real-time PowerPC computer (running LynxOS). A multi-threaded, gcc-compiled program on the PowerPC accepts top-level requests via remote procedure calls (RPC), subsequently dispatches tuning commands to the relevant microcontrollers, and regularly reports the system status to optical-fiber-based reflective memory for common access by the telescope monitor and error reporting system. All serial communication occurs asynchronously via encoded, variable-length packets. The microcontrollers respond to the requested commands and queries by accessing non-volatile, rewriteable lookup-tables (when appropriate) and executing embedded software that operates additional electronic devices (DACs, ADCs, etc.). Since various receiver hardware components require linear or rotary motion, each microcontroller also implements a position servo via a one-millisecond interrupt service routine which drives a DC-motor/encoder combination that remains standard across each subsystem.© (2002) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

...read moreread less

Proceedings Article•DOI•

Keynote address: Sensor network research: emerging challenges for architecture, systems, and languages

[...]

Deborah Estrin

01 Oct 2002

Proceedings Article•DOI•

Real-time embedded hyperspectral image compression for tactical military platforms

[...]

D. Lorts

16 Oct 2002

TL;DR: This paper presents the current on-going research efforts in which a real-time hyperspectral data compression system developed and demonstrated for a military customer is being ported to an embedded platform fit for deployment onto a tactical platform such as an unmanned aerial vehicle (UAV).

...read moreread less

Abstract: Summary form only given. This paper presents the current on-going research efforts in which a real-time hyperspectral data compression system developed and demonstrated for a military customer is being ported to an embedded platform fit for deployment onto a tactical platform such as an unmanned aerial vehicle (UAV). The original system consists of a PC host containing multiple PCI boards with SHARC processors interfaced to a state-of-the-art hyperspectral image (HSI) sensor. The resulting embedded implementation will leverage a scalable multiprocessing architecture. Processing nodes based on PowerPC processors with AltiVec technology provide the compute power, while the scalable standard RACEway fabric (ANSI/VITA 5-1994) handles the large interprocessor communication bandwidth. The motivation for this effort is derived from the increased interest in fielding hyperspectral sensors in the intelligence, surveillance, and reconnaissance missions of the military. Historically, there has been significant work performed to develop various data link systems. Data transmission requirements have grown quickly to whatever capacity was available in the data link. With hyperspectral data, this problem becomes even more significant. Sensors such as the EO/IR packages generate large two-dimensional (2-D) data sets. There are many standards developed to compress 2-D data sets, including the ubiquitous JPEG family of routines. With hyperspectral data, there is now a third dimension contained in the collection, that being the spectral components associated with each spatial pixel element. No longer do 2-D approaches apply efficiently. The "data cube" produced by an HSI sensor has correlation components in spatial, temporal, and spectral dimension. The principle component transformation algorithm is one such routine that can work within the data cube environment. The results of this port to a deployable, embedded system architecture will be a scalable product that can be integrated into a larger system that may provide actual data exploitation either on the unmanned platform or on the ground element. Performance characteristics between the two implementations are compared. An attempt to "generalize" the parallelism to increase the scalability to any number of available processing elements is a critical objective to increase the utility of this approach. The final product of this work will be the creation of a commercial off-the-shelf (COTS) subsystem that can be leveraged by system developers.

...read moreread less

Proceedings Article•DOI•

Real-time control using open source RTOS

[...]

Philip C. Irwin¹, Richard L. Johnson¹•Institutions (1)

Jet Propulsion Laboratory¹

01 Dec 2002-Astronomical Telescopes and Instrumentation

TL;DR: The goals of this paper are to briefly describe the RTC toolkit, highlight the successes and pitfalls of porting the toolkit from VxWorks to Linux-RTAI, and to discuss future enhancements that will be implemented as a direct result of this port.

...read moreread less

Abstract: Complex telescope systems such as interferometers tend to rely heavily on hard real-time operating systems (RTOS). It has been standard practice at NASA's Jet Propulsion Laboratory (JPL) and many other institutions to use costly commercial RTOSs and hardware. After developing a real-time toolkit for VxWorks on the PowerPC platform (dubbed RTC), the interferometry group at JPL is porting this code to the real-time Application Interface (RTAI), an open source RTOS that is essentially an extension to the Linux kernel. This port has the potential to reduce software and hardware costs for future projects, while increasing the level of performance. The goals of this paper are to briefly describe the RTC toolkit, highlight the successes and pitfalls of porting the toolkit from VxWorks to Linux-RTAI, and to discuss future enhancements that will be implemented as a direct result of this port. The first port of any body of code is always the most difficult since it uncovers the OS-specific calls and forces “red flags” into those portions of the code. For this reason, It has also been a huge benefit that the project chose a generic, platform independent OS extension, ACE, and its CORBA counterpart, TAO. This port of RTC will pave the way for conversions to other environments, the most interesting of which is a non-real-time simulation environment, currently being considered by the Space Interferometry Mission (SIM) and the Terrestrial Planet Finder (TPF) Projects.

...read moreread less

Proceedings Article•DOI•

Is state mapping essential for equivalence checking custom memories in scan-based designs?

[...]

N. Krishnamurthy¹, Jayanta Bhadra¹, Magdy S. Abadir¹, Jacob A. Abraham•Institutions (1)

Motorola¹

28 Apr 2002

TL;DR: This paper defines Crossover Bugs (CB's) that can be present in scan-based custom designs and that are inherently hard-to-detect without state mapping and demonstrates that such bugs can be missed by equivalence checking techniques that do not have state mappings between the two descriptions.

...read moreread less

Abstract: Equivalence checking between Register Transfer Level (RTL) descriptions and transistor level descriptions of custom memories is an important step in the design flow of high performance microprocessors. Equivalence checking can be done with or without the knowledge of state mapping between the two descriptions. We present evidence that because of state mapping, our verification technique exercises system behavior that exposes hard-to-detect bugs that might otherwise go undetected. This paper defines Crossover Bugs (CB's) that can be present in scan-based custom designs and that are inherently hard-to-detect without state mapping. We demonstrate that such bugs can be missed by equivalence checking techniques that do not have state mappings between the two descriptions. By identifying the state correspondences between the RTL and the transistor implementation of custom memories, a more rigorous equivalence check can be performed compared to traditional equivalence checking methods such as product machine constructions. We also compare the time and memory complexities of crossover bug detection capability of the two equivalence checking approaches. We conclude with experimental results of CB detection on some of the custom designed embedded memories of Motorola's MPC 7455 microprocessor (compliant with IBM's PowerPC instruction set architecture).

...read moreread less

Proceedings Article•DOI•

A wavelet-based timing parameter extraction method

[...]

Mani Soma¹, W. Haileselassie¹, J. Yan¹, R. Raina•Institutions (1)

University of Washington¹

07 Oct 2002

TL;DR: Experimental validation of this wavelet-based timing parameter extraction method is demonstrated with clock jitter estimation on a prototype microprocessor, based on the PowerPC architecture, together with a comparison with existing techniques.

...read moreread less

Abstract: A wavelet-based timing parameter extraction method applicable to computer and communication systems is presented. Experimental validation of this theoretical method is demonstrated with clock jitter estimation on a prototype microprocessor, based on the PowerPC architecture, together with a comparison with existing techniques.

...read moreread less

Silicon-on-Insulator PowerPC Microprocessors

[...]

F. Irom, G. M. Swift, F. H. Farmanesh, A. H. Johnston

01 Jan 2002

Review of the mouton interactive introduction to phonetics and phonology and phonetics: an interactive introduction

[...]

Jürgen Handke, Nicholas J Reid, Helen B Fraser, Mouton de Gruyter

01 Jan 2002

TL;DR: The Mouton Interactive Introduction to Phonetics and Phonology: An Interactive Introduction (2000; Contains bonus program Introduction to Voice Onset Time, 1996)

...read moreread less

Abstract: Title The Mouton Interactive Introduction to Phonetics and Phonology (2000) Phonetics: An Interactive Introduction (2000; Contains bonus program Introduction to Voice Onset Time, 1996) Author Jurgen Handke Nicholas Reid, with contributions from Helen Fraser Platform Windows (9x/ME/NT 4.0/2000) and Macintosh (MacOS 8.1 or higher) Windows (95/98/NT4) and Macintosh (MacOS 7.5.1 or later) Minimum hardware requirements PC: Pentium, 32 MB RAM, 30 MB hard disk space, SVGA graphics board, CD-ROM drive and sound card. Mac: PowerPC 120 MHz or higher, 32 MB RAM, 30 MB hard disk space, screen resolution 800x600, color monitor with thousands of colors or higher, CD-ROM drive, sound card. PC: Pentium processor or equivalent, 2x speed CD-ROM, 12 MB RAM (free), 800x600 8-bit color display. Mac: 68040 or faster processor, 2x speed CD-ROM, 12 MB RAM (free), 800x600 8-bit color display.

...read moreread less