Showing papers on "PowerPC published in 2006"

PDF

Open Access

Journal Article•DOI•

Formal certification of a compiler back-end or

[...]

11 Jan 2006-Sigplan Notices

TL;DR: This paper reports on the development and formal certification of a compiler from Cminor (a C-like imperative language) to PowerPC assembly code, using the Coq proo...

...read moreread less

Abstract: This paper reports on the development and formal certification (proof of semantic preservation) of a compiler from Cminor (a C-like imperative language) to PowerPC assembly code, using the Coq proo...

...read moreread less

100 citations

Proceedings Article•DOI•

Exploiting parallelism and structure to accelerate the simulation of chip multi-processors

[...]

David A. Penry¹, Dan Fay², D. Hodgdon², R. Wells¹, G. Schelle², David I. August¹, Daniel A. Connors² - Show less +3 more•Institutions (2)

Princeton University¹, University of Colorado Boulder²

27 Feb 2006

TL;DR: It is shown that automated parallelization can achieve an 7.60 speedup for a 16-processor CMP model on a conventional 4-processor shared-memory multiprocessor, and the power of hardware integration by integrating eight hardware PowerPC cores into a C MP model, achieving a speedup of up to 5.82.

...read moreread less

Abstract: Simulation is an important means of evaluating new microarchitectures. Current trends toward chip multiprocessors (CMPs) try the ability of designers to develop efficient simulators. CMP simulation speed can be improved by exploiting parallelism in the CMP simulation model. This may be done by either running the simulation on multiple processors or by integrating multiple processors into the simulation to replace simulated processors. Doing so usually requires tedious manual parallelization or re-design to encapsulate processors. This paper presents techniques to perform automated simulator parallelization and hardware integration for CMP structural models. We show that automated parallelization can achieve an 7.60 speedup for a 16-processor CMP model on a conventional 4-processor shared-memory multiprocessor. We demonstrate the power of hardware integration by integrating eight hardware PowerPC cores into a CMP model, achieving a speedup of up to 5.82.

...read moreread less

75 citations

Proceedings Article•DOI•

Trampoline An Open Source Implementation of the OSEK/VDX RTOS Specification

[...]

Jean-Luc Béchennec¹, Mikaël Briday¹, Sébastien Faucou¹, Yvon Trinquet¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Sep 2006

TL;DR: An OSEK/VDX operating system implementation is proposed in the context of the open source software, which interest needs not to be demonstrated any more.

...read moreread less

Abstract: This paper introduces an OSEK/VDX1 Operating System implementation. OSEK/VDX is an industry standard for real-time operating system used in the field of automotive embedded software. This implementation is proposed in the context of the open source software, which interest needs not to be demonstrated any more. The paper explains the main implementation choices as well as the technique proposed for the generation of a real-time application. This implementation is nowadays available for three targets: Infineon C167, Darwin/PowerPC and Linux/x86.

...read moreread less

61 citations

Proceedings Article•DOI•

PLRU Cache Domino Effects

[...]

Christoph Berg¹•Institutions (1)

Saarland University¹

01 Jan 2006

TL;DR: This paper shows that the pseudo LRU (PLRU) cache replacement policy can cause unbounded effects on the WCET, which is widely used in embedded systems, and some x86 models.

...read moreread less

Abstract: Domino effects have been shown to hinder a tight prediction of worst case execution times (WCET) on real-time hardware. First investigated by Lundqvist and StenstrAƒÂ¶m, domino effects caused by pipeline stalls were shows to exist in the PowerPC by Schneider. This paper extends the list of causes of domino effects by showing that the pseudo LRU (PLRU) cache replacement policy can cause unbounded effects on the WCET. PLRU is used in the PowerPC PPC755, which is widely used in embedded systems, and some x86 models.

...read moreread less

58 citations

Proceedings Article•DOI•

Fast rule matching for learning classifier systems via vector instructions

[...]

Xavier Llorà¹, Kumara Sastry¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

08 Jul 2006

TL;DR: This paper presents how to implement efficient condition encoding and fast rule matching strategies using vector instructions and elaborates on Altivec and SSE2 instruction sets producing speedups of XCS matching process beyond ninety times.

...read moreread less

Abstract: Over the last ten years XCS has become the standard for Michigan-style learning classifier systems (LCS). Since the initial CS-1 work conceived by Holland, classifiers (rules) have widely used a ternary condition alphabet {0,1,#} for binary input problems. Most of the freely available implementations of this ternary alphabet in XCS rely on character-based encodings---easy to implement, not memory efficient, and expensive to compute. Profiling of freely available XCS implementations shows that most of their execution time is spent determining whether a rule is match or not, posing a serious threat to XCS scalability. In the last decade, multimedia and scientific applications have pushed CPU manufactures to include native support for vector instruction sets. This paper presents how to implement efficient condition encoding and fast rule matching strategies using vector instructions. The paper elaborates on Altivec (PowerPC G4, G5) and SSE2 (Intel P4/Xeon and AMD Opteron) instruction sets producing speedups of XCS matching process beyond ninety times. Moreover, such a vectorized matching code will allow to easily scale beyond tens of thousands of conditions in a reasonable time. The proposed fast matching scheme also fits in any other LCS other than XCS.

...read moreread less

34 citations

Proceedings Article•DOI•

A Flexible Framework for Wireless Medium Access Protocols

[...]

Chris Hunter¹, Joseph Camp¹, Patrick Murphy¹, Ashutosh Sabharwal¹, Christopher H. Dick² - Show less +1 more•Institutions (2)

Rice University¹, Xilinx²

01 Oct 2006

TL;DR: The framework, developed for the Rice University Wireless Open-Access Research Platform (WARP), allows to interface a large class of medium access protocols with custom physical layer implementations, thereby providing a flexible and high-performance research tool.

...read moreread less

Abstract: In this paper, we present a framework for Medium Access Control (MAC) protocol development and performance evaluation. The framework, developed for the Rice University Wireless Open-Access Research Platform (WARP), allows us to interface a large class of medium access protocols with custom physical layer (PHY) implementations, thereby providing a flexible and high-performance research tool. MAC protocols for our framework are written in C and targeted to embedded PowerPC cores within the Xilinx Virtex II-Pro class of FPGAs. A key innovation is a flexible interface between the PHY and the MAC capable of exposing user-defined parameters to either layer, thus enabling cross-layer research.

...read moreread less

32 citations

Proceedings Article•DOI•

Combining analytical and empirical approaches in tuning matrix transposition

[...]

Qingda Lu¹, Sriram Krishnamoorthy¹, P. Sadayappan¹•Institutions (1)

Ohio State University¹

16 Sep 2006

TL;DR: An integrated optimization framework is developed that addresses a number of issues, including tiling for the memory hierarchy, effective handling of memory misalignment, utilizing memory subsystem characteristics, and the exploitation of the parallelism provided by the vector instruction sets in current processors.

...read moreread less

Abstract: Matrix transposition is an important kernel used in many applications Even though its optimization has been the subject of many studies, an optimization procedure that targets the characteristics of current processor architectures has not been developed In this paper, we develop an integrated optimization framework that addresses a number of issues, including tiling for the memory hierarchy, effective handling of memory misalignment, utilizing memory subsystem characteristics, and the exploitation of the parallelism provided by the vector instruction sets in current processors A judicious combination of analytical and empirical approaches is used to determine the most appropriate optimizations The absence of problem information until execution time is handled by generating multiple versions of the code - the best version is chosen at runtime, with assistance from minimal-overhead inspectors The approach highlights aspects of empirical optimization that are important for similar computations with little temporal reuse Experimental results on PowerPC G5 and Intel Pentium 4 demonstrate the effectiveness of the developed framework

...read moreread less

32 citations

Book•

Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture

[...]

Jon Stokes

30 Nov 2006

TL;DR: This chapter discusses the design philosophy of the Intel Pentium, as well as the architecture of the PowerPC processors used in the generation of the Pentium and Pentium Pro.

...read moreread less

Abstract: Chapter 1: Basic Computing Concepts Chapter 2: The Mechanics of Program Execution Chapter 3: Pipelined Execution Chapter 4: Superscalar Execution Chapter 5: The Intel Pentium and Pentium Pro Chapter 6: PowerPC Processors: 600 Series, 700 Series, and 7400 Chapter 7: Intel's Pentium 4 vs. Motorola's G4e: Approaches and Design Philosophies Chapter 8: Intel's Pentium 4 vs. Motorola's G4e: The Back End Chapter 9: 64-Bit Computing and x86-64 Chapter 10: The G5: IBM's PowerPC 970 Chapter 11: Understanding Caching and Performance Chapter 12: Intel's Pentium M, Core Duo, and Core 2 Duo

...read moreread less

31 citations

Journal Article•DOI•

An FPGA- Based General-Purpose Data Acquisition Controller

[...]

C.C.W. Robson¹, Abdelkader Bousselham¹, Christian Bohm¹•Institutions (1)

Stockholm University¹

28 Aug 2006-IEEE Transactions on Nuclear Science

TL;DR: A modular development framework that can be adapted for different systems by simply changing the software or firmware parts, based on the demands of the system to be developed.

...read moreread less

Abstract: System development in advanced FPGAs allows considerable flexibility, both during development and in production use. A mixed firmware/software solution allows the developer to choose what shall be done in firmware or software, and to make that decision late in the process. However, this flexibility comes at the cost of increased complexity. We have designed a modular development framework to help to overcome these issues of increased complexity. This framework comprises a generic controller that can be adapted for different systems by simply changing the software or firmware parts. The controller can use both soft and hard processors, with or without an RTOS, based on the demands of the system to be developed. The resulting system uses the Internet for both control and data acquisition. In our studies we developed the embedded system in a Xilinx Virtex-II Pro FPGA, where we used both PowerPC and MicroBlaze cores, http, Java, and LabView for control and communication, together with the MicroC/OS-II and OSE operating systems

...read moreread less

24 citations

Proceedings Article•DOI•

An adaptive system-on-chip for network applications

[...]

Roman Koch¹, Thilo Pionteck¹, Carsten Albrecht¹, Erik Maehle¹•Institutions (1)

University of Lübeck¹

25 Apr 2006

TL;DR: The hardware architecture of DynaCORE, a dynamically reconfigurable system-on-chip for network applications, and on-chip communication issues are presented, including the integration of PowerPC processor cores into the configurable logic as well as the mode of operation of the network-On-chip.

...read moreread less

Abstract: This paper presents the hardware architecture of DynaCORE, a dynamically reconfigurable system-on-chip for network applications. DynaCORE is an application specific coprocessor for offloading computationally intensive tasks from a network processor. The system-on-chip architecture is based on an adaptable network-on-chip which allows the dynamic replacement of hardware modules as well as the adaptation of the on-chip communication structure. The coprocessor leverages the active partial reconfiguration feature of modern FPGAs in order to adapt to shifting demand patterns. An embedded general-purpose processor core within the coprocessor runs software which manages the configurations of the device. With reference to a prototypical implementation targeting a Xilinx Virtex-II Pro FPGA, this paper focuses on on-chip communication issues. Topics include the integration of PowerPC processor cores into the configurable logic as well as the mode of operation of the network-on-chip.

...read moreread less

23 citations

Proceedings Article•DOI•

Automatic testcase synthesis and performance model validation for high performance PowerPC processors

[...]

Robert H. Bell¹, R.R. Bhatia¹, Lizy K. John, Jeffrey A. Stuecheli, J. Griswell, P. Tu, L. Capps, A. Blanchard, R. Thai - Show less +5 more•Institutions (1)

IBM¹

19 Mar 2006

TL;DR: This work synthesizes representative PowerPC versions of the SPEC2000, STREAM, TPC-C and Java benchmarks, compile and execute them, and obtains an average IPC within 2.4% of the averageIPC of the original benchmarks and with many similar average workload characteristics.

...read moreread less

Abstract: The latest high-performance IBM PowerPC microprocessor, the POWERS chip, poses challenges for performance model validation The current state-of-the-art is to use simple hand-coded bandwidth and latency testcases, but these are not comprehensive for processors as complex as the POWER5 chip Applications and benchmark suites such as SPEC CPU are difficult to set up or take too long to execute on functional models or even on detailed performance models We present an automatic testcase synthesis methodology to address these concerns By basing testcase synthesis on the workload characteristics of an application, source code is created that largely represents the performance of the application, but which executes in a fraction of the runtime We synthesize representative PowerPC versions of the SPEC2000, STREAM, TPC-C and Java benchmarks, compile and execute them, and obtain an average IPC within 24% of the average IPC of the original benchmarks and with many similar average workload characteristics The synthetic testcases often execute two orders of magnitude faster than the original applications, typically in less than 300K instructions, making performance model validation for today's complex processors feasible

...read moreread less

Proceedings Article•DOI•

Power Efficiency for Variation-Tolerant Multicore Processors

[...]

Donald, Martonosi

01 Jan 2006

Proceedings Article•DOI•

Power Distribution Measurements of the Dual Core PowerPC/sup TM/ 970MP Microprocessor

[...]

Hendrik F. Hamann¹, Alan J. Weger¹, James A. Lacey¹, Erwin B. Cohen, C. Atherton - Show less +1 more•Institutions (1)

IBM¹

18 Sep 2006

TL;DR: Spatially-resolved imaging of microprocessor power (SIMP) is shown to be a critical tool for measuring temperature and power distributions of a microprocessor under full operating conditions.

...read moreread less

Abstract: Spatially-resolved imaging of microprocessor power (SIMP) is shown to be a critical tool for measuring temperature and power distributions of a microprocessor under full operating conditions. In this paper, the SIMP technique is applied to the dual-core PowerPCtrade 970MP microprocessor

...read moreread less

Proceedings Article•DOI•

Fault Injection-based Reliability Evaluation of SoPCs

[...]

M. Sonza Reorda¹, Luca Sterpone¹, Massimo Violante¹, Marta Portela-Garcia², Celia Lopez-Ongil², Luis Entrena² - Show less +2 more•Institutions (2)

Polytechnic University of Turin¹, Charles III University of Madrid²

21 May 2006

TL;DR: A new fault-injection approach for evaluating the impact of transient faults in SoPCs is presented and a case study consisting of a Web server implemented on a Xilinx Virtex-II FPGA embedding a PowerPC 405 and running the whole TCP/IP stack is reported.

...read moreread less

Abstract: Systems-on-Programmable-Chip (SoPCs) include processors, memories and programmable logic that allow to catch multiple application requirements such as high performance, reconfigurability and low-costs. Due to these characteristics, they are also becoming very attractive for safety-critical applications. However, the issue of assessing the reliability they can provide and debugging the possible safety-related mechanisms they embed is still open. In this paper, we present a new fault-injection approach for evaluating the impact of transient faults in SoPCs. Fault-injection experiments are reported on a case study consisting of a web server implemented on a Xilinx Virtex-II FPGA embedding a PowerPC 405 and running the whole TCP/IP stack.

...read moreread less

Proceedings Article•DOI•

Using Lin-Kernighan algorithm for look-up table compression to improve code density

[...]

Talal Bonny¹, Joerg Henkel¹•Institutions (1)

Karlsruhe Institute of Technology¹

30 Apr 2006

TL;DR: This work presents a method and architecture for compressing the so-called Look-up Tables that are necessary for the de-compression process, and introduces a novel and very efficient hardware-supported approach based on Canonical Huffman Coding.

...read moreread less

Abstract: The presented work uses code compression to improve the design efficiency of an embedded system. In particular, we present a method and architecture for compressing the so-called Look-up Tables that are necessary for the de-compression process. No other work has yet focused on minimizing the Look-up Tables that, as we show, have a significant impact on the total overhead of a hardware-based decompression scheme. We introduce a novel and very efficient hardware-supported approach based on Canonical Huffman Coding. Using the Lin-Kernighan algorithm we reduce the Look-up Table size by up to 45%. As a result, we achieve all-over compression ratios as low as 45% (already including the overhead of the Look-up Tables). Thereby, our scheme is entirely orthogonal to approaches that take particularities of a certain instruction set architecture into account, meaning that compression could be further improved. Factoring in the orthogonality, our scheme is the basis for not-yet-achieved efficiency in hardware-supported compression schemes. We have conducted evaluations using a representative set (in terms of size and application domain) of applications and have applied it to three major embedded processor architectures, namely ARM, MIPS and PowerPC. The hardware evaluation shows no performance penalty.

...read moreread less

Proceedings Article•DOI•

Front-end Module for GNSS Software Receiver

[...]

Josef Spacek¹, Pavel Puricer¹•Institutions (1)

Czech Technical University in Prague¹

07 Jun 2006

TL;DR: The topic of the paper is focused on the design and implementation of the radio front-end part for experimental GNSS software receiver developed at the Department of Radio Engineering of the Czech Technical University in Prague.

...read moreread less

Abstract: The topic of the paper is focused on the design and implementation of the radio front-end part for experimental GNSS software receiver developed at the Department of Radio Engineering of the Czech Technical University in Prague. The receiver is designed for the processing of signals of present and future global navigation satellite systems, including GPS, GLONASS and Galileo. For the biggest possible versatility, the modular architecture and software defined radio (SDR) concept were chosen. The front-end unit consists of three independent channels with the bandwidth of 24 MHz each that use a single conversion super-heterodyne concept with intermediate frequency 140 MHz. The front-end provides down converted analogue signal to DSP unit represented by FPGA device with two embedded PowerPC cores. The paper also provides comparison of the front-end of experimental receiver with lot manufacture case.

...read moreread less

Journal Article•DOI•

Hybrid Fault Detection Technique: A Case Study on Virtex-II Pro's PowerPC 405

[...]

Paolo Bernardi¹, Luca Sterpone¹, Massimo Violante¹, M. Portela-Garcia¹•Institutions (1)

Polytechnic University of Turin¹

19 Dec 2006-IEEE Transactions on Nuclear Science

TL;DR: A hybrid approach is proposed, which combines ideas from previous techniques based on software transformations with the introduction of an Infrastructure IP with reduced memory and performance overheads, to harden system based on the PowerPC 405 core available in Virtex-II Pro FPGAs.

...read moreread less

Abstract: Hardening processor-based systems against transient faults requires new techniques able to combine high fault detection capabilities with the usual design requirements, e.g., reduced design-time, low area overhead, reduced (or null) accessibility to processor internal hardware. This paper proposes the adoption of a hybrid approach, which combines ideas from previous techniques based on software transformations with the introduction of an Infrastructure IP with reduced memory and performance overheads, to harden system based on the PowerPC 405 core available in Virtex-II Pro FPGAs. The proposed approach targets faults affecting the memory elements storing both the code and the data, independently of their location (inside or outside the processor). Extensive experimental results including comparisons with previous approaches are reported, which allow practically evaluating the characteristics of the method in terms of fault detection capabilities and area, memory and performance overheads

...read moreread less

Proceedings Article•DOI•

High Throughput FPGA Based Architecture for H. 264/AVC Inverse Transforms and Quantization

[...]

Luciano Agostini¹, Marcelo Porto¹, Jose Luis Guntzel¹, Roger Porto², Sergio Bampi² - Show less +1 more•Institutions (2)

Universidade Federal de Pelotas¹, Universidade Federal do Rio Grande do Sul²

01 Aug 2006

TL;DR: The post place-and-route synthesis results indicate that the global architecture is able to process 132 million of samples per second, allowing its use in H.264/AVC coders and decoders for HDTV.

...read moreread less

Abstract: This paper presents the design, the validation and the prototyping of a H.264/AVC inverse transform and quantization architecture. This architecture was designed to reach high throughputs and to be easily integrated with other H.264/AVC modules. The architecture was completely described in VHDL and the VHDL code was behaviorally and post place-and-route validated through simulations, comparing the data generated by the architecture with the data extracted from the H.264/AVC reference software. Finally, the architecture was prototyped using a Digilent XUP V2P board that contains a Virtex-II Pro VP30 Xilinx FPGA. The architecture mapped to the target FPGA was stimulated in the prototyping board using a PowerPC processor that is hardwired in that FPGA. The prototype was validated and the results show that the designed architecture was working in accordance with the H.264/AVC standard. The post place-and-route synthesis results indicate that the global architecture is able to process 132 million of samples per second, allowing its use in H.264/AVC coders and decoders for HDTV.

...read moreread less

Proceedings Article•DOI•

VICTORIA: VMX indirect compute technology oriented towards in-line acceleration

[...]

Jeff H. Derby¹, Robert K. Montoye², José E. Moreira²•Institutions (2)

Research Triangle Park¹, IBM²

03 May 2006

TL;DR: The VICTORIA PowerPC architecture is described, which is based on the iVMX accelerator technology, which extends the existing VMX architecture with indirect register addressing and opens the door for highly optimized vector algorithms that can sustain very high processing rates.

...read moreread less

Abstract: There is increasing interest in the use of accelerators in computer systems. Accelerators are processor-attached hardware units that can perform certain functions faster than the conventional general purpose processor. In this paper, we describe the VICTORIA PowerPC architecture, which is based on the iVMX accelerator technology. The iVMX accelerator extends the existing VMX architecture with indirect register addressing. That approach greatly extends the architected space of registers and opens the door for highly optimized vector algorithms that can sustain very high processing rates. The large space of registers is directly controlled by the executing code and offers a sufficiently large storage to hold sizeable intermediate results. This helps reduce the negative effects of limited memory bandwidth and high memory latency. The iVMX accelerator is an example of in-line accelerator; that is, the instructions that drive the accelerator are part of the same stream that drives the main processor. Compared to off-line accelerators, which execute their own instruction stream, in-line accelerators present a much more convenient programming model.

...read moreread less

Proceedings Article•DOI•

TRAIN: A Virtual Transaction Layer Architecture for TLM-based HW/SW Codesign of Synthesizable MPSoC

[...]

W. Klingauf, H. Gadke, R. Giinzel

06 Mar 2006

TL;DR: The goal is to methodically simplify MPSoC design by systematic HW/SW interface abstraction, thus enabling early SW verification, rapid prototyping and fast exploration of critical design issues.

...read moreread less

Abstract: Our concept of a virtual transaction layer (VTL) architecture allows to directly map transaction-level communication channels onto a synthesizable multiprocessor SoC implementation. The VTL is above the physical MPSoC communication architecture, acting as a hardware abstraction layer for both HW and SW components. TLM channels are represented by virtual channels which efficiently route transactions between SW and HW entities through the on-chip communication network with respect to quality-of-service and realtime requirements. The goal is to methodically simplify MPSoC design by systematic HW/SW interface abstraction, thus enabling early SW verification, rapid prototyping and fast exploration of critical design issues. With TRAIN, we present our implementation of such a VTL architecture for Virtex-II Pro and PowerPC and illustrate its efficiency by experimentation

...read moreread less

Proceedings Article•DOI•

FPGA-based multichannel optical concentrator SIMCON 4.0 for TESLA cavities LLRF control system

[...]

Karol Perkuszewski¹, Krzysztof T. Pozniak¹, Wojciech Jalmuzna¹, Waldemar Koprek¹, Jaroslaw Szewinski¹, Ryszard S. Romaniuk¹, Stefan Simrock - Show less +3 more•Institutions (1)

Warsaw University of Technology¹

04 Oct 2006

TL;DR: The paper presents an idea, design and realization of a gigabit, optoelectronic synchronous massive data concentrator for the LLRF control system for FLASH and XFEL superconducting accelerators and lasers.

...read moreread less

Abstract: The paper presents an idea, design and realization of a gigabit, optoelectronic synchronous massive data concentrator for the LLRF control system for FLASH and XFEL superconducting accelerators and lasers. The design bases on a central, large, programmable FPGA VirtexIIPro circuit by Xilinx and on eight commercial optoelectronic transceivers. There were implemented peripheral devices for embedded PowerPC block like: memory and Ethernet. The SIMCON 4.0 module was realized as a single, standard EURO-6HE board with VXI/VME-bus. Hardware implementation was described for the most important functional blocks. Construction solutions were presented.

...read moreread less

Proceedings Article•DOI•

An on-line software-based self-test framework for microprocessor cores

[...]

Alfredo Benso¹, Alberto Bosio, P. Prinetto, Alessandro Savino¹•Institutions (1)

Polytechnic University of Turin¹

16 Oct 2006

TL;DR: Results and issues faced during the development of SBST approach targeting a Motorola PowerPC 603 core are presented and a general framework, easily reusable on other microprocessor cores is developed.

...read moreread less

Abstract: Software-based self-test (SBST) in embedded microprocessor cores testing allows lowering test costs without loosing fault detection capabilities. Particularly in critical environments, SBST is executed during the system operating life in order to guarantee its availability and quality of service. If the test routines can be executed online but not-concurrently, then both the hardware and software overheads are negligible. This paper presents results and issues faced during the development of SBST approach targeting a Motorola PowerPC 603 core. The test, constrained by tight timing and coverage requirements, required the development of a general framework, easily reusable on other microprocessor cores

...read moreread less

Proceedings Article•DOI•

An embedded microcontroller for spacecraft applications

[...]

Joseph R. Marshall¹, Jeffrey E. Robertson¹•Institutions (1)

BAE Systems¹

24 Jul 2006

TL;DR: The evolution of the EMC within the Power PCI Bridge, development of its support tools and some of its applications are described as both a capable assistant to the RAD750 as well as a standalone processing element.

...read moreread less

Abstract: Originally conceived for fault tolerance control of its associated general purpose processor, the embedded microcontroller (EMC) present in BAE Systems Power PCI Bridge application specific integrated circuit (ASIC) has evolved into a processing workhorse finding applications spanning memory controllers, I/O processors as well as continuing to support the RAD750/spl reg/ PowerPC/spl reg/ processor. Development tools have also evolved from a simple assembler to a full development environment including compiler and simulator integrated with the PowerPC tools supporting the RAD750. This paper describes the evolution of the EMC within the Power PCI Bridge, development of its support tools and some of its applications as both a capable assistant to the RAD750 as well as a standalone processing element. Power and performance improvements are highlighted. Comparison to other processor cores that might be used in space is also shown. Discussion of future enhancements will also be mentioned.

...read moreread less

Proceedings Article•DOI•

Efficient use of communications between an FPGA's embedded processor and its reconfigurable logic

[...]

Joshua Noseworthy¹, Miriam Leeser²•Institutions (2)

Mercury Systems¹, Northeastern University²

22 Feb 2006

TL;DR: In this paper, the authors investigate the best interfaces for different data including instructions, stack, heap and user data, and demonstrate that the performance of the SDR application can be increased by as much as 60 percent just by choosing the interfaces that are most appropriate for the different types of data in the implementation.

...read moreread less

Abstract: FPGA manufacturers have recently embedded hard core microprocessors in FPGA fabric to improve the processing capabilities of their architectures. We present a study of using the Xilinx Virtex family's embedded PowerPC405 processor. We use a Software Defined Radio (SDR) application as a vehicle for investigating effective communications between the PowerPC405 Processor and the surrounding FPGA fabric. A challenging aspect of developing applications that target the PowerPC is the interfacing of the processor with the surrounding reconfigurable logic. We have implemented a dozen different versions of a Software Defined Radio (SDR) application to exercise the various interfaces that enable communication between the processor and the surrounding FPGA fabric. The implementations differ only in the interfaces used. Our study investigates the use of the On Chip Memory (OCM) interface, the Processor Local Bus (PLB) and the On-chip Processor Bus (OPB).We investigate the best interfaces for different data including instructions, stack, heap and user data. Our results indicate that the performance of the SDR application can be increased by as much as 60 percent just by choosing the interfaces that are most appropriate for the different types of data in the implementation. This demonstrates that the performance of FPGA applications that use the embedded processor are dramatically effected by the mechanisms chosen to enable communication between the processor and its surrounding resources.

...read moreread less

Proceedings Article•DOI•

A method to measure impedance of chip/package/board power supply system using pseudo-impulse current

[...]

Yaping Zhou¹, S.H. Dhong¹, Brian Flachs¹, Paul M. Harvey¹, Brad W. Michael¹ - Show less +1 more•Institutions (1)

IBM¹

01 Oct 2006

TL;DR: In this article, a method to measure the impedance Z(f) of a chip/package/board power supply system using pseudo-impulse current is described, which can be easily applied to the digital systems with synchronous clocking systems.

...read moreread less

Abstract: A method to measure the impedance Z(f) of a chip/package/board power supply system using pseudo-impulse current is described. This method can be easily applied to the digital systems with synchronous clocking systems. A PowerPC based microprocessor power supply system is used as an example to show the effectiveness of the method.

...read moreread less

Using faust for fpga programming

[...]

Robert Trausmuth¹, Christian Dusek¹, Yann Orlarey•Institutions (1)

CERN¹

01 Jan 2006

TL;DR: The possibility of using FAUST (a programming language for function based block oriented programming) to create a fast audio processor in a single chip FPGA environment and a proof-of-concept implementation using a simple two pole IIR filter is shown.

...read moreread less

Abstract: In this paper we show the possibility of using FAUST (a programming language for function based block oriented programming) to create a fast audio processor in a single chip FPGA environment. The produced VHDL code is embedded in the on-chip processor system and utilizes the FPGA fabric for parallel processing. For the purpose of implementing and testing the code a complete System-On-Chip framework has been created. We use a Digilent board with a XILINX Virtex 2 Pro FPGA. The chip has a PowerPC 405 core and the framework uses the on chip peripheral bus to interface the core. The content of this paper presents a proof-of-concept implementation using a simple two pole IIR filter. The produced code is working, although more work has to be done for implementing complex arithmetic operations support.

...read moreread less

Proceedings Article•DOI•

Transparent Distributed Programming under Linux

[...]

Kamran Karimi¹, Mohsen Sharifi²•Institutions (2)

University of Windsor¹, Iran University of Science and Technology²

14 May 2006

TL;DR: This paper introduces Distributed Inter-Process Communication (DIPC), a heterogeneous distributed programming system that hides inside Linux's kernel, traps access requests to System V IPC mechanisms and delegates them for execution on other computers as needed.

...read moreread less

Abstract: Developing parallel and distributed programs is usually considered a hard task. One has to have a good understanding of the problem domain, as well as the target hardware, and map the problem to the available hardware resources. The resulting program is often hard to port to another system. The development and maintenance process may thus be costly and time-consuming. In this paper we propose giving priority to hiding the details of distributed programming behind normal data sharing and synchronisation mechanisms. The programs are thus written as if they are meant to be run on a single, parallel computer. The developers still have to divide the task and make sure that the problem is solved in parallel, but the details of data transfer and synchronisation over a network are hidden. The resulting programs can thus be developed on nondistributed, non-parallel environments, and then be run on a variety of distributed and/or parallel platforms. Though such programs may not be as optimised as programs written specifically for a distributed computer, the speedups in programming time and costs may offset the losses. In this paper we introduce Distributed Inter-Process Communication (DIPC), a heterogeneous distributed programming system that hides inside Linuxs kernel, traps access requests to System V IPC mechanisms (messages, semaphores and shared memories) and delegates them for execution on other computers as needed. The results are then handed back through the kernel to the calling process, which is unaware of any distributed activity. DIPC currently supports Intel, PowerPC, ALPHA, MIPS, SPARC and Motorola 68k processors.

...read moreread less

Proceedings Article•DOI•

Software Implementation Issues of Existing and New Defuzzification Methods

[...]

A. Banaiyan¹, Hamid Reza Mahdiani¹, Sied Mehdi Fakhraie¹•Institutions (1)

University of Tehran¹

11 Sep 2006

TL;DR: Three new defuzzification methods are introduced which are suitable for efficient software and also hardware implementations and prove the superiority of these new methods for different software implementation approaches.

...read moreread less

Abstract: This paper discusses software implementation issues of different defuzzification procedures in fuzzy systems. Three new defuzzification methods are introduced which are suitable for efficient software and also hardware implementations. A set of seven important existing defuzzification methods are reviewed and compared with these new methods for different software implementation approaches. The C models of all methods are prepared to perform a comprehensive analysis on the output accuracy of different methods. The results prove the superiority of our new proposed methods. In another study, three categories of low-level assembly models are developed for each method to evaluate its software execution time and instruction count when executed on each of three chosen popular processors. Namely, Texas Instruments C6xcopy DSP, Intel's Pentiumcopy IV, and IBM's PowerPC PPC405copy processors are used as the running engines for this comparison. Some accuracy-speed analysis diagrams are then introduced to guide the designers for choosing the defuzzification method which best suites their application requirements.

...read moreread less

Proceedings Article•DOI•

An adaptive system-on-chip for network applications

[...]

Koch, Pionteck, Albrecht, Maehle

01 Jan 2006

Journal Article•DOI•

64-bit versus 32-bit Virtual Machines for Java: Research Articles

[...]

Kris Venstermans¹, Lieven Eeckhout¹, Koen De Bosschere¹•Institutions (1)

Ghent University¹

01 Jan 2006-Software - Practice and Experience

TL;DR: It is observed that 64-bit computing typically results in a significantly larger number of data cache misses at all levels of the memory hierarchy, and that when a sufficiently large heap is available, the IBM JDK 1.4.0 VM is 1.7p slower on average in 64- bit mode than in 32-bit mode.

...read moreread less

Abstract: The Java language is popular because of its platform independence, making it useful in a lot of technologies ranging from embedded devices to high-performance systems. The platform-independent property of Java, which is visible at the Java bytecode level, is only made possible thanks to the availability of a Virtual Machine (VM), which needs to be designed specifically for each underlying hardware platform. More specifically, the same Java bytecode should run properly on a 32-bit or a 64-bit VM. In this paper, we compare the behavioral characteristics of 32-bit and 64-bit VMs using a large set of Java benchmarks. This is done using the Jikes Research VM as well as the IBM JDK 1.4.0 production VM on a PowerPC-based IBM machine. By running the PowerPC machine in both 32-bit and 64-bit mode we are able to compare 32-bit and 64-bit VMs. We conclude that the space an object takes in the heap in 64-bit mode is 39.3p larger on average than in 32-bit mode. We identify three reasons for this: (i) the larger pointer size, (ii) the increased header and (iii) the increased alignment. The minimally required heap size is 51.1p larger on average in 64-bit than in 32-bit mode. From our experimental setup using hardware performance monitors, we observe that 64-bit computing typically results in a significantly larger number of data cache misses at all levels of the memory hierarchy. In addition, we observe that when a sufficiently large heap is available, the IBM JDK 1.4.0 VM is 1.7p slower on average in 64-bit mode than in 32-bit mode. Copyright © 2005 John Wiley & Sons, Ltd.

...read moreread less