scispace - formally typeset
Search or ask a question

Showing papers on "PowerPC published in 2006"


Journal ArticleDOI
TL;DR: This paper reports on the development and formal certification of a compiler from Cminor (a C-like imperative language) to PowerPC assembly code, using the Coq proo...
Abstract: This paper reports on the development and formal certification (proof of semantic preservation) of a compiler from Cminor (a C-like imperative language) to PowerPC assembly code, using the Coq proo...

100 citations


Proceedings ArticleDOI
27 Feb 2006
TL;DR: It is shown that automated parallelization can achieve an 7.60 speedup for a 16-processor CMP model on a conventional 4-processor shared-memory multiprocessor, and the power of hardware integration by integrating eight hardware PowerPC cores into a C MP model, achieving a speedup of up to 5.82.
Abstract: Simulation is an important means of evaluating new microarchitectures. Current trends toward chip multiprocessors (CMPs) try the ability of designers to develop efficient simulators. CMP simulation speed can be improved by exploiting parallelism in the CMP simulation model. This may be done by either running the simulation on multiple processors or by integrating multiple processors into the simulation to replace simulated processors. Doing so usually requires tedious manual parallelization or re-design to encapsulate processors. This paper presents techniques to perform automated simulator parallelization and hardware integration for CMP structural models. We show that automated parallelization can achieve an 7.60 speedup for a 16-processor CMP model on a conventional 4-processor shared-memory multiprocessor. We demonstrate the power of hardware integration by integrating eight hardware PowerPC cores into a CMP model, achieving a speedup of up to 5.82.

75 citations


Proceedings ArticleDOI
01 Sep 2006
TL;DR: An OSEK/VDX operating system implementation is proposed in the context of the open source software, which interest needs not to be demonstrated any more.
Abstract: This paper introduces an OSEK/VDX1 Operating System implementation. OSEK/VDX is an industry standard for real-time operating system used in the field of automotive embedded software. This implementation is proposed in the context of the open source software, which interest needs not to be demonstrated any more. The paper explains the main implementation choices as well as the technique proposed for the generation of a real-time application. This implementation is nowadays available for three targets: Infineon C167, Darwin/PowerPC and Linux/x86.

61 citations


Proceedings ArticleDOI
01 Jan 2006
TL;DR: This paper shows that the pseudo LRU (PLRU) cache replacement policy can cause unbounded effects on the WCET, which is widely used in embedded systems, and some x86 models.
Abstract: Domino effects have been shown to hinder a tight prediction of worst case execution times (WCET) on real-time hardware. First investigated by Lundqvist and StenstrAƒÂ¶m, domino effects caused by pipeline stalls were shows to exist in the PowerPC by Schneider. This paper extends the list of causes of domino effects by showing that the pseudo LRU (PLRU) cache replacement policy can cause unbounded effects on the WCET. PLRU is used in the PowerPC PPC755, which is widely used in embedded systems, and some x86 models.

58 citations


Proceedings ArticleDOI
08 Jul 2006
TL;DR: This paper presents how to implement efficient condition encoding and fast rule matching strategies using vector instructions and elaborates on Altivec and SSE2 instruction sets producing speedups of XCS matching process beyond ninety times.
Abstract: Over the last ten years XCS has become the standard for Michigan-style learning classifier systems (LCS). Since the initial CS-1 work conceived by Holland, classifiers (rules) have widely used a ternary condition alphabet {0,1,#} for binary input problems. Most of the freely available implementations of this ternary alphabet in XCS rely on character-based encodings---easy to implement, not memory efficient, and expensive to compute. Profiling of freely available XCS implementations shows that most of their execution time is spent determining whether a rule is match or not, posing a serious threat to XCS scalability. In the last decade, multimedia and scientific applications have pushed CPU manufactures to include native support for vector instruction sets. This paper presents how to implement efficient condition encoding and fast rule matching strategies using vector instructions. The paper elaborates on Altivec (PowerPC G4, G5) and SSE2 (Intel P4/Xeon and AMD Opteron) instruction sets producing speedups of XCS matching process beyond ninety times. Moreover, such a vectorized matching code will allow to easily scale beyond tens of thousands of conditions in a reasonable time. The proposed fast matching scheme also fits in any other LCS other than XCS.

34 citations


Proceedings ArticleDOI
01 Oct 2006
TL;DR: The framework, developed for the Rice University Wireless Open-Access Research Platform (WARP), allows to interface a large class of medium access protocols with custom physical layer implementations, thereby providing a flexible and high-performance research tool.
Abstract: In this paper, we present a framework for Medium Access Control (MAC) protocol development and performance evaluation. The framework, developed for the Rice University Wireless Open-Access Research Platform (WARP), allows us to interface a large class of medium access protocols with custom physical layer (PHY) implementations, thereby providing a flexible and high-performance research tool. MAC protocols for our framework are written in C and targeted to embedded PowerPC cores within the Xilinx Virtex II-Pro class of FPGAs. A key innovation is a flexible interface between the PHY and the MAC capable of exposing user-defined parameters to either layer, thus enabling cross-layer research.

32 citations


Proceedings ArticleDOI
16 Sep 2006
TL;DR: An integrated optimization framework is developed that addresses a number of issues, including tiling for the memory hierarchy, effective handling of memory misalignment, utilizing memory subsystem characteristics, and the exploitation of the parallelism provided by the vector instruction sets in current processors.
Abstract: Matrix transposition is an important kernel used in many applications Even though its optimization has been the subject of many studies, an optimization procedure that targets the characteristics of current processor architectures has not been developed In this paper, we develop an integrated optimization framework that addresses a number of issues, including tiling for the memory hierarchy, effective handling of memory misalignment, utilizing memory subsystem characteristics, and the exploitation of the parallelism provided by the vector instruction sets in current processors A judicious combination of analytical and empirical approaches is used to determine the most appropriate optimizations The absence of problem information until execution time is handled by generating multiple versions of the code - the best version is chosen at runtime, with assistance from minimal-overhead inspectors The approach highlights aspects of empirical optimization that are important for similar computations with little temporal reuse Experimental results on PowerPC G5 and Intel Pentium 4 demonstrate the effectiveness of the developed framework

32 citations


Book
30 Nov 2006
TL;DR: This chapter discusses the design philosophy of the Intel Pentium, as well as the architecture of the PowerPC processors used in the generation of the Pentium and Pentium Pro.
Abstract: Chapter 1: Basic Computing Concepts Chapter 2: The Mechanics of Program Execution Chapter 3: Pipelined Execution Chapter 4: Superscalar Execution Chapter 5: The Intel Pentium and Pentium Pro Chapter 6: PowerPC Processors: 600 Series, 700 Series, and 7400 Chapter 7: Intel's Pentium 4 vs. Motorola's G4e: Approaches and Design Philosophies Chapter 8: Intel's Pentium 4 vs. Motorola's G4e: The Back End Chapter 9: 64-Bit Computing and x86-64 Chapter 10: The G5: IBM's PowerPC 970 Chapter 11: Understanding Caching and Performance Chapter 12: Intel's Pentium M, Core Duo, and Core 2 Duo

31 citations


Journal ArticleDOI
TL;DR: A modular development framework that can be adapted for different systems by simply changing the software or firmware parts, based on the demands of the system to be developed.
Abstract: System development in advanced FPGAs allows considerable flexibility, both during development and in production use. A mixed firmware/software solution allows the developer to choose what shall be done in firmware or software, and to make that decision late in the process. However, this flexibility comes at the cost of increased complexity. We have designed a modular development framework to help to overcome these issues of increased complexity. This framework comprises a generic controller that can be adapted for different systems by simply changing the software or firmware parts. The controller can use both soft and hard processors, with or without an RTOS, based on the demands of the system to be developed. The resulting system uses the Internet for both control and data acquisition. In our studies we developed the embedded system in a Xilinx Virtex-II Pro FPGA, where we used both PowerPC and MicroBlaze cores, http, Java, and LabView for control and communication, together with the MicroC/OS-II and OSE operating systems

24 citations


Proceedings ArticleDOI
25 Apr 2006
TL;DR: The hardware architecture of DynaCORE, a dynamically reconfigurable system-on-chip for network applications, and on-chip communication issues are presented, including the integration of PowerPC processor cores into the configurable logic as well as the mode of operation of the network-On-chip.
Abstract: This paper presents the hardware architecture of DynaCORE, a dynamically reconfigurable system-on-chip for network applications. DynaCORE is an application specific coprocessor for offloading computationally intensive tasks from a network processor. The system-on-chip architecture is based on an adaptable network-on-chip which allows the dynamic replacement of hardware modules as well as the adaptation of the on-chip communication structure. The coprocessor leverages the active partial reconfiguration feature of modern FPGAs in order to adapt to shifting demand patterns. An embedded general-purpose processor core within the coprocessor runs software which manages the configurations of the device. With reference to a prototypical implementation targeting a Xilinx Virtex-II Pro FPGA, this paper focuses on on-chip communication issues. Topics include the integration of PowerPC processor cores into the configurable logic as well as the mode of operation of the network-on-chip.

23 citations


Proceedings ArticleDOI
19 Mar 2006
TL;DR: This work synthesizes representative PowerPC versions of the SPEC2000, STREAM, TPC-C and Java benchmarks, compile and execute them, and obtains an average IPC within 2.4% of the averageIPC of the original benchmarks and with many similar average workload characteristics.
Abstract: The latest high-performance IBM PowerPC microprocessor, the POWERS chip, poses challenges for performance model validation The current state-of-the-art is to use simple hand-coded bandwidth and latency testcases, but these are not comprehensive for processors as complex as the POWER5 chip Applications and benchmark suites such as SPEC CPU are difficult to set up or take too long to execute on functional models or even on detailed performance models We present an automatic testcase synthesis methodology to address these concerns By basing testcase synthesis on the workload characteristics of an application, source code is created that largely represents the performance of the application, but which executes in a fraction of the runtime We synthesize representative PowerPC versions of the SPEC2000, STREAM, TPC-C and Java benchmarks, compile and execute them, and obtain an average IPC within 24% of the average IPC of the original benchmarks and with many similar average workload characteristics The synthetic testcases often execute two orders of magnitude faster than the original applications, typically in less than 300K instructions, making performance model validation for today's complex processors feasible


Proceedings ArticleDOI
Hendrik F. Hamann1, Alan J. Weger1, James A. Lacey1, Erwin B. Cohen, C. Atherton 
18 Sep 2006
TL;DR: Spatially-resolved imaging of microprocessor power (SIMP) is shown to be a critical tool for measuring temperature and power distributions of a microprocessor under full operating conditions.
Abstract: Spatially-resolved imaging of microprocessor power (SIMP) is shown to be a critical tool for measuring temperature and power distributions of a microprocessor under full operating conditions. In this paper, the SIMP technique is applied to the dual-core PowerPCtrade 970MP microprocessor

Proceedings ArticleDOI
21 May 2006
TL;DR: A new fault-injection approach for evaluating the impact of transient faults in SoPCs is presented and a case study consisting of a Web server implemented on a Xilinx Virtex-II FPGA embedding a PowerPC 405 and running the whole TCP/IP stack is reported.
Abstract: Systems-on-Programmable-Chip (SoPCs) include processors, memories and programmable logic that allow to catch multiple application requirements such as high performance, reconfigurability and low-costs. Due to these characteristics, they are also becoming very attractive for safety-critical applications. However, the issue of assessing the reliability they can provide and debugging the possible safety-related mechanisms they embed is still open. In this paper, we present a new fault-injection approach for evaluating the impact of transient faults in SoPCs. Fault-injection experiments are reported on a case study consisting of a web server implemented on a Xilinx Virtex-II FPGA embedding a PowerPC 405 and running the whole TCP/IP stack.

Proceedings ArticleDOI
30 Apr 2006
TL;DR: This work presents a method and architecture for compressing the so-called Look-up Tables that are necessary for the de-compression process, and introduces a novel and very efficient hardware-supported approach based on Canonical Huffman Coding.
Abstract: The presented work uses code compression to improve the design efficiency of an embedded system. In particular, we present a method and architecture for compressing the so-called Look-up Tables that are necessary for the de-compression process. No other work has yet focused on minimizing the Look-up Tables that, as we show, have a significant impact on the total overhead of a hardware-based decompression scheme. We introduce a novel and very efficient hardware-supported approach based on Canonical Huffman Coding. Using the Lin-Kernighan algorithm we reduce the Look-up Table size by up to 45%. As a result, we achieve all-over compression ratios as low as 45% (already including the overhead of the Look-up Tables). Thereby, our scheme is entirely orthogonal to approaches that take particularities of a certain instruction set architecture into account, meaning that compression could be further improved. Factoring in the orthogonality, our scheme is the basis for not-yet-achieved efficiency in hardware-supported compression schemes. We have conducted evaluations using a representative set (in terms of size and application domain) of applications and have applied it to three major embedded processor architectures, namely ARM, MIPS and PowerPC. The hardware evaluation shows no performance penalty.

Proceedings ArticleDOI
07 Jun 2006
TL;DR: The topic of the paper is focused on the design and implementation of the radio front-end part for experimental GNSS software receiver developed at the Department of Radio Engineering of the Czech Technical University in Prague.
Abstract: The topic of the paper is focused on the design and implementation of the radio front-end part for experimental GNSS software receiver developed at the Department of Radio Engineering of the Czech Technical University in Prague. The receiver is designed for the processing of signals of present and future global navigation satellite systems, including GPS, GLONASS and Galileo. For the biggest possible versatility, the modular architecture and software defined radio (SDR) concept were chosen. The front-end unit consists of three independent channels with the bandwidth of 24 MHz each that use a single conversion super-heterodyne concept with intermediate frequency 140 MHz. The front-end provides down converted analogue signal to DSP unit represented by FPGA device with two embedded PowerPC cores. The paper also provides comparison of the front-end of experimental receiver with lot manufacture case.

Journal ArticleDOI
TL;DR: A hybrid approach is proposed, which combines ideas from previous techniques based on software transformations with the introduction of an Infrastructure IP with reduced memory and performance overheads, to harden system based on the PowerPC 405 core available in Virtex-II Pro FPGAs.
Abstract: Hardening processor-based systems against transient faults requires new techniques able to combine high fault detection capabilities with the usual design requirements, e.g., reduced design-time, low area overhead, reduced (or null) accessibility to processor internal hardware. This paper proposes the adoption of a hybrid approach, which combines ideas from previous techniques based on software transformations with the introduction of an Infrastructure IP with reduced memory and performance overheads, to harden system based on the PowerPC 405 core available in Virtex-II Pro FPGAs. The proposed approach targets faults affecting the memory elements storing both the code and the data, independently of their location (inside or outside the processor). Extensive experimental results including comparisons with previous approaches are reported, which allow practically evaluating the characteristics of the method in terms of fault detection capabilities and area, memory and performance overheads

Proceedings ArticleDOI
01 Aug 2006
TL;DR: The post place-and-route synthesis results indicate that the global architecture is able to process 132 million of samples per second, allowing its use in H.264/AVC coders and decoders for HDTV.
Abstract: This paper presents the design, the validation and the prototyping of a H.264/AVC inverse transform and quantization architecture. This architecture was designed to reach high throughputs and to be easily integrated with other H.264/AVC modules. The architecture was completely described in VHDL and the VHDL code was behaviorally and post place-and-route validated through simulations, comparing the data generated by the architecture with the data extracted from the H.264/AVC reference software. Finally, the architecture was prototyped using a Digilent XUP V2P board that contains a Virtex-II Pro VP30 Xilinx FPGA. The architecture mapped to the target FPGA was stimulated in the prototyping board using a PowerPC processor that is hardwired in that FPGA. The prototype was validated and the results show that the designed architecture was working in accordance with the H.264/AVC standard. The post place-and-route synthesis results indicate that the global architecture is able to process 132 million of samples per second, allowing its use in H.264/AVC coders and decoders for HDTV.

Proceedings ArticleDOI
03 May 2006
TL;DR: The VICTORIA PowerPC architecture is described, which is based on the iVMX accelerator technology, which extends the existing VMX architecture with indirect register addressing and opens the door for highly optimized vector algorithms that can sustain very high processing rates.
Abstract: There is increasing interest in the use of accelerators in computer systems. Accelerators are processor-attached hardware units that can perform certain functions faster than the conventional general purpose processor. In this paper, we describe the VICTORIA PowerPC architecture, which is based on the iVMX accelerator technology. The iVMX accelerator extends the existing VMX architecture with indirect register addressing. That approach greatly extends the architected space of registers and opens the door for highly optimized vector algorithms that can sustain very high processing rates. The large space of registers is directly controlled by the executing code and offers a sufficiently large storage to hold sizeable intermediate results. This helps reduce the negative effects of limited memory bandwidth and high memory latency. The iVMX accelerator is an example of in-line accelerator; that is, the instructions that drive the accelerator are part of the same stream that drives the main processor. Compared to off-line accelerators, which execute their own instruction stream, in-line accelerators present a much more convenient programming model.

Proceedings ArticleDOI
06 Mar 2006
TL;DR: The goal is to methodically simplify MPSoC design by systematic HW/SW interface abstraction, thus enabling early SW verification, rapid prototyping and fast exploration of critical design issues.
Abstract: Our concept of a virtual transaction layer (VTL) architecture allows to directly map transaction-level communication channels onto a synthesizable multiprocessor SoC implementation. The VTL is above the physical MPSoC communication architecture, acting as a hardware abstraction layer for both HW and SW components. TLM channels are represented by virtual channels which efficiently route transactions between SW and HW entities through the on-chip communication network with respect to quality-of-service and realtime requirements. The goal is to methodically simplify MPSoC design by systematic HW/SW interface abstraction, thus enabling early SW verification, rapid prototyping and fast exploration of critical design issues. With TRAIN, we present our implementation of such a VTL architecture for Virtex-II Pro and PowerPC and illustrate its efficiency by experimentation

Proceedings ArticleDOI
04 Oct 2006
TL;DR: The paper presents an idea, design and realization of a gigabit, optoelectronic synchronous massive data concentrator for the LLRF control system for FLASH and XFEL superconducting accelerators and lasers.
Abstract: The paper presents an idea, design and realization of a gigabit, optoelectronic synchronous massive data concentrator for the LLRF control system for FLASH and XFEL superconducting accelerators and lasers. The design bases on a central, large, programmable FPGA VirtexIIPro circuit by Xilinx and on eight commercial optoelectronic transceivers. There were implemented peripheral devices for embedded PowerPC block like: memory and Ethernet. The SIMCON 4.0 module was realized as a single, standard EURO-6HE board with VXI/VME-bus. Hardware implementation was described for the most important functional blocks. Construction solutions were presented.

Proceedings ArticleDOI
16 Oct 2006
TL;DR: Results and issues faced during the development of SBST approach targeting a Motorola PowerPC 603 core are presented and a general framework, easily reusable on other microprocessor cores is developed.
Abstract: Software-based self-test (SBST) in embedded microprocessor cores testing allows lowering test costs without loosing fault detection capabilities. Particularly in critical environments, SBST is executed during the system operating life in order to guarantee its availability and quality of service. If the test routines can be executed online but not-concurrently, then both the hardware and software overheads are negligible. This paper presents results and issues faced during the development of SBST approach targeting a Motorola PowerPC 603 core. The test, constrained by tight timing and coverage requirements, required the development of a general framework, easily reusable on other microprocessor cores

Proceedings ArticleDOI
24 Jul 2006
TL;DR: The evolution of the EMC within the Power PCI Bridge, development of its support tools and some of its applications are described as both a capable assistant to the RAD750 as well as a standalone processing element.
Abstract: Originally conceived for fault tolerance control of its associated general purpose processor, the embedded microcontroller (EMC) present in BAE Systems Power PCI Bridge application specific integrated circuit (ASIC) has evolved into a processing workhorse finding applications spanning memory controllers, I/O processors as well as continuing to support the RAD750/spl reg/ PowerPC/spl reg/ processor. Development tools have also evolved from a simple assembler to a full development environment including compiler and simulator integrated with the PowerPC tools supporting the RAD750. This paper describes the evolution of the EMC within the Power PCI Bridge, development of its support tools and some of its applications as both a capable assistant to the RAD750 as well as a standalone processing element. Power and performance improvements are highlighted. Comparison to other processor cores that might be used in space is also shown. Discussion of future enhancements will also be mentioned.

Proceedings ArticleDOI
22 Feb 2006
TL;DR: In this paper, the authors investigate the best interfaces for different data including instructions, stack, heap and user data, and demonstrate that the performance of the SDR application can be increased by as much as 60 percent just by choosing the interfaces that are most appropriate for the different types of data in the implementation.
Abstract: FPGA manufacturers have recently embedded hard core microprocessors in FPGA fabric to improve the processing capabilities of their architectures. We present a study of using the Xilinx Virtex family's embedded PowerPC405 processor. We use a Software Defined Radio (SDR) application as a vehicle for investigating effective communications between the PowerPC405 Processor and the surrounding FPGA fabric. A challenging aspect of developing applications that target the PowerPC is the interfacing of the processor with the surrounding reconfigurable logic. We have implemented a dozen different versions of a Software Defined Radio (SDR) application to exercise the various interfaces that enable communication between the processor and the surrounding FPGA fabric. The implementations differ only in the interfaces used. Our study investigates the use of the On Chip Memory (OCM) interface, the Processor Local Bus (PLB) and the On-chip Processor Bus (OPB).We investigate the best interfaces for different data including instructions, stack, heap and user data. Our results indicate that the performance of the SDR application can be increased by as much as 60 percent just by choosing the interfaces that are most appropriate for the different types of data in the implementation. This demonstrates that the performance of FPGA applications that use the embedded processor are dramatically effected by the mechanisms chosen to enable communication between the processor and its surrounding resources.

Proceedings ArticleDOI
Yaping Zhou1, S.H. Dhong1, Brian Flachs1, Paul M. Harvey1, Brad W. Michael1 
01 Oct 2006
TL;DR: In this article, a method to measure the impedance Z(f) of a chip/package/board power supply system using pseudo-impulse current is described, which can be easily applied to the digital systems with synchronous clocking systems.
Abstract: A method to measure the impedance Z(f) of a chip/package/board power supply system using pseudo-impulse current is described. This method can be easily applied to the digital systems with synchronous clocking systems. A PowerPC based microprocessor power supply system is used as an example to show the effectiveness of the method.

01 Jan 2006
TL;DR: The possibility of using FAUST (a programming language for function based block oriented programming) to create a fast audio processor in a single chip FPGA environment and a proof-of-concept implementation using a simple two pole IIR filter is shown.
Abstract: In this paper we show the possibility of using FAUST (a programming language for function based block oriented programming) to create a fast audio processor in a single chip FPGA environment. The produced VHDL code is embedded in the on-chip processor system and utilizes the FPGA fabric for parallel processing. For the purpose of implementing and testing the code a complete System-On-Chip framework has been created. We use a Digilent board with a XILINX Virtex 2 Pro FPGA. The chip has a PowerPC 405 core and the framework uses the on chip peripheral bus to interface the core. The content of this paper presents a proof-of-concept implementation using a simple two pole IIR filter. The produced code is working, although more work has to be done for implementing complex arithmetic operations support.

Proceedings ArticleDOI
14 May 2006
TL;DR: This paper introduces Distributed Inter-Process Communication (DIPC), a heterogeneous distributed programming system that hides inside Linux's kernel, traps access requests to System V IPC mechanisms and delegates them for execution on other computers as needed.
Abstract: Developing parallel and distributed programs is usually considered a hard task. One has to have a good understanding of the problem domain, as well as the target hardware, and map the problem to the available hardware resources. The resulting program is often hard to port to another system. The development and maintenance process may thus be costly and time-consuming. In this paper we propose giving priority to hiding the details of distributed programming behind normal data sharing and synchronisation mechanisms. The programs are thus written as if they are meant to be run on a single, parallel computer. The developers still have to divide the task and make sure that the problem is solved in parallel, but the details of data transfer and synchronisation over a network are hidden. The resulting programs can thus be developed on nondistributed, non-parallel environments, and then be run on a variety of distributed and/or parallel platforms. Though such programs may not be as optimised as programs written specifically for a distributed computer, the speedups in programming time and costs may offset the losses. In this paper we introduce Distributed Inter-Process Communication (DIPC), a heterogeneous distributed programming system that hides inside Linux’s kernel, traps access requests to System V IPC mechanisms (messages, semaphores and shared memories) and delegates them for execution on other computers as needed. The results are then handed back through the kernel to the calling process, which is unaware of any distributed activity. DIPC currently supports Intel, PowerPC, ALPHA, MIPS, SPARC and Motorola 68k processors.

Proceedings ArticleDOI
11 Sep 2006
TL;DR: Three new defuzzification methods are introduced which are suitable for efficient software and also hardware implementations and prove the superiority of these new methods for different software implementation approaches.
Abstract: This paper discusses software implementation issues of different defuzzification procedures in fuzzy systems. Three new defuzzification methods are introduced which are suitable for efficient software and also hardware implementations. A set of seven important existing defuzzification methods are reviewed and compared with these new methods for different software implementation approaches. The C models of all methods are prepared to perform a comprehensive analysis on the output accuracy of different methods. The results prove the superiority of our new proposed methods. In another study, three categories of low-level assembly models are developed for each method to evaluate its software execution time and instruction count when executed on each of three chosen popular processors. Namely, Texas Instruments C6xcopy DSP, Intel's Pentiumcopy IV, and IBM's PowerPC PPC405copy processors are used as the running engines for this comparison. Some accuracy-speed analysis diagrams are then introduced to guide the designers for choosing the defuzzification method which best suites their application requirements.


Journal ArticleDOI
TL;DR: It is observed that 64-bit computing typically results in a significantly larger number of data cache misses at all levels of the memory hierarchy, and that when a sufficiently large heap is available, the IBM JDK 1.4.0 VM is 1.7p slower on average in 64- bit mode than in 32-bit mode.
Abstract: The Java language is popular because of its platform independence, making it useful in a lot of technologies ranging from embedded devices to high-performance systems. The platform-independent property of Java, which is visible at the Java bytecode level, is only made possible thanks to the availability of a Virtual Machine (VM), which needs to be designed specifically for each underlying hardware platform. More specifically, the same Java bytecode should run properly on a 32-bit or a 64-bit VM. In this paper, we compare the behavioral characteristics of 32-bit and 64-bit VMs using a large set of Java benchmarks. This is done using the Jikes Research VM as well as the IBM JDK 1.4.0 production VM on a PowerPC-based IBM machine. By running the PowerPC machine in both 32-bit and 64-bit mode we are able to compare 32-bit and 64-bit VMs. We conclude that the space an object takes in the heap in 64-bit mode is 39.3p larger on average than in 32-bit mode. We identify three reasons for this: (i) the larger pointer size, (ii) the increased header and (iii) the increased alignment. The minimally required heap size is 51.1p larger on average in 64-bit than in 32-bit mode. From our experimental setup using hardware performance monitors, we observe that 64-bit computing typically results in a significantly larger number of data cache misses at all levels of the memory hierarchy. In addition, we observe that when a sufficiently large heap is available, the IBM JDK 1.4.0 VM is 1.7p slower on average in 64-bit mode than in 32-bit mode. Copyright © 2005 John Wiley & Sons, Ltd.