Showing papers on "PowerPC published in 2012"

PDF

Open Access

Journal Article•

Hardware Accelerator of Cartesian Genetic Programming with Multiple Fitness Units

[...]

Zdeněk Vašíček, Lukas Sekanina¹•Institutions (1)

26 Jan 2012-Computing and Informatics \/ Computers and Artificial Intelligence

TL;DR: In the benchmark problem (image filter evolution) the proposed platform provides a significant speedup in comparison with a highly optimized software implementation and is 8 times faster than previous FPGA accelerators of image filter evolution.

...read moreread less

Abstract: A new accelerator of Cartesian genetic programming is presented in this paper. The accelerator is completely implemented in a single FPGA. The proposed architecture contains multiple instances of virtual reconfigurable circuit to evaluate several candidate solutions in parallel. An advanced memory organization was developed to achieve the maximum throughput of processing. The search algorithm is implemented using the on-chip PowerPC processor. In the benchmark problem (image filter evolution) the proposed platform provides a significant speedup (170) in comparison with a highly optimized software implementation. Moreover, the accelerator is 8 times faster than previous FPGA accelerators of image filter evolution.

...read moreread less

25 citations

Journal Article•DOI•

A fault injection analysis of Linux operating on an FPGA-embedded platform

[...]

Joshua S. Monson¹, Michael Wirthlin¹, Brad Hutchings¹•Institutions (1)

Brigham Young University¹

01 Jan 2012

TL;DR: An FPGA-based Linux test-bed was constructed for the purpose of measuring its sensitivity to single-event upsets and it was found that the most sensitive user module in the design was the PowerPC's direct connections to the DDR2 memory controller.

...read moreread less

Abstract: An FPGA-based Linux test-bed was constructed for the purpose of measuring its sensitivity to single-event upsets. The test-bed consists of two ML410 Xilinx development boards connected using a 124-pin custom connector board. The Design Under Test (DUT) consists of the "hard core" PowerPC, running the Linux OS and several peripherals implemented in "soft" (programmable) logic. Faults were injected via the Internal Configuration Access Port (ICAP). The experiments performed here demonstrate that the Linux-based system was sensitive to 199,584 or about 1.4 percent of all tested bits. Each sensitive bit in the bit-stream is mapped to the resource and user-module to which it configures. A density metric for comparing the reliability of modules within the system is presented. Using this density metric, we found that the most sensitive user module in the design was the PowerPC's direct connections to the DDR2 memory controller.

...read moreread less

23 citations

Proceedings Article•DOI•

Blue Gene/Q: design for sustained multi-petaflop computing

[...]

Michael K. Gschwind¹•Institutions (1)

IBM¹

25 Jun 2012

TL;DR: The Blue Gene/Q system represents the third generation of optimized high-performance computing Blue Gene solution servers and provides a platform for continued growth in HPC performance and capability and gives application developers a platform to develop and deploy sustained petascale computing applications.

...read moreread less

Abstract: The Blue Gene/Q system represents the third generation of optimized high-performance computing Blue Gene solution servers and provides a platform for continued growth in HPC performance and capability. Blue Gene/Q started with a new design of the hardware platform, while retaining and significantly expanding an established, trusted and successful software environment.To deliver a system that enables users to fully exploit the promise of high-performance computing for both traditional HPC applications and new commercial application areas, the Blue Gene/Q system architecture combines hardware and software innovations to overcome traditional bottlenecks, most famously the memory and power walls which have become emblematic of modern computing systems. At the same time, to deliver a platform for sustainable petascale computing, and beyond to exascale, we had to address a new set of "walls" with the many innovations described below: a scalability wall, a communication wall, and a reliability wall.The new Blue Gene/Q system increases overall system performance with a new node architecture: Each node offers more thread-level-parallelism with a coherent SMP node consisting of eighteen 64-bit PowerPC cores with 4-way simultaneous multithreading. Each core provides for better exploitation of data-level parallelism with a new 4-way quad-vector processing unit (QPU). The memory subsystem integrates memory speculation support which can be used to implement both Transactional Memory and Speculative Execution programming models.The compute nodes are connected in a five dimensional torus configuration using 10 point-to-point links, and a total network bandwidth of 44 GB/s per node. The on-chip messaging unit provides an optimized interface between the network routing logic and the memory subsystem, with enough bandwidth to keep all the links busy. It also offloads communication protocol processing by implementing collective broadcast and reduction operations, including integer and floating point sum, min and max.Built on the Blue Gene hardware design is an efficient software stack that builds on several generations of Blue Gene software interfaces, while extending these capabilities and adding new functions to support new hardware capabilities. The hardware functions were designed with a focus on providing efficient primitives upon which to build the rich software environment.To ensure reliable operation of a petascale system, reliability has to be a pervasive design consideration. At the architecture level, new QPX store-and-indicate instructions support the detection of programming errors. To ensure reliable operation in the presence of transient faults, we conducted exhaustive single event upset simulations based on fault injection into the simulated design. The operating system was structured to use firmware in a small on-chip boot eDRAM to avoid silent system hangs.Together, the hardware and software innovations pioneered in Blue Gene/Q give application developers a platform and framework to develop and deploy sustained petascale computing applications. These petascale applications will allow its users to make new scientific discoveries and gain new business insights, which will be the true measure of the success of the new Blue Gene/Q systems.

...read moreread less

12 citations

Proceedings Article•DOI•

Osprey: Operating system for predictable clouds

[...]

Jan Sacha¹, Jeff Napper¹, Sape J. Mullender¹, Jim Mckie¹•Institutions (1)

Bell Labs¹

25 Jun 2012

TL;DR: This paper describes an alternative approach to cloud computing where all user applications on top of a single cloud operating system called Osprey, which allows dependable, predictable, and real-time computing by consistently managing all system resources and exporting relevant information to the applications.

...read moreread less

Abstract: Cloud computing is currently based on hardware virtualization wherein a host operating system provides a virtual machine interface nearly identical to that of physical hardware to guest operating systems. Full transparency allows backward compatibility with legacy software but introduces unpredictability at the guest operating system (OS) level. The time perceived by the guest OS is non-linear. As a consequence, it is difficult to run real-time or latency-sensitive applications in the cloud. In this paper we describe an alternative approach to cloud computing where we run all user applications on top of a single cloud operating system called Osprey. Osprey allows dependable, predictable, and real-time computing by consistently managing all system resources and exporting relevant information to the applications. Osprey ensures compatibility with legacy software through OS emulation provided by libraries and by porting runtime environments. Osprey's resource containers fully specify constraints between applications to enforce full application isolation for real-time execution guarantees. Osprey pushes much of the state out of the kernel into user applications for several benefits: full application accounting, mobility support, and efficient networking. Using a kernel-based packet filter, Osprey dispatches incoming packets to the user application as quickly as possible, eliminating the kernel from the critical path. A real-time scheduler then decides on the priority and order in which applications process their incoming packets while maintaining the limits set forth in the resource container. We have implemented a mostly complete Osprey prototype for the x86 architecture and we plan to port it to ARM and PowerPC and to develop a Linux library OS.

...read moreread less

11 citations

Proceedings Article•DOI•

Applying Radiation Hardening by Software to Fast Lossless compression prediction on FPGAs

[...]

Andrew G. Schmidt¹, John Paul Walters¹, Kenneth M. Zick¹, Matthew French¹, Didier Keymeulen², Nazeeh Aranki², Matthew Klimesh², Aaron Kiely² - Show less +4 more•Institutions (2)

University of Southern California¹, California Institute of Technology²

03 Mar 2012

TL;DR: This work presents a novel investigation into the capability of using FPGAs integrated with embedded PowerPC processors to adequately perform the predictor function of the Fast Lossless (FL) compression algorithm for multispectral and hyperspectral imagery.

...read moreread less

Abstract: As scientists endeavor to learn more about the world's ecosystems, engineers are pushed to develop more sophisticated instruments. With these advancements comes an increase in the amount of data generated. For satellite based instruments the additional data requires sufficient bandwidth be available to transmit the data. Alternatively, compression algorithms can be employed to reduce the bandwidth requirements. This work is motivated by the proposed HyspIRI mission, which includes two imaging spectrometers measuring from visible to short wave infrared (VSWIR) and thermal infrared (TIR) that saturate the projected bandwidth allocations. We present a novel investigation into the capability of using FPGAs integrated with embedded PowerPC processors to adequately perform the predictor function of the Fast Lossless (FL) compression algorithm for multispectral and hyperspectral imagery. Furthermore, our design includes a multi-PowerPC implementation which incorporates recently developed Radiation Hardening by Software (RHBSW) techniques to provide software-based fault tolerance to commercial FPGA devices. Our results show low performance overhead (4–8%) while achieving a speedup of 1.97× when utilizing both PowerPCs. Finally, the evaluation of the proposed system includes resource utilization, performance metrics, and an analysis of the vulnerability to Single Event Upsets (SEU) through the use of a hardware based fault injector.

...read moreread less

9 citations

Proceedings Article•DOI•

A customizable and ARINC 653 quasi-compliant hypervisor

[...]

Adriano Tavares¹, Adriano Carvalho¹, P. Rodrigues¹, Paulo A. Garcia¹, Tiago Gomes¹, Jorge Cabral¹, Paulo Cardoso¹, S. Montenegro¹, Mongkol Ekpanyapong¹ - Show less +5 more•Institutions (1)

Asian Institute of Technology¹

19 Mar 2012

TL;DR: A novel hypervisor developed for aerospace applications using an object oriented approach that embodies time and space partitioning on a PowerPC core embedded in a FPGA to implement dependable computing and targeting simplicity is presented.

...read moreread less

Abstract: This paper presents a novel hypervisor, developed for aerospace applications using an object oriented approach that embodies time and space partitioning (TSP) on a PowerPC (PPC) core embedded in a FPGA, for the NetworkCentric core avionics [1] - an architecture of cooperating components and managed by a real-time operating system, to implement dependable computing and targeting simplicity. To support Integrated Modular Architecture (IMA) [2] partitioned software architectures, the proposed hypervisor adapted to the aerospace application domain the Popek and Goldberg's [3] fidelity, efficiency and resource control virtualization requirements, and extends them with additional ones like timing determinism, reactivity and improved dependability. A distinctive feature of this hypervisor is its I/O device virtualization approach that guarantees real-time performance and small trusted computing base. The object oriented approach will be particularly useful to customize key components of the hypervisor (with different granularity levels) such as partition scheduling and the communications manager using generative programming techniques (Aspect Oriented Programming (AOP) and template meta-programming).

...read moreread less

9 citations

Journal Article•DOI•

A PowerPC-based control system for the Read-Out-Driver module of the ATLAS IBL

[...]

G. Balbi, Giovanna Bruni, Marco Bruschi, I. D'Antone, Jens Dopke¹, Davide Falchieri², Tobias Flick, Alessandro Gabrielli², Joern Grosse-Knetter, Timon Heim, J. M. Joseph, N. Krieger, Andreas Kugel, P. Morettini, Manuel Neumann, A. Polini, N. Schroer, M. Rizzi, Riccardo Travaglini, S Zannoli², Antonio Zoccoli² - Show less +17 more•Institutions (2)

CERN¹, University of Bologna²

01 Feb 2012-Journal of Instrumentation

TL;DR: The status of the PowerPC-based control system will be outlined with major focus on firmware and software development strategies.

...read moreread less

Abstract: The ATLAS experiment at LHC planned to upgrade the existing Pixel Detector with the insertion of an innermost silicon layer, called Insertable B-layer (IBL). A new front-end ASIC has been foreseen (named FE-I4) and it will be read out with improved off-detector electronics. In particular, the new Read-Out Driver card (ROD) is a VME-based board designed to process a four-fold data throughput. Moreover, the ROD hosts the electronics devoted to control operations whose main tasks are providing setup busses to access configuration registers on several FPGAs, receiving configuration data from external PCs, managing triggers and running calibration procedures. In parallel with a backward-compatible solution with a Digital Signal Processor (DSP), a new ROD control circuitry with a PowerPC embedded into an FPGA has been implemented. In this paper the status of the PowerPC-based control system will be outlined with major focus on firmware and software development strategies.

...read moreread less

8 citations

Journal Article•DOI•

SoC based floating point implementation of differential evolution algorithm using FPGA

[...]

Kiran Kumar Anumandla¹, Rangababu Peesapati¹, Samrat L. Sabat¹, Siba K. Udgata²•Institutions (2)

University of Hyderabad¹, University UCINF²

01 Nov 2012-Design Automation for Embedded Systems

TL;DR: The experimental result concludes that the hardware DE IP accelerates the execution speed approximately by 200 times compared to equivalent software implementation of DE algorithm on PowerPC 440 processor.

...read moreread less

Abstract: This paper presents floating point design and implementation of System on Chip (SoC) based Differential Evolution (DE) algorithm using Xilinx Virtex-5 Field Programmable Gate Array (FPGA). The hardware implementation is carried out to enhance the execution speed of the embedded applications. Intellectual Property (IP) of DE algorithm is developed and interfaced with the 32-bit PowerPC 440 processor using processor local bus (PLB) of Xilinx Virtex-5 FPGA. In the proposed architecture the algorithmic parameters of DE are scalable. The software and hardware implementation of the DE algorithm is carried out in PowerPC embedded processor and hardware IP respectively. The optimization of numerical benchmark functions and system identification in control systems are implemented to verify the proposed hardware SoC platform. The performance of the IP is measured in terms of acceleration gain of the DE algorithm. The optimization problems are solved by using floating point arithmetic in both embedded processor and hardware. The experimental result concludes that the hardware DE IP accelerates the execution speed approximately by 200 times compared to equivalent software implementation of DE algorithm on PowerPC 440 processor. Further, as a case study an Infinite Impulse Response (IIR) based system identification task on SoC using the developed hardware accelerator is implemented.

...read moreread less

7 citations

Patent•

Centralized protection, measurement and control device and realizing method thereof

[...]

Yueming Cai, Xinhong Qiu, Chen Fan, Chunlei Gao, Xiaodong Guo, Han Wei, Xuejun Ji, Yulei Jiang, Huiyu Li, Heping Peng, Wang Jun, Wenxia Wu, Xiaoming Yang, Dongpo Zhao - Show less +10 more

03 Oct 2012

TL;DR: In this article, a centralized protection, measurement and control device consisting of the following functional plug-ins: a PowerPC board, a master controller, a CPU, a high-speed serial bus board, and a liquid crystal display board.

...read moreread less

Abstract: The invention discloses a centralized protection, measurement and control device which comprises the following functional plug-ins: a PowerPC board, a master control PowerPC board, a CPU board, a high-speed serial bus board and a liquid crystal display board, wherein the PowerPC board is arranged on the high-speed serial bus board by plug-in connection, comprises a sampling PowerPC board and receives a sampling packet from a merging unit or sent from the protection, measurement and control device through a sampling network on process level; the master control PowerPC board communicates with abackground and a master controller through the network; the CPU board is arranged on the high-speed serial bus board by plug-in connection and used for realizing the data calculation and the protection function of all intervals; the high-speed serial bus board is an access board for all the plug-ins and used for data transmission among the plug-ins; and the liquid crystal display board is connected with the master control PowerPC board through a built-in network port The device can reduce the complex wiring among all the intervals on the secondary side of a substation and reduce the work of design, manufacture and maintenance of secondary equipment of an automatic system of the substation

...read moreread less

7 citations

Patent•

High-speed signal data processing system

[...]

Guangtao Zhu

04 Jul 2012

TL;DR: In this article, the authors propose a high-speed signal data processing system, comprising computation modules and a switching module, each computation module is provided with two or four PowerPC computation nodes and is connected with a low-latency SRIO (Serial RapidIO) exchange switch, the switching module provides a data exchange function from 18 SRIOs and 18 gigabit ethernets to a load and supports SriO high speed data transmission with the speed of 12.5Gbps.

...read moreread less

Abstract: The utility model provides a high-speed signal data processing system, comprising computation modules and a switching module, each computation module is provided with two or four PowerPC computation nodes and is connected with a low-latency SRIO (Serial RapidIO) exchange switch, the switching module provides a data exchange function from 18 SRIOs and 18 gigabit ethernets to a load and supports SRIO high speed data transmission with the speed of 12.5Gbps, and a data input and output end of each computation module is connected with a data input and output end of the switching module. Due to the connection of the computation module with the PowerPC computation nodes and an SRIO exchange switch and the switching module which provides the data exchange function from 18 SRIOs and 18 gigabit Ethernets to the load and supports SRIO high speed data transmission with the speed of 12.5Gbps disclosed by the utility model, the flexible expandability, higher bandwidth and the high-density computing capacity are provided to an embedded system.

...read moreread less

7 citations

Amorphous Slack Methodology for Autonomous Fault-Handling in Reconfigurable Devices

[...]

Naveed Imran¹, Jooheung Lee¹, Youngju Kim¹, Mingjie Lin², Ronald F. DeMara² - Show less +1 more•Institutions (2)

University of Central Florida¹, Hongik University²

01 Jan 2012

TL;DR: The results from H.263 video encoder and Canny edge detector implemented over Xilinx Virtex-4 device demonstrate autonomous recovery from permanent stuck-at faults while maintaining the throughput during fault-handling operations.

...read moreread less

Abstract: Amorphous Slack fault handling methodology utilizes adaptive runtime redundancy to improve survivability of FPGA based designs. Unlike conventional static redundancy based methods to achieve fault resilience, the proposed system operates in uniplex arrangement under non-contingent conditions. The proposed fault isolation algorithm is invoked upon fault detection which employs a health metric of the application operating over reconfigurable platform. This assertion applies if a signal-to-noise metric is known, as well as applications that do not possess a readily correlated metric to identify anomalous behavior. In particular, readily available processor cores allow dynamic fault identification by executing a software specification of the signal processing algorithm which is used to periodically validate critical outputs of the high-speed hardware circuit within tolerances. The results from H.263 video encoder and Canny edge detector implemented over Xilinx Virtex-4 device demonstrate autonomous recovery from permanent stuck-at faults while maintaining the throughput during fault-handling operations. The fault-detection and isolation applications are executed on on-chip PowerPC processor while the Circuit-Under-Test (CUT) is realized in hardware fabric. The proposed architecture allows on-chip processor based functional monitoring of the contained hardware resources subjected to the actual inputs of the circuit.

...read moreread less

Proceedings Article•DOI•

Input/output peripheral devices control through serial communication using MicroBlaze processor

[...]

B. Muralikrishna¹, K. Gnana Deepika¹•Institutions (1)

K L University¹

15 Mar 2012

TL;DR: Xilinx Platform Studio (XPS) provides an integrated environment for creating software and hardware specification flows for embedded processor systems based on MicroBlaze and PowerPC processors and allows the user to incorporate soft-core processors (MicroBlaze or PicoBlaze) to interface the built-in PowerPC computers with the reconfigurable FPGA resources.

...read moreread less

Abstract: The microprocessors available for use in Xilinx Field Programmable Gate Arrays (FPGAs) with Xilinx EDK (Embedded Development Kit) software tools can be divided into two broad categories There are soft-core microprocessors (MicroBlaze) and the hard-core embedded microprocessor (PowerPC) Xilinx Platform Studio (XPS) provides an integrated environment for creating software and hardware specification flows for embedded processor systems based on MicroBlaze and PowerPC processors MicroBlaze is a 32-bit RISC soft-core (synthesizable) processor core that enables embedded developers to tune performance to match the requirements of target applications XPS offers customization of tool flow configuration options and provides a graphical system editor for connection of processors, peripherals, and buses XPS tool can create a simple processor system and the process of adding a custom OPB peripheral to that processor system by using the Import Peripheral Wizard The EDK allows the user to incorporate soft-core processors (MicroBlaze or PicoBlaze) to interface the built-in PowerPC processors with the reconfigurable FPGA resources Such processors can also be used to interface the hardware system to a variety of input/output peripheral devices needed to supply input data to the hardware system through serial communication and/or to display the results of processing

...read moreread less

Proceedings Article•DOI•

Real time communication between multiple FPGA systems in multitasking environment using RTOS

[...]

Rourab Paul¹, Sangeet Saha¹, Suman Sau², Amlan Chakrabarti²•Institutions (2)

University of Calcutta¹, Information Technology University²

15 Mar 2012

TL;DR: This research work proposes the design and implementation of a real-time FPGA based application, which demonstrates the creation ofreal-time process tasks in FGPA systems for successful real- time communication between multiple FPGAs.

...read moreread less

Abstract: The recent development of Field-Programmable Gate Array (FPGA) architectures, with soft core (MicroBlaze) and hard core (PowerPC) processors, embedded memories and IP cores, offers the potential for high computing power. Presently FPGAs are considered as a major platform for high performance embedded applications as it provides the opportunity for reconfiguration as well as good clock speed and design resources. As the complexities in the embedded applications increase, use of an operating system brings in a lot of advantages. In present day application scenarios most embedded systems have real-time requirements that demand the use of Real-time operating systems (RTOS), which creates a suitable environment for real time applications to be designed and expanded easily. In an RTOS the design process is simplified by splitting the application code into separate tasks and then the scheduler executes them according to a specific schedule, meeting the real-time deadline. In this research work, we propose the design and implementation of a real-time FPGA based application, which demonstrates the creation of real-time process tasks in FPGA systems for successful real-time communication between multiple FPGA systems. We have chosen the RSA based encryption and decryption algorithm for this implementation, as security is one of the most important need for data communication. At first we demonstrate the real-time execution of multiple process tasks in a single FPGA system for the encryption and decryption of data. Next we describe the most challenging part of our work, where we establish the real-time communication between two FPGA systems, each running the encryption engine and decryption engine respectively and communicating with one another via an RS232 communication link. The results show that our design is better in terms of execution speed in comparison with the existing research works.

...read moreread less

Patent•

Accuracy verification system for digital signal-based electric power quality measuring apparatus

[...]

Yaqiao Luo, Xu Bin, Ji Chang An, Hong Wei, Song Zhuo - Show less +1 more

13 Sep 2012

TL;DR: In this paper, an accuracy verification system for digital signal-based electric power quality measuring apparatus is presented. But the verification system is not suitable for the measurement of the power quality of digital transformer substations.

...read moreread less

Abstract: The present invention relates to an accuracy verification system for digital signal-based electric power quality measuring apparatus. The system comprising an analog signal standard source and an apparatus to be measured, the analog signal standard source comprising a power standard source and a three-phase relay protection analyzer, the power standard source is used as a standard source for the harmonic voltage, harmonic current and voltage flicker; the three-phase relay protection analyzer is used as a standard source for the three-phase unbalance factor; the system further comprises a signal conversion apparatus, the signal conversion apparatus comprising a signal processing unit consisting of an isolation amplifier and a Hall sensor, a 16-bit analog/digital conversion unit, a central processing unit (CPU) based on the PowerPC+FPGA+VxWorks architecture and a optical fiber interface unit, each unit connected in sequence. The system inherited the advantages of the traditional verification system for power quality measuring apparatus while simulating the measuring environments of digital transformer substations and intelligent transformer substations, thus able to verify the accuracy of digital signal-based electric power quality measuring apparatus strictly according to existing standards and specifications.

...read moreread less

Proceedings Article•DOI•

Rodosvisor — An ARINC 653 quasi-compliant hypervisor: CPU, memory and I/O virtualization

[...]

Adriano Tavares¹, A. Didimo¹, T. Lobo¹, Paulo Cardoso¹, Jorge Cabral¹, Sergio Montenegro² - Show less +2 more•Institutions (2)

University of Minho¹, University of Würzburg²

01 Sep 2012

TL;DR: A novel hypervisor engineered for aerospace applications using an object oriented approach embodying time and space partitioning on a PowerPC core embedded in a FPGA, for the NetworkCentric core avionics.

...read moreread less

Abstract: This paper presents a novel hypervisor engineered for aerospace applications using an object oriented approach embodying time and space partitioning (TSP) on a PowerPC core embedded in a FPGA, for the NetworkCentric core avionics [1] — an architecture of cooperating components and managed by a real-time operating system to implement dependable computing and targeting simplicity. To support partitioned IMA [2]software architectures our hypervisor adapted to the aerospace application domain the Popek and Goldberg's [3] fidelity, efficiency and resource control virtualization requirements by extending them with additional ones like timing determinism, reactivity and improved dependability. A distinctive feature of the hypervisor is its I/O device virtualization approach that guarantees real-time performance and small trusted computing base.

...read moreread less

Proceedings Article•DOI•

An Empirical Evaluation of the Influence of the Load-Store Unit on WCET Analysis.

[...]

Mohamed Abdel Maksoud¹, Jan Reineke¹•Institutions (1)

Saarland University¹

01 Jan 2012

TL;DR: This paper introduces a simplified variant of the existing design of the LSU by reducing its queue sizes and uses AbsInt's aiT WCET analysis toolchain to determine the resulting WCET bounds and analysis times.

...read moreread less

Abstract: Due to the complexity of today’s micro-architectures, the micro-architectural analysis usually constitutes the most time-consuming step in worst-case execution time (WCET) analysis. In this paper, we investigate the influence of the design of the load-store unit (LSU) in the PowerPC 7448 on WCET analysis. To this end, we introduce a simplified variant of the existing design of the LSU by reducing its queue sizes. Using AbsInt's aiT WCET analysis toolchain we determine the resulting WCET bounds and analysis times. For the modified version of the LSU with reduced queue sizes, analysis time is reduced by more than 50% on a set of benchmarks from the Malardalen suite, while there is little change in the WCET bound.

...read moreread less

Journal Article•DOI•

Run-time generation of partial FPGA configurations for subword operations

[...]

Miguel L. Silva¹, João Canas Ferreira¹•Institutions (1)

University of Porto¹

01 Jul 2012-Microprocessors and Microsystems

TL;DR: This paper proposes mapping sequences of subword operations to a set of hardware components and generating the corresponding FPGA partial configurations at run-time, aimed at adaptive embedded systems that employ run- time reconfiguration to achieve high flexibility and performance.

...read moreread less

Proceedings Article•DOI•

An embedded architecture for implementation of a video acquisition module of a smart camera system

[...]

Jai Gopal Pandey¹, Abhijit Karmakar¹, Chandra Shekhar¹•Institutions (1)

Central Electronics Engineering Research Institute¹

15 Mar 2012

TL;DR: The device utilization summary shows that, with the proposed embedded architecture based video acquisition module, the remaining FPGA resources are sufficient for implementing any reasonably complex real-time video processing application.

...read moreread less

Abstract: This paper presents an embedded architecture for real-time video acquisition module which is a vital part of a smart camera system. The Xilinx ML-507 platform has been used to develop the proposed embedded architecture. Apart from the required necessary peripherals, the platform contains a Virtex-5 FX FPGA device having PowerPC 440 processor embedded in the FPGA fabric itself. In order to develop the required hardware and software in an integrated fashion, Xilinx Embedded Development Kit (EDK) design tool has been used. A number of Xilinx provided IPs are customized to realize the hardware modules in the FPGA fabric. To implement the real-time video capture and display functionality for the smart camera system, a Pan-Tilt-Zoom (PTZ) camera and a VGA monitor have been interfaced with the platform. This interfacing uses on-board VGA input video codec and DVI transmitter chips. The control registers of these chips are configured using the embedded PowerPC 440 processor with Inter-Integrated Circuit (IIC) bus controller's low-level device driver functions. The application software, written in C language, runs on top of a standalone software platform and uses the application programmer interface (API) provided by the software platform. The device utilization summary shows that, with the proposed embedded architecture based video acquisition module, the remaining FPGA resources are sufficient for implementing any reasonably complex real-time video processing application.

...read moreread less

Journal Article•

Golden-Finger and Back-Door: Two HW/SW Mechanisms for Accelerating Multicore Computer Systems

[...]

Slo-Li Chu¹•Institutions (1)

Chung Yuan Christian University¹

01 Jan 2012-International Journal of Engineering and Technology Innovation

TL;DR: An effective hardware mechanism to communicate two independent processors which can not be operated together, such as the dual PowerPC 405 cores in the Xilinx ML310 system and a tool, called Golden-Finger, to dynamically adjust the scheduling policy of the process scheduler in Linux are proposed.

...read moreread less

Abstract: Continuously requirements of high-performance computing make the computer system adopt more processors within a system to improve the parallelism and throughput. Although multiple processing cores are implemented in a computer system, the complicated hardware communication mechanism between processors will decrease the performance of overall system. Besides, the unsuitable process scheduling mechanism of conventional operating system can not fully utilize the computation power of additional processors. Accordingly, this paper provides two mechanisms to overcome the above challenges by using hardware and software mechanisms, respectively. In software aspect, we propose a tool, called Golden-Finger, to dynamically adjust the scheduling policy of the process scheduler in Linux. This software mechanism can improve the performance of the specified process by occupying a processor solely. In hardware aspect, we design an effective hardware mechanism, called Back-Door, to communicate two independent processors which can not be operated together, such as the dual PowerPC 405 cores in the Xilinx ML310 system. The experimental results reveal that the two mechanisms can obtain significant performance enhancements.

...read moreread less

Journal Article•DOI•

Monte Carlo dose calculation using a cell processor based PlayStation 3 system

[...]

James C. L. Chow¹, James C. L. Chow², Phil Lam¹, David A. Jaffray³, David A. Jaffray¹ - Show less +1 more•Institutions (3)

University Health Network¹, Ryerson University², University of Toronto³

09 Feb 2012

TL;DR: The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made, however, as the E GSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.

...read moreread less

Abstract: This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR_GET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.

...read moreread less

Patent•

Multichannel video system on basis of 1394b optical bus and data collecting and transmitting method thereof

[...]

Chunxi Zhang, Yingxue Long, Xiaosu Yi, Ming Wang, Yuhan Zhu - Show less +1 more

15 Aug 2012

TL;DR: In this paper, a multichannel video system on the basis of 1394b optical bus and a data collecting and transmitting method thereof is presented, where an embedded microprocessor is utilized to acquire videos, relying on a master controlling computer is avoided, and reliability of the system is improved.

...read moreread less

Abstract: The invention provides a multichannel video system on the basis of 1394b optical bus and a data collecting and transmitting method thereof, which belong to the technical field of the 1394b optical bus. The multichannel video system comprises a video collecting unit and a video transmitting unit, wherein the video collecting unit comprises a DSP (digital signal processing) chip, video decoding chips, a flash memory, a synchronous dynamic random access memory, a PCI (peripheral component interconnect) interface A and a JTAG (joint test action group) interface, and the video transmitting unit comprises a PowerPC (performance optimization with enhanced reduced instruction set computer-performance computing) module and a 1394b interface protocol module. An embedded microprocessor is utilized to acquire videos, relying on a master controlling computer is avoided, the load to the master controlling computer is lowered, and reliability of the system is improved. The specified DSP chip for video image is utilized, multiple video ports can be provided, so that multichannel videos are realized with a single chip, systematic resources are fully utilized, the circuit design is simplified, and the cost is lowered.

...read moreread less

Proceedings Article•DOI•

Recording of Time Varying Data on a Display Device: An Embedded Solution

[...]

Fema Merin Jacob

09 Aug 2012

TL;DR: This paper presents the design method for developing a recording application for time varying data displayed on the monitor based on embedded Linux using Qt, a powerful development toolkit and results indicated that this system is working stably and reliably.

...read moreread less

Abstract: This paper presents the design method for developing a recording application for time varying data displayed on the monitor based on embedded Linux using Qt, a powerful development toolkit. Recording of the visualization of time varying data to video is perceived to be a convenient and easy-to-use solution for the intense data analysis anywhere/anytime. The application is based on IBM's PowerPC 7410, while the software platform is Embedded Linux and the development environment features are Qt and Qt/Embedded. The embedded operating system is MontaVista Linux with X11 built as an abstract layer on kernel providing graphics capability to the system. The open source FFmpeg library is employed to encode captured window frames to mpeg video. The experimental testing and results indicated that this system is working stably and reliably. The implementation of the entire design environment is based on the X Window System and the Linux operating system and can thus be used on an increasing number of low-cost workstations.

...read moreread less

Journal Article•DOI•

Run-time generation of partial FPGA configurations

[...]

Miguel L. Silva¹, João Canas Ferreira¹•Institutions (1)

University of Porto¹

01 Jan 2012-Journal of Systems Architecture

TL;DR: The method is intended for use in adaptive embedded systems that employ run-time reconfiguration to achieve high flexibility and performance and is embodied in a code library that applications can use to create new bitstreams at run- time.

...read moreread less

Journal Article•DOI•

Reconfigurable FPGA-based switching path frequency-domain echo canceller with applications to voice control device

[...]

Ka Fai Cedric Yiu¹, Yao Lu¹, CH Ho², Wayne Luk², Jiaquan Huo³, Sven Nordholm³ - Show less +2 more•Institutions (3)

Hong Kong Polytechnic University¹, Imperial College London², Curtin University³

01 Mar 2012-Digital Signal Processing

TL;DR: A novel hardware architecture to support a robust adaptive algorithm in combination with a switching path model to tackle the double-talk situation is proposed and results obtained show the echo canceller is successful in handling double- talk situation and the sub-band implementation has improved convergence significantly.

...read moreread less

Patent•

Merging unit hardware core board based on PowerPC system

[...]

Kecheng Hao, Shijun Bai, Lincui Zeng, Liang Ma, Bai Biao - Show less +1 more

22 Aug 2012

TL;DR: In this paper, a merging unit hardware core board based on a PowerPC system is presented, which consists of a processor board and a secondary board; a central processing unit (CPU) and a storage module, a mode selection module and a first power module which are connected with the CPU are arranged on the processor board; and the processor and the secondary board are spliced through a pin connector.

...read moreread less

Abstract: The utility model discloses a merging unit hardware core board based on a PowerPC system. The merging unit hardware core board comprises a processor board and a secondary board; a central processing unit (CPU) and a storage module, a mode selection module and a first power module which are connected with the CPU are arranged on the processor board; and the processor board and the secondary board are spliced through a pin connector. A field programmable gate array (FPGA), a synchronous module, a digital sampling module, an analog sampling module, a point-to-point output module, a serial output module, a plurality of Ethernet modules, a network port debugging module and a second power module are arranged on the secondary board; the FPGA is connected with the synchronous module, the digital sampling module, the analog sampling module, the point-to-point output module, the serial output module, the CPU and the second power module; and the CPU is connected with a plurality of Ethernet modules, the network port debugging module and the FPGA. Through the core board, the problem about coexistence of analog sampling and digital sampling is solved, the problem about timing of a precision clock is solved, and the problem of synchronization difficulty of eight paths of point-to-point output power consumption is solved.

...read moreread less

Journal Article•

VPX Bus Techniques and Its Implementation

[...]

Pan Qi

01 Jan 2012-Electro-mechanical Engineering

TL;DR: The 3U and 6U module and backboard of chassis are described and analyzed, and a project of embedded system base on VPX bus is also shown in this paper.

...read moreread less

Abstract: High speed serial bus is compatible in the architecture of VPX bus specification.VPX bus has extensive application prospect in high-performance embedded computing area.The 3U and 6U module and backboard of chassis are described and analyzed,and a project of embedded system base on VPX bus is also shown in this paper.The module is designed with PowerPC MPC8641D and RapidIO switch Tsi577.The backboard is designed with mesh topology architecture,which supports high speed point to point communication.The powerful ability of data processing and data exchange is a trend of military information processing system of radar.

...read moreread less

A processor based implementation of lapped biorthogonal transform for jpeg xr compression on fpga

[...]

M. R. Rehman, G. Raja

01 Jan 2012

TL;DR: A processor based design that executes the instructions of LBT at higher speed is presented that can be used in low cost battery operated imaging devices and supports easy up-gradation.

...read moreread less

Abstract: This paper describes a new methodology for implementation of Lapped Biorthogonal Transform (LBT) used in JPEG XR Image compression. Due to sequential nature of LBT, we present a processor based design that executes the instructions of LBT at higher speed. This design can be used in low cost battery operated imaging devices and supports easy up-gradation. We have tested the design on Xilinx Virtex-II Pro FPGA that includes built in PowerPC 405 core which is a 32-bit implementation of RISC PowerPC embedded-environment architecture.

...read moreread less

Proceedings Article•DOI•

Hardware/software co-design of Dynamic Binary Translation in X86 emulation

[...]

Hongqi He, Hai-feng Chen, Liehui Jiang, Weiyu Dong

25 May 2012

TL;DR: A hardware unit is designed to accelerate the DBT system, including Instruction Decoder, RISC Code Table, Translation Cache and Cache Query Unit, and experiment showed that the co-design D BT system could work accurately.

...read moreread less

Abstract: X86 emulation is an effectively method to solve the problem of software compatible between X86 and RISC processors, such as ARM, PowerPC, Alpha and so on. Dynamic Binary Translation (DBT) in X86 emulation translates the X86 binary codes to RISC binary code dynamically so that the software based on X86 platform could execute undifferentiated on RISC platform. However, the DBT based on software is one of the performance bottlenecks nowadays. In this case, this paper discusses a new method for DBT with hardware/software co-design. A hardware unit is designed to accelerate the DBT system, including Instruction Decoder, RISC Code Table, Translation Cache and Cache Query Unit. Instruction Decoder analyses the meaning of X86 binary codes and then looks up RISC Code Table to obtain the corresponding RISC binary codes. Translation Cache stores the recently translated RISC binary codes to reduce the repeated instruction translation. Cache Query Unit is used to determine whether cache hit or not. Finally, we achieve the hardware unit using Verilog HDL. Experiment showed that the co-design DBT system could work accurately.

...read moreread less

Barrier logic: a program logic for concurrency on PowerPC

[...]

Richard Bornat

01 Jan 2012

TL;DR: The authors distill rambling ideas to distill their rambolic ideas into a coherent sentence, and then use them to describe the ideas of the sentence. But they ignore the rambunctious ideas.

...read moreread less

Abstract: Attempt to distill my rambling ideas.

...read moreread less

Patent•

Switching system for realizing industry standard architecture (ISA) bus on performance optimization with enhanced RISC-performance computing (PowerPC) embedded computer

[...]

Qiping Xiao, Song Yang

05 Sep 2012

TL;DR: In this article, a switching system for realizing an industry standard architecture (ISA) bus on a performance optimization with enhanced RISC-performance computing (PowerPC) embedded computer is presented.

...read moreread less

Abstract: The utility model discloses a switching system for realizing an industry standard architecture (ISA) bus on a performance optimization with enhanced RISC-performance computing (PowerPC) embedded computer. The switching system comprises a peripheral component interconnect (PCI) Target bridge, an ISA bus converter bridge, a direct memory access (DMA) controller and a PCI Master bridge, wherein one end of the PCI Target bridge is connected with a PCI bus and the other end of the PCI Target bridge is connected with the ISA bus converter bridge; the other end of the ISA bus converter bridge is connected with the ISA bus; one end of the PCI Master bridge is connected with the PCI bus and the other end of the PCI Master bridge is connected with the DMA controller; and the other end of the DMA controller is connected with the ISA bus. By the switching system, the problem that the PCI bus of a PowerPC computer operates external equipment in a standard ISA bus mode is solved, and the external equipment communicates with the PowerPC computer in a DMA transmission mode of the standard ISA bus.

...read moreread less