# Implementation and Tests of FPGA-embedded PowerPC in the control system of the ATLAS IBL ROD card

**G.** Balbi<sup>a</sup>, M. Bindi<sup>a</sup>, D. Falchieri<sup>b</sup>, M. Furini<sup>a</sup> A. Gabrielli<sup>b</sup>, A. Kugel<sup>c</sup>, **R. Travaglini**<sup>a,1</sup>, **M. Wensing**<sup>d</sup> On behalf of the ATLAS IBL collaboration

<sup>a</sup> INFN Bologna

<sup>b</sup> University of Bologna & INFN Bologna

<sup>c</sup> University of Heidelberg/ZITI

<sup>d</sup> Fachbereich C Physik Wuppertal

*E-mail*: Riccardo. Travaglini@bo.infn.it

ABSTRACT: The Insertable B-layer project is planned for the upgrade of the ATLAS experiment at LHC. A silicon layer will be inserted into the existing Pixel Detector together with new electronics. The readout off-detector system is implemented with a Back-Of-Crate module implementing I/O functionality and a Readout-Driver card (ROD) for data processing. The ROD hosts the electronics devoted to control operations implemented both with a backcompatible solution (using a Digital Signal Processor) and with a PowerPC embedded into an FPGA. In this document major firmware and software achievements concerning the PowerPC implementation, tested on ROD prototypes, will be reported.

KEYWORDS: Data acquisition circuits, Detector control systems, Digital electronic circuits

<sup>1</sup> Speaker.

#### Contents

| 1. Introduction                                                            | 1 |
|----------------------------------------------------------------------------|---|
| 1.1 The ROD card prototype                                                 | 2 |
| 2. Design and test of the FPGA-embedded PowerPC system for the ROD control | 3 |
| 2.1 Firmware design                                                        | 4 |
| 2.2 Performances and tests                                                 | 4 |
| 3. Integration Tests                                                       | 5 |
| 3.1 Simulation                                                             | 6 |
| 4. Summary                                                                 | 7 |
|                                                                            |   |

## 1. Introduction

The ATLAS experiment [1] at LHC planned to upgrade the existing Pixel Detector [2] with the insertion of an innermost silicon layer, called Insertable B-layer (IBL) [3] which will be interposed between a new beam pipe and the current inner layer (B-layer). The IBL installation has been planned during the first long LHC shutdown in 2013/14.

Major goals of the IBL project are:

- strengthen the tracking capability by increasing both redundancy and precision;
- preserve the performances of the Pixel Detector for effects due to the increased luminosity expected after LHC upgrades (greater pile-up and radiation doses).

IBL read-out electronics have been redesigned in order to accomplish the enhanced detector performances. A new front-end ASIC, called FE-I4 [4] has been designed to face the larger occupancy as well as to manage the increased bandwidth expected for IBL. New off-detector electronics [5] have been foreseen as well in order to overcome limitations in the current system; it consists of two 9U-VME cards: Back-of-Crate (BOC) and Read-Out Driver (ROD) respectively implementing optical I/O interface and data processing (see Figure 1). Each card pair processes data received from 32 FE-I4 data links for a total I/O bandwidth of 5.12 Gb/s.



Figure 1: IBL read-out electronics layout [6].

Moreover, the new ROD provides enhanced performances in the so-called calibration runs. They consist in injecting test charges into the single pixel preamplifier; these injections are iterated over several acquisition scans with different settings of the front-end electronics parameters. In the current version of the readout electronics, scan results are acquired over VME bus, with a transfer rate limit of 7 MB/s. The new ROD design implements dedicated Gigabit Ethernet connections, thus enhancing the overall acquisition performances of the calibration runs.

A new off-detector design was also strongly suggested by the obsolescence of components used in the actual cards.

#### 1.1 The ROD card prototype

The ROD card has been implemented on a 14 layers PCB. First prototypes have been received from producer on September '11; a second revision batch, following a major bug-fixing, has been delivered on February '12. In the following of this document the description of the card functionalities and test results are referred to that second prototype version (Figure 2).

The design is based on modern FPGA Xilinx devices: one Virtex-5 for control purposes and two Spartan-6 dedicated to data processing (gathering of front-end output, event building and calibration data processing). Aiming at backward compatibility and reliability, two independent microprocessors have been foreseen: a Digital Signal Processor (DSP) [7] and a PowerPC embedded into the Virtex-5 [8].



Figure 2: Picture of the ROD-card second prototype.

The ROD is a standard 9U VME64X board. The VME connectors are used to distribute the power supply to the card from the crate. Moreover, the VME bus interfaces the ROD with the DAQ controller system, acting as a back-up of the main control link (Gigabit Ethernet) and allowing for remote download of the FPGAs firmware.

Other ROD external connections are the following:

- 1 Gigabit Ethernet port towards the DAQ controller system (main connection);
- 2 Gigabit Ethernet ports to deliver the calibration scans results;
- TTCrq mezzanine to receive ATLAS clock and trigger commands;
  - 4 custom busses towards the BOC through VME J3 and P0 connectors:
    - a) Synchronous 96-bit bus @ 80MHz receiving the FE-I4 output;
    - b) Synchronous 16-bit bus @ 40 MHz for sending commands to the FE-I4;
    - c) Synchronous 56-bit bus @ 80MHz for sending formatted events to the ATLAS DAQ system;
    - d) Asynchronous 27-bit bus for configuring the BOC.

An internal asynchronous bus is implemented in order to configure the devices on the board (ROD bus); PowerPC and DSP act as master controllers.

Static and dynamic memories components are hosted on the ROD; in particular the Virtex-5 is equipped with a SO-DIMM DDR2<sup>1</sup> module; it is also supplied with a 64 Mbit Atmel Flash device, devoted to the storage of both non-volatile parameters (e.g. Ethernet IP addresses) and software programs to be executed by the PowerPC.

# 2. Design and test of the FPGA-embedded PowerPC system for the ROD control

Description showing the implementation of FPGA-embedded PowerPC for ROD control can be found in a previous document [9] together with details on firmware development tools and strategies. In the following of this paper, system performances and test results will be described while details already presented will be reported only when needed.

Main task of the PowerPC is to configure and control the overall read-out electronics. A set of dedicated software API<sup>2</sup> are being developed; they will form the master control software that will be executed on the processor. The API will be accessed by the high level DAQ software using a custom communication protocol running both over Ethernet and VME. Major tasks that must be accomplished by the PowerPC are:

- configuration of processing blocks into the ROD Spartan-6;
- configuration of BOC card;
- sending commands (e.g. configuration, triggers) to the FE-I4;
- spying of data received from the FE-I4;
- control of the calibration scans.

<sup>&</sup>lt;sup>1</sup> DDR2 is a second generation Double Data Rate (DDR) Synchronous Dynamic RAM. SO-DIMM (Small Outline Dual In-Line Memory Module) is a type of pluggable memory module which hosts several chips; its form-factor is an industrial standard for commercial notebooks.

<sup>&</sup>lt;sup>2</sup> Application Programming Interfaces (API) are a core set of software components (typically functions or routines) properly designed in order to improve the interface toward the underlying operating system.

## 2.1 Firmware design

The PowerPC embedded system into the Virtex-5 is composed of several types of different elements. The microcontroller (PowerPC), the MAC<sup>3</sup> and BRAM<sup>4</sup> memories (used as cache) are made of hardware silicon blocks embedded into the FPGA. All the other components are implemented with the FPGA logic cells using both available Xilinx IP cores (e.g. memories and PHY<sup>5</sup> controllers) and custom VHDL code. The full system (Figure 3) is composed by a set of peripherals connected to the PowerPC with an internal 128-bit bus (Processor Local Bus).

The firmware is developed within the Xilinx  $ISE^{\text{(B)}}$  Design Suite [10] which integrates  $EDK^{\text{(B)}}$ , an environment for designing embedded processing systems. The software is designed with Xilinx  $SDK^{\text{(B)}}$  tool using C language.  $SDK^{\text{(B)}}$  supplies a simple operating system platform that provides the lowest layer of software to handle the processor-specific functions.



**Figure 3:** schematic of the embedded system firmware implementation. Externally connected devices and busses are also shown (in grey). Colors are used to identify different types of peripheral: silicon blocks (white), Xilinx-IP mixed hard-soft cores (light violet), Xilinx-IP soft cores (blue), custom code (light green).

## 2.2 Performances and tests

We configured the PowerPC in order to run with a CPU clock frequency of 250 MHz, well below the maximum achievable (400 MHz) but providing anyway a good compromise between performance and power consumption. The Processor Local Bus is implemented with Xilinx-IP core, working with 125 MHz clock. Since every 128-bit data transfer requires several cycles, Direct Memory Access (DMA) is used to access the peripherals requiring high bandwidth.

<sup>&</sup>lt;sup>3</sup> Media Access Control : the sub-layer of the Ethernet standard mainly devoted to network addressing and data framing.

<sup>&</sup>lt;sup>4</sup> Block RAM: dedicated memory (RAM) resources embedded into Xilinx FPGAs.

<sup>&</sup>lt;sup>5</sup> Chip implementing the Ethernet Physical layer; its main task is transmitting and receiving the data frames.

DMA transfer has been successfully tested towards the Ethernet MAC controller. High data rates have been achieved through the Gigabit connection: TCP/IP packets have been sent continuously and unidirectionally, reaching a stable rate of about 300 Mb/s. Sending streams in both directions has been tested and behaves uniformly.

Tests have been done in order to evaluate several on-the-shelf SO-DIMM DDR2 modules. Three constraints limit the maximum data rate which can be achieved on the system:

- a) the FPGA speed-grade (the Virtex-5 hosted on the ROD supports a maximum DDR2 clock frequency of 266 MHz),
- b) the main CPU clock (only given ratios of CPU and memory clock are allowed)
- c) the module architecture (number of ranks<sup>6</sup>).

Results are shown in Table 1.

| Producer  | Total Size (MB) | Data Width | Ranks | Max Clock Speed | Max Transfer Rate |
|-----------|-----------------|------------|-------|-----------------|-------------------|
|           |                 |            |       | (MHz)           | Per Line (Mb/s)   |
| Micron    | 256             | 64         | 1     | 200             | 400               |
| Nanya     | 512             | 64         | 1     | 200             | 400               |
| Kingston  | 2000            | 64         | 2     | 125             | 300               |
| Transcend | 2000            | 64         | 2     | 125             | 300               |

**Table 1:** SO-DIMM DDR2 modules tested with the ROD PowerPC. Single rank modules works up to 200 MHz clock speed while dual rank modules operates correctly up to 125 MHz clock frequency.

An I<sup>2</sup>C interface running at 100 kHz has been implemented in order to access the SO-DIMM DDR2 EEPROM where the module parameters are stored. Thus, they can be read and later used to properly configure the memory controller interface. The Serial Peripheral Interface (SPI) protocol is used to connect with the flash memory; it has been proved to work up to 30 MHz clock frequency.

A custom peripheral has been implemented to allow VME to act as a master control of the Processor Local Bus. It has been successfully tested and will serve as a backup option either to control the peripherals or to perform specific low-level tests. FE-I4 commands and ROD busses have been thoroughly checked during integration test, as described in the following section.

## 3. Integration Tests

Joined tests with other IBL electronics cards have been performed. The first consisted on connecting ROD and BOC together into a VME crate and carefully verify both control bus and then data paths. Results are satisfactory: BOC configuration has been successfully performed using software running on the PowerPC. Moreover, transmission integrity of data exchanged between the two cards has been verified.

Further tests have been done connecting the ROD with a FE-I4 chip, hosted on a prototype board. In the final system the FE-I4 will be connected to the BOC via optical fibers. In these tests the FE-I4 were electrically plugged on general-purpose ROD connector. Despite the

<sup>&</sup>lt;sup>6</sup> A memory rank is set of memory chips into a module which are connected to the "select" signal. SO-DIMM modules with the same memory size can be designed with different architectures and, consequently, with single or multiple ranks.

different type of connection, the test has been useful to debug both dedicated peripherals and software of the PowerPC system as well as to develop the Spartan-6 firmware which processes the incoming data.

The PowerPC has been demonstrated to be able of sending commands to the FE-I4 with the correct timing and format as well as of spying output data (see Figure 4); successively they were injected into dedicated Spartan-6 FIFOs and properly read-back by the controller.



**Figure 4:** Picture taken from the Xilinx Chipscope during FE-I4 inetragtion tests. The PowerPC sends Trigger commands in loop and the Front-end output is properly decoded, sampled and verified. Spying the FE-I4 output at run-time with the Chipscope has been proven to be a powerful tool in order to better tune both timing and bus protocol among the several VHDL blocks implemented on the receiver path.

## 3.1 Simulation

Much effort has been endowed on setting up a simulation of the whole acquisition system. The System Verilog code developed for the FE-I4 design has been integrated with BOC and ROD VHDL firmwares in a common simulation environment. The testbench was driven by software running on the PowerPC simulation model. Verilog models of external components needed by the PowerPC (Ethernet PHY, DDR2, Flash) have been also included (see Figure 5). Mentor Graphics simulation environment has been choosen [11][12][13].



Figure 5: Layout of the overall off-electronics simulation setup.

The simulation of the whole system has been proven to be feasible: dedicated software was running onto the PowerPC emulation instance and was able to configure the FE-I4, send triggers and read back outputs (see Figure 6). The possibility of checking data stream and components behavior at any point of the DAQ chain showed to be a powerful tool for debugging code (both firmware and test software) and fixing bugs. However, the simulation required excessive computational resources in terms of memory consumption. The implementation of only two FE-I4 instances filled up about 14 GB RAM into an eight-cores PC running Windows OS.



Figure 6: Waveforms from the overall off-detector DAQ simulation.

#### 4. Summary

The present status of the ROD card designed for the ATLAS IBL experiment has been described. In particular the implementation of a PowerPC embedded into a Xilinx Virtex-5 FPGA has been showed with major focus on the functionalities required to control the acquisition electronics. Both stand-alone and integration tests proved the system is stable and behaves according to the requirements. An overall simulation of the acquisition chain has been setup and used for debugging and bug fixing, though showing limits due to excessive requirements of computational resources.

#### References

- [1] The ATLAS Collaboration, *The ATLAS Experiment at the CERN Large Hadron Collider*, 2008 *JINST* 3 S08003
- [2] G. Aad et al., ATLAS Pixel Detector Electronics and Sensors, 2008 JINST 3 P07007.
- [3] The ATLAS Collaboration, ATLAS Insertable B-Layer Technical Design Report, CERN-LHCC-2010-013.
- [4] J. Dopke et al., The IBL Readout System, 2011 JINST 6 C01006 doi:10.1088/1748-0221/6/01/C01006
- [5] ATLAS IBL Collaboration, *Prototype ATLAS IBL Modules using the FE-I4A Front-End Readout Chip*, arXiv:1209.1906 [physics.ins-det]. Submitted to JINST.

- [6] D. Falchieri et al., *Proposal for a readout driver card for the ATLAS Insertable B-Layer*, ATL-COM-INDET-2010-147, 2010
- [7] Texas Instruments, TMS320C6201 Fixed-Point Digital Signal Processor, SPR051H, March 2004
- [8] Embedded Processor Block in Virtex-5 FPGAs, Xilinx Reference Guide, UG200(v1.8), 2010
- [9] Balbi et al., A PowerPC-based control system for the ReadOut Driver module of the ATLAS IBL, 2012 JINST 7 C02016 doi:10.1088/1748-0221/7/02/C02016
- [10] http://www.xilinx.com/products/design-tools/ise-design-suite/index.htm
- [11] http://www.mentor.com/products/fpga/simulation/modelsim
- [12] http://www.mentor.com/products/fv/codelink/
- [13] http://www.mentor.com/products/fv/questa/