# FPGA Implementation of Orthogonal 2D Digital Predistortion System for Concurrent Dual-Band Power Amplifiers Based on Time-Division Multiplexing

Christophe Quindroit, Naveen Naraharisetti, *Student Member, IEEE*, Patrick Roblin, *Member, IEEE*, Shahin Gheitanchi, *Member, IEEE*, Volker Mauer, and Mike Fitton

Abstract—A concurrent dual-band digital predistortion (DPD) system is presented to compensate for the nonlinearity of the radio-frequency power amplifiers (PAs) driven by a concurrent dual-band signal. Recently, a closed-form orthogonal polynomial basis has been introduced showing stability improvement compared with the conventional polynomial. An experimental test bed employing a field-programmable gate array (FPGA) linked to two mixed-signal system boards has also been presented. Based on the FPGA, this paper focuses on the hardware implementation of the new concurrent dual-band orthogonal DPD forward path using time-division multiplexing. Performances are evaluated with an experimental test setup cascading 1-10 W peak PAs and a dual-band signal center frequency spaced by 310 MHz. The lower side band (LSB) and upper side band (USB) are centered at 1890 and at 2200 MHz, respectively. Two signal scenarios are presented combining alternatively 1-carrier wide-band code-division multiple access (WCDMA) and 10-MHz long-term evolution (LTE) signals to a 5-carrier WCDMA signal. Experimental results show that the proposed time-division-multiplexing implementation approach gives similar performance compared with the software implementation with half of the resources. Adjacent channel power ratios (ACPRs) are reduced below -50 dBc and normalized mean-square error (NMSE) close to -40 dB.

Index Terms—Concurrent dual-band, digital predistortion (DPD), orthogonal polynomials, power amplifiers (PAs), time-division multiplexing.

# I. INTRODUCTION

IRELESS communication systems are continuously growing by supporting more users and providing more services. Consequently, each generation of mobile telecommu-

Manuscript received July 05, 2013; revised October 11, 2013; accepted October 16, 2013. Date of publication November 21, 2013; date of current version December 02, 2013. This work was also supported in part by the National Science Foundation under grant ECS 1129013. This project was supported in part by the Altera Corporation/Wireless Systems Solutions Group. This paper is an expanded paper from the IEEE International Microwave Symposium, Seattle, WA, USA, June 2–7, 2013.

- C. Quindroit, N. Naraharisetti, and P. Roblin are with the Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210 USA (e-mail: quindroit.1@osu.edu; naraharn@ece.osu.edu; roblin. 1@osu.edu).
- S. Gheitanchi and V. Mauer are with Altera Europe, High Wycombe, Buckinghamshire HP12 4XF, U.K. (e-mail: sgheitan@altera.com; vmauer@altera.com)
- M. Fitton is with Altera Corporation, San Jose, CA 95134 USA (e-mail: mfitton@altera.com).
- Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMTT.2013.2288220

nication systems require higher data rates while using a limited and already saturated radio-frequency (RF) spectrum. To take advantage of the spectrum, spectrally efficient modulation schemes, based on code-division multiple access (CDMA) and orthogonal-frequency-division multiplexing (OFDM), are now commonly used in such systems. These complex modulations, resulting in a nonconstant envelope signal with a high peak-to-average-power ratio (PAPR), stimulate harder the transmitter nonlinearities, whereas the requirements on the RF front end linearity performance are tougher. The power amplifier (PA) plays a key role in the transmitter nonlinearities creation [1] and drives the tradeoff between the linearity and the power efficiency of the RF front end.

Digital predistortion (DPD) is a widespread and cost-effective method to linearize the transmit PA. As a result, the standard linearity requirements are respected while conserving high power efficiency [2]–[5].

To satisfy the multiband, multistandard requirements of the modern radio base stations, recent advancement in PA design have given the availability to concurrently drive it with a signal consisting of widely separated bands [6]–[9], with typically more than 100 MHz, permitting to cover multiband operation with only one amplification stage.

Nevertheless, excited by such concurrent dual-band signal, the behavior of the PA is different than driving it by a singleband signal. Besides producing the usual in-band distortion in each bands, PA nonlinearities are also involving cross-band distortions, resulting in the different nonlinear cross-product of the combined bands falling into the bands of interest [10], [11]. In this context, applying directly the single-band DPD techniques [12] for each band is not effective [10], [13]. Indeed, single-band nonlinear models, dedicated to mimic the PA driven by a single-band signal, are not sufficient since the cross-modulation distortions are ignored. Moreover, applying single-band DPD techniques on the full band is demanding a large bandwidth (five to seven times the full signal bandwidth), involving costly high-sampling-rate digital-to-analog converters (DACs) and analog-to-digital converters (ADCs), which is inefficient or impractical for large frequency band separation.

Since 2008, linearization of concurrent multiband PA has become a main interest for the DPD research community. In [14], a system-level simulation of a concurrent dual-band predistortion technique performed at intermediate frequency (IF), is con-



Fig. 1. Block diagram of frequency-selective method.

ducted, reducing the spectral regrowths by 15–20 dB, but no experimental test have been conducted. To efficiently address this problem, the frequency-selective approach has been explored by Roblin et al. in [11], [15], [16] and implemented in a field-programmable gate array (FPGA). The strategy of these methods is depicted in Fig. 1 and can be summed up as divide to conquer. Indeed, each band is upconverted via different modulators before being combined and amplified. In that case, the technique ensures to linearize only the band of interest by taking into account the different nonlinear cross-products of the combined bands. Thus, the bandwidth requirement of each DPD system has been considerably reduced. This digital predistortion technique enables to linearize separately in-band and interband distortions up to the fifth order. Moreover, in [16], the linearization of a concurrent three-band signal is also explored. The presented measurement setup does not include observation path, and the predistorter coefficients are manually tuned from the spectrum analyzer observation.

In [10] and [17], based on the same strategy and the memory polynomial model, Bassam *et al.* have reformulated and extended the technique to compensate for memory effects and named it as two-dimensional digital predistortion (2D-DPD). Since both input bands are widely separated, it should be noticed that the intermodulation bands are located far from the band of interest and can be easily removed with filters. Thus, 2D-DPD is only concerned about the in-band and cross-band distortion cancellations. In [18], a subsampling feedback loop is adopted to simplify and reduce the complexity of the dual-band linearization architecture involving only one observation path for both bands.

One of the disadvantages of the 2D-DPD model is its complexity requiring a high number of coefficients. Liu *et al.* proposed to reduce the complexity of the 2D-DPD model by introducing a 2D augmented Hammerstein model (2D-AH) [19] and the 2D modified memory polynomial model (2D-MMP) [13]. In [22], Zhang *et al.* presented a pruning method applied to 2D-DPD. These three methods enable to drastically reduce the needed number of coefficient while achieving similar distortion cancellation results.

In [21], and later on in [22] and [23], based on the dual-input truncated Volterra model and the neural network model, respectively, the authors extended the 2D-DPD model to also compensate for the joint mitigation and modulator imbalance. Lately, in [24], by following the same kind of expansion as 2D-DPD, the authors have extended the technique to successfully compensate for a concurrent three-band signal. All these works have been successfully tested for different signal scenarios, using single and multicarriers wide-band code-division multiple-access (WCDMA), long-term evolution (LTE) and worldwide interoperability for microwave access (WiMax)

signals, different PAs, and different band frequency separations. 2D-DPD and its derivatives reach very good distortion compensation showing adjacent channel power ratio (ACPR) of usually less than -50 dBc and a normalized mean-square error (NMSE) around -40 dB.

Nonetheless, these works have been evaluated by using vector signal generators (VSGs) and vector signal analyzers (VSAs) and are thus reserved to laboratory experiments. Indeed, except the frequency-selective predistortion from Roblin *et al.*, few works regarding hardware implementation have been published. In [25], Kwan *et al.* proposed a lookup table (LUT) implementation that has also been evaluated using a signal generator. In [26], Ding *et al.* have presented a simplified dual-band LUT implementation based on an FPGA. However, the proposed test bench uses a single modulator/demodulator for the up/downfrequency conversion and one ADC/DAC limiting the frequency-band separation to 100 MHz.

Recently, in [27], to simplify the hardware implementation for strong nonlinearities, we have presented a concurrent dual-band spline-based DPD.

However, one of the intrinsic drawbacks of the 2D-DPD model is its numerical instability. Indeed, the kernel extraction process involves the inversion of an often ill-conditioned matrix. Raich *et al.* [3] have introduced a closed-form expression of orthogonal polynomials basis for a single-band DPD that allowed to alleviate the numerical instability. Based on this work, in [28], we have proposed a new set of orthogonal polynomials for 2D-DPD that have shown an improvement of the extraction stability process. Note that [29] proposed at the same moment a similar approach.

Moreover, in [27] and [28], a new test bed, based on a commercial FPGA and two mixed signal DPD (MSDPD) evaluation boards, devoted to the design and the implementation of concurrent dual-band digital predistortion, is also presented. Thanks to both MSDPDs, the test bed holds two independent transmitter (TX) and receiver (RX) paths. Nevertheless, in both papers, despite of the usage of an FPGA, to evaluate the performance of the concurrent dual-band DPD, the test bed was used as a regular VSG/VSA solution. Therefore, the DPD forward path was implemented in a software environment, and practical hardware implementation issues were not discussed.

Thus, in this paper, as an extension of [28], based on time-division multiplexing, we propose an efficient hardware implementation of the orthogonal polynomial 2D-DPD inside the FPGA and evaluate the compensation performances for different scenarios.

The paper is organized as follows. In Section II, the conventional and orthogonal polynomial 2D-DPD models and the kernel extraction process are recalled. Section III presents the proposed FPGA implementation. Finally, Section IV illustrates the efficiency of the proposed orthogonal 2D-DPD implementation testing on a 10-W gallium-nitride (GaN) PA, and the conclusion is presented in Section V.

# II. CONCURRENT DUAL-BAND 2D-DPD TECHNIQUE

The system block diagram of a concurrent dual-band digital predistortion architecture is displayed in Fig. 2. Both baseband



Fig. 2. Block diagram of a dual-band adaptive digital predistortion system.

input signals  $z_1$  and  $z_2$  at the carrier frequencies,  $\omega_1$  or  $\omega_2$ , respectively, drive two distinct predistorters. The generated signals  $x_1$  and  $x_2$  are converted to the analog domain and frequency upconverted by their respective DAC and modulator. The resulting RF signals are combined to feed into the PA. Two observation paths are filtered as well as frequency downconverted and digitally converted. The two feedback baseband signals  $y_1$  and  $y_2$  are time-aligned, and both predistorter coefficients are estimated and replaced in the forward paths.

Considering  $x_1$  and  $x_2$  and  $y_1$  and  $y_2$ , the two input and output baseband signals of the PA, from [10], the generalized complex baseband input—output relationship of the 2D-DPD memory model for concurrent dual band is shortened and recalled as

$$y_i(n) = \sum_{m=0}^{M_i - 1} \sum_{k=1}^{K_i} \sum_{j=0}^{k-1} c_{m,k,j}^{(i)} \times \gamma_{k,j} \left( x_i(n-m), x_l(n-m) \right)$$

where  $i, l \in \{1, 2\}$  and  $i \neq l, c_{m,k,j}^{(i)}, K_i$ , and  $M_i$  are the coefficients, the nonlinearity order, and the memory depth, respectively, of the band (i).  $\gamma_{k,j}(x_i(n-m), x_l(n-m))$  represents the basis function. Using a conventional polynomial basis [10],  $\gamma_{k,j}$  is expressed as follows:

$$\gamma_{k,j}(x_i, x_l) = x_i \cdot |x_i|^{k-j-1} \cdot |x_l|^j.$$
 (2)

The coefficients in (1) can be estimated through a least square (LS) approach. Let us define the following vector notations from N samples of the input signal:

$$\vec{y}_{i} = [y_{i}(M_{i}), \dots, y_{i}(N)]^{T}$$

$$\Gamma = \begin{bmatrix} \vec{\gamma}_{1,0} \left( \vec{x}_{i}^{(0)}, \vec{x}_{l}^{(0)} \right), \dots, \vec{\gamma}_{K_{i},K_{i}-1} \left( \vec{x}_{i}^{(0)}, \vec{x}_{l}^{(0)} \right), \\ \dots, \vec{\gamma}_{k,j} \left( \vec{x}_{i}^{(m)}, \vec{x}_{l}^{(m)} \right), \dots, \vec{\gamma}_{1,0} \left( \vec{x}_{i}^{(M_{i}-1)}, \vec{x}_{l}^{(M_{i}-1)} \right), \\ \dots, \vec{\gamma}_{K_{i},K_{i}-1} \left( \vec{x}_{i}^{(M_{i}-1)}, \vec{x}_{l}^{(M_{i}-1)} \right) \end{bmatrix}$$

$$\vec{c}^{i} = \begin{bmatrix} c_{0,1,0}^{(i)}, \dots, c_{0,K_{i},K_{i}-1}^{(i)}, \dots, c_{m,k,j}^{(i)}, \dots, c_{M_{i}-1,1,0}^{(i)}, \\ \dots, c_{M_{i}-1,K_{i},K_{i}-1}^{(i)} \end{bmatrix}^{T}$$

$$(3)$$

where  $\vec{x}_i^{(m)} = [x_i(M_i - m), \dots, x_i(N - m)]^T$  and  $\vec{x}_l^{(m)} = [x_l(M_l - m), \dots, x_l(N - m)]^T$  are the mth delayed vectors.

Using these vector notations, (1) can be written as

$$\vec{y_i} = \Gamma \vec{c}^{(i)}. \tag{4}$$

The set of coefficients  $\vec{c}^{(i)}$  can then be evaluated via the least-squares solution as follows:

$$\vec{c}^{(i)} = (\Gamma^H \Gamma)^{-1} \Gamma^H \vec{y}_i \tag{5}$$

where  $\Gamma^H$  is the conjugate transpose of  $\Gamma$ . Due to the conventional polynomial uses as basis function  $\gamma_{k,j}$ , the Hessian matrix  $(\Gamma^H\Gamma)$  is often ill-conditioned and its inversion can lead to numerical errors, thus yielding system convergence problems. In order to improve the extraction stability and assuming that both band signals are independent, in [28] and in [29], a closed-form orthogonal polynomial has been successfully introduced to replace the conventional polynomial in the 2D-DPD model. The orthogonal polynomial is expressed as follows:

$$\gamma_{k,j}(x_i, x_l) = \lambda_k(x_i) \times \psi_j(x_l)$$

$$\lambda_k(x_i) = \sum_{v=1}^{v=k} (-1)^{v+k} \frac{(k+v)!}{(v-1)!(v+1)!(k-v)!} \times |x_i|^{v-1} x_i$$

$$\psi_j(x_l) = (-1)^j \sum_{v=0}^{v=j} \frac{(v+j)!}{(v!)^2 (j-v)} \times (-|x_l|)^v.$$
(6)

 $\lambda_k$  is the modified Legendre polynomial from [3] and  $\psi_j$  is the shifted Legendre polynomial. While the introduced basis is not strictly orthogonal for an arbitrary signal distribution, it has shown stability improvement during the model extraction for different signal distributions.

Finally, the indirect learning method, consisting of swapping the variables  $x_1 \leftrightarrow y_1$  and  $x_2 \leftrightarrow y_2$ , enables to estimate the DPD coefficients. To reinforce the robustness of the new basis, the direct-learning method or Damped Newton algorithm can also be employed. By choosing adequately the relaxation constant, a fast convergence of the system can also be achieved [2].

# III. ORTHOGONAL 2D-DPD HARDWARE IMPLEMENTATION DISCUSSION

Since the stability improvement of the orthogonal polynomial has been shown in [28] and in [29], in this paper, we look for an efficient hardware implementation of the 2D-DPD forward path. Due to the complexity of the two paths, it could be challenging to fit the design into a given FPGA.

### A. Full-Multiplier-Based Implementation

The direct approach is to implement both DPD paths from (1), by using the three main design blocks: delays, adders, and multipliers. Due to the closed-form expression of the orthogonal basis, the number of multipliers increase drastically when  $K_i$  become larger. Knowing that one of the most complex and expensive component in FPGA is the multiplier, it has to be used parsimoniously to finally decrease the cost and the complexity of the system. Given the large number of multiplications required for 2D-DPD, this strategy is then inefficient.



Fig. 3. Dual-band LUT contents.



Fig. 4. Block diagram of the 2D-DPD LUT implementation.

# B. Full-LUT Based Implementation

Equation (1) derives for the DPD can be simplified and expressed as follows:

$$x_i(n) = \sum_{m=0}^{M_i - 1} z_i \times g_{i,m}(n)$$
 (7)

where  $g_{i,m}$  is the complex gain for a given memory tap, depending on both inputs and is expressed as

$$g_{i,m}(n) = \sum_{k=1}^{K_i} \sum_{j=0}^{k-1} c_{m,k,j}^{(i)} \times \gamma_{k,j} \left( |z_i(n-m)|, |z_l(n-m)| \right).$$

An LUT-based implementation of  $g_{i,m}$  for each memory is certainly saving on the number of multiplications. For a given memory length, (7) shows that the number of multiplication is drastically reduced to  $M_i$  for each band, regardless of the nonlinearity order. The ranges of  $|z_i|$  and  $|z_l|$  are predefined and normalized. Thus, after the model extraction, it is then possible to calculate the  $M_i$  complex gain tables for a predetermined couple of input values  $(|z_i|,|z_l|)$  and store them in the memory as shown in Fig. 3. Thus, these matrix or 2D-LUT, composed by concatenating multiple LUTs, need to be implemented in a system as described in Fig. 4. For each delay tap, the memory is indexed based on both signals' input amplitudes with an offset address. The retrieved gain values are then multiplied by the

TABLE I MEMORY RESOURCE COMPARISON

| LUT Size | Memory Size (MByte) |
|----------|---------------------|
| 64       | 0.13                |
| 128      | 0.53                |
| 256      | 2.10                |
| 512      | 8.39                |
| 1024     | 33.60               |
| 2048     | 134.22              |



Fig. 5. Block diagram of 2D-DPD basic cell.

respective delayed input signal and added to the other memory path values. While the number of multipliers is reduced and is independent of the nonlinearity order, the main drawback of a full-LUT implementation is the memory required for the tables. The size of the required memory can be estimated as follows:

Memory Size(bit) = 
$$2 \cdot LUT_{\text{size}}^2 \cdot M_i \cdot (\text{Bit Length})$$
 (9)

where LUT $_{\rm size}$  is the size of a unique LUT; i.e., the size for one variable,  $M_i$ , the number of memory tap, and (Bit Length) represent the size of the complex data stored in the LUT. As an example, let us consider a memory length  $M_i=4$  and assuming that each complex gain value is expressed on 32-bit, Table I shows the required memory for different unique LUT size. While the memory is relatively cheap, the time to update such a system can be very long and can penalize the speed of the DPD training and adaptation. In [26], by simplifying the model, a reduced LUT implementation is proposed for dual-band DPD resulting in limited performances.

# C. Hybrid LUT Multiplier Implementation

The last method proposed for the implementation of the orthogonal 2D-DPD is a hybrid solution combining multipliers and LUTs. The basis functions, which are real numbers, are stored in LUTs, while the rest of the calculation is done by conventional multipliers. LUT values do not need to be updated, and then the predistorter adaption is done by updating the coefficients. Thus, only small-size LUTs are required, and the number of multipliers is reduced compared with the full-multiplier implementation. A schematic of the implementation of a basic cell is presented in Fig. 5. From their respective signal amplitudes, the LUTs are indexed, and the basis function values are

TABLE II HARDWARE RESOURCE COMPARISON

|    |     | $K_i =$  | 5    | $K_i = 7$ |          |      | $K_i = 7$ |          |      |
|----|-----|----------|------|-----------|----------|------|-----------|----------|------|
|    |     | $M_i =$  | 3    | $M_i = 3$ |          |      | $M_i = 4$ |          |      |
|    | 8   | $\oplus$ | LUT  | 8         | $\oplus$ | LUT  | 8         | $\oplus$ | LUT  |
| H1 | 6   | 6        | 2048 | 6         | 6        | 2048 | 8         | 8        | 2048 |
| H2 | 130 | 86       | 16   | 330       | 168      | 24   | 288       | 224      | 24   |



Fig. 6. Time sequence view of the band operations.

retrieved. These are multiplied together by the complex coefficient and then by the respective band input signal. The DPD output signal results in the combination of the whole cell signals.

Table II proposes a hardware resource comparison of both the full-LUT (H1) and hybrid-LUT (H2) orthogonal 2D-DPD implementations for two nonlinear orders, two memory lengths, and by assuming that the unique LUT size is 1024. H1 and H2 stand for a full-LUT hardware system and for a hybrid-LUT hardware system, respectively.  $\otimes$ ,  $\oplus$ , and the LUT represent a complex multiplier, two-input adder, and the number of unique LUTs. From the table, we can see that the LUT implementation reduces drastically the hardware utilization while requiring a large number of complex LUTs depending only on the memory length. On the other hand, the hybrid-LUT implementation uses a lot of resources while reducing drastically the number of LUTs. In the further section, a time-division multiplexing architecture is introduced to reduce the cost of the implementation.

# D. Time Multiplexing for 2D-DPD Path Sharing

As shown in Fig. 2, the 2D-DPD architecture needs two predistorter paths relative to each bands. Therefore, each path requires a proper implementation and is operated in parallel occupying an entire time slot and then increasing the required FPGA resources. However, the parallel operation can be converted to a serial operation by using a multiplexer; both predistorters can be conducted in a serial way with only one path, saving then half of the resources. Nonetheless, the time duration for each operation becomes shorter, and the input signals must be upsampled by a factor 2, and the resulting single predistorter path is processing data twice the original input sample time. In Fig. 6, the time sequence of the processed band is represented. Based on the time-division multiplexing, we propose a new architecture for the implementation of the 2D-DPD technique presented in Fig. 7. Both input signals are upsampled and repeated by a factor 2, depending on the selection signal (CS), and two multiplexers enable to select alternatively the couple of inputs that have to be processed. Then, a demultiplexer enables to guide the



Fig. 7. Block diagram of the time-division multiplexing 2D-DPD architecture.



Fig. 8. Block diagram of the experimental setup for dual-band DPD.

output signal to the appropriate band path, and finally both signals are downsampled to get back to the original data sampling rate. The simple technique proposed here is able to save half of the resource compared with a regular parallel implementation by increasing the DPD processing rate by a factor 2, which is feasible for a large bandwidth signal. This architecture can be worth implementing in other DPD systems independently of the algorithm selected. Except for the number of LUT, the required resources shown in Table II are then reduced by half, which is very substantial for the hybrid-LUT implementation.

# IV. MEASUREMENT SETUP AND PERFORMANCE OF THE FPGA IMPLEMENTATION

### A. Measurement Setup

Fig. 8 shows the block diagram of the experimental setup, which was also presented in [27] and [28]. It is based on two commercial products, an FPGA Altera Stratix IV development kit [30] connected and clock-synchronized to two similar Analog Devices MSDPD demo boards [31]. Each MSDPD enables the up/downconversion, filtering, digital-to-analog conversion, and analog-to-digital conversion. The DAC is a 16-bit accuracy sampling at a rate of 983.04 MHz. 12-bit ADC sampling at 245.76 MHz is used in both observation paths. The FPGA clock runs also at 245.76 MHz, so the transmit signal is interpolated by a factor 4 directly by the MSDPDs. The maximum received complex bandwidth is 122.88 MHz. DACs and ADCs are synchronized to the FPGA. Finally, both



Fig. 9. FPGA and MSDPDs configuration for dual-band DPD.

# TABLE III SUMMARY OF THE TWO SCENARIOS

|             | Lower Side Band   | Upper Side Band   |
|-------------|-------------------|-------------------|
|             | Carrier Frequency | Carrier Frequency |
|             | 1890 MHz          | 2200 MHz          |
| Scenario I  | 1c-WCDMA          | 5c-WCDMA          |
| Scenario II | LTE 10MHz         | 5c-WCDMA          |





Fig. 10. Comparison of the signal power spectra at the output of the amplification stage for scenario I: (a) lower sideband (1c-WCDMA), (b) upper sideband (5c-WCDMA), for PA without 2D-DPD, PA with 2D-DPD software implementation, and PA with 2D-DPD hardware implementation.

MSDPDs are synchronized by using an external 61.44-MHz reference clock. The RF center frequency of both MSDPDs can





Fig. 11. Comparison of the signal power spectra at the output of the amplification stage for scenario II: (a) lower sideband (1c-LTE 10 MHz), (b) upper sideband (5c-WCDMA), for PA without 2D-DPD, PA with 2D-DPD software implementation, and PA with 2D-DPD hardware implementation.

be set between 1.8 to 2.2 GHz. A picture of the configuration is presented in Fig. 9.

The implemented FPGA design enables to communicate with MATLAB via the USB link to download/upload data from/to the FPGA memories. The baseband signals are synthesized using MATLAB, downloaded to the FPGA memory and processed by the FPGA. Both processed baseband signals are sent to their respective MSDPD to be upconverted to 1890 and 2200 MHz. Both generated RF signals are merged together to drive the amplification stage. The output signal is captured through a coupler, filtered, connected to the two RF observation paths, downconverted to an intermediate frequency (IF) of 184.32 MHz, digitized, and stored in the FPGA memory. Both received sets of data are digitally downconverted (DDC) and frequency time aligned [32] using MATLAB, and the 2D-DPD coefficients are extracted.

One of the major interests of such a test bench is its flexibility. The designed FPGA based test bed can be employ in two different modes as follows:

Mode 1: The test bed is used as a usual VSG/VSA measurement setup solution, the predistorter is software implemented, and the predistorted signal is generated using MATLAB, downloaded to the FPGA memory and run for verification. Then, one can take advantage of the software environment to test DPD algorithms in ideal conditions.

| Reference         | fcenter (MHz) |          | Signal Type<br>(Bandwidth) |            | ACPR (dBc)      |                 |               | NMSE (dB)       |                 |               |
|-------------------|---------------|----------|----------------------------|------------|-----------------|-----------------|---------------|-----------------|-----------------|---------------|
| Reference         | LSB           | USB      | LSB US                     | USB        | LSB             | USB             | $\Delta$ ACPR | LSB             | USB             | Δ NMSE        |
|                   |               |          |                            |            | w/o / w DPD     | w/o / w DPD     | LSB / USB     | w/o / w DPD     | w/o / w DPD     | LSB / USB     |
|                   |               | 900 2000 | 1c-WCDMA                   |            | -45 / -55.8     | -41.1 / -53.1   | 10.8 / 12     | -30.84 / -43.77 | -26.73 / -41.88 | 12.93 / 15.15 |
|                   |               |          | (3.84MHz)                  |            | -43 / -33.8     | -41.17 -33.1    | 10.6 / 12     | -30.647 -43.77  | -20.737 -41.00  | 12.93 / 13.13 |
| 2D-DPD            | 1000          |          | WiMAX                      | 1c-WCDMA   | 45 1 50         | -39.2 / -51.1   | 8 / 11.9      | -19.05 / -41.61 | -26 / -39.4     | 22.56 / 13.4  |
| Bassam et.al [10] |               |          | (5MHz)                     | (3.84MHz)  | 45 / 53         |                 |               |                 |                 |               |
|                   |               |          | WiMAX                      |            |                 |                 |               |                 |                 |               |
|                   |               |          | (10MHz)                    |            | 48 / 58         | -38.34 / -54.55 | 10 / 16.21    | -30.04 / -41.05 | -26.19 / -42.51 | 11.01 / 16.32 |
|                   | 4000          | 2000     |                            | 1c-WCDMA   | 10 10 1 70 01   |                 | 0.50 / 45     | 24.02.4.40.40   | 20 /0 / 20 /    | 10.15.110.01  |
| 2D-Modified DPD   | 1900          | 2000     | 2c-WCDMA                   | (3.84MHz)  | -42.48 / -52.06 | -41.2 / -56.2   | 9.58 / 15     | -21.97 / -40.12 | -20.69 / -39.5  | 18.15 / 18.81 |
| Liu et.al [13]    | 000           | 1050     | (-)                        | 3c-WCDMA   | 25.05 / 50.00   | 24.00 4.50 54   | 44.73.130.66  | 17.20 / 20.17   | 10.4 / 24.72    | 20.50 / 45.60 |
|                   | 880           | 1960     |                            | (-)        | -36.05 / -50.78 | -31.88 / -52.54 | 14.73 / 20.66 | -17.38 / -38.16 | -19.1 / -34.72  | 20.78 / 15.62 |
| 2D-Orthogonal     | 1460          | 469 1531 | 1c-WCDMA                   | LTE        | -40 / -49       | -37 / -47       | 9 / 10        | - / -           | -/-             | -/-           |
| Yang et.al [29]   | 1409          |          | (3.84MHz)                  | (5MHz)     |                 |                 |               |                 |                 |               |
| This work         | 1890          | 2200     | Ic-WCDMA                   |            | -36.91 / -56.06 | -32.27 / -51.94 | 19.15 / 19.67 | -25.24 / -41.77 | -19.02 / -38.33 | 16.53 / 19.31 |
|                   |               |          | (3.84MHz)                  | 5c-WCDMA   |                 |                 |               |                 |                 |               |
|                   |               |          | LTE                        | (23.84MHz) | -33.12 / -51.13 | -32.04 / -50.42 | 18.01 / 18.38 | -21.82 / -37.09 | -20.64 / -39.16 | 15.27 / 18.52 |
|                   |               |          | (10MHz)                    |            |                 |                 |               |                 |                 |               |

TABLE IV
SUMMARY OF THE LINEARIZATION PERFORMANCE OF BOTH SCENARIOS IN COMPARISON WITH THE PRIOR STUDIES

2) *Mode 2*: The predistorter is hardware implemented, and the predistorted signal is generated directly in the FPGA and run for verification. The received data are then downloaded to MATLAB for extraction. The updated predistorter coefficients are written to the memory using the USB link. In mode 2, real hardware is tested and can then be compared with the ideal software implementation.

The usage of these two modes are combined enabling to speed up the integration of an efficient DPD system in the hardware. The time-division-multiplexing solution has been implemented and combined to the 18-bit fixed hybrid-LUT implementation with an LUT size equal to 512. The coefficients are coded in 16-bit.

# B. Experimental Results

The amplification stage is composed of a cascade of 1-W Prewell linear driver followed by a broadband (500–2500 MHz) 10-W peak output power PA, based on the NXP Semiconductor GaN HEMT CLF1G0060-10 transistor [33] biased in Class-AB  $(V_{ds} = 50 \text{ V} \text{ and } I_{ds} = 40 \text{ mA})$ . At 2 GHz, the output power for a 1-dB gain compression is 36 dBm, and the drain efficiency is  $\eta_D = 21\%$ . The test signals are a 5.7-dB PAPR single-carrier WCDMA, a 9.8-dB PAPR 5-carrier WCDMA spaced apart from each other by 5 MHz, and a 10.2-dB PAPR single-band LTE 10-MHz signal. Two test scenarios are proposed. In scenario I, the lower sideband (LSB) centered at 1890 MHz drives a 1c-WCDMA, and the upper sideband (USB) centered at 2200 MHz drives a 5c-WCDMA signal. Scenario II proposes a combination of two standards: LTE 10 MHz and a 5c-WCDMA for LSB and USB, respectively. Table III summarizes the two different signal scenarios that have been considered in this paper for lower and upper sidebands.

The time-division multiplexing hybrid-LUT implementation is tested for  $K_i = 7$  and  $M_i = 4$ . The extraction process is done in single precision, i.e., a 32-bit floating point to take advantage of the orthogonal basis. Although a 64-bit floating point DSP is

available, it uses less resource and is more time efficient to implement the algorithm in a 32-bit DSP at the cost of increased sensitivity to numerical errors. The software implementation is considered as the reference design, where the DPD forward path is implemented in MATLAB using 64-bit floating point precision with no time multiplexing. The hardware implementation presents the DPD forward path implemented in the FPGA using 18-bit fixed point precision and the time-multiplexing method. During the training of the 2D-DPD model, 8000 samples are used for the extraction of the model coefficients. The linearization performances are evaluated with 231 000 samples.

Fig. 10 shows a comparison of the PA output power spectra for PA without linearization, PA with 2D-DPD implemented in software, and PA with 2D-DPD implemented in the hardware, for scenario I. Due to the crosstalk between both bands, on the LSB spectra, cross-modulation effects are largely noticeable, the amplification stage shows an output power spectra signal more than eight times larger than the 1c-WCMA bandwidth. The linearization stage allows to compensate for both in-band and cross-modulation distortions. The hardware implementation performs as well as the software implementation, decreasing the spectral regrowth by more than 15 dB in each band. The NMSE between both implementations is -40 dB for LSB and -43 dB for USB showing a good correlation between the software and hardware implementation.

Fig. 11 shows the same comparison for scenario II. The cross-modulation effects are less noticeable in this scenario. Nevertheless, both implementations enable to reduce the spectral regrowth below the -50 dBc. The NMSEs comparing both implementations are below -41 dB for both bands.

The performance of linearization, in terms of ACPR and NMSE, of the hardware implementation, are summarized in Table IV for scenarios I and II. Moreover, Table IV compares this linearization performance with the different results that have been published in [10], [13], and [29].

# V. CONCLUSION

In this paper, a 2D-DPD hardware architecture to compensate for the nonlinearity of concurrent dual-band transmitter has been proposed. The model implemented is based on the orthogonal polynomial proposed in a previous work. Two DPD hardware implementations are presented.

In the first one, the full-LUT implementation enables to save hardware but requires a large amount of memory. In the second one, a hybrid-LUT is proposed to use predetermined LUTs but requires a larger number of multipliers.

Next, a new hardware implementation with reduced complexity has been presented, employing the time-division multiplexing. Thanks to this technique, half of the original hardware resources are saved. Based on commercial products and a development FPGA, an efficient test bed for the design of concurrent dual-band predistorter has been described. This measurement setup enables to test the DPD algorithm either in a software environment or directly in the FPGA.

The hybrid-LUT hardware implementation has been tested for two different scenarios alternating multicarrier WCDMA and LTE single-band signals, for the linearization of a 10-W PA. Both software and hardware implementations have been compared, giving similar results, showing ACPRs of less than -50 dBc and an NMSE around -40 dB, and validating the FPGA implemented architecture.

## ACKNOWLEDGMENT

The authors would like to thank Analog Devices Inc., Wilmington, MA, USA, and NXP Semiconductors, Smithfield, RI, USA, for donating the Mixed Signal Digital Pre-distortion System Boards (MSDPDs) and the PAs used in this study, respectively. The authors wish also to thank Altera Corporation/Wireless Systems Solutions Group for their financial and technical support of this project and the donation of the Stratix IV FPGA.

# REFERENCES

- J. H. K. Vuolevi and T. Rahkonen, Distortion in RF Power Amplifiers. Boston, MA, USA: Artech House, 2003.
- [2] D. R. Morgan, Z. Ma, J. Kim, M. G. Zierdt, and J. Pastalan, "A generalized memory polynomial model for digital predistortion of RF power amplifiers," *IEEE Trans. Signal Process.*, vol. 54, no. 10, pp. 3852–3860, Oct. 2006.
- [3] R. Raich, H. Qian, and G. T. Zhou, "Orthogonal polynomials for power amplifier modeling and predistorter design," *IEEE Trans. Veh. Technol.*, vol. 53, pp. 1468–1479, Sep. 2004.
- [4] F.-L. Luo, Digital Front-end in Wireless Communications and Broadcasting. Cambridge, U.K.: Cambridge Univ. Press, 2011.
- [5] L. Guan and A. Zhu, "Optimized low-complexity implementation of least squares based model extraction for digital predistortion of RF power amplifiers," *IEEE Trans. Microw. Theory Tech.*, vol. 60, no. 3, pp. 594–603, Mar. 2012.
- [6] P. Saad, P. Colantonio, L. Piazzon, F. Giannini, K. Andersson, and C. Fager, "Design of a concurrent dual-band 1.8–2.4-GHz GaN-HEMT Doherty power amplifier," *IEEE Trans. Microw. Theory Tech.*, vol. 60, no. 6, pp. 1840–1849, June 2012.
- [7] R. Liu, D. Schreurs, W. De Raedt, F. Vanaverbeke, and R. Mertens, "Concurrent dual-band power amplifier with different operation modes," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Baltimore, MD, USA, Jun. 2011.

- [8] A. Cidronali, N. Giovannelli, T. Vlasits, R. Hernaman, and G. Manes, "A 240 W dual-band 870 and 2140 MHz envelope tracking GaN PA designed by a probability distribution conscious approach," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Baltimore, MD, USA, Jun. 2011.
- [9] W. Chen, S. A. Bassam, X. Li, Y. Liu, K. Rawat, M. Helaoui, F. M. Ghannouchi, and Z. Feng, "Designand linearization of concurrent dual-band doherty power amplifier with frequency-dependent power ranges," *IEEE Trans. Microw. Theory Tech.*, vol. 59, no. 10, pp. 2537–2546, 2011.
- [10] S. A. Bassam, M. Helaoui, and F. M. Ghannouchi, "2-D digital predistortion (2-D-DPD) architecture for concurrent dual-band transmitters," *IEEE Trans. Microw. Theory Tech.*, vol. 59, pp. 2547–2553, Oct. 2011
- [11] P. Roblin, S. K. Myoung, D. Chaillot, Y. G. Kim, A. Fathimulla, J. Strahler, and S. Bibyk, "Frequency-selective predistortion linearization of RF power amplifiers," *IEEE Trans. Microw. Theory Tech.*, vol. 56, no. 1, pp. 65–76, Jan. 2008.
- [12] F. M. Ghannouchi and O. Hammi, "Behavioral modeling and predistortion," *IEEE Microw. Mag.*, vol. 10, no. 7, pp. 52–64, Dec. 2009.
- [13] Y.-J. Liu, W. Chen, J. Zhou, B.-H. Zhou, and F. Ghannouchi, "Digital predistortion for concurrent dual-band transmitters using 2-D modified memory polynomials," *IEEE Trans. Microw. Theory Tech.*, vol. 61, no. 1, pp. 281–290, Jan. 2013.
- [14] A. Cidronali, I. Magrini, R. Fagotti, and G. Manes, "A new approach for concurrent dual-band IF digital predistortion: System design and analysis," in *Workshop on Integr. Nonlinear Microw. Millimetre-Wave Circuits (INMMIC)*, Nov. 24–25, 2008, pp. 127–130.
- [15] J. Kim, P. Roblin, X. Yang, and D. Chaillot, "A new architecture for frequency-selective digital predistortion linearization for RF power amplifiers," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Montreal, QC, Canada, Jun. 2012.
- [16] J. Kim, P. Roblin, D. Chaillot, and Z. Xie, "A generalized architecture for the frequency-selective digital predistortion linearization technique," *IEEE Trans. Microw. Theory Tech.*, vol. 61, no. 1, pp. 596–605, Jan. 2013
- [17] S. A. Bassam, W. Chen, M. Helaoui, F. M. Ghannouchi, and Z. Feng, "Linearization of concurrent dual-band power amplifier based on 2D-DPD technique," *IEEE Microw. Wireless Compon. Lett.*, vol. 21, no. 12, pp. 685–687, 2011.
- [18] S. A. Bassam, A. Kwan, W. Chen, M. Helaoui, and F. M. Ghannouchi, "Subsampling feedback loop applicable to concurrent dual-band linearization architecture," *IEEE Trans. Microw. Theory Tech.*, vol. 60, no. 6, pp. 1990–1999, 2012.
- [19] Y. J. Liu, W. Chen, B. Zhou, J. Zhou, and F. M. Ghannouchi, "2D augmented Hammerstein model for concurrent dual-band power amplifiers," *Electron. Lett.*, vol. 48, pp. 1214–1216, 2012.
- [20] S. Zhang, W. Chen, F. M. Ghannouchi, and Y. Chen, "An iterative pruning of 2-D digital predistortion model based on normalized polynomial terms," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Jun. 2013.
- [21] Y.-J. Liu, W. Chen, J. Zhou, B.-H. Zhou, and Y.-N. Liu, "Joint predistortion of IQ impairments and PA nonlinearity in concurrent dual-band transmitters," in *Proc. 42nd Eur. Microw. Conf. (EuMC)*, Oct. 2012, pp. 132–135.
- [22] M. Younes and F. M. Ghannouchi, "On the modeling and linearization of a concurrent dual-band transmitter exhibiting nonlinear distortion and hardware impairments," *IEEE Trans. Circuits Syst. I, Reg. Papers*, no. 99, pp. 1–14, 2013.
- [23] M. Rawat, K. Rawat, M. Younes, and F. M. Ghannouchi, "Joint mitigation of nonlinearity and modulator imperfections in dual-band concurrent transmitter using neural networks," *Electron. Lett.*, vol. 49, no. 4, pp. 253–255, Feb. 2013.
- [24] M. Younes, A. Kwan, M. Rawat, and F. M. Ghannouchi, "Three-dimensional digital predistorter for concurrent tri-band power amplifier linearization," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Seattle, WA, USA, Jun. 2013.
- [25] A. K. Kwan, S. A. Bassam, M. Helaoui, and F. M. Ghannouchi, "Concurrent dual band digital predistortion using look up tables with variable depths," in *Proc. IEEE Topical Conf. Power Amplifiers for Wireless Radio Appl. (PAWR)*, Santa Clara, CA, USA, Jan. 2013.
- [26] L. Ding, Z. Yang, and H. Gandhi, "Concurrent dual-band digital predistortion," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Montreal, QC, Canada, Jun. 2012.

- [27] N. Naraharisetti, C. Quindroit, P. Roblin, S. Gheitanchi, V. Mauer, and M. Fitton, "2D cubic spline implementation for concurrent dual-band system," in IEEE MTT-S Int. Microw. Symp. Dig., Jun. 2013
- [28] C. Quindroit, N. Naraharisetti, P. Roblin, S. Gheitanchi, V. Mauer, and M. Fitton, "Concurrent dual-band digital predistortion for power amplifier based on orthogonal polynomials," in IEEE MTT-S Int. Microw. Symp. Dig., Seattle, WA, USA, Jun. 2013.
- [29] G. Yang, F. Liu, L. Li, H. Wang, C. Zhao, and Z. Wang, "2D orthogonal polynomials for concurrent dual-band digital predistortion," in IEEE MTT-S Int. Microw. Symp. Dig., Seattle, WA, USA, Jun. 2013.
- [30] Altera [Online]. Available: www.altera.com/devices/fpga/stratixfpgas/stratix-iv/stxiv-index.jsp
- [31] Analog Devices [Online]. Available: www.analog.com/static/imported-files/eval boards/AD-MSDPD-EVB.pdf
- [32] S. Boumaiza, M. Helaoui, O. Hammi, L. Taijun, and F. M. Ghannouchi, 'Systematic and adaptive characterization approach for behavior modeling and correction of dynamic nonlinear transmitters," IEEE Trans. Instrum. Meas., vol. 56, no. 6, pp. 2203-2211, Dec. 2007.
- [33] NXP [Online]. Available: www.nxp.com/products/rf/amplifiers/ power transistors/gan devices/CLF1G0060-10.html#overview



Patrick Roblin (M'85) was born in Paris, France, in September 1958. He received the Maitrise de Physics degree from the Louis Pasteur University, Strasbourg, France, in 1980 and the M.S. and D.Sc. degrees in electrical engineering from Washington University, St. Louis, MO, USA, in 1982 and 1984, respectively.

In 1984, he joined the Department of Electrical Engineering, at The Ohio State University (OSU), Columbus, OH, USA, as an Assistant Professor and is currently a Professor. His present research inter-

ests include the measurement, modeling, design and linearization of nonlinear RF devices and circuits such as oscillators, mixers, and power amplifiers. He is the first author of two textbooks titled High-Speed Heterostructure and Devices (Cambridge Univ. Press, 2002) and Nonlinear RF Circuits and Nonlinear Vector Network Analyzers (Cambridge Univ. Press, 2011). At OSU, he is the Founder of the NonLinear RF Research Laboratory. He has developed at OSU two educational RF/microwave laboratories and associated curriculum for training both undergraduate and graduate students.



Christophe Quindroit was born in Corbeil-Essonnes, France, in October 1982. He received the M.Tech. and M.S. degrees in electronics from the Ecole Polytechnique de l'Université de Nantes, Nantes, France, in 2005 and the Ph.D. degree in electronics from XLIM, University of Limoges, Limoges, France, in 2010.

He was a Project Engineer with ALCATEL-LU-CENT, France. He is currently working as a Research Engineer at the Ohio State University, Columbus, OH, USA. His current research interests include

analog system-level modeling, PA linearization techniques, and FPGA implementation

Dr. Quindroit is the recipient of the 2010 European Microwave Conference Young Engineers Prize.



Naveen Naraharisetti (S'13) was born in Andhra Pradesh, India, in June 1984. He received the B.Tech. degree in electronics and communications engineering from Acharya Nagarjuna University, India, in 2005 and the M.S. degree in electrical and computer engineering from the University of Michigan, Ann Arbor, MI, USA, in 2008.

He is currently working towards the Ph.D. degree at The Ohio State University, Columbus, OH, USA. His research interests are concurrent multiband digital predistortion for power amplifiers with FPGA im-

plementation and nonlinear modeling.



Shahin Gheitanchi (M'04) received the M.Sc. degree in digital communications and the Ph.D. degree from the University of Sussex, U.K., in 2004 and 2009, respectively.

He is currently with the Wireless Systems Solutions Group of Altera, Buckinghamshire, U.K. His research interests include multicarrier multiple-access techniques, adaptive real-time signal processing, crest factor reduction, adaptive digital pre-distortion, application of biologically inspired artificial intelligence for optimization, and heteroge-

neous multistandard networks. He has a number of publications in international journals and conferences.

Volker Mauer received the M.Sc. degree in VLSI design from Bournemouth University in 1991.

Since then, he has been working at GEC Plessey Semiconductors, Siemens Semiconductors, and Altera on a number of products, including GPS, radar, and wireless communications. He currently works in Wireless System Solution Group of Altera, Buckinghamshire, U.K., where his research interest is the efficient silicon implementation for advanced wireless standards.



Mike Fitton received the Ph.D. degree from the University of Bristol in 1997. His research focused on frequency-hopping code-division multiple access and was funded under a U.K. EPSRC Research

He was a Researcher with the British Telecom Virtual Universities Research Initiative during the ETSI evaluation of candidate radio access schemes for UMTS. Since then, he has been involved in various capacities in wireless research and development, including algorithm design and signal processing

for both handset and infrastructure. He is currently responsible for wireless strategy at Altera Corporation, Buckinghamshire, U.K. He has numerous publications and in excess of 30 patents.