# Optical Link Optimization Using Embedded Forward Error Correcting Codes

# Ted H. Szymanski

Abstract—The design of a single-chip optical transceiver to optimize the performance of a short-distance optical datalink is proposed. The transceiver includes an embedded hybrid automatic repeat request (ARQ) controller capable of operation at several gigahertz clock rates. The hybrid ARQ controller uses a combination of packet retransmission protocols and forward error correction (FEC) to minimize bit errors and achieve a transmitter power coding gain of several dB. Conventional FEC codes such as Reed-Solomon codes cannot be used due to their excessive hardware cost and delays. A practical multilevel coding scheme is explored. The inner codes consist of small linear block codes with reasonable FEC capability, such as small BCH codes, which can be encoded and decoded with reasonable hardware cost and delay. The outer code for a complete packet consists of a long linear block code with excellent error detection ability, such as a cycle redundancy check code. Low-power pipelined on-chip FEC decoders with estimated throughputs of several hundred gigabits per second per square millimeter are proposed. Mathematical analysis indicates that substantial coding gains are possible, which can be used to increase the data rate or the distance span of the link. The proposed designs can be used in short-distance optical transceivers for 10-Gb ethernet, fiberchannel, and very short reach optical datalinks, and are scalable to future two-dimensional optical datalinks with Terabits of capacity.

*Index Terms*—Automatic repeat request (ARQ), BCH, code, forward error correcting, optical link, pipelined, transceiver, very large scale integration.

#### I. INTRODUCTION

**W**ERTICAL-CAVITY surface-emitting laser (VCSEL) optical datalink standards are being developed for many short distance networking applications, including 10-Gb ethernet, fiberchannel, and very short reach (VSR) networks. Many of these emerging standards describe VCSEL optical datalinks with several one-dimensional (1-D) parallel optical channels, each operating at clock rates of several GHz, with link bit error rates (BERs) as low as  $10^{-15}$ .

To date, the optical transceivers for short distance optical datalinks using VCSELs have not embraced embedded (on-chip) forward error correction (FEC), for several reasons. First, the VCSEL technology itself has only been developed over the last decade and it already presents several formidable design challenges. Secondly, the hardware technology to encode and decode traditional FECs is expensive and infeasible to embed within a low-cost single-chip transceiver. Third, traditional FECs such as Reed–Solomon (RS) codes require several

The author is with the Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S-4K1, Canada (e-mail: teds@ece.eng.mcmaster.ca).

Digital Object Identifier 10.1109/JSTQE.2003.813314

thousand clock cycles to decode in the time-domain, and the decoding delays would render such transceivers ineffective for computing networks and storage area networks, where low delay is essential. Finally, VCSEL technology has the potential to scale to two-dimensional (2-D) optical datalinks, with potentially several thousand VCSELs each operating at several gigahertz clock rate, for an aggregate data rate in the tens of terabits per second range. It would be very difficult to scale traditional FEC codes such as long RS codes to such an environment, for all of the above reasons.

Neifeld and Kostuk explored the use of short RS codes to optimize the performance of a 2-D bit-parallel free-space optical datalink [1]. The RS codes provided a coding gain which improved the SNR, which allowed the free-space optical channels to be spaced closer together. The decreased spacing between optical channels increased the optical crosstalk, but this increase was mitigated by the coding gain, resulting in a net throughput increase by a factor of 3 to 8. They also proposed spectral domain RS encoder and decoder chip-set in [2]. They recognized the complexity of the decoders and identified the choice of FEC codes as an issue for further research.

FEC has been proposed by the International Telecommunications Union (ITU) for use in long distance bit-serial submarine fiber optic systems, operating at 10 Gb/s [3]. The ITU recommendation calls for an increase in the link data rate by 7% to accommodate the check bits of the FEC code, a RS(255 239) code, which can correct up to t = 8 random symbol errors. The code has a coding gain of 5.5 dB at a BER of  $10^{-12}$ . To provide further protection against bursts, the ITU recommends a 16-way interleaved RS(255 239) code, which can correct bursts of up to 1024 bits.

Vitesse Semiconductor Corporation (Camarillo, CA) has an advance product notice describing an integrated circuit chip-set (VSC9210) which can be used for long distance bit-serial fiber optic systems [4]. The chip-set can perform RS(255 241) encoding and decoding at data rates of 2.488 Gb/s before encoding and 2.654 Gb/s after encoding, and consumes 1.05 W for the encoder and 2.82 W for the decoder at BER =  $10^{-5}$  for a combined power dissipation of approximately 4 W. The design of another RS FEC encoder/decoder chip-set with an aggregate rate of 10 Gb/s is described in [5]. However, according to [5], at speeds beyond 10 Gb/s, the implementation of FEC "becomes extremely challenging due to excessive complexity and power consumption." To illustrate this point, an FEC system for a 100-Gb/s link using the Vittesse technology would require 40 chip-sets, dissipating a total of 160 W, which does not include the chip-to-chip electronic interconnections to move data between ICs. Electronic chip IO require additional power. An FEC

Manuscript received November 11, 2002; revised February 7, 2003.

system for a 1 Tb/s link using the Vittesse technology would require 400 chip-sets, dissipating a total of 1600 W for decoding. An FEC system for a future 10 Tb/s optical link created with 2-D VCSEL technology, using the Vittesse technology, would require 4000 chip-sets, dissipating a total of 16 kW for decoding. These examples illustrate the difficulty with traditional codes and decoders for future high-capacity optical links.

The Canadian government has funded a 10-year national program to develop the architectures and technologies for a multiterabit free-space optical backplane, and a system demonstrator is described in [8]. As part of the architectural innovation, the use of simple embedded FEC built directly onto a 2-D CMOS/VCSEL optical transceiver was proposed in [6]. The industrial sponsors, major Canadian manufacturers in the optical networking industry, were strongly opposed to the concept of embedding FEC directly onto CMOS/VCSEL transceivers, citing concerns about the hardware complexity of on-chip FECs, the loss of bandwidth due to the FEC overhead and the lack of tangible benefits to FEC. In an effort to address these issues, an analysis of the coding gain in short distance optical datalinks using ARQ and simple FECs, embedded directly onto a single-chip CMOS/VCSEL transceiver, was proposed in [7]. It was shown that using the same parameters as in [1], substantial coding gains could be achieved using simple embedded ARQ and FEC based on multidimensional parity checks. Surprisingly, it was shown that data rate increases of a factor of 30 could be achieved, given the same assumptions in [1]. Reference [9] extends the work of [1] and [7] by exploring the use of small FECs such as Golay codes and BCH codes in 2-D optical datalinks.

In this paper, the design of a single-chip optical transceiver to optimize the performance of short-distance optical datalinks is proposed. The transceiver includes an embedded hybrid automatic repeat request (ARQ) controller capable of operation at several gigahertz clock rates, a challenging design. The hybrid ARQ controller uses a combination of packet retransmission protocols and on-chip FEC to minimize bit errors and achieve a transmitter power coding gain of several dB. The use of retransmissions requires a bidirectional optical link, whereas existing commercial parallel VCSEL datalinks and transceivers are unidirectional. However, in many applications such as computer-to-computer or computer-to-memory communications, bidirectional communications is essential. We postulate that the development of bidirectional transceivers for these applications, while challenging, is feasible.

In this paper, a practical multilevel coding scheme using simple FECs and error-detection codes suitable for fabrication directly onto a single chip CMOS/VCSEL IC is explored. The inner codes (encoded last and decoded first) consist of small linear block codes with reasonable FEC capability, such as small BCH or Hamming codes, or the Golay code, which can be encoded and decoded with reasonable hardware cost and delay. The outer codes for a complete packet consist of a long linear block code with excellent error detection (ED) ability, such as a cycle redundancy check (CRC) code or the multidimensional parity checks considered in [6] and [7]. Low-power pipelined on-chip encoders and decoders for the FEC and CRC codes are proposed. The decoders have estimated aggregate throughputs of several hundred gigabits per square millimeter, and therefore, the very large scale integration area overhead of including on-chip FEC decoders is small. Mathematical analysis indicates that substantial coding gains are achievable. The coding gains can be used to increase the data rate in short-distance optical datalinks, or to increase the distance span. The proposed designs can be used in optical transceivers for multigigabit ethernet, fiberchannel, and VSR optical datalinks. More importantly, the proposed schemes are scalable to multiterabit throughputs necessary to support the 2-D VCSEL datalinks of the future.

This paper is organized as follows. Section II reviews noise and BERs, and describes the proposed multilevel codes using BCH and CRC codes. Sections III–V describe the mathematical models for three optical link encoding schemes (FEC alone, ARQ alone, and a combination of FEC and ARQ). Section VI discusses the hardware encoders and decoders for the proposed codes. Section VII contains some concluding remarks.

## II. NOISE IN OPTICAL SYSTEMS

Two major sources of noise in an optical datalink are *thermal* and *shot* noise [9]. Thermal noise originates due to the random motion of electrons in the resistive load of the receiver amplifier circuit, and is always present. Shot noise is due to the discrete nature of electrons, and arises from the random generation and recombination of free electrons and holes in a photodiode. When the optical power is low, thermal noise tends to dominate over shot noise, yielding a thermal-noise-limited system [9].

There are several potential sources of noise in a VCSEL optical datalink.

- 1) Poor coupling between the VCSELs or photodetectors (PDs) and the fiber can degrade power. The use of rigid mechanical connectors such as the MT-connector, along with small diameter VCSELs (10  $\mu$ m) and large diameter multimode fiber (62.5  $\mu$ m), can increase the coupling efficiency.
- Nonuniform power over large 2-D optical arrays can result in some optical channels with high noise levels.
- 3) Laser modes can contribute to noise at high frequencies.
- 4) Channel dispersion leading to intersymbol interference (ISI) can contribute to noise at high frequencies. However, for short multimode fiber optical links with distances of tens of meters, ISI will be limited.
- 5) Digital electronic switching, power supply noise and electronic crosstalk on dense 1- and 2-D optoelectronic integrated circuits can be a major source of noise.

All of these phenomena will lower the SNR and increase the BER, and all can be partially compensated for by employing the error correcting schemes proposed in this paper. In all systems with limited optical power, noise will be a dominant cause of bit errors, and the proposed schemes will compensate regardless of the source of the noise.

In this paper, to allow for comparison with the prior analyses in this area [1], [2], [7], [9], we assume thermal-noise-limited systems where the primary cause of bit errors is thermal noise at the PD. In practice, laser modes and ISI will contribute to the noise in long distance multimode fibers with lengths greater



Fig. 1. Structure of the proposed single-chip optical transceiver.

than hundreds of meters operating at frequencies in excess of several gigahertz. Electronic switching and crosstalk will also contribute. Ultimately, these sources of noise will limit the performance to be gained by using the techniques proposed in this paper.

The noise-equivalent-power (NEP) is a measure of a PD's sensitivity, and reflects the amount of optical power needed to achieve a SNR ratio of unity. The units of the NEP are W/Hz<sup>1/2</sup>.

## A. 2-D Optical Datalinks

Consider a 2-D parallel optical data link where 1024 bits of parallel data arrive at every clock tick, with a 10-GHz clock rate, for a total bandwidth of 10 Tb/s. The U.S. Defense Advanced Research Projects Agency (DARPA) is sponsoring the development of such technologies, and while such data links do not yet exist in real systems, the technology is feasible and such links may begin to appear in systems in several years. The techniques proposed in this paper will be scalable to the tens of Tb/s capacities associated with future 2-D optical datalinks.

Fig. 1 illustrates a transceiver for an optical datalink, as required for the schemes to be proposed in this paper. Packets arrive from the producer of data into the *Packet-In Queue*. The packet passes through an *ARQ Encoder* unit, which appends the ARQ header information. The packet then passes through the *CRC Encoder* unit, which appends an ED code, typically a 32-bit CRC checksum. The resulting packet is then passed to the inner *FEC Encoder* unit, where the packet is encoded using multiple relatively small BCH FEC codes. The packet is then stored in the *Sliding-Window Queue*, which is used in the *ARQ* protocol. A packet to be transmitted is transferred from the *Sliding-Window Queue* to the *Transmit Buffer*, under the control of the *Datalink Controller*, and transmitted by an on-chip 1- or 2-D VCSEL array.

On the receiver side, a packet arrives from the channel through the *Photodetector Array* and into the *Reassembly Buffer*. The individual bits are reassembled to form a complete packet. The complete packet is then transferred into the inner *FEC Decoder* unit, which performs FEC on the multiple BCH codewords. Codewords which experience less than t bit errors (for some parameter t) are correctly decoded. Codewords which experience greater than t bit errors are either 1) decoded incorrectly with undetectable errors or 2) are detected as errorful. (The codewords may possibly be correctly decoded, but this



Fig. 2. Multilevel encoding format.

event is ignored.) Once the multiple codewords in a packet are decoded, if ARQ is used, the packet is passed to the CRC Decoder unit, which performs ED using the CRC checksum in the ARQ trailer. Packets with no detectable bit errors are accepted as error free, and forwarded to the Packet Out Queue, where they will eventually be removed by the consumer of data. When ARQ is used, packets with detectable bit errors are deleted, and the Datalink Controller is signaled. The Controller sends information back to the sender requesting a retransmission of the erroneous packet, using the ARQ Sliding-Window protocol. There exist several flow control protocols which manage such retransmissions [11], [12]. We will assume the selective-repeat protocol, where only the packets which are explicitly requested are retransmitted. This scheme does require two way traffic flow over the optical datalink. Provided that most packets are error free, the selective-repeat protocol consumes very little of the datalink bandwidth.

The packet format given the multilevel coding scheme is shown in Fig. 2. The data is first formatted into 512-bit sections. An ARQ header which includes destination information and source sequence numbers is prepended. An ARQ trailer consisting of a CRC checksum is appended. The entire packet is then partitioned into small n-bit sections, each of which is encoded into a k-bit codeword using the FEC code.

#### **III. SIGNAL-TO-NOISE RATIO**

Data can be transmitted over a channel according to several different modulation schemes, including direct detection (DD), binary phase shift keying, and differential phase shift keying. In the DD scheme, the absence or presence of optical power denotes the logic values 0 or 1, respectively. The basic BER of an optical datalink using the DD modulation scheme, denoted *BER*, is given by (1), where the Error Function  $erf(\cdot)$  is defined in (1), and where *snr* denotes the signal-to-noise ratio at the receiver [10], [13]

$$BER = \frac{1}{2} - \frac{1}{2} erf(0.354\sqrt{snr})$$

where

$$erf(z) = \frac{2}{\sqrt{\pi}} \int_0^z e^{-\lambda^2} d\lambda.$$
 (1)

The reader is referred to [14] for a derivation of (1). For x > 3, the error function can be approximated using  $erf(x) \approx 1 - e^{-x^2}(x\pi^{1/2})$ .

Fig. 3 illustrates the BER of the uncoded or "raw" channel (i.e., the BER before any error control) versus the SNR, as given



Fig. 3. BER and PER versus SNR for uncoded optical link.

by (1). Observe that the BER drops rapidly as the SNR increases. Define the PER for an uncoded channel to be the probability a packet with m bits is received without any bit errors. The PER for 512-bit packets over the raw channel is also shown on Fig. 3.

The SNR for a free-space parallel optical datalink with optical crosstalk is given in [1]

$$SNR \equiv \frac{P_0 \cdot \eta}{P_0(1-\eta) + \text{NEP} \cdot \sqrt{f}}$$
(2)

where Po denotes the power of each optical source, and where  $\eta$  represents the efficiency of the optical power transfer. The first term in the denominator represents the noise due to optical crosstalk. The second term represents the thermal noise over the bandwidth of the receiver. In this paper, we assume that optical transmission is through parallel optical fiber, where the optical crosstalk is negligible. Under this assumption, the following formula for the SNR is used [10], [13]

$$SNR \equiv \frac{P_0 \cdot \eta}{NEP \cdot \sqrt{f}} \tag{3}$$

as an approximation since it ignores some potential sources of noise, as discussed in Section II. However, regardless of the source of noise, the bit errors will be corrected by the proposed error control schemes, in the same manner as the random bit errors introduced, due to thermal noise, are corrected.

Fig. 4 (a) and (b), respectively, illustrates the eye-diagram of a 3-Gb/s commercial VCSEL and a 10-Gb/s VCSEL over 80 m of multimode fiber, indicating that the ISI is low at 10 Gb/s. The figure illustrates a reasonably good eye-diagram and suggests that improvements in the data rate of the link, or the distance span of the link, can be achieved through the use of error control coding. While counterintuitive, the performance can be improved by increasing the data rate or distance span until the eye diagram degrades further, given the error correcting ability of the proposed datalink controller.



Fig. 4. Eye-diagrams of a commerical VCSEL array. (a) 3 Gb/s and (b) 10 Gb/s over 80 m of multimode fiber.



Fig. 5. SNR versus frequency for uncoded optical link.

## A. Uncoded Optical Link

For the VCSEL-based datalinks in this paper, assume a 3-dB power loss in the optical interconnect for an efficiency of  $\eta = 0.5$ . The *Noise-Equivalent-Power* is representative of figures reported in the literature, i.e., assume the NEP =  $0.3 \text{ nW/Hz}^{1/2}$  as used in [1] and [7]. According to [15], this NEP is compatible with simple receiver designs that can achieve SNR = 10 with 50  $\mu$ W of optical power over a bandwidth of 250 MHz, in the absence of ISI.

Fig. 5 illustrates the SNR versus the frequency f for an uncoded optical datalink according to 3 for a VCSEL power level of 1.0 mW, assuming NEP = 0.3 nW/Hz<sup>1/2</sup> and  $\eta = 0.5$ . Observe that the SNR drops as the frequency increases.

Consider the design of a VCSEL-based optical datalink with a maximum acceptable packet error rate (PER) of  $\leq 10^{-15}$ . Assume a VCSEL with 1 mW of power, an optical imaging system with 3 dB loss, and a NEP = 0.3 nW/Hz<sup>1/2</sup>. For the raw optical datalink, the SNR must be  $\geq 300$  to ensure the PER  $\leq 10^{-15}$ , as illustrated by the bold dot in Fig. 3. From Fig. 5, at 1 mW the frequency f must be  $\leq 31$  MHz to ensure the SNR = 300 and PER  $\leq 10^{-15}$ , as illustrated by the bold dot. This frequency will be used as a reference point for the following designs. These uncoded links are representative of existing parallel VCSEL transceivers, which do not perform ARQ or FEC at the datalink layer, and where all bit errors are left for processing at a higher protocol layer.

## B. Embedded FEC Alone With No ARQ

Consider an optical datalink with FEC only. No ARQ issues are handled at the transceiver. Let the BER on an optical channel before any error control be denoted by  $p_{be}$ . The probability a packet is accepted with an undetectable error, equivalent to the PER, is denoted by the probability  $p_{pe}$ .

Let  $p_c$  be the probability a packet is received without error;  $p_u$  be the probability a packet is received with an undetectable error pattern; and  $p_e$  be the probability a packet is received with a detectable error pattern. Let E be a random variable denoting the number of bit errors encountered in an m-bit packet. Assuming throughout the paper that bit errors are independent and random events occurring with probability  $p_{be}$ , the probability a packet experiences e random bit errors is given by the binomial distribution

$$\Pr(E = e) = B(m, e, p_{be}) \equiv \binom{m}{e} (1 - p_{be})^{m-e} p_{be}^{e}.$$
 (4)

The probability that a packet experiences no bit errors is given by  $Pr(E = 0) = B(m, 0, p_{be})$ .

The BCH codes are a class of cyclic codes, which can be decoded in an iterative manner [11]. The BCH codes consist of two classes, the binary and nonbinary BCH codes. Of the nonbinary BCH codes, the RS code is the most well known. The BCH codes can use an iterative algorithm to find the locations of the bit errors, which can then be corrected. Fast implementations of the BCH codes, including the RS codes, can be very expensive and, for large n and t, the hardware cost of finding the locations of the errors is substantial [11]. Therefore, the choice of which BCH codes may be feasible for very high-speed optical datalinks must be carefully considered. This issue will be addressed in Section VI.

A (k, n, t) BCH code accepts n bits of data and appends (n-k) parity check bits to create a k bit codeword. The code is capable of correcting up to t bit errors in the codeword. When the number of bit errors exceeds t, the correct decoding of the codeword cannot be guaranteed. If greater than t bit errors occurs, three events may happen: 1) the codeword may be incorrectly decoded yielding a zero syndrome, in which case, an undetected error has occurred; 2) the codeword may be incorrectly decoded yielding a nonzero syndrome, in which case, a detectable error has occurred; or 3) the codeword may be correctly decoded with a zero or nonzero syndrome.

In many systems, to avoid the event of accepting incorrectly decoded data (event 1), the BCH (k, n, t) codes are often used to correct fewer bit errors than their maximum error correction capability; some of the FEC overhead bits are used for ED. Typically, a (k, n, t) codeword is used to correct all error patterns with  $\leq t'$  bit errors, where t' < t. The code is used to detect error patterns with e bit errors, where  $t' < e \leq t$ . The codes are chosen so that the event of t bit errors is relatively rare, reducing the likelihood of decoding errors.

In this section, we reserve no FEC overhead bits for errordetection and correct up to t bit errors per (k, n, t) codeword, at the expense of admitting more undetected bit errors after decoding. Throughout the paper, assume that both events 1) and 2) in the previous paragraphs are equiprobable, and event 3) is negligible. Given a codeword, it is accepted by the receiver without bit errors with probability

$$P_{cw} = \sum_{e=0}^{t} B(k, e, P_{be}).$$

Assume a packet has 512 data bits. This packet is encoded into  $h = \lceil 512/n \rceil$  codewords after the BCH inner code is applied. A packet with h codewords will be correctly decoded and accepted with no bit errors with probability  $P_{cp} = B(h, h, P_{cw})$ , which reflects the event that all hcodewords can be correctly decoded.

An entire packet may have bit errors after decoding with probability  $P_w = \sum_{e=1}^{h} B(h, e, 1 - P_{cw}) = 1 - B(h, O, P_{cw})$ , which reflects the event that one or more of the *h* BCH codewords experiences greater than *t* bit errors. Given that the system under consideration supports FEC only, with no ARQ, these packets are accepted by the transceiver. Given the assumptions, the PER is, therefore

$$PER = \frac{\sum_{e=1}^{h} B(h, e, 1 - P_{cw})}{B(h, h, P_{cw}) + \sum_{e=1}^{h} B(h, e, 1 - P_{cw})}$$
(5)

which will form a constraint (upper limit) on the channel clock rate, to ensure that the predetermined PER threshold after using FEC can be met.

## C. Results—FEC Alone

The previous analysis is now applied to several design examples. The design objective is to achieve a PER after the ECC of  $10^{-15}$ , assuming a 1-mW VCSEL, with 3 dB of power loss and with NEP = 0.3 nW/Hz<sup>1/2</sup>.

Fig. 6(a) illustrates PER versus frequency, after applying FEC using several 7- and 15-bit BCH codes. Fig. 6(b) illustrates the PER versus frequency, after applying FEC using several 31-bit BCH codes and the Golay (23,12,3) code. Observe that at any given clock frequency, the PER after FEC drops dramatically compared to the PER on the raw datalink shown. The cost of this process is the loss of bandwidth due to FEC overhead, which is not reflected in Fig. 6(a) or (b). The effective data rate is determined by adjusting the clock rate by the coding efficiency (n/k) < 1.

To achieve a PER of  $10^{-15}$  when using the (15,5,3) FEC, the clock frequency can be determined from Fig. 6(a) to be  $\leq$ 560 MHz. By adding the proposed FEC scheme, the clock rate of the VCSEL can be increased from 31 MHz for the channel to 560 MHz, while keeping the PER below  $10^{-15}$ . When the code rate (5/15) is considered, the effective data rate increases from 31 Mb/s to 187 Mb/s, an increase by a factor of six.

## IV. ARQ ALONE

An analysis for the effective bandwidth, assuming a binary symmetric channel (BSC) with ARQ alone, is now derived. Packets are encoded with an ED code at the transmitter. A similar scheme was considered in [7]. At the receiver, a packet retransmission is requested when the received packet has a detectable error pattern. Let R be a random variable denoting the



Fig. 6. PER versus frequency for FEC optical link. (a) 7- and 15-bit BCH codes. (b) 31-bit BCH code and 23-bit Golay code. Frequency on x-axis is in multiples of 10 MHz.

number of retransmissions required for a packet to be accepted by the receiver. Assume packet retransmissions are random and independent events, and let a packet retransmission event occur probability Pr. The expected number of retransmissions required for a packet to be accepted at the receiver, including the first transmission attempt, is given by the following, where  $p_r = 1 - (p_c + p_u)$ :

$$E[R] = \sum_{k=1}^{\infty} k(P_r)^{k-1} (1 - P_r) = \frac{1}{1 - P_r}.$$
 (6)

Let CRC denote the number of CRC checkbits. The packet size increases from 512 to 512 + CRC bits to accommodate the CRC. A packet is accepted without errors in one transmission attempt with prob.  $P_c = B(512 + CRC, 0, P_{be})$ . Given that a packet experiences one or more errors, assume the CRC admits undetected errors with a probability of  $2^{-CRC}$ . Hence, with

ARQ only, the expected number of retransmissions per packet is given by (7) as shown at the bottom of the page.

According to [11], the efficiency of a selective repeat ARQ protocol in a BSC, assuming a (k, n, t) linear code is used for ED, is given by

$$\eta_{SR} = (1 - P_r)\frac{n}{k} \tag{8}$$

which assumes that the reverse channel used to transmit the NACKs is error free, which is a reasonable approximation. In general, the reverse ACKs/NACKs are usually transmitted onto packets going in the reverse direction, consuming a negligible amount of bandwidth. Equation (8) also assumes that the transmitter and receiver have sufficient buffering capacity so that buffer overflow and a halt of transmissions never occurs.

According to (8), the channel efficiency decreases as the expected number of packet retransmissions increases, and as the code rate (n/k) decreases. However, (8) is misleading since it does not reflect the bandwidth increase which can be obtained by increasing the clock rate, in view of the reduction in the PER after the addition of the ARQ protocol. Equation (8) will be adopted to explicitly include the clock rate of the data link, which is a variable, to yield an absolute data rate in bits per second, rather than an efficiency

$$D = f_{\rm clock} (1 - P_r) \frac{n}{k}.$$
 (9)

The clock rate in (9) is the rate at which encoded bits are transmitted, to be distinguished from the effective data rate D. Therefore, in our BSC with a CRC ED check scheme, the effective data rate after the ARQ protocol is given by

$$D = f_{\text{clock}} \cdot \left( B(512 + \text{CRC}, 0, P_{be}) + (2^{-\text{CRC}}) \right) \cdot \left( 1 - B(512 + \text{CRC}, 0, P_{be}) \right) \cdot \left( \frac{n}{k} \right). \quad (10)$$

# A. Results-ARQ Alone

Fig. 7(a) illustrates the effective data rate versus the clock frequency f according to (9). Results are shown for 3 different CRC checks, 32-, 48-, and 64-bit CRC checks. Observe that at low clock rates, the BER is low, so there is a substantial advantage in increasing the clock rate and the net affect is a data rate increase. After a certain frequency, the BER increases, causing a large number of packet retransmissions, which results in a reduction in throughput. The maximum effective data rate can be read off the y axis of Fig. 7(a) and is 1.1 Gb/s for all three CRC codes. The PER after ARQ at this clock rate can be determined from Fig. 7(b). The overheads associated with the CRC codes are relatively small given the 512-bit packet size, and the choice of CRC code affects basically the PER only. Analysis indicates that a 32-bit CRC reaches the PER limit at a clock rate of 200 MHz, at which point the data rate is effectively 200 Mb/s. Using a 32-bit CRC, the data rate increase is approximately 6.5,

$$E[R] = \frac{1}{(B(512 + \text{CRC}, 0, P_{be}) + 2^{-\text{CRC}}(1 - B(512 + \text{CRC}, 0, P_{be}))}$$



Fig. 7. (a) Effective data rate versus frequency for ARQ optical link. (b) PER versus frequency for ARQ optical link. Frequency on x-axis is in multiples of 100 MHz.

comparable to the use of FEC alone. However, the use of a 48or 64-bit CRC allows for much lower PERs to be achieved at higher frequencies. A 64-bit CRC is assumed. At a data rate of 1.1 Gb/s, the PER is approximately  $10^{-20}$  and well below the design threshold of  $10^{-15}$ . The maximum data rate, while ensuring the PER  $\leq 10^{-15}$  after ARQ, is determined from Fig. 7(a) to be 1.1 Gb/s.

By adding the ARQ scheme, the data rate of the VCSEL can be increased from 31 Mb/s in the uncoded channel to 1.1 Gb/s, while keeping the PER below the acceptable threshold, resulting in an increase by a factor of 35. This result is surprising; the data rate increase due to retransmissions alone is considerably larger than the increase due to FEC alone, given this choice of operating parameters (large CRC codes).

# V. HYBRID FEC + ARQ SCHEME

The two previous systems are now combined: an FEC inner code and a CRC outer code for ED, using both FEC and ARQ in the transceiver.

In many ARQ systems, to avoid the event of accepting incorrectly decoded data, the BCH (k, n, t) codes are not used to their maximum error correction capability. Typically, a (k, n, t) codeword is used to correct all error patterns with  $\leq t'$  bit errors, where t' < t. The code is used to detect bit error patterns with e bit errors, where  $t' < e \leq t$ . The codes are chosen so that the event of t bit errors occurring is relatively rare, reducing the likelihood of decoding errors.

In this paper, the BCH codes are decoded and errors are corrected to the fullest extent, and we reserve no FEC bits for error detection, at the expense of potentially admitting more undetected bit errors after decoding. Assume that both events 1) and 2) discussed previously in the FEC section are equiprobable. Let  $p_{cw}$  be the probability a codeword is received without error,  $p_{uw}$  be the probability a codeword is received with an undetectable error pattern, and  $p_{ew}$  be the probability a codeword is incorrectly decoded and accepted as error free with probability  $P_{uw} = (1/2) \sum_{e=t+1}^{k} B(k, e, P_{be})$ , and it is detected as error neous with probability  $P_{ew} = (1/2) \sum_{e=t+1}^{k} B(k, e, P_{be})$ .

A packet with 512 bits will use a CRC check as the outer code, increasing its length to 512+CRC bits. This packet is encoded into  $h = \lceil 512 + \text{CRC}/n \rceil$  codewords after the BCH inner code is applied. A packet with h codewords will be correctly decoded and accepted (with no bit errors) with probability  $p_c = B(h, h, P_{cw})$ .

An entire packet will be detected as erroneous after the FEC decoding with probability  $P_e = \sum_{e=1}^{h} B(h, e, P_{ew}) = 1 - B(h, 0, P_{ew})$ , which reflects the event that one or more of the h BCH codewords experienced a detectable error. The packet will be incorrectly decoded and accepted as error-free with probability  $P_u = B(h, O, \frac{1}{2}(1 - P_{cw})) - B(h, h, P_{cw})$ . After the FEC decoding, the packets pass through the CRC decoder. Assume the CRC decoder admits erroneous packets with probability  $2^{-\text{CRC}}$ , and detects erroneous packets with probability  $1 - 2^{-\text{CRC}}$ . The PER is, therefore [See (9) shown at the bottom of next page.]

## A. Results—FEC + ARQ

To achieve a VCSEL datalink with a PER of  $\leq 10^{-15}$  given 1 mW of signal power, the maximum clock frequency before the data rate drops can be determined from Fig. 8(a) to be greater than 5 GHz for the 7- and 15-bit FEC codes. From Fig. 8(a), the maximum data rate varies between 2.5 and 4.5 Gb/s, depending upon the inner code. From Fig. 8(b), to achieve a PER of  $10^{-15}$ , the maximum clock rate is greater than 10 GHz for most codes. Selecting the (15,5,3) code, the maximum data rate while ensuring a PER  $\leq 10^{-15}$  is 4.5 Gb/s.

The data rate has increased from the 31 Mb/s for the uncoded channel to 4.5 Gb/s for the FEC+ARQ channel, for an increase by a factor of 145 times. We reiterate that the this result is based upon the same model assumptions as the results in [1], [7], and [9] where systems are thermal noise limited. In practice, the actual gains will be limited by other noise sources. For very short distance optical links, with distances of tens of meters, the effects of modal noise and ISI should be sufficiently small enough to enable substantial data rate increases. Nevertheless, experimental validation will be required.



Fig. 8. (a) Effective data rate versus frequency for FEC+ARQ optical link, for 7- and 15-bit BCH codes with 64 bit CRC. (b) PER versus frequency for FEC+ARQ optical link. Frequency on x-axis is in multiples of 100 MHz.

Additional results are shown for 31-bit BCH codes witha 64-bit CRC in Fig. 9.

## B. Coding/Power Gain

The coding gain of a forward error-correcting code can be defined as the transmitter power increase required to achieve a specific BER on an uncoded channel, when compared to a coded channel [11]. The VCSEL power required to yield a PER of  $10^{-15}$  assuming an uncoded channel can be determined from (3) and Fig. 3 to be 300. At a data rate of 4.5 Gb/s (from section 3.5), the noise power is 20  $\mu$ W. To achieve a SNR of 300, the signal power must equal 6 mW. Given the assumed 3-dB power loss on the optical fiber, the initial VCSEL signal power on an uncoded channel must be 12 mW. The use of the FEC+ARQ scheme has resulted in a coding gain by a factor of 12, or an effective transmitter power increase of 10.8 dB.



Fig. 9. (a) Effective data rate versus frequency for FEC+ARQ optical link, for 31-bit BCH codes and Golay code with 64-bit CRC. (b) PER versus frequency for FEC+ARQ optical link. Frequency on x-axis is in multiples of 100 MHz.

### VI. HARDWARE COMPLEXITY

In this section, the hardware complexity of the decoder is estimated. A pipelined (15,5,3) BCH decoder is proposed in Fig. 10. The combinational logic between pipeline stages consists of four six-input majority logic gates and 28 four-input XOR gates (two levels deep).

Assume an 0.18- $\mu$ m CMOS standard cell technology, where  $\lambda = 0.09 \,\mu$ m, and where each gate drives a standard load (one other standard size gate). In order to estimate the circuit complexity, representative data for several 0.18  $\mu$ m standard cells is illustrated in Table I. For smaller technologies such as 0.13  $\mu$ m CMOS, define a scaling factor s < 1 as the ratio of  $\lambda s$ , and the areas and delays in the smaller technology can be estimated using the linear scaling law [16].

A full-adder (FA) cell has a cost of 65  $\mu$ m<sup>2</sup> and a delay of 100 ps. A six-input majority logic gate can be made with 2 FAs

$$PER = \frac{2^{-\text{CRC}} \Big( B(h, O, \frac{1}{2}(1 - P_{cw})) - B(h, h, P_{cw}) \Big)}{B(h, h, P_{cw}) + 2^{-\text{CRC}} \Big( B(h, O, \frac{1}{2}(1 - P_{cw})) - B(h, h, P_{cw}) \Big)}.$$
(11)



Fig. 10. (a) Bit-serial decoder for BCH codes. (b) Proposed pipelined decoder for BCH codes.

TABLE I DATA FOR SEVERAL LOGIC GATES IN 0.18- $\mu$ m CMOS Technology

| Gate(0.18µCMOS) | Area (µ <sup>2</sup> ) | Delay (psec) |  |
|-----------------|------------------------|--------------|--|
| Inverter        | 10                     | 30           |  |
| Binary gate     | 12                     | 40           |  |
| Full Adder      | 65                     | 100          |  |
| Quad EXOR       | 55                     | 180          |  |
| D Flip Flop     | 50                     | 100          |  |

and 4 binary logic gates, for a cost of 178  $\mu$ m<sup>2</sup> and a delay of approximately 180 ps. The combinational logic for one pipeline stage occupies approximately  $4 \times (178) + 28 \times (55) = 2252 \,\mu$ m<sup>2</sup> and has a critical path delay of 540 ps. The area for 15 DFFs is 750  $\mu$ m<sup>2</sup>. The area for the entire FEC decoder is approximately 15 stages  $\times (2252 + 750) = 45\,030\,\mu$ m<sup>2</sup>. This decoder requires approximately 4% of 1 mm<sup>2</sup> of a silicon integrated circuit. For comparison purposes, the area of a standard metal bond pad used to connect a wire to a silicon IC is  $150 \times 100 = 15\,000\,\mu$ m<sup>2</sup>. The proposed decoder occupies about 3× the area of a typical electrical bond pad.

The critical path delay due to combinational logic cells only is approximately 540 ps. The DFF overhead adds an additional 100 ps. In a 0.18- $\mu$ m CMOS process, the wiring delay can match or exceed the cell delay. Throughout this section, assume the wiring delay equals the cell delay in these highly localized decoder circuits. Therefore, we allocate an additional 640 ps for wire delays. The maximum clock rate can be estimated to be approximately 0.780 GHz. This 15-stage decoder pipeline has input and output data rates of approximately 11.7 and 3.9 Gb/s, respectively.

 TABLE II

 Delays for Uncoded and Encoded Channels

|         | Encoding<br>time (ns) | Transmission<br>time (ns) | Time of Flight (ns) | Decoding time<br>(ns) | Total time<br>(ns) |
|---------|-----------------------|---------------------------|---------------------|-----------------------|--------------------|
| uncoded | 0                     | 1,677                     | 50                  | 0                     | 1,727              |
| FEC     | 17                    | 276                       | 50                  | 17                    | 360                |
| FEC+ARQ | 25                    | 14                        | 50                  | 25                    | 123 (ave.)         |

An extra pipeline stage can be added to partition each combinational logic section in two stages. The added hardware overhead is approximately 15 stages  $\times$  15 DFF =  $11250 \,\mu m^2$ , i.e., the area required by the decoder increases by about 25% to 56280  $\mu$ m<sup>2</sup>. However, the final critical path is reduced to approximately 740 ps and the maximum clock rate is now approximately 1.35 GHz. This 30-stage decoder pipeline has input and output data rates of approximately 20 and 6.75 Gb/s, respectively. The addition of another 15 pipeline stages yields a 45-stage pipeline, with an area of 67 530  $\mu$ m<sup>2</sup> and a clock rate of approximately 1.8 GHz. This 45-stage decoder pipeline occupies approximately 6% of 1 mm<sup>2</sup> of silicon, and has input and output data rates of 27 and 9 Gb/s, respectively. Using several decoders in parallel, each square millimeter of silicon can decode approximately 450 Gb/s of data, yielding 150 Gb/s of data.

Synchronization is a challenging problem in high-speed design. In the proposed transceiver, in order to eliminate a clock-and-data recovery circuit for each optical channel, a high-speed clock can be transmitted on a separate optical channel and is shared among several physically adjacent data channels. If the received clock signal is high quality, the signal can be processed through a delay-locked loop (DLL) and then used to sample the data channels. If the received clock signal is low quality, the transmitter power for the clock channel can be increased relative to the data channels, which can then be processed with a DLL. Alternatively, the received clock can be processed through a phase-locked loop (PLL) to yield a sampling clock signal with reduced jitter. A detailed analysis of synchronization in high-speed parallel optical channels is, however, an area for further research.

The delay of the optical datalink can be estimated; the results are shown in Table II. Consider the transmission of 512-bit packets over the uncoded channel, with a PER  $\leq 10 - 15$ . The clock rate is constrained to be 31 MHz and, assuming a 10-bit-wide channel, the transmission time is 52 clock periods or 1677 ns. Assuming a 10-m fiber, the time-of-flight (TOF) is approximately 50 ns. The packet is reassembled at the receiver in approximately 1727 ns. Using the (15,5,3) FEC scheme, the data rate is increased to 187 MHz at a clock rate of 560 MHz. Assume the FEC encoder requires 30 stages at 1.8 GHz (similar to the decoder). The transmission time is 154 clock periods, or 276 ns. The TOF is unchanged. The pipelined 30-stage decoder requires 30 clock periods (at 1.8 GHz) or 17 ns. The packet is reassembled at the receiver in 360 ns. Using the combined FEC+ARQ scheme, the peak data rate is 4.5 GHz using the (15,5,3) code. However, to lower the retransmission rate assume a data rate of 4.0 GHz at a clock rate of approximately 12 GHz. The transmission time is 160 clock periods or 14 ns. The TOF remains unchanged, the decoding time using the 30-stage FEC decoder is 17 ns. The CRC decoder requires approximately 16 stages at 2 GHz, for a delay of 8 ns. Assume the encoders incur the same delay as the decoders. The packet is reassembled at the receiver in 114 ns, when no retransmissions are required. The FEC+ARQ system has an expected number of retransmissions of 1.02 at 4.5 GHz (and approximately one at 4 GHz). The second retransmission has a delay of approximately 4 \* 114 ns. The weighted average total time is, therefore, approximately 123 ns. The FEC-only scheme is several times faster than the uncoded system, and the FEC+ARQ scheme is a few times faster than the FEC-only scheme.

It should be noted that the pipelined BCH decoder in Fig. 10 has not been constructed, and therefore, its performance should be viewed as an estimate until the design is validated. Furthermore, it should be noted that a bidirectional 2-D optical transceiver as shown in Fig. 1, including ED logic, FEC logic, and timing synchronization operating on Terabits of optical data, represents significant engineering challenges. An optical system using the proposed transceiver design would be considerably more complex than a system that was accomplished by a multidisciplinary research team funded by the Canadian Networks of Centers of Excellence program, with the participation of four universities, four companies, and a budget in the range of \$10 million Canadian over a period of 10 years [8]. This author's experience indicates that to secure funding and embark on the design of such a transceiver in a multidisciplinary environment with numerous technical challenges, requires a very compelling argument which this paper attempts to present.

## VII. CONCLUSION

Mathematical analyses indicate that the performance of shortdistance multimode fiber optical datalinks can be improved substantially, by the use of a Hybrid error control system using both FEC and ARQ retransmissions. The design of a bidirectional optical transceiver exploiting a two-level coding scheme, with small FEC block codes for the inner code combined with a CRC error check for the outer code, was proposed. Assuming a thermal noise limited system as in [1], [7], and [9] the transceiver allows significant increases in the data rate or distance span of the link. The use of FEC alone or ARQ alone provides moderate coding gains, while the combination of FEC and ARQ provides a substantial coding gain, approximately equal to the product of the individual gains. Of the BCH codes, the (15,7,2), (15,5,3), and (31,16,3) codes provide very good performance at moderate hardware costs. The proposed pipelined decoder for the BCH codes achieves estimated throughputs of several hundred gigabits per second per square millimeter of silicon area (using 0.18- $\mu$ m CMOS technology), exceptionally high when compared to traditional RS decoders. The transceiver should be scalable to the multiterabits per second range. The proposed transceiver is applicable to short-distance networks including ethernet, fiberchannel, and VSR networks. The transceivers are currently being designed at McMaster University.

## References

- M. A. Neifeld and R. K. Kostuk, "Error correction for free-space optical interconnects: Space-time resource optimization," *Appl. Opt.*, vol. 37, no. 2, pp. 296–307, 1998.
- [2] M. A. Neifeld and S. K. Sridharan, "Parallel error correction using spectral Reed–Solomon code," J. Opt. Commun., vol. 17, pp. 525–531, 1997.

- [3] International Telecommunication Union (ITU), Forward Error Correction for Submarine Systems: ITU Standard, Series G, G.975, Nov. 2000. (Prepublished recommendation).
- [4] Vitesse Semiconductor Corp., Advance Product Information VSC921-2.488 Gbps SONET/SDH FEC Encoder and Decoder Chip set, California 93 012, 2000.
- [5] K. Azadet, E. F. Haratsch, H. Kim, F. Saibi, J. Sanders, M. Shaffer, L. Song, and M. Yu, "Equalization and FEC techniques for optical transceivers," *IEEE J. Solid-State Circuits*, vol. 37, pp. 317–327, Mar. 2002.
- [6] T. H. Szymanski and V. Tyan, "Error and flow control in an intelligent optical backplanes," *IEEE J. Select. Topics Quantum Electron.*, vol. 5, pp. 339–352, Mar.–Apr. 1999.
- [7] T. H. Szymanski, "Bandwidth optimization of optical datalinks using error control codes," *Appl. Opt. Inform. Process.*, vol. 39, no. 11, pp. 1761–1775, 2000.
- [8] A. G. Kirk, D. V. Plant, T. H. Szymanski, Z. G. Vranesic, F. A. P. Tooley, D. Rolston, M. Ayliffe, F. Lacroix, B. Robertson, E. Bernier, D. Brosseau, F. Michael, and E. Chuah, "Design and implementation of a modulator-based multistage free-space optical backplane for multiprocessor applications," Appl. Opt. Inform. Process., vol. 42, no. 14, pp. 2465–2481, 2003, to be published.
- [9] J. Faucher, M. B. Venditti, and D. V. Plant, "Application of parallel forward-error correction in 2-D optical data links," *J. Lightwave Technol.*, vol. 21, pp. 466–475, Feb. 2003.
- [10] J. C. Palais, *Fiber Optic Communications*. Englewood Cliffs, NJ: Prentice-Hall, 1984.
- [11] S. Lin and D. J. Costello, *Error Control Coding*. Englewood Cliffs, NJ: Prentice-Hall, 1983.
- [12] D. Bertsekas and R. Gallager, *Data Networks*, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 1992.
- [13] A. Yariv, Optical Electronics, 4th ed. New York: Holt, Rinehart, and Winston, 1991.
- [14] G. P. Agrawal, Fiber-Optic Communication Systems, 2nd ed. New York: Wiley, 1997.
- [15] T. K. Woodward, A. U. Krishnamoorthy, K. W. Goosen, J. A. Walker, J. E. Cunningham, W. Y. Jan, M. F. Chirovski, S. P. Hui, B. Tseung, D. Kossives, D. Dahringer, D. Bacon, and R. E. Leibenguth, "Clock-senseamplifier based smart-pixel optical receivers," *IEEE Photon. Technol. Lett.*, vol. 8, pp. 1067–1069, Aug. 1996.
- [16] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A Systems Perspective. Reading, MA: Addison-Wesley, 1994.



**Ted Szymanski** is the L. R. Wilson/Bell Canada Enterprises Chair in Data Communications at McMaster University, Hamilton, ON, Canada. He is well known for his work on "intelligent optical systems," which merge the computational power of CMOS logic along with the bandwidth of integrated laser diode and photodetector arrays within a single monolithic package. His research team designed several integrated field-programmable optoelectronic devices, which were fabricated through the Lucent/ARPA/COOP program on optoelectronic

technologies, and which appeared on the front cover of the journal Applied Optics twice, in January 1998 and April 2000. These devices demonstrated an unprecedented level of programmability in integrated optoelectronic systems. He was the principal architect of a ten-year research program on Photonic Systems, funded by the Networks of Centers of Excellence program of the government of Canada. This research program included a strong blend of industrial and academic collaborators, including Nortel Networks, Lucent Technologies, Lockheed-Martin/Sanders, McGill University, McMaster University, the University of Toronto, and Heriot Watt University in the U.K. This internationally visible program resulted in innovations at several levels, including circuits, packaging, and systems, and culminated with the development of an "intelligent optical backplane" with 512 optical channels, as described in Katz et al. (Appl. Opt. Inform. Process. 2003). He is the Associate Chairman of Undergraduate Studies within the Department of Electrical and Computer Engineering at McMaster University. He is a coholder of a U.S. patent on "intelligent optical interconnects" exploiting integrated optoelectronic technologies, along with Prof. S. Hinten. He has presented numerous invited talks at international conferences and research institutes. He has several invited book chapters on the topic of intelligent optical systems, and several of his papers have reprinted in IEEE textbooks.

Prof. Szymanski has served on the technical program committees of numerous international conferences on optical systems.