## The Design and Characterisation of an Optical VLSI Processor for Real Time Centroid Detection # BOON HEAN PUI,<sup>1\*</sup> BARRIE HAYES-GILL,<sup>1</sup> MATT CLARK,<sup>2</sup> MIKE SOMEKH,<sup>2</sup> CHUNG SEE,<sup>2</sup> JEAN-FRANÇOIS PIÉRI,<sup>1</sup> STEVE P. MORGAN,<sup>2</sup> ALAN NG<sup>2</sup> <sup>1</sup>VLSI Design Group, School of Electrical and Electronic Engineering, University of Nottingham, University Park, Nottingham NG7 2RD, UK <sup>2</sup>Optical Engineering Group, School of Electrical and Electronic Engineering, University of Nottingham, University Park, Nottingham NG7 2RD, UK Received May 10, 2001; Accepted September 14, 2001 **Abstract.** The integration of photo-detectors onto a standard CMOS integrated circuit is presented. This device provides the optical front end for a real time centroid detection system to be used as part of a larger system for implementing a Shack-Hartmann wavefront sensor. A hardware emulation system containing a Field Programmable Gate Array is used to prototype suitable algorithms prior to IC fabrication. Data is presented on the performance of photodetectors and the ability to extract in real time a centroid coordinate. Key Words: centroid detection, wavefront sensor, photodiodes, active pixel, FPGA, CMOS IC, integrated detector ## 1. Introduction The fabrication of photodetectors integrated with complex analogue and digital circuitry on a single integrated circuit presents numerous applications and opportunities to the optical design engineer. Such integration provides potential for improvement in speed, cost and performance over traditional systems. These traditional systems require data to be transmitted from the optical sensor (i.e., a CCD) to a host computer by means of an analogue video line, an analogue-to-digital converter (ADC) and a frame memory and hence are invariably slow and costly. A CMOS sensor on the other hand allows random access to pixel regions of interest. This readout facilitates windowed and scanning readouts that can increase the frame rate at the expense of reduced resolution. The strength of CMOS as compared to CCD technology is not in sensor cost itself, but in the higher level of integration offered by CMOS, which allows a reduction in the number of external chips used in the system and therefore a reduction in overall system cost. This additional on chip processing thus offers potential for enhanced performance for the chosen application. A CCD imaging system is more suited to high-resolution or colour imaging where cost is less important but image quality is paramount. With its large frame sizes/pixel count, CCDs are not able to achieve frame rates in the high kilohertz or megahertz range and hence are not able to operate in real time for systems requiring such high frame rates. As a result many applications would benefit from a simplification of the CCD system. Three factors have influenced the arrival of these devices. Firstly, in the early 1990s a demand existed from the high street market for a range of image sensors where the quality was not of importance but the size and price was paramount. Secondly, space agencies (i.e., NASA and ESA) required an extremely low power image sensor with combined low component count but *without* compromising the quality. This resulted in significant advances in CMOS image sensors and the development of the CMOS active pixel sensor (APS). Thirdly, over these past ten years CMOS technology in general has advanced faithfully following Moore's law. The integrated CMOS optical processor is thus an extremely powerful device that has yet to be fully utilised by the optical design engineer. Although a plethora of applications exist the specific application that this paper addresses is the centroid detection of incoming light to be used as a wavefront sensor. <sup>\*</sup>Address correspondence to: Tel.: +44-1159515554; Fax: +44-1159515616. E-mail: eexbhp@nottingham.ac.uk #### 2. Centroid Detection Aberration caused by the turbulence in the atmosphere can degrade the imaging property of the light being detected by distorting the incoming wavefront. A Shack-Hartmann wavefront sensor [1] uses an array of small lenslets to sample the optical wavefront. Local wavefront tilts are measured by detecting the deviation of the focussed spots from reference positions. These tilts are referred to as the *centroid* position. Traditional systems use a single CCD to sample the entire wavefront resulting in a data bottleneck. In our system, each lenslet will focus only a local portion of the wavefront onto a tilt sensor consisting of a small number of optical detectors, analogue and digital electronics and local centroid processing. The array of tilt sensors will be linked to a matrix processor to reconstruct the estimate of the complete wavefront. Once calculated, the reduced bandwidth wavefront data can then be transferred off-chip. This allows a form of parallel processing to be achieved at each tilt sensor resulting in a data rate independent of the number of tilt sensors employed. A key component in such a final system is the centroid calculation or the angle of tilt. This paper describes the design and characterisation of a real time centroid processor to be used in applications such as wavefront sensing. ## 2.1. Current Centroiding Systems Three types of optical front-end devices (sometimes referred to as position sensitive devices) are commonly found in the literature for the detection of a centroid. These are: lateral-effect photodiodes (LEP), sometimes called position sensitive detectors (PSD) [2]; quad cells or quadrant detectors [3]; and finally multi-pixel arrays. Conventional LEPs have high linearity but require a very uniform resistive layer with large sheet resistance, which is not generally available with a standard CMOS process. Linearity may also be affected by contact strips with finite conductivity, and imperfect isolation at the edges of the resistive sheets. Ideally, a LEP should have a double-sided (top and bottom) construction but again this is not possible with a standard CMOS process. Quad cells, as the name suggests, consist of four photodetectors in a 2 × 2 arrangement. A typical device would consist of an array of these quad cells. These devices have simple readout schemes but are not very linear and do not have very good sub-pixel accuracy. With sufficient number of pixels, multi-pixel array systems have very high linearity and good sub-pixel accuracy. Current multi-pixel array systems [4-7] use an analogue current dividing method to find the centroid which require the use of a linear resistive array. A problem faced by these multi-pixel systems is the mismatch and poor tolerance of the polysilicon resistors in the divider line. The advantage of performing the centroid computation in analogue is in terms of speed and high functional density. However, this traditional advantage is rapidly disappearing as digital CMOS technologies are continuously downscaled resulting in higher and higher speeds and greater functionality per unit area. Digital implementation also benefits from high noise immunity and do not suffer inaccuracies due to the poor tolerance of on-chip analogue components. A generic optical processor, which contains an on-chip photodetector array and an internal microprocessor, has been designed by Forcheimer [8]. This excellent device, now available commercially by Integrated Vision Products, allows flexible implementation of prototype optical algorithms but due to its generic nature does suffer from the fact that processing time is impaired. It is this multipixel approach with on chip CMOS digital processing specifically designed for centroid calculation that is to be exploited for the work presented here. ## 3. Design Philosophy It is well known that the design and fabrication of a mixed analogue and digital ASIC carries the possibility of the design falling outside of the specification and hence more than one fabrication iteration is required. Although excellent analogue simulators exist, photodiodes are not available in the library and integrating these with digital logic requires a mixed analogue and digital simulator. Due to the sensitive nature of analogue circuitry to IC process variations it is inevitable that fine-tuning of the design is required. Consequently, more than one mask iteration is required before a successfully operating circuit can be realised. As a result, the successful design of an optimised optical VLSI processing algorithm at the first mask iteration is fraught with problems and can carry a heavy cost penalty. The design philosophy adopted for our work is therefore to use a two-stage process of a hardware emulation system prior to ASIC fabrication. The hardware emulation system consists of a re-configurable digital device (called a Field Programmable Gate Array or FPGA) and a commercial photodiode array. Once the emulation hardware confirms the satisfactory performance of a design in its intended application, it can then be converted into a mask programmed CMOS integrated circuit. This philosophy, although conservative in its approach, attempts to reduce the number of iterations needed to achieve a successfully operating optical VLSI processor. Due to the re-programmable nature of the FPGA the hardware emulation environment can also be used for many other optical processing algorithms prior to ASIC fabrication. The work described in this paper explains how the optical front-end array and centroid logic has been implemented in CMOS technology prior to the complete integration of both array and centroid processing onto a single piece of silicon. This work is one part of a larger project to implement a fully integrated CMOS Shack-Hartmann wavefront sensor. This section describes the centroid calculation, the FPGA hardware emulation system and a CMOS integrated circuit fabricated for implementing the front-end photodetector array. ## 3.1. Centroid Calculation The centroid of an array of photo-detectors is expressed in terms of its x and y coordinates, C(x) and C(y). The values of C(x) and C(y) and hence the centroid of the array are found from the "1st moment" equations: $$C(x) = \frac{\sum r_{xn} I_n}{\sum I_n}; \qquad C(y) = \frac{\sum r_{yn} I_n}{\sum I_n}; \tag{1}$$ where $r_{xn}$ is the displacement of each photo-detector from the origin of the array in the *x*-direction $r_{yn}$ is the displacement of each photo-detector from the origin of the array in the y-direction $I_n$ is the light (current) level of each photodetector. For example consider the $4 \times 4$ shaded photo-detector array shown in Fig. 1 with arbitrary light intensity as shown by the decimal numbers 3, 4, 5, 6 and 7. With these given light levels a centroid position of C(x) = 2.53 and C(y) = 2.68 would occur. If the light levels are represented digitally then these centroid moments can be implemented using the block diagram shown in Fig. 2 for the x-coordinate and another duplicate block (not shown) for the y-coordinate. Photocurrent data is clocked in sequentially from each photo-detector and multiplied by a counter (Mod $N^{1/2}$ , Fig. 1. Schematic of photodiode array illustrating light levels for centroid calculation of C(x) = 2.53 and C(y) = 2.68. Fig. 2. Block diagram of centroid processor in the x-direction. where N is the number of pixel elements in the array) that holds the position of the detector relative to the reference point in the x direction. The Mod $N^{1/2}$ counter represents this relative position (in the x-direction) from the reference point $(r_{xn})$ . The output of this multiplier is continually accumulated via an adder block and the result is divided by the total photocurrent acquired via a separate and parallel running accumulator. The resultant division represents a 7 bit digital representation of the x centroid coordinate. A second centroid processing block calculates, in parallel, the y-coordinate centroid. ## 3.2. Hardware Emulation System The hardware emulation system allows optical processing algorithms to be debugged prior to CMOS foundry fabrication. This system, shown in Fig. 3, consists of two printed circuit boards: a main motherboard and a smaller daughter board. The motherboard contains a single channel 16 bit analogue to digital converter (ADC), a Field Programmable Gate Array (FPGA), serial PROM for Fig. 3. Block diagram of centroid emulation hardware. storing the FPGA configuration file, MAX232 for a PC serial interface, LED displays for debug purposes and miscellaneous switches for initiating various test routines under user control. The second, smaller, daughter board contains an optical front end customised for the centroid application and uses a commercially available, $5 \times 5$ , common cathode photodiode array from Centronics (part number MD25-5T). The FPGA employed on the motherboard is the Xilinx Spartan series device (XCS40-3PQ208C) which contains approximately 40,000 system gates and contains an erasable and hence reusable architecture. This reusable facility allows several iterations of the firmware before transferring to a mask programmable device. This FPGA is programmed via VHDL which, as a high level hardware programming language, maps directly to the digital logic gates on the FPGA. VHDL is a powerful programming language and can handle complex processing and control via a few lines of code. Synthesiser software allows the VHDL design to be easily mapped to both the FPGA and the chosen CMOS IC foundry. Due to VHDL's technology independence, migration to other CMOS foundries, as feature lengths reduce, is easily catered for. The software used to implement the VHDL code on the FPGA is the Xilinx Foundation series 2.1i CAD tools running on a Pentium III 500 MHz PC. This software is intuitive to use and allows both pre and post layout simulation so as to check for timing problems. The centroid is computed by multiplexing each photodiode output into a current to voltage converter on the daughter board. The current to voltage converter simply consists of an op-amp in the transimpedance mode i.e., with a feedback resistance. An op-amp with low input bias current is necessary. The input bias current should be significantly smaller than the photocurrent that is to be converted because the large feedback resistance will convert this input bias current into a dc offset voltage at the output of the op-amp for every pixel. This will significantly affect the centroid algorithm by shifting the centroid position towards the centre. The CMOS amplifier TLC2274 was selected because of its low input bias current of 1 pA, its rail-to-rail operation and its low-noise (9 nV/ $\sqrt{\text{Hz}}$ ). The converted voltage is then digitised by the ADC on the motherboard. Here, the FPGA computes the centroid computation by implementing the block diagram of Fig. 2 in VHDL. Each centroid is updated on the FPGA every N + 5 clock cycles. Since N, the number of photo-diodes, is 25 and the clock speed is 40 KHz then a new centroid is calculated once every 0.75 ms (when this design is transferred to a fully integrated mixed analogue and digital process with a 40 MHz clock, for example, a centroid will be computed once every 0.75 $\mu$ s). The MAX232 device on the motherboard passes the centroid data via its serial port to a PC at a baudrate of 38 KBps thus allowing centroid data to be continuously logged onto disk. Once the emulation hardware confirms the satisfactory performance, the design can be converted into a mask programmed CMOS integrated circuit. ## 3.3. Design of on Chip CMOS Photodetectors This work is the first phase of a three-stage design process to implement a fully integrated CMOS wavefront processor. This first stage involves the design of the CMOS based optical photodetector array. The second phase will integrate both the centroid processing and photodetector array whilst the third and final phase will integrate several centroids onto a single CMOS IC so as to compute the overall wavefront. The aim of this first phase is therefore to replace the commercial Centronics array with our own CMOS photodetector array and again compute the centroid via the previously proven FPGA centroid algorithm. In a standard CMOS process, several photodetectors exist as a result of the naturally occurring source and drain regions and N and P-type substrates. The cross-sectional difference between each of these is shown in Figs. 4(a–d). Figures 4(a) and (b) illustrate the shallow (diffusion-substrate or diffusion-well respectively) photodiodes. These have good spectral response at shorter wavelengths, as these wavelengths do not penetrate very far into the substrate. They also possess good substrate noise immunity due to the presence of the deep field oxide (FOX) implants. Figure 4(c) shows a deep or N-well to p-substrate photodiode. This has good responsitivity due to its wide depletion region caused by the relatively low carrier concentration in the n-well and p-substrate. Since it is deep it is also able to collect the minority carriers photogenerated deep in the substrate provided that they are generated within a diffusion length of the depletion region. The deep photodiode has good spectral response at longer wavelengths. This is due to the fact that light of longer wavelength penetrates deeper into the n-well. Figure 4(d) shows the combination of both deep and shallow photodiodes thus maximising the collection of both short and long wavelength photons. In order to characterise the CMOS process for photosensitivity, all photodiode types were included on the CMOS IC. In addition a $5 \times 5$ photo-diode array (pixel size of 100 $\mu$ m square) was laid out on the same chip where each photodetector contained both deep and shallow photodiodes as in Fig. 4(d). All structures were designed with a guard-ring around every pixel to minimize leakage current to neighbouring pixels. The CAD tools used were Mentor Graphics version C4 running on a Sun Workstation. Schematics were entered via "Design Architect" and layout was implemented using "IC station." CMOS Silicon Libraries (0.7 $\mu$ m mixed analogue and digital) were provided by Mietec Alcatel via Europractice. A photomicrograph of the fabricated device is shown in Fig. 5, which clearly illustrates the 5 $\times$ 5 photodiode array in the centre of the chip. Fig. 4. Four different photodiode configurations available in a standard CMOS process. Fig. 5. Photomicrograph of CMOS photodetector chip. ## 4. Results ## 4.1. Centroid Maps for Commercial Photodetector Array As a test of the VHDL centroid algorithm the centronics photodiode array was connected to the FPGA board and the FPGA was programmed to implement the centroid processing as previously illustrated in Fig. 2. A 20 $\mu$ m diameter laser beam (a double YAG laser at 532 nm with approximate output power of 0.86 mW) was scanned across the array at a speed of 2000 $\mu$ m/sec. Centroid values were computed by the FPGA at about 1.3 kHz and serially transmitted in real time to a PC. Figure 6 shows a grey scale map of the centroid values successfully recorded at each position on the array. The dark regions correspond to larger centroid coordinates whilst lighter regions correspond to small Fig. 6. Image map of x and y centroids for the centronics array. centroid coordinates. As expected, as we scan in the x-direction, the x-centroid values increases while the y-centroid values remain constant and vice versa. Since the laser beam size is less than the size of one pixel then a stepped appearance can be seen as the beam moves across the array passing from one discrete detector to another. ## 4.2. CMOS Photodetector Array The responsitivity of the CMOS photodiodes were recorded under reverse bias using the double YAG laser at 532 nm. Responsitivity values of all three diodes varied from 0.36 A/W to 0.48 A/W. This compares favorably with the Centronics photodiode value of approximately 0.4 A/W and previously published material on CMOS photodetectors by Forcheimer [8]. Relative sensitivity was also recorded as a function of wavelength. All three photodiode structures exhibited good sensitivity over the range 450 nm to 900 nm. The shallow photodiode, as expected, continued to function at short wavelengths since it is able to capture those photons absorbed at shallower depths. Centroid values were again calculated in real time by the FPGA with the 532 nm laser scanned across the custom made CMOS array. Figure 7 shows the y-coordinate centroid values plotted as a function of pixel position for different beam diameter sizes. The array goes from $-250~\mu m$ to $250~\mu m$ . Near the edges we can see non-linearity effects as the beam falls off the edge of the array. This effect is more pronounced for larger beam sizes because these will fall off the Fig. 7. Measured position vs. actual position for different beam sizes. Fig. 8. Image map of x and y centroids for different beam sizes. edge first. For very small beam sizes, we obtain discrete steps in the waveform as we would expect as the beam passes from 1 pixel to another. The steeper rise in centroid value occurs when the beam lands in-between two pixels. As the beam size increases the response becomes more linear in the centre of the array. Figure 8 shows the grey scale centroid maps for both the x and y centroid coordinates obtained in real time by the CMOS array via the FPGA. Again the darker regions correspond to large centroid coordinates and vice-versa for light regions. Again we can see the stepped appearance with small beam size and a smoother appearance for larger beam size. ## 5. Conclusions This paper demonstrates the ability of a custom-made CMOS photodiode array to compute in real-time the centroid of a scanning light beam. Centroid values are computed once every 0.75 ms which when transferred to a fully integrated mixed analogue and digital CMOS process operating at a conservative 40 MHz will scale to 0.75 $\mu$ s. This rate of update of centroid position is more than adequate for the proposed Shack-Hartmann wavefront sensor. In addition, CMOS photodiodes have been designed and fabricated having sensitivities in the range 0.36–0.48 A/W operating from 450 nm to 900 nm. ## Acknowledgments The authors are grateful for the financial support of the University of Nottingham Alumni Fund, University of Nottingham International Office, University of Nottingham in Malaysia and the Engineering and Physical Sciences Research Council (EPSRC), UK. ## References - Tyson, R. K., Principles of Adaptive Optics. Academic Press, 1991 - Turner, R. M. and Johnson, K. M., "CMOS Photodetectors for correlation peak location." *IEEE Photonics Technology Letters* 6, pp. 552–554, April 1994. - de Lima Monteiro, D. W., Vdovin, G. and Sarro, P. M., "Integration of a Hartmann-Shack wavefront sensor," in *Proceedings of 2nd International Workshop on Adaptive Optics for Industry and Medicine*, World Scientific, pp. 215–220, 1999. - Deweerth, S. P., "Analog VLSI circuits for stimulus localization and centroid computation." *International Journal of Computer Vision* 8(2), pp. 191–202, 1992. - Standley, D. L., "An object position and orientation IC with embedded imager." *IEEE Journal of Solid-State Circuits* 26, pp. 1853–1859, December 1991. - Gonnason, W. R., Haslett, J. W. and Trofimenkoff, F. N., "A low cost high resolution optical position sensor." *IEEE Trans*actions on Instrumentation and Measurement 39, pp. 658–663, August 1990. - Pain, B., Sun, E. and Yang, G., "CMOS APS with integrated centroid computation circuits." NASA Technical Brief, from JPL New Technology Report NPO-20715 24(9), September 2000. - Forcheimer, R., Chen, K., Svensson, C. and Odmark, A., "Single chip image sensors with a digital processor array." *Journal of* VLSI Signal Processing 5, pp. 121–131, 1993. Boon Hean Pui graduated in 1999 with a first class honours degree in electrical and electronic engineering from the University of Nottingham, UK and was awarded both the Sir Basil Blackwell and IEE prizes for academic excellence. He is currently pursuing a Ph.D. research programme in optical VLSI processors funded by the University of Nottingham in UK and Malaysia (UNIM). To date he has successfully fabricated an integrated optical sensor using the $0.7~\mu m$ CMOS design via Alcatel and has developed an FPGA optical prototyping system that is used for emulating new optical algorithms prior to fabrication. Barrie Hayes-Gill (Ph.D. Nottingham, 1979) worked in industry as a semiconductor product engineer at Texas Instruments, AEI semiconductor devices and Marconi Electronic Devices Limited specialising in CMOS integrated circuit design, manufacture and test. In 1982 he was appointed lecturer in the Department of Electrical and Electronic Engineering at Nottingham Trent University. He then moved to the Department of Electrical and Electronic Engineering at the University of Nottingham, UK where he has been lecturing VLSI integrated circuit design since 1986. His research work has centred around electronic systems, telecommunications and the application of semicustom and full-custom integrated circuits. To date he has managed the design of over 30 CMOS VLSI custom integrated circuits for various commercial and research applications. He has lectured at 5 international locations on VLSI design and has been a Project Monitoring Officer on behalf of the DTI for the design, fabrication and test of HBT devices operating at 40 Gbits/s. He is a member of both the Institute of Electrical and Electronic Engineers and the Institute of Electrical Engineers, is a Chartered Engineer and Eur Ing. Matthew Clark (Ph.D. Imperial College, 1997). After graduation Matthew Clark went to Imperial College to work as and research associate constructing and automating an experiment to measure the complex refractive index of liquids. He stayed to research his Ph.D. on computer generated holography developing a new technique for the optimization of computer generated holograms and fabricating them at the microstructure facility at Rutherford Appleton Laboratory. In 1996 he joined the School of Electrical Engineering at NU as an RA on EPSRC grants No. GR/K82369 and GR/M64505/01, developing novel diffractive elements for laser ultrasonics and an ultrastable optical profilometer. He was awarded an EPSRC Advanced Fellowship in 2000 titled "Adaptive Acoustics" looking at all aspects of acoustic wavefront aberrations, techniques to overcome the aberrations and their effects on conventional measurements. He has substantial experience in designing and optimizing optical algorithms and architectures and in microstructure design and fabrication. **Prof. M. G. Somekh** (Ph.D. Lancaster, 1981). Mike Somekh joined Nottingham University from University College London in 1989. He was promoted to professor of optical engineering in 1995. His principal research is the development and analysis of novel optical and ultrasonic instrumentation for characterisation of tissue and materials. In addition to many successful EPSRC funded research projects he is active in ensuring that the systems developed are properly exploited. Chung Wah See (Ph.D. UCL, 1986). In his Ph.D. and Post-Doctoral period (1981–1988), CWS worked on a number of optical systems for high contrast and high sensitivity imaging, with surface height sensitivity considerably better than 1 Å. From October 1988 to September 1992 he worked as a senior research engineer at Rank Taylor Hobson Ltd. He joined the Department of Electrical and Electronic Engineering, University of Nottingham in October 1992 as a Lecturer, where he has initiated and conducted many research projects in the optical engineering group. **Steve Morgan** (Ph.D. Nottingham, 1996). His Ph.D. in optical engineering investigated the use of continuous wave optical techniques for imaging and spectroscopy of scattering media, such as body tissue. This was followed by research into modelling of optical microscope systems using vector diffraction theory. In 1998 he was awarded an EPSRC Advanced Fellowship. His current research interests include biomedical optics and optical microscopy.