# A 5.26 Mflips Programmable Analogue Fuzzy Logic Controller in a Standard CMOS 2.4µ Technology

Carlos Dualibe<sup>1,2</sup>, Paul Jespers<sup>2</sup> and Michel Verleysen<sup>2</sup>

<sup>1</sup>Lab. de Microelectrónica, Universidad católica de Córdoba, Ob. Trejo 323, 5000-Córdoba, Argentina <sup>2</sup>Microelectronics Lab., Université catholique de Louvain, Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium.

#### **Abstract**

A complete digitally - programmable analogue Fuzzy Logic Controller (FLC) is presented. The design of some new functional blocks and the improvement of others aim towards speed optimisation with a reasonable accuracy, as it is needed in several analogue Signal Processing applications.

A nine-rules, two-inputs and one-output prototype was fabricated and successfully tested using a standard CMOS 2.4 $\mu$  technology showing good agreement with the expected performances, namely: 5.26 Mflips (Mega fuzzy logic inferences per second) at the pin terminals (@CL=13pF), 933  $\mu$ W power consumption per rule (@Vdd=5V) and 5 to 6 bits of precision. Since the circuit is intended for a subsystem embedded in an application chip (@CL  $\leq$ 5pF) over 8 Mflips may be expected.

# 1. Introduction

In the last years the application of Fuzzy Logic has been extended beyond classical Process Control. Signal Processing seems to be another niche where this soft-computing technique can meet a broad range of applications. As real time processing need ever-faster, more autonomous and less power consuming circuits the choice of on-chip controllers becomes an interesting option.

Digital Fuzzy Logic chips provide enough performance for general applications but their speed is limited for an acceptable power consumption level, if compared with their analogue counterparts.

In this work, a low-power analogue Fuzzy Logic controller is introduced intended for embedded subsystems as it is required for medium-accuracy Analogue Signal Processing applications (i.e. channel equalisation, non-linear filtering [1], etc).

It has been shown that analogue current-mode FLCs [2] lend themselves to simple rules-evaluation and aggregation circuits that can work at a reasonable speed. If some of the unwanted current-to-voltage and/or voltage-to-current intermediate converters can be avoided, the delay through cascaded operators may be even shortened and higher speeds achieved. This is interesting when fuzzifiers [2, 3] and defuzzifiers [4] circuits are being designed for these circuits interact normally with a voltage-mode controlled environment.

On the other hand, to reduce die silicon area and power consumption some building blocks can be shared without altering functionality. As a result a relatively low-complexity layout can be obtained which leads to an additional gain of speed.

Keeping in mind all those aspects, new operators were designed while others were optimised achieving a flexible and

high performance controller notwithstanding the limits imposed by the technology that was used for the demonstrator.

# 2. Architecture of the controller

Because it offers a good trade-off between simplicity and accuracy, a zero-order Sugeno architecture (consequents are singletons) was chosen. Figure 1 shows the block diagram of a two-inputs one-output controller, highlighting the three well-known basic fuzzy operations (fuzzification, rules-evaluation and defuzzification) being performed concurrently.

The MIN inference method may be also stated as: MIN (A, B,...) = 1 - MAX (1-A, 1-B,...). Therefore, for the general case of a q-inputs, m-rules controller, a set of Complementary Fuzzy Membership Functions (CFMF) per input, being shared by several rules, followed by m q-inputs MAX operators perform the two first operations. After complementing the outputs of the MAX operators the firing degree of each rule is provided in the form of a current signal Ii.

At the last stage, each Ii current is replicated (n+1) times via unit gain mirrors, where n stands for the resolution of the singleton discrete-value  $\alpha i$  of the consequents of the rules. This last is codified accordingly with the state of the switches  $C_{n-1}, \ldots, C_0$ .

Finally a common current-mode Digital to Analogue converter (D/A), used as a weighting operator, together with an analogue divider takes care of the computation of the center-of-gravity (COG), rendering the defuzzified output value Vo equal to

$$Vo = k \frac{\sum_{1}^{m} \alpha i * Ii}{\sum_{1}^{m} Ii} , \qquad (1)$$

where k is a voltage-dimension constant defined by the transfer function of the divider itself.

#### 2.1 Complementary fuzzy membership functions

Low-power fuzzy controllers need relatively low tranconductance values for their membership function circuits. Consequently, CMOS triode transconductors can be used to meet that requirement smartly. The circuit of the complementary fuzzifier is depicted in figure 2. It is composed by two almost linear regulated-cascode transconductors (ML1, ML3, DAL - MR1, MR3, DAR) each one controlling one edge of the CFMF whose shape is nearly an inverted trapezoid. Transistors ML2, MR2 have fixed large sizes, so that their gate-voltage-overdrive (Vgs-VTn) can be neglected. Reference voltages VKL, VKR define the knees where conduction begins falling towards zero or

rising towards Io respectively in each transconductor. Slopes and knees are independently programmable.

The drain-source voltage drops Vds of transistors ML1, MR1 are kept constant over a wide range of the input voltage Vin, and their magnitudes are fixed by means of the artificially increased offset voltages of the differential amplifiers DAL, DAR. Since these offsets are smaller than the saturation drain-source voltage Vds<sub>sat</sub> of transistors ML1, MR1, the same are forced to operate in the triode region. Thus, their transconductance gm, defining the slopes of the trapezoid, is given by

$$gm = \frac{\partial Id}{\partial Vgs} = \mu Cox \frac{W}{L} Vds. \tag{2}$$

In figure 3, the schematic of differential amplifiers is shown. By sizing  $(W/L)_{M1} > (W/L)_{M2}$  a voltage offset between their inputs ( V- - V+ = Vds ) is established and is linearly controlled by the voltage source Vs as follows:

$$Vds = \left(\sqrt{\frac{(W/L)_{M5}}{2(W/L)_{M2}}} - \sqrt{\frac{(W/L)_{M5}}{2(W/L)_{M1}}}\right) (Vdd - |VTp| - Vs). \quad (3)$$

In this way slopes can be electrically tuned, which is an advantage when analogue storage is available compared to the typical four transistors CFMF operators [3]. In the last, input transistors are saturated and tail current Io must be fixed (Io  $\equiv$  logical '1'). Even if in both cases slopes are discretely programmed via a set of different sized input transistors, calling N the ratio between the maximum and minimum desirable slopes, the ratio between the maximum and minimum transistor size needed in our case, from (2), becomes N. For the second case the last ratio is equal to  $N^2$ . Moreover, in this version of the controller we have performed a combination between a few discrete values of Vs and input transistor sizes in order to optimise the slopes range capability at a low cost in terms of silicon area. Figure 4 shows some measured curves. Note that we could easily get  $N\approx9.5$ .

#### 2.2 Multiple-input MAX operator

The Winner-Take-All circuit presented in [5] was adopted for the MAX operator, but some modifications were introduced. The circuit depicted in figure 5 is composed by q current - controlled voltage sources (M1, M2, M3, M4 and M5) connected to a common node Vc and fighting to impose their own voltage, which is proportional to their controlling current source. Transistors Mc1, Mc2, connected as a cascode-diode and common to all cells, convey the highest current at the output. Gate voltages of transistors M1 belonging to the losers fall and those transistors are off. Diode-connected transistors M2, M3 guarantee a voltage level of at least 2VTn at the gate of loser transistors M1. In this way the recovering time delay of these cells (i.e. when any of them pass from loser to winner) is improved. Since transistors M4, M5 are cascoded an accurate replica of the winner current is ensured.

# 2.3 Consequents and defuzzifier

Singletons: for the consequent of each rule a discrete singleton  $\alpha$  smaller than 1 is given by

$$\alpha i = (Cn\text{-}1)_i \ 2^{\text{-}1} + (Cn\text{-}2)_i \ 2^{\text{-}2} + ..... + (C0)_i \ 2^{\text{-}n} \ , \qquad (4)$$
 where i ranges from 1 to m and coefficients  $(Cn\text{-}1)_i....(C0)_i$  adopt binary values.

In figure 1, the outputs of the (n+1) current mirrors of the whole m-consequents set are column-wise summed to give the following (n+1) values

$$(\sum \text{Ii}); (\sum (\text{Cn-1})_i \text{Ii}); \dots; (\sum (\text{Co})_i \text{Ii}).$$
 (5)

Except for the above first term, all the others are weighted and summed in the common D/A whose circuit is shown in figure 8. Therefore, the output current Iout of the D/A is equal to

$$(\sum (Cn-1)_i \ 2^{-1} \ \text{Ii}) + ... + (\sum (C0)_i \ 2^{-n} \ \text{Ii}) = \sum \alpha i * \text{Ii}.$$
 (6)

Most approaches found in the literature to perform this operation use one individual D/A per rule. With the common D/A used here, a big saving of silicon area can be obtained compared to the local D/A approach. On the other hand, also in our case, the input capacitance of each consequent is reduced by a factor  $(2^n/n+1)$ . Finally, since the layout of the whole defuzzifier becomes smaller, routing capacitances are also diminished. As a result a considerable gain of speed can be achieved.

Moreover, in order to improve the accuracy of the converter, the same was built using non-minimum size transistors without expending too much of silicon area.

Analogue Divider: a novel current-input voltage-output divider [4] was specially designed to carry out the division operation in formula (1). The circuit is shown in figure 6. With equal sized transistors in each row of the circuit, the division is actually performed by transistors M1, M2, M3 at the bottom layer, all of them being constrained to operate in the triode region. The drain to source voltage drops Vds of those transistors are matched thanks to common-gate connected transistors M4, M5, M6 that convey the same current. This is guaranteed by the upper PMOS cascode-mirror (M7 to M12). While Vb1 and Vbo are fixed bias voltages, transistor M3 gate voltage Vout is self-adjusted so that the drain current of M6 matches the current imposed by the PMOS cascoded-mirror branch M9, M12. In this way, the following relation holds [4]

$$(Vout - Vbo) = Vo = (Vb1 - Vbo) \frac{IN}{ID}.$$
 (7)

Thus, if Vout is referred to Vbo a two-quadrant divider is obtained. Since this divider behaves like a current-to-voltage converter, there is no need for extra interface converter circuits neither at the inputs [6] nor at the output [7, 8]. Figure 7 displays some measured results using a HP4145 equipment. Relative errors are below 2%. The output offset (when IN=0) is lower than 1.6mV. The division operation does not take more than 60ns to be performed.

### 3. Performance and conclusions

In the fabricated two-inputs, one-output, nine-fixed rules controller, there are three fuzzy labels available per input. Each four-parameters CFMF is 18-bits programmable (2x5 bits for knees and 2x4 bits for slopes). Tail current Io was set to  $10\mu A$ . Input voltages range from 1.3V to 4.5V. Consequents are 5-bits programmable. Figures 9 and 10 show the simulated and measured output surfaces respectively for a particular setting of the controller. The RMS error between these surfaces remains lower than 2.7% in all measured chips.

Figure 11 shows the measured step response of the controller to a 0.55V input step at Vin1 while Vin2 remains constant (Vin1 at Ch1: 500 mV/div and Vo at Ch2: 200 mV/div). The total delay measured is around 190ns (@CL=13pF). Extrapolating this result for CL≤5pF the delay should be kept below 125ns. Consequently

more than 8 Mflips would be achieved inside the chip. This last feature must be taken as a proof of the optimal strategy adopted for the design, namely: the avoidance of intermediate voltage-to-current and/or current-to-voltage converters, the reduced complexity of the defuzzifier and the simplicity of the divider.

Figure 12 shows the controller microphotograph. Its core occupies  $3040 \times 1500 \ \mu m^2$  including digital storage circuits which represent the 50% of the total area. The chip dissipates 8.4 mW at Vdd=5V. Tables 1-3 display main transistors aspect ratio.

Experimental results confirm that the controller is suitable for high-speed low-power embedded subsystems as it is required in several Analogue Signal Processing application chips.

# 4. Acknowledgements

M. Verleysen is a research associate of Belgian National Fund for Scientific Research (FNRS).

Authors want to thank the Universidad católica de Córdoba at Argentina and the ARAMIS Belgian association for their financial support.

# 5. References

- [1] Mancuso M., D' Alto V., De Luca R., Poluzzi R. and Rizzotto G., "Fuzzy logic based image processing in IQTV environment", *IEEE Trans. on Consumer Electronics*, 1995, Vol. 41, N° 3, pp. 917-923.
- [2] Vidal-Verdú F., Navas R. and Rodríguez-Vázquez A., "A 16 rules @2.5 Mflips mixed-signal programmable fuzzy controller CMOS-1µ chip", *Proc. of the ESSCIRC'96*, 1996, pp.156-159.
- [3] Guo S., Peters L. and Surmann H., "Design and application of an analog fuzzy logic controller", *IEEE Transaction on Fuzzy Systems*, 1996, N° 4, pp. 429-438.
- [4] Dualibe C., Verleysen M. and Jespers P., "Two-quadrant CMOS analogue divider", *Electronics Letters*, June 1998, pp. 1164 1165.

- [5] Lazzaro J., Ryckebusch S., Mahowald M. and Mead C., "Winner-take-all networks of order N complexity", *Proc. 1988 IEEE Conf. on Neural Information Processing Natural and Synthetic*, 1988, Denver, pp. 703 711.
- [6] Huertas J., Sanchez-Solano S., Baturone I. and Barriga A., "Integrated circuit implementation of fuzzy controllers", *IEEE JSSC*, 1996, Vol. 31, N° 7, pp. 1051 1058.
- [7] Liu D., Huang Y., and Wu Y., "Modular current-mode defuzzification circuit for fuzzy logic controllers", *Electronics Letters*, vol. 30, August 1994, pp. 1287 1288.
- [8] Marshall G. and Collins S., "Fuzzy logic architecture using subthreshold analogue floating-gate devices", *IEEE Transactions on Fuzzy Systems*, vol. 5, N° 1, pp. 32 43, February 1997.

| <u>CFMF</u>       | $M_{R1,L1}$                 |       |        | $M_{R2,L2}$ | $M_{R3,L3}$ |
|-------------------|-----------------------------|-------|--------|-------------|-------------|
| $(W/L) [\mu/\mu]$ | { 3.2/8; 3.2/8; 9/8; 12/4 } |       |        | 35/3.2      | 15/3.2      |
| DAL, DAR          | M1                          | M2    | M3, M4 | M5          | M6          |
| (W/L) [μ/μ]       | 2x 8/6.4                    | 8/6.4 | 30/4   | 40/4        | 24/4        |

Table 1. CFMF circuit sizing

| MAXIMUM           | M1   | M2, M3 | M4     | M5  |
|-------------------|------|--------|--------|-----|
| $(W/L) [\mu/\mu]$ | 4/10 | 12/2.4 | 12/2.4 | 8/4 |

Table 2. Maximum circuit sizing

| DIVIDER           | M1M3 | M4M6   | M7M12 |
|-------------------|------|--------|-------|
| $(W/L) [\mu/\mu]$ | 8/8  | 15/6.4 | 22/4  |

Table 3. Divider circuit sizing





Fig. 8. Weighting D/A circuit

Fig. 9. Target output of the controller

Fig. 10. Measured output of the controller



Fig. 11. Measured step response of the controller



Fig. 12. Chip photograph: FLC and isolated CFMF