# This document is downloaded from DR-NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore.

# A voltage scalable 0.26 V, 64 kb 8T SRAM with Vmin lowering techniques and deep sleep mode

Kim, Tony Tae-Hyoung; Liu, Jason.; Kim, Chris H.

2009

Kim, T. H., Liu, J., & Kim, C. H. (2009). A voltage scalable 0.26 V, 64 kb 8T SRAM with Vmin lowering techniques and deep sleep mode. IEEE Journal of Solid State Circuits. 44(6), 1785-1795.

https://hdl.handle.net/10356/90787

https://doi.org/10.1109/JSSC.2009.2020201

© 2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. http://www.ieee.org/portal/site This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Downloaded on 23 Aug 2022 03:21:22 SGT

# A Voltage Scalable 0.26 V, 64 kb 8T SRAM With $V_{min}$ Lowering Techniques and Deep Sleep Mode

Tae-Hyoung Kim, Student Member, IEEE, Jason Liu, Member, IEEE, and Chris H. Kim, Member, IEEE

Abstract—A voltage scalable 0.26 V, 64 kb 8T SRAM with 512 cells per bitline is implemented in a 130 nm CMOS process. Utilization of the reverse short channel effect in a SRAM cell design improves cell write margin and read performance without the aid of peripheral circuits. A marginal bitline leakage compensation (MBLC) scheme compensates for the bitline leakage current which becomes comparable to a read current at subthreshold supply voltages. The MBLC allows us to lower  $V_{\rm min}$  to 0.26 V and also eliminates the need for precharged read bitlines. A floating read bitline and write bitline scheme reduces the leakage power consumption. A deep sleep mode minimizes the standby leakage power consumption without compromising the hold mode cell stability. Finally, an automatic wordline pulse width control circuit tracks PVT variations and shuts off the bitline leakage current upon completion of a read operation.

Index Terms—Bitline leakage compensation, floating bitlines, low-voltage SRAM design, minimum operation voltage, sleep mode.

### I. INTRODUCTION

UBTHRESHOLD logic circuits are becoming increasingly popular in ultra-low-power applications where minimal power consumption is the primary design constraint [1]–[4]. Static CMOS logic consumes roughly an order of magnitude less power when operating in subthreshold, compared with normal strong-inversion operation. However, the MOS current becomes an exponential function of gate and threshold voltage in the subthreshold regime. This leads to an exponential increase in MOS current variability under process, voltage, and temperature (PVT) fluctuations.

SRAMs with a wide range of supply voltages are necessary for achieving high performance during normal modes while minimizing power consumption during low voltage modes [5]. For a reliable operation from the strong-inversion region down to the subthreshold region, key memory design metrics such as noise margin, speed, and power consumption need to be examined across this range of supply voltages. In the subthreshold region, conventional 6-T SRAMs fail to deliver the density and yield requirements due to the reduced read static noise margin (SNM), poor writability, limited number of cells per bitline, reduced bitline sensing margin, and increased impact of PVT

Manuscript received October 27, 2008; revised March 04, 2009. Current version published May 28, 2009.

The authors are with the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: thkim@umn.edu)

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2009.2020201



Fig. 1. (a) Previous 8T SRAM cell [5]. (b)–(d) Previous 10T SRAM cells [6], [9], [11].

variations. Decoupled SRAM cells have been proposed to make the read mode SNM equal to the hold mode SNM by isolating the SRAM cell nodes from the bitline [6]–[9]. Writability has been improved in prior designs by using a boosted wordline voltage that increases the drivability of the write devices or a collapsed cell supply that weakens the storage devices [6], [10].

In this work, we demonstrate a voltage scalable 0.26 V, 64 kb SRAM with 512 cells per bitline using several circuit techniques that can be activated at ultra-low voltages to expand the operating range. Those novel techniques include the following: (i) 8T SRAM cell utilizing the reverse short channel effect (RSCE) for improved writability and read performance; (ii) the marginal bitline leakage compensation (MBLC) scheme for improved read sensitivity and precharge elimination; (iii) floating read bitlines (RBL) and write bitlines (WBL) to minimize bitline leakage; (iv) deep sleep mode for reducing standby cell leakage; and (v) automatic read wordline pulse width control for improved bitline sensing margin and lower leakage power.

# II. PREVIOUS SUBTHRESHOLD SRAM CIRCUIT TECHNIQUES

Designing subthreshold SRAMs is challenging due to the degraded cell stability, small Ion-to-Ioff ratio, and large current variations. In this section we will discuss several circuit techniques that have been proposed to mitigate these design issues associated with subthreshold SRAMs.

Cell read stability is a critical design parameter in SRAMs. A decoupled cell is inevitable for subthreshold SRAMs as it achieves the maximum read SNM at a given supply voltage for



Fig. 2. (a) Normalized  $V_{th}$  versus channel length shows that RSCE effect is more severe in scaled technologies. (b) Normalized current drivability and delay versus channel length.

the same area constraint. Fig. 1 shows previous 8T and 10T SRAM cells with decoupled cell nodes [5], [6], [9], [11]. Most of these cells use the 6T SRAM structure for data storage and write operation. All minimum sized devices can be used because the operation is no longer limited by the read stability problem, as the separate read port decouples the cell node from the read bitline. No disturbance current flows between the storage transistors and the read bitline making the read SNM equal to the ideal hold SNM.

Reliably sensing the read bitline voltage is another critical challenge for subthreshold SRAMs. Verma *et al.* proposed using a redundant sense amplifier to improve the failure probability for bitline sensing [12]. Instead of using a single sense amplifier, two half-sized amplifiers were used to perform the single ended bitline sensing. It is claimed that the failure rate of the two half-sized sense amplifiers is smaller than that of a single sense amplifier when one out of the two half-sized sense amplifiers is selected through a start-up selection routine. However, this technique can only be used when the failure rate of the half-sized sense amplifier is low enough, which limits the minimum operational voltage. Zhai *et al.* proposed using a transmission gate as the access device [13]. However, the read disturbance problem will likely become worse for this SRAM cell due to the current through the access path.

Finally, the write operation becomes problematic in subthreshold SRAMs as the variation in current worsens due to its larger sensitivity to PVT parameters. In particular, weak write access transistors and strong pull-up PMOS transistors can cause a cell write failure to occur. To avoid this problem, write access transistors must be strong enough to overwrite the cell data even under the worst case PVT parameters. Various write margin improvement techniques have been proposed to make the access devices stronger compared to the storage devices [6], [10], [12]. The collapsed supply rail scheme lowers the cell supply voltage during write operations weakening the current drivability of the PMOS storage transistors. However, the lowered supply voltage degrades the cell stability of SRAM cells in the pseudo-write mode (also known as the half-select mode) because the PMOS storage devices also become weaker due to the shared supply node. Furthermore, the SRAM cell stability is already close to the data retention limit in subthreshold SRAMs



Fig. 3. Schematic and layout of the proposed 8T SRAM cell utilizing RSCE.



Fig. 4. (a) Write margin improvement at different supply voltages by utilizing RSCE. (b) Read performance improvement utilizing RSCE.

making this scheme infeasible even with a column-by-column supply control. Alternatively, the boosted wordline scheme increases the drive strength of the write access transistors to improve cell write margin over that of pull-up PMOS transistors under process variations. However, this scheme cannot be used for column-muxed array architectures because it also increases the drive strength of the write access transistors in pseudo write mode. It also requires additional circuitry to generate and route the boosted supply voltage.

## III. V<sub>min</sub> Lowering Circuit Techniques

### A. 8T SRAM Cell Utilizing the Reverse Short Channel Effect

The reverse short channel effect (RSCE) is observed in modern CMOS devices due to the HALO pocket implants used to compensate for the  $V_{\rm th}$  roll-off. This causes the  $V_{\rm th}$  to



Fig. 5. Marginal bitline leakage compensation (MBLC) scheme.



Fig. 6. Schematic of sense amplifier with trip point trimming circuits.

increase with decreasing channel length, as shown in Fig. 2(a). RSCE becomes pronounced at lower supply voltages due to the significantly reduced drain induced barrier lowering (DIBL) effect [9]. Since device current is an exponential function of  $V_{\rm th}$ , RSCE increases the current per width as the channel length increases at sub-0.6 V [Fig. 2(b)]. Unlike using wider devices to increase drive current, increasing the channel length can improve drive current without increasing the junction area or bitline capacitance. Fig. 3 shows the schematic and layout of the proposed 8T SRAM cell. A minimum sized conventional 6T SRAM cell structure is used for data storage and write operation. Two NMOS devices are used for the read path with the cell node being isolated from the read bitline (RBL). The



Fig. 7. The best case sensing margin occurs when the accessed bitline and the replica bitline have identical leakage currents. Conversely, the sensing margin is worst for an all-'0' column which has the minimum bitline leakage.

proposed 8T SRAM cell uses a 3X longer channel length in the write access devices and a 2X longer channel length in the read path devices (Fig. 3). The 3X longer channel length offers a 2.4X higher drive current [Fig. 2(b)]. However, the improved current drivability reduces the stability of the half-selected cells. Circuit techniques such as the write-back scheme that we proposed in [9] can be adopted to remove this issue. (The write-back scheme was not implemented in this test chip.) The 2X longer channel length in the read path devices improves the



Fig. 8. (a) RBL voltage when the accessed column has the same data as replica column. (b) RBL voltage with different column data. (c) RBL voltage with different column data after applying optimal body biasing (this work).



Fig. 9. (a) Data dependent bitline leakage compensation using the floating write bitline voltage as the body bias. The nominal corner is used for simulation with the supply level of 0.2 V at room temperature. (b) Impact of cell current degradation on sensing margin.

read speed without incurring additional cell area penalty. The proposed SRAM cell also has a smaller variation due to the larger device sizes [14]. The proposed 8T SRAM cell utilizing RSCE has an area overhead of 20% compared to a conventional all minimum sized device 8T cell (Fig. 3) [14]. Fig. 4 shows the simulated results of write margin improvement and read performance. Compared to previous 8T cells, the proposed cell improves write margin by 66 mV (33%) and boosts read performance by 56.9% at 0.2 V without any increase in the bitline capacitance or the need for additional peripheral circuitry. Utilization of RSCE for improving current drivability is effective when the supply voltage is around or below  $V_{\rm th}$ . The improvement of write margin and read performance becomes more significant as the supply voltage decreases to these levels because of the stronger impact of RSCE on device current.

## B. Marginal Bitline Leakage Compensation (MBLC) Scheme

At low supply voltages, transistor I<sub>on</sub>-to-I<sub>off</sub> ratio decreases exponentially, which can cause the bitline leakage current to become significant compared to the SRAM cell read current. This makes it increasingly difficult to detect the cell data, as the inactive cells' leakage current can offset the read bitline voltage level. In addition, the amount of the bitline leakage is a function of the column data, which makes it even more challenging to distinguish the SRAM cell current from the bitline leakage current. To tackle this issue, Agawa *et al.* proposed a bitline leakage current compensation scheme using analog circuitry and MOS

capacitors [15]. In this technique, the bitline leakage of each accessed column is measured during the precharge time using a PMOS diode. The diode voltage drop is stored in a capacitor and is used to inject an equal compensate current to the bitline when the read wordline signal is asserted. However, this technique cannot be used when the supply voltage is near or lower than the threshold voltage, as the voltage drop cannot be reliably sensed. In addition, the peripheral circuitry required for each bitline costs a significant area overhead for the SRAM. In this work, we propose a marginal bitline leakage compensation (MBLC) technique suitable for bitline leakage compensation in ultra-low-voltage SRAMs.

The MBLC scheme shown in Fig. 5 compensates for the RBL leakage in the unaccessed cells using a replica bitline with dedicated control circuits. The RBL voltage is tuned to settle just above the sense amplifier (SA) trip point by turning on the marginal compensation devices, which is based on the replica bitline circuit. When a logic '0' is read, only a small swing is required to change the SA output, which is beneficial when the cell current is comparable to the bitline leakage current. The logic level of RBL during read operation is decided by the static balance between the cell read current ( $I_{\rm cell}$ ), the pull-down leakage current ( $I_{\rm lol_leak}$ ), and this marginal compensation current ( $I_{\rm cmp}$ ) as shown in Fig. 5. The marginal compensation current should be large enough to produce logic '1' for the worst case pull-down leakage current, while still being small enough to produce logic '0' for the pull-down cell current and



Fig. 10. (a) RBL waveforms for a conventional precharged bitline. (b) RBL\_REPLICA waveforms of the proposed MBLC scheme for maximum and minimum bitline leakage cases.



Fig. 11. The proposed MBLC scheme improves sensing margin compared with the conventional precharged bitline. The conventional precharged bitline fails in read operations. (a) Sensing margin of this work at different corners. (b) Sensing margin of this work at different temperatures.

the smallest bitline leakage. The replica bitline generates the marginal compensation current to be used in an array (Fig. 5).

A feedback loop controls the strength of the marginal compensation current charging RBL\_REPLICA up to a point where the SA output switches to '1' by progressively turning on the marginal compensation devices. Cell data in the replica bitline is hardwired to generate the maximum bitline leakage. This configuration was chosen to emulate the large bitline leakage current and small RBL sensing margin condition. Initially, the SA output is '0' because bitline leakage current pulls down RBL\_REPLICA and cmp(3:0) is initialized with '1's, turning off all marginal compensation devices. An increasing number of compensation devices are then turned on raising the level of RBL REPLICA until the SA output switches to '1'. The digital code from the replica bitline is used in array bitlines to generate the compensation current. The compensation devices are activated only during the short read windows because RBL voltage is determined by the static current balance. This is different from the conventional strong-inversion SRAM read operation where the device Ion-to-Ioff ratio is sufficiently large and bitline voltages are decided by the dynamic operation, discharging the precharged bitlines conditionally.

Additional margin for '1' can be built into the SAs by selectively turning on extra precharge devices in the accessed bitline and providing a more compensation current. This margin can be used to make all RBLs have a large enough compensa-

tion current to reliably generate data '1' without a pull-down cell current, accounting for within-die variations. The marginal precharging level can also be trimmed by changing the trip point of the SA. Fig. 6 shows the simplified schematic of the SA implemented in our design. By turning on additional devices here, we can change the SA trip point, which in turn adjusts the marginal precharging level.

However, a fixed compensation current can be problematic because the ideal compensation currents for the bitlines can be different from the replica bitline leakage due to the data dependant bitline leakage current. Section III-C describes the column data dependency of the compensation current and a circuit technique to deal with this issue.

### C. Column Data Dependency of MBLC Current

The optimal compensation current depends on the data pattern in a column because the amount of bitline leakage is also a function of this data. Since the replica bitline generates the marginal compensation current for the column data pattern resulting in the worst case bitline leakage, a method for incorporating column data dependency must be devised. In this work, data dependency was accounted for by connecting the body of the compensating PMOS devices to the floating WBL voltage, which is also determined by the data pattern stored in the SRAM column. The floating WBL is possible because this bitline does



Fig. 12. Power reduction using floating read and write bitlines. It is assumed that the probability of writing a '0' is equal to that of writing a '1'.

not need to be precharged during non-write operations as it does in conventional SRAMs.

The column data patterns of the replica bitline and the array bitlines are shown in Fig. 7. The best case bitline has the same data as the replica bitline. In this scenario, the compensation current will be identical to the bitline leakage current. On the other hand, the column data pattern giving rise to the minimum bitline leakage causes the worst-case discrepancy between the compensation current and the actual bitline leakage. Fig. 8 illustrates the change of RBL voltage due to column data patterns, and the principle of using body biasing to incorporate this dependency. The accessed column and replica column have the same RBL signal levels when they contain the same data [Fig. 8(a)]. However, the difference in the column data pattern will raise the RBL level due to the imbalance between the compensation current and the bitline leakage current, which degrades sensing margin [Fig. 8(b)]. This is inevitable as the replica bitline has to be hardwired with the data pattern generating the largest compensation current for reliable read operations with large bitline leakage current. To solve this problem, the floating WBL voltage which changes with the column data is used as the body bias of the marginal compensation devices.

The floating WBL voltage rises with more cells in the column storing data '1', which in turn decreases the amount of marginal compensation current by weakening the forward body bias in



Fig. 13. (a) Conventional sleep mode. (b) Proposed deep sleep mode.

the PMOS compensation devices. The decreased compensation current cancels out the difference between the required bitline leakage current and the provided compensation current, which makes the RBL similar to that in the replica bitline [Fig. 8(c)]. Simulation results for this compensation scheme are illustrated in Fig. 9(a). As shown here, the body bias control using the floating WBL tracks the column data pattern and moves the compensation current close to the optimal matching bitline leakage. The maximum error in the compensation current was only 7.13% without considering within-die variations. A smaller cell read current due to within-die variations can increase the RBL level reducing the sensing margin for data '0'. Fig. 9(b) shows the impact of cell current degradation on the sensing margin. These simulation results show that the

MBLC scheme ensures a correct operation until the cell current is reduced by 64%.

Fig. 10 compares the proposed MBLC scheme to the conventional precharged bitline scheme during read operations. In the conventional scheme [Fig. 10(a)], the bitline leakage discharges RBL at a rate comparable to the cell current, which reduces bitline sensing margin. Furthermore, the RBL discharging speed of data '1' with the maximum bitline leakage is faster than that of data '0' with the minimum bitline leakage current. A sense amplifier cannot detect the read data correctly from a single ended bitline in this case. However, the proposed MBLC scheme generates a compensation current that tracks the column data and static bitline levels, making the bitline sensing margin constant over time. Fig. 10(b) shows RBL\_REPLICA waveforms with two different hardwired patterns. The body bias control of the compensation devices enhances the sensing margin of data '0' in the minimum bitline leakage condition. The change of RBL\_REPLICA is shown as the MBLC control circuit adjusts the compensation current. Simulated RBL sensing margins for different process corners and temperatures are illustrated in Fig. 11. The sensing margin decreases as temperature increases since the bitline leakage increases faster than the cell current.

### IV. ACTIVE LEAKAGE REDUCTION AND DEEP SLEEP MODE

### A. Floating Read/Write Bitlines

Leakage current in inactive memory cells accounts for most of the SRAM power consumption. Circuit techniques for leakage control are particularly critical for reducing the total power consumption in the subthreshold region. RBL leakage is one of the most dominant leakage components and is inevitable in conventional memories where bitlines are precharged to VDD. In our design, the RBLs are left floating without being precharged whenever the read wordline (RWL) is low. This is possible because the RBL level is decided by the static current balance between the bitline leakage current, compensation current and cell read current. During the read operation, the MBLC scheme provides the compensation pull-up current to generate logic high or low levels in the RBL with large sensing margin. The static operation in deciding RBL makes the precharging operation unnecessary. During the non-read operation, however, the floating RBL level is determined by the strong pull-down leakage current formed by the read path in the SRAM cells and the negligible pull-up leakage current through the compensation devices. This makes RBL converge to GND, eliminating the leakage current from RBL in Fig. 12 (top).

Like the floating RBL, write bitlines (WBL and WBLB) are also left floating when WWL is low so that they will automatically settle to levels which minimize the leakage current as shown in Fig. 12 (middle). Forcing a specific voltage will break the balance of leakage current flowing through pull-up and pull-down devices and make one larger than the other. During a write operation, WBL is driven by the write driver. Therefore, precharging WBL is also redundant. The proposed scheme has no energy overhead during the write operation compared to the conventional scheme due to the same voltage swing. This is based on the assumption that the probability of writing a data '1' and a data '0' are the same. WBL and WBLB are not at the same





Fig. 14. (a) Leaky current path at the interface circuit in deep sleep mode.(b) Simulated leakage reduction.

level. If WBL is higher than WBLB, writing a '0' to WBL and a '1' to WBLB will consume more energy than the conventional write operation. Writing the opposite data, however, will save the energy because both WBL and WBLB have smaller swings. Assuming the same probability of writing a '0' and a '1', the energy consumption in write operations can be calculated by the equations in Fig. 12 (middle). A leakage power reduction using the floating RBL and WBL is summarized in Fig. 12 (bottom). A total SRAM leakage reduction of 44% to 60% can be obtained by using the floating RBLs and WBLs. The variations in power reduction happen because the different column data pattern changes the floating WBL voltage, which also changes the leakage reduction.

### B. Deep Sleep Mode

Sleep transistors are popular for reducing SRAM leakage current in standby mode by collapsing the virtual supply rails [16], [17]. However, due to the fact that the voltage margin is already close to the functionality limit, it is difficult to use conventional footer sleep transistors for subthreshold SRAM designs. In this work, we propose a deep sleep mode illustrated in Fig. 13(b) to reduce the standby leakage in subthreshold memory designs. VDDC and VSSC represent virtual supply and virtual ground voltages of the SRAM array. Instead of collapsing VSSC for a sleep mode as shown in Fig. 13(a), the proposed scheme raises both VDDC and VSSC while



Fig. 15. Read wordline pulse width control for PVT tracking.



Fig. 16. Within-die variation causes read failures when array bitlines are slower than the replica bitline. Failure rate is reduced by adding more delay to give enough timing margin under within-die variation.

keeping the cell voltage, VDDC-VSSC, constant to reduce leakage while maintaining the same cell stability in the deep sleep mode. SRAM cell leakage is reduced due to the negative VGS in the write access transistors and the increased threshold voltage of the pull-down NMOS devices due to the reverse body bias. However, raising both VDDC and VSSC increases the floating write bitline voltages (WBL and WBLB) because they are decided by the column data pattern and the SRAM cell node voltages. If both VDDC and VSSC are raised excessively, the pull-up path in the interfacing circuit becomes leaky and a current starts to flow from the write bitlines to the virtual supply nodes. Fig. 14 highlights the leaky current path and illustrates the normalized SRAM leakage reduction using the proposed deep sleep mode. A leakage current decreases as increasing VDDC and VSSC. By applying an optimal supply voltage (VDDC = 0.83 V, VSSC = 0.60 V), 87% reduction in the cell leakage was obtained during the deep sleep mode. Half '0's and half '1's are assumed in the simulation. Raising VDDC and



Fig. 17. Test chip architecture.



Fig. 18. (a) Measured SRAM total power consumption. (b) SRAM leakage current varying supply voltage. (c) Normalized leakage current at different temperature.

VSSC beyond the optimal point increases the leakage current exponentially.

### C. Automatic Wordline Pulse Width Control

The RWL activation time should be long enough for the sense amplifier to function reliably, but it should be turned off soon after the read operation is finished to cut off the marginal compensation current and reduce the power consumption. In order to address this tradeoff, we propose a scheme to automatically adjust the read wordline pulse width based on PVT variations (Fig. 15). A replica bitline generates the wordline pulse width needed for the SA to precisely capture the read data. The cell data in the replica bitline is hardwired so that an RBL\_REPLICA pulse is generated for each read cycle. The delayed SA output RD\_FIN from the replica bitline disables the read wordline and shuts off the marginal precharge devices. By doing so, the RWL is only enabled until the read operation is completed, saving the RBL leakage power. Another issue to be considered is the impact of within-die variations on the wordline pulse width. Fig. 16 shows a failure scenario where read data, D(i), arrives later than the read data from the replica bitline due to within-die variations. To address this problem, an eight FO1 inverter delay chain is inserted in the replica bitline path to provide enough timing margin for correct a read operation. With this additional timing margin, the proposed SRAM is tolerant to a cell current variation of up to 50%.

### V. EXPERIMENTAL RESULTS

A 64 kb SRAM was fabricated in a 130 nm CMOS technology with a nominal supply voltage of 1.2 V. Fig. 17 shows the architecture of the implemented SRAM. It consists of two SRAM cell arrays, each with 512 rows and 64 columns, 16 IOs, and replica bitlines and added delay for the proposed MBLC and wordline pulse width control. Each cell array is divided into eight sub-blocks generating one bit per sub-block, and a sub-block is composed of eight columns. Fig. 18 shows the measured power consumption and leakage current. We observed SRAM cells functional down to 0.23 V running at 100 kHz and consuming 4.3  $\mu$ W [Fig. 18(a)]. At 0.4 V, the operation frequency was 6.7 MHz with a power consumption of 10.8  $\mu$ W. The measured SRAM leakage currents from different dies are shown in Fig. 18(b). The leakage current in SRAM array is



Fig. 19. Leakage current reduction in deep sleep mode.

around 5X of that in peripheral circuits. Variation in leakage current was 2.0X at 0.3 V due to its exponential dependency on device threshold voltage. The normalized leakage current measured at different temperatures is shown in Fig. 18(c). The leakage current at 110 °C is 3.4X larger than that at 27 °C when the supply voltage is 0.23 V. Fig. 19 illustrates the normalized leakage current reduction achieved using the proposed deep sleep mode. The total SRAM leakage including the array and peripheral components was reduced by 69% in the deep sleep mode by raising the VSSC to 0.45 V while maintaining the cell voltage of 0.23 V. The initial leakage reduction is large when raising VSSC due to the strong negative Vgs effect in conjunct with the reverse body biasing effect. 58% leakage reduction was achievable using a VSSC of 0.2 V during the deep sleep mode. The smaller offset in VDDC and VSSC improves the efficiency and area overhead of the charge pumps that can be used to generate the voltage on-chip [18]. In this test chip, we used an external supply for the higher supply voltages needed during the deep sleep mode.

Fig. 20 shows the shmoo plot of a single SRAM cell when the proposed MBLC scheme is on and off. When the MBLC scheme is off, a conventional fixed precharge device is used. The  $V_{\rm min}$  of the SRAM cell under test is improved from 0.28 V to 0.23 V by activating the MBLC scheme. Fig. 21 illustrates the measured  $V_{\rm min}$  of each SRAM cell for read and write operations from an 8-by-8 mini-subarray.  $V_{\rm min}$  for read operation



Fig. 20. Shmoo plot for an SRAM cell with a 0.23 V  $\rm V_{min}$ .



Fig. 21. V<sub>min</sub> for read and write from an 8-by-8 mini-subarray.



Fig. 22. Output waveforms from marginal bitline leakage compensation control circuit.

ranges from 0.24 V to 0.26 V and  $V_{min}$  for write operation ranges from 0.18 V to 0.20 V. We have also tested the feedback control circuit for the MBLC scheme which compensates the bitline leakage on the fly. The 4-bit counter used in the MBLC requires up to 16 clock cycles to generate the optimal precharge strength. Fig. 22 shows SA outputs with two different trip points to mimic two different compensation currents. It is shown that a SA with a higher trip point requires additional cycles to turn on more number of compensation devices. Similarly, more devices should be turned on for a larger bitline leakage current due to process variations. The die photo and chip performance summary are given in Fig. 23. The proposed MBLC and read wordline pulse width control scheme incur an area overhead of 1.3%.

| Array (512 x 64) | 2 Decoder | Array (512 × 64) | Technology               | 130nm 8-metal CMOS                          |
|------------------|-----------|------------------|--------------------------|---------------------------------------------|
|                  |           |                  | SRAM Area                | 0.72x0.85mm <sup>2</sup>                    |
|                  |           |                  | vcc                      | ≥0.26V                                      |
|                  |           |                  | Size                     | 64kb                                        |
|                  |           |                  | Performance              | 100kHz @ 0.23V, 27°C<br>15MHz @ 0.60V, 27°C |
|                  | Ckt       |                  | Power<br>Consumption     | 4.3μA @ 0.23V, 27°C<br>34μA @ 0.60V, 27°C   |
|                  |           | 777              | Deep Sleep<br>Mode Leak. | 69% reduction                               |

Fig. 23. Chip microphotograph and performance summary.

### VI. CONCLUSION

Lowering V<sub>min</sub> and reducing leakage become more important in applications where energy dissipation is the primary design constraint. This paper proposes circuit techniques for lowering V<sub>min</sub> and minimizing leakage. Utilizing RSCE in the read and write ports of the SRAM cell improves write margin and read performance. The MBLC scheme lowers V<sub>min</sub> by compensating bitline leakage and improving bitline sensing margin. The proposed floating bitline scheme and deep sleep mode improve the leakage current reduction during a normal operation and a standby mode. An automatic read wordline pulse width control scheme improves readability and reduces wasted read power by tracking the PVT variations. A 64 kb SRAM fabricated in a 130 nm CMOS technology with 512 cells per bitline verifies the V<sub>min</sub> lowering and leakage reduction achieved by the proposed circuit techniques. These techniques facilitate a superior minimum energy solution through improved leakage reduction and the enhanced SRAM performance.

### REFERENCES

- [1] H. Kim, H. Soeleman, and K. Roy, "An ultra-low power DLMS filter for hearing aid applications," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 11, no. 6, pp. 1058–1067, Dec. 2003.
- [2] A. Bryant et al., "Low-power CMOS at Vdd = 4 kT/q," in Proc. IEEE Device Research Conf., 2001, pp. 22–23.
- [3] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, "Analysis and mitigation of variability in subthreshold design," in *Proc. IEEE Int. Symp. Low Power Electronics and Design (ISLPED)*, Aug. 2005, pp. 20–25.
- [4] A. Wang and A. Chandrakasan, "A 180-mV subthreshold FFT processor using a minimum energy design methodology," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 310–319, Jan. 2005.
- [5] L. Chang et al., "A 5.3 GHz 8T-SRAM with operation down to 0.41 V in 65 nm CMOS," in Symp. VLSI Circuits Dig., Jun. 2007, pp. 252–253.
- [6] B. H. Calhoun and A. Chandrakasan, "A 256 kb sub-threshold SRAM using 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig.*, Feb. 2006, pp. 628–629.
- [7] L. Chang *et al.*, "Stable SRAM cell design for the 32 nm node and beyond," in *Symp. VLSI Technology Dig.*, Jun. 2005, pp. 128–129.
- [8] J. Chen, L. T. Clark, and T. Chen, "An ultra-low-power memory with a subthreshold power supply voltage," *IEEE J. Solid-State Circuits*, vol. 41, no. 10, pp. 2344–2353, Oct. 2006.
- [9] T. Kim, J. Liu, J. Keane, and C. Kim, "A 0.2 V, 480 kb subthreshold SRAM with 1 k cells per bitline for ultra-low-voltage computing," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 518–529, Feb. 2008.
- [10] M. Yamaoka et al., "90-nm process-variation adaptive embedded SRAM modules with power-line-floating write technique," *IEEE J. Solid-State Circuits*, vol. 41, no. 3, pp. 705–711, Mar. 2006.
- [11] I. Chang, J. Kim, S. Park, and K. Roy, "A 32b 10T subthreshold SRAM array with bit-interleaving and differential read scheme in 90 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig.*, Feb. 2008, pp. 388–389.

- [12] N. Verma and A. Chandrakasan, "A 256 kb 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 141–149, Jan. 2008.
- [13] B. Zhai, D. Blaauw, D. Sylvester, and S. Hanson, "A sub-200 mV 6T SRAM in 0.13 μm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig.*, Feb. 2007, pp. 332–333.
- [14] T. Kim, J. Liu, and C. Kim, "An 8T subthreshold SRAM cell utilizing reverse short channel effect for write margin and read performance improvement," in *Proc. IEEE Custom Integrated Circuits Conf.*, Oct. 2007, pp. 241–244.
- [15] K. Agawa, H. Hara, T. Dakayanagi, and T. Kuroda, "A bitline leakage compensation scheme for low-voltage SRAMs," *IEEE J. Solid-State Circuits*, vol. 36, no. 5, pp. 726–734, May 2001.
- [16] K. Zhang et al., "SRAM design on 65-nm CMOS technology with dynamic sleep transistor for leakage reduction," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 895–901, Apr. 2005.
- [17] Y. Wang *et al.*, "A 1.1 GHz 12/Mb-leakage SRAM design in 65 nm ultra-low-power CMOS technology with integrated leakage reduction for mobile applications," *IEEE J. Solid-State Circuits*, vol. 43, no. 1, pp. 172–179, Jan. 2008.
- [18] H. Lee and P. Mok, "Switching noise and shoot-through current reduction techniques for switched-capacitor voltage doubler," *IEEE J. Solid-State Circuits*, vol. 40, no. 5, pp. 1136–1146, May 2005.



**Tae-Hyoung Kim** (S'06) received the B.S. and M.S. degrees in electrical engineering from Korea University, Seoul, Korea, in 1999 and 2001, respectively. He joined the Department of Electrical and Computer Engineering at the University of Minnesota, Minneapolis, in 2005 to pursue the Ph.D. degree.

In 2001, he joined the Device Solution Network Division, Samsung Electronics, Yong-in, Korea. From 2001 to 2005, he performed research on the design of high-speed SRAM memories. In summer 2007 and 2008, he was with IBM T. J. Watson

Research Center, Yorktown Heights, NY, where he worked on NBTI/PBTI-induced frequency degradation measurement circuit and impact of aging on SRAM mismatch. His research interests include low-power and high-performance VLSI circuit design in nanoscale technologies.

Mr. Kim received the 2008 AMD/CICC Student Scholarship Award, 2008 Departmental Research Fellowship from University of Minnesota, 2008 DAC/ISSCC Student Design Contest Award, 2008 Samsung Humantec Thesis Award (Bronze Prize), 2005 ETRI Journal Paper of the Year Award, 2001 Samsung Humantec Thesis Award (Honor Prize), and 1999 Samsung Humantec Thesis Award (Silver Prize).



Jason Liu (M'07) received the B.S. degree in electrical engineering and the B.S. degree in computer engineering from the University of Michigan, Ann Arbor, in 2003 and 2004, respectively. From 2004 to 2005, he worked in the VLSI Circuit Design group at IBM Rochester, where he was a member of the Broadway microprocessor design team, which designed the CPU for the Nintendo Wii. He received the M.S. degree in electrical engineering from the University of Minnesota, Minneapolis, in 2007.

His research interests include high-performance and low-power VLSI circuit design. He is currently working for a start-up in

Los Angeles, California.



**Chris H. Kim** (M'04) received the B.S. degree in electrical engineering and the M.S. degree in biomedical engineering from Seoul National University, Seoul, Korea. He received the Ph.D. degree in electrical and computer engineering from Purdue University, West Lafayette, IN.

He spent a year at Intel Corporation where he performed research on variation-tolerant circuits, on-die leakage sensor design and crosstalk noise analysis. He joined the electrical and computer engineering faculty at University of Minnesota,

Minneapolis, in 2004.

Prof. Kim is the recipient of the NSF CAREER Award, Mcknight Foundation Land-Grant Professorship, 3M Non-Tenured Faculty Award, DAC/ISSCC Student Design Contest Awards, IBM Faculty Partnership Awards, IEEE Circuits and Systems Society Outstanding Young Author Award, ISLPED Low Power Design Contest Award, Intel Ph.D. Fellowship, and Magoon's Award for Excellence in Teaching. He is an author or coauthor of 60+ journal and conference papers and has served as a technical program committee member for numerous circuit design conferences. His current research interests include digital, mixed-signal, and memory circuit design for silicon and non-silicon technologies.