# A Recursive Switched-Capacitor DC-DC Converter Achieving $2^{N}-1$ Ratios With High Efficiency Over a Wide Output Voltage Range 

Loai G. Salem, Student Member, IEEE, and Patrick P. Mercier, Member, IEEE


#### Abstract

A Recursive Switched-Capacitor (RSC) topology is introduced that enables reconfiguration among $2^{N}-1$ conversion ratios while achieving minimal capacitive charge-sharing loss for a given silicon area. All $2^{N}-1$ ratios are realized by strategically interconnecting $N$ 2:1 SC cells either in series, in parallel, or in a stacked configuration such that the number of input and ground connections are maximized in order to minimize cascaded losses. Importantly, all ratios are dynamically reconfigurable without disconnecting a single capacitor, all while ensuring optimal capacitance/conductance relative-sizing. The RSC topology is inherently regular, enabling recursive inter-cell connection and recursive binary-slicing that implement ratio-reconfiguration with minimum complexity and losses. A scalable all-digital binary search controller is employed to perform ratio-reconfiguration among the available $2^{N}-1$ ratios without using any ratio-threshold generation circuitry. To validate the topology, a 4 bit RSC is fully integrated in $0.25 \mu \mathrm{~m}$ bulk CMOS using MIM capacitors, achieving greater than $70 \%$ efficiency over a 0.8-2.2 V output voltage range with $85.8 \%$ peak-efficiency from a 2.5 V input supply. Compared to a co-fabricated three-ratio (1/3, $1 / 2$, 2/3) Series-Parallel SC converter, the RSC achieves a $40.4 \%$ larger output operating range (from 0.04 to 2.2 V ), and fills the efficiency-drops in-between the three-ratios by $8 \%$ with a $940 \Omega$ load.


Index Terms-Binary search, DC-DC converter, dynamic voltage scaling (DVS), portable electronics, power electronics, power management, pulse frequency modulation (PFM), recursive algorithm, switched-capacitor, voltage regulator, wide voltage range.

## I. Introduction

TODAY'S digital integrated circuits achieve a balance between performance and energy efficiency through dynamic voltage scaling (DVS) of individual processing cores in accordance with performance needs. As the number of voltage domains increases in today's system-on-chips (SoCs), generation of each supply voltage must occur not only efficiently, but within a small area. While linear regulators are compact and achieve fast response times [1], [2], their efficiencies are determined by the ratio of output to input voltage, potentially

[^0]limiting system-level energy efficiency [3]-[5]. On the other hand, switched-inductor DC-DC converters can achieve high efficiencies, yet typically require large off-chip inductors [6] or increased packaging complexity [7]-[11], limiting their ability to power many independent voltage domains in a small volume. To simultaneously address the efficiency/size trade-off, fully integrated switched capacitor (SC) DC-DC converters utilize high- $Q$ capacitors available in typical CMOS processes to convert and regulate power in an energy- and area-efficient manner [4], [12]-[19].

Unlike switched-inductor DC-DC converters, however, SC converters are only efficient at discrete ratios of input-to-output voltages, constricting efficient DVS operation to small supply voltage ranges. Increasing the number of reconfigurable ratios can solve this; however, doing so introduces two main challenges: capacitance utilization and relative sizing. In a fully integrated SC converter, the achievable efficiency is limited by the amount of committed capacitance; disabling even a small fraction of such capacitance significantly lowers the efficiency. Additionally, ensuring optimal relative sizing among the constituent capacitors can improve efficiency considerably [20]. Unfortunately, the complexity of conventional topologies, including the number of necessary capacitors and reconfigurations switches, increases significantly with the number of ratios, making simultaneous $100 \%$ capacitance utilization and optimal relative sizing extremely challenging. Thus, most SC converter designs employ only a small number of ratios [12], [16], [21]-[23], often resulting in large efficiency drops in-between the available ratios.

This paper presents the first demonstration of a SC converter that is reconfigurable among $2^{N}-1$ ratios without disconnecting a single capacitor while ensuring optimal relative sizing and high efficiency across a large output voltage range. The proposed Recursive SC DC-DC converter topology (RSC) [24], shown in Fig. 1, recursively divides the delivered output charge across $N$ 2:1 cells connected in cascade to generate $N$ bit ratios. By maximizing the number of input voltage and ground connections, charge-sharing losses are minimized, and in fact become a convergent geometric series with minimal additional losses incurred beyond 4 bit ratios. Given the inherently modular nature of the converter, $100 \%$ capacitance utilization is ensured by reconfiguring cell connections either in cascade (for high resolutions) or parallel (for lower resolutions), with binary slicing of the largest cascaded cell in order to enable reconfiguration


Fig. 1. The Recursive switched-capacitor realization. (a) Examples of the ratios $1 / 4,3 / 8$, and $5 / 16$. (b) Recursive SC topology pseudo-code generator. Each SC cell comprises two out-of-phase 2:1 SC for a well-posed SC network.
among odd and even resolutions, all while ensuring optimal relative sizing.

This paper is organized as follows. Section II introduces the RSC topology and discusses its theoretical performance compared with prior topologies. Section III presents architectural implementation details of a 4 bit RSC converter, while Section IV presents detailed circuit design. Experimental results of the test chip that verify the predicted performance are provided in Section V.

## II. Recursive Switched-Capacitor Topology

The most basic RSC building block is a $2: 1 \mathrm{SC}$ converter. As shown in Fig. 1(a), a 2:1 SC can be considered as a three-port circuit that includes two input ports $I N_{\text {top }}$ and $I N_{\text {bottom }}$ to receive a high and low input voltages, respectively, and an output port $M I D$ that provides the average of the voltages at the input ports, i.e., $\left(I N_{\text {top }}+I N_{\text {bottom }}\right) / 2$. The $2: 1 \mathrm{SC}$ cell equally loads its output port current on the two input ports ( $I N_{\text {top }}, I N_{\text {bottom }}$ ). The following subsections discuss how 2:1 SC building cells can be connected to realize $2^{N}-1$ conversion ratios while minimizing losses.

## A. Topology Definition and Steady-State Loss Analysis

Fig. 1(b) shows the Recursive SC topology pseudo-code. Starting with a single $2: 1 \mathrm{SC}$ that divides the converter input voltage, $V_{\mathrm{in}}$, into two intervals ( 0 -to- $V_{\mathrm{in}} / 2, V_{\mathrm{in}} / 2$-to- $V_{\mathrm{in}}$ ), the topology inserts a $2: 1 \mathrm{SC}$ cell in series between the previous cell output $M I D$ and the converter ground 0 , or stacked between $V_{\text {in }}$ and the output $M I D$ of the previous $2: 1$ cell, repeatedly, until the desired binary conversion ratio $m / 2^{N}$ is realized,
where $m<2^{N}$. Fig. 1(a) demonstrates examples of the ratios $1 / 4,3 / 8$, and $5 / 16$ at 2,3 , and 4 bit resolutions, respectively.

The proposed topology minimizes cascaded losses by maximizing the number of input voltage, $V_{\mathrm{in}}$, and ground, 0 , connections. Specifically, each $2: 1$ stage $C i$ has at least one input port connected to either the input voltage $V_{\text {in }}$ or the converter ground 0 , and thus each stage loads half of its output charge $q_{i}$ on the input supply, $V_{\mathrm{in}}$, or ground, 0 , instead of loading such charge on a previous cascaded stage. For example, Fig. 2 illustrates two different configurations that both realize an 11/16 ratio. In Fig. 2(a), the last stage $C 4$ loads half of the output charge $q_{\text {out }}$ on the second stage, $C 2$, which in turn loads the first stage, $C 0$, with $3 q_{\text {out }} / 8$. The third stage, $C 3$, loads the first stage by an additional $q_{\text {out }} / 4$, and thus the total charge delivered by the first stage is $5 q_{\text {out }} / 8$. In contrast, the RSC converter employs the configuration shown in Fig. 2(b), where the $I N_{\text {top }}$ of $C 4$ and the $I N_{\text {bottom }}$ of $C 3$ are directly connected to the converter input, $V_{\mathrm{in}}$, and the ground, 0 , respectively, and therefore, the loaded charge on $C 2$ and $C 1$ are both reduced by $q_{\text {out }} / 2$. For an arbitrary recursion depth $N$, each stage is loaded with a charge $q_{i}$ that is divided by a binary weight of the total output charge, $q_{\text {out }}$, such that $q_{i}=q_{\text {out }} / 2^{N-i}$, where $i$ is the stage order in the cascade.

It is known that the intrinsic loss mechanisms in a SC converter can be modeled by a finite output resistance, $R_{\text {out }}$ in either the slow or fast-switching limit (SSL or FSL, respectively): $R_{\text {SSL }}$, where the charge-sharing loss dominates, and $R_{\text {FSL }}$, where the switches' on-resistance dominates the losses [20], [25]. In the SSL, the total energy loss through the converter can be found by adding the charge-sharing loss across each capacitor $C_{i},\left(q_{i} / 2\right)^{2} / C_{i}$, and by normalizing the charge-sharing


Fig. 2. Charge flow through two inter-cell connections to realize the same ratio 11/16. (a) Non-optimal cascading connection. (b) Proposed RSC optimal connection with minimal inter-stage loading. Bold blocks are loaded with extra charge than the corresponding blocks in (b) with RSC connection. Bold arrows represent the extra loading charge.
power loss by the squared output current $I_{L}^{2}$, i.e., $\left(q_{\text {out }} f_{\mathrm{sw}}\right)^{2}$, the equivalent output resistance $R_{\text {SSL }}$ can be calculated as

$$
\begin{equation*}
R_{\mathrm{SSL}}=\sum_{i=1}^{N}\left(\frac{1}{2^{N-i+1}}\right)^{2} \frac{1}{f_{\mathrm{sw}} C_{i}} \tag{1}
\end{equation*}
$$

where $C_{i}$ is the total capacitance of the two flying capacitors per stage. The derived $R_{\mathrm{SSL}}$ is for a symmetric RSC, where each cell consists of two oppositely phased $2: 1 \mathrm{SC}$, which eliminates any charge-balance DC capacitor between the cascaded stages. Similarly, at the FSL, the current through each switch becomes the delivered current by that stage, which is a binary weighted fraction of the load current, $I_{L}$ (i.e., $I_{L} / 2^{N-i}$ ). Thus, the equivalent output resistance $R_{\text {FSL }}$, for a $50 \%$ duty-cycle converter clock, is

$$
\begin{equation*}
R_{\mathrm{FSL}}=\sum_{i=1}^{N} \sum_{j=1}^{4} \frac{1}{2}\left(\frac{1}{2^{N-i}}\right)^{2} R_{i, j} \tag{2}
\end{equation*}
$$

where the summation over $j$ accounts for the four switches per stage $i$, and each switch resistance $R_{i, j}$ results from two parallel switches in a symmetric RSC of eight switches. The total equivalent output resistance $R_{\text {out }}$ at a given switching frequency, $f_{\mathrm{sw}}$, occurring between the two asymptotes can be approximated by the Euclidean norm of the two limits, $R_{\mathrm{SSL}}$ and $R_{\mathrm{FSL}}$ [20]. From (1) and (2), the RSC equivalent output resistance $R_{\text {out }}$ only depends on $N$ and does not change across the resolution ratios.

Allocating a larger capacitance, $C_{i}$, for each stage results in a lower voltage swing, $\Delta V_{i}$, and lower charge-sharing loss, as dictated by (1). Given the limited available capacitance in a fully integrated SC converter, it is important to find the relative sizing of each stage capacitance $C_{i}$ from the total available on-die capacitance $C_{\text {tot }}$ to realize the minimal $R_{\mathrm{SSL}}$. For fully integrated capacitors with a single-voltage-rating and with no stacking of switches to block higher voltages, the optimal
capacitance and conductance relative-sizing match the relative charge transferred through each capacitor or switch, [20], [26], [27], and hence is binary weighted of the total available capacitance $C_{\text {tot }}$ and conductance $G_{\text {tot }}$ :

$$
\begin{align*}
C_{i} & =\left(\frac{2^{i-1}}{2^{N}-1}\right) C_{\mathrm{tot}}  \tag{3}\\
G_{i} & =\frac{1}{4}\left(\frac{2^{i-1}}{2^{N}-1}\right) G_{\mathrm{tot}} \tag{4}
\end{align*}
$$

With such optimal sizing, the equivalent output impedance at the two asymptotes can be found as

$$
\begin{align*}
R_{\mathrm{SSL}}^{*} & =\frac{1}{f_{\mathrm{sw}} C_{\mathrm{tot}}}\left(1-\frac{1}{2^{N}}\right)^{2}  \tag{5}\\
R_{\mathrm{FSL}}^{*} & =\frac{2}{G_{\mathrm{tot}}}\left(1-\frac{1}{2^{N}}\right)^{2} \tag{6}
\end{align*}
$$

To realize the highest possible efficiency for a given silicon area, it is desired to select the SC topology that incurs the lowest charge-sharing loss, $R_{\mathrm{SSL}}$, to deliver the same $q_{\text {out }}$ and conversion ratio. The power available from a SC converter normalized by the power available from a $2: 1 \mathrm{SC}$, using the same silicon area, can be used as a metric to compare various SC topologies in the SSL and FSL. After assigning the capacitors appropriate optimal relative sizing, the SSL normalized power available from a topology at a conversion ratio $m / n$ becomes $M_{\text {SSL }}=\left(m / n / \sum_{i} a_{c, i}\right)^{2}$, where $m<n$ and $a_{c, i}$ is the fraction of the output charge $q_{\text {out }}$ that flows through the capacitor $C_{i}$.

Fig. 3 compares five conventional SC topologies [28]-[30], as well as a successive approximation (SAR) SC converter [31] and the proposed RSC converter, using the established SSL metric, $M_{\mathrm{SSL}}$, where the capacitors of each topology are assigned the optimal relative sizing. The charge multiplier vectors of the various topologies can be found in [32] and through the analysis in [20]. The topologies are compared up


Fig. 3. The SSL power-available metric, $M_{\text {SSL }}$, for the seven topologies at binary ratios up to 5 bit resolution. The topology of the highest power available at certain ratio incurs the lowest charge-sharing loss for a given silicon area. (a) Power-available metric for the seven topologies. (b) Power-available metric for SP, symmetric RSC, and symmetric SAR topologies.


Fig. 4. The $R_{\text {SSL }}^{*}$ for the SP and symmetric RSC versus the binary ratios using a 1 F total capacitance and for a SC converter operated at 1 Hz .
to 5 bit binary conversion ratios. The SP topology $M_{\mathrm{SSL}}$ is also shown at the ratios $1 / 6,1 / 5,2 / 7,1 / 3,2 / 5,3 / 7$, while the Fibonacci topology $M_{\text {SSL }}$ is shown for the Fibonacci series ratios $1 / 21,1 / 13, \ldots, 1 / 2$. All topologies, with the exception of the Ladder topology, have the same $M_{\text {SSL }}$ at the minimum and maximum conversion ratios within each resolution, e.g., $1 / 2$, $1 / 4,1 / 8,1 / 16,1 / 32$ and $3 / 4,7 / 8,15 / 16,31 / 32$, respectively. As shown in Fig. 3(b), due to the binary division of the output charge across the various stages, the RSC cascading loss converges to an upper limit, $1 /\left(f_{\mathrm{sw}} C_{\text {tot }}\right)$, at large resolutions $N$, without further $M_{\text {SSL }}$ degradation. The other topologies exhibit an $M_{\text {SSL }}$ eye opening with higher resolutions $N$ for ratios $m_{\text {odd }} / 2^{N}$, where the SSL loss becomes the summation of a divergent series.

Fig. 4 shows the $R_{\text {SSL }}$, using a 1 F total capacitance and at 1 Hz switching frequency for the SP and the RSC topologies, with capacitors of optimal relative-sizing, across binary ratios


Fig. 5. The FSL performance metric $M_{\mathrm{FSL}}$ of the seven topologies at binary conversion ratios up to 5 bit resolution.
up to 5 bit resolutions. The RSC normalized $R_{\text {SSL }}^{*}$ saturates at an upper limit of $4 \times R_{\text {SSL }}$ of a $1 / 2$ ratio. Fig. 5 shows the FSL optimal-voltage metric [20] for the seven topologies at the same binary ratios as previously discussed. In general for fully integrated converters, capacitors consume most of the die area, and thus topologies that achieve the lowest SSL loss for a given silicon area (i.e., topologies with the highest $M_{\text {SSL }}$ ) are desired.

## B. Open-Loop Power Stage Optimization

After defining the optimal relative sizing of individual RSC components, it is critical to select the total switch area $A_{\text {sw }}$ and switching frequency $f_{\text {sw }}$ that result in the maximum efficiency for a given load $I_{L}$ and input voltage $V_{\mathrm{in}}$. In a fully integrated SC, the charge-sharing SSL loss constitutes the major loss component. To decrease the SSL loss, either the available
capacitance or the switching frequency, and hence switching parasitics, should be increased. In integrated converters, capacitance is not typically considered as a variable in the optimization process, and the maximum available capacitance for a given silicon area is implemented. The maximum efficiency over the design space $\left(A_{\mathrm{sw}}, f_{\mathrm{sw}}\right)$ can be found by minimizing the total losses arising from the intrinsic SC $R_{\text {out }}$, and the switching losses that result from the power switches gate drive as well as the capacitor bottom-plate losses. The drain parasitics of the switches are treated as part of the capacitors bottom-plate parasitics.

Since a RSC consists of individual $2: 1 \mathrm{SC}$ cells that provide binary-weighted currents $I_{i}=I_{L} / 2^{N-i}$, it can be shown that the optimal switching frequency $f_{\mathrm{sw}}^{*}$ and total conductance $G_{\mathrm{tot}}^{*}$ are given by ${ }^{1}$

$$
\begin{align*}
f_{\mathrm{sw}}^{*} & =\frac{1}{4 \sqrt[3]{2}} \sqrt[3]{\frac{G_{\mathrm{on}}}{C_{\text {gate }} V_{\mathrm{gate}}^{2}}\left(\frac{I_{L}}{C_{\mathrm{tot}}}\right)^{2}} \cdot \sqrt[3]{\left(\frac{2^{N}-1}{2^{N-1}}\right)^{2}}  \tag{7}\\
\frac{G_{\mathrm{tot}}^{*}}{C_{\mathrm{tot}}} & =4 \sqrt[3]{4} \sqrt[3]{\frac{G_{\mathrm{on}}}{C_{\text {gate }} V_{\mathrm{gate}}^{2}}\left(\frac{I_{L}}{C_{\mathrm{tot}}}\right)^{2}} \cdot \sqrt[3]{\left(\frac{2^{N}-1}{2^{N-1}}\right)^{2}} \tag{8}
\end{align*}
$$

where $G_{\text {on }}$ and $C_{\text {gate }}$ are the switch conductance density in $S / m$ and the switch gate capacitance per unit width $F / m$, respectively. $V_{\text {gate }}$ is the gate drive voltage, and $G_{\text {tot }}^{*} / C_{\text {tot }}$ is the optimal total conductance per unit capacitance. Essentially, $G_{\mathrm{tot}}^{*} / C_{\mathrm{tot}}$ sets the intersection point of the SSL and FSL loss components, or the SC corner frequency. The first term in (7)-(8) depends on the technology conductance per gate drive energy loss, and the load current density per unit capacitance. The second term depends on the resolution $N$, where at 1 bit resolution the optimal values correspond to a $2: 1 \mathrm{SC}$ converter. On the other hand, with larger number of cascaded stages $N$, the optimal $f_{\text {sw }}$ and total conductance density reaches an upper limit of approximately $60 \%$ above the optimal values of a $2: 1$ SC converter utilizing the available $C_{\text {tot }}$. Essentially, the allocated capacitance of the last stage at large $N$ becomes $C_{\text {tot }} / 2$ while supplying $I_{L}$ load current, and thus the optimal design point shifts by $\sqrt[3]{4}$. From (8), the optimal total switch area does not change from one ratio to another within a given resolution $N$, simplifying the implementation of a reconfigurable SC. However, a small change in the optimal total conductance results when the bottom-plate parasitics are significant, and an average total switch width across the various ratios slightly affects the optimal efficiency. The optimal total loss per unit ampere becomes:

$$
\begin{equation*}
\frac{P_{\mathrm{loss}}^{*}}{I_{L}}=3 \sqrt[3]{2} \sqrt[3]{\frac{\frac{I_{L}}{C_{\mathrm{tot}}}}{\frac{G_{\mathrm{on}}}{C_{\text {gate }} V_{\mathrm{gate}}^{2}}}} \cdot \sqrt[3]{\left(\frac{2^{N}-1}{2^{N-1}}\right)^{4}} \tag{9}
\end{equation*}
$$

The minimum loss at the optimal design point depends on the ratio of the current density $I_{L} / C_{\text {tot }}$ to the switch conductance

[^1]

Fig. 6. Resolution reduction from 4 bit to 1 bit and 2 bit, using output selection multiplexer (left) and recursive inter-cell connection (right). The dashed cells are disabled when realizing lower resolutions.
per gate loss, and the required resolution $N$. However, the efficiency $\left(1+P_{\text {Loss }} / I_{L} V_{\text {out }}\right)^{-1}$ depends on the desired ratio and increases with larger output voltages $V_{\text {out }}$. For arbitrarily large resolutions $N$, the loss per ampere in (9) saturates at about $2.5 \times$ the loss of a $2: 1 \mathrm{SC}$ that utilizes the same available $C_{\text {tot }}$.

## III. Recursive Resolution-Reconfiguration Architecture

In order to achieve the highest possible efficiency for a given silicon area, the various ratios must be realized while ensuring $100 \%$ utilization of the available on-die capacitance. Additionally, the optimal relative sizing of the constituent capacitors and switches should be guaranteed. Unlike conventional topologies, the proposed RSC topology inherently enables recursive inter-cell connection and recursive binary slicing that can simultaneously achieve both conditions with low complexity.

## A. Recursive Inter-Cell Connection

The proposed recursive inter-cell connection brings individual cells in parallel instead of disabling them when realizing lower-resolution ratios. Fig. 6 summarizes the challenge of lowering the resolution in a 4 bit RSC. The converter consists of four 2:1 SC cells connected in succession $C 1, C 2, C 3, C 4$ to realize $m_{\text {odd }} / 2^{4}$ ratios. As shown, the cells are allocated optimal binary sizing of the total available capacitance, $C_{\text {tot }}$, and conductance, $G_{\text {tot }}$. One method to realize a $1 / 2$ ratio from the 4 bit RSC is to route the output from the first stage using an output selection multiplexer and disabling all other stages. While this will produce the correct output voltage, such an


Fig. 7. Resolution reduction from 4 bit to 3 bit and from 3 bit to 2 bit, using output selection multiplexer (left) and recursive slicing with recursive inter-cell connection (right).
approach wastes the available capacitance in the last three cells $C 2, C 3$, and $C 4$, resulting in a $14 / 15$ ( $93.33 \%$ ) reduction in the available capacitance for charge transfer, thereby incurring a $15 \times$ penalty in $R_{\mathrm{SSL}}$.

On the other hand, the Recursive implementation connects the four $2: 1 \mathrm{SC}$ cells in parallel when a $1 / 2$ ratio is desired, as shown in Fig. 6, which results in $100 \%$ capacitance usage and the minimum possible $\Delta V$ for a given output charge and silicon area. Similarly, to lower the resolution from 4 bit to 2 bit, the cascade of the last two cells $C 3$ and $C 4$ is brought in parallel to the cascade of the first two cells $C 1$ and $C 2$, as shown in Fig. 6, ensuring optimal relative sizing, i.e., $1 / 3: 2 / 3$, and $100 \%$ capacitance usage.

## B. Recursive Cell Slicing

Recursive-slicing breaks down the largest cell in a cascade into binary weighted sub-cells to enable even-to-odd, and odd-to-odd, resolution reconfiguration, all while satisfying optimal sizing. For example, instead of disabling the fourth cell $C 4$ to realize a 3 bit resolution in a 4 bit SC converter, which wastes more than half of the total capacitance, one or more of the four available cells is sliced to realize six cells in total, and then the resulted cells are arrange in two parallel cascades of three cells each. In general terms, it can be shown that recursively slicing the last cell in the cascade $\mathrm{C} N$ into $(N-1)$ binary weighted cells results in the optimal solution. Such slicing achieves the optimal relative sizing when lowering the resolution, with a minimum number of sliced sub-cells and thus complexity. The resulted binary sliced sub-cells are connected in cascade, while operating in parallel with the cascade of the original $(N-1)$ stages. For example, in the 4 bit converter shown in Fig. 7, the fourth cell $C 4$ is sliced into three sub-cells
of binary weights $(1 / 7,2 / 7,4 / 7)$, and arranged in parallel to the original cascade of the stages, $C 1, C 2, C 3$ to achieve $m_{\text {odd }} / 8$ ratios.

Similarly, when lowering the resolution further from three bits to two bits for $m_{\text {odd }} / 4$ ratios, the last cells $C 3$ and $C 4_{3}$, which in parallel represent the last stage in the 3 bit cascade, are each binary sliced into two sub-cells, $\left(C 3_{1}, C 3_{2}\right)$, and $\left(C 4_{31}, C 4_{32}\right)$, respectively. Fig. 7 shows the resulted eight cells sizing and connections of the topology implemented in this paper. The relative sizing should be as close as possible to the illustrated weighting to achieve the peak performance, however the optimal efficiency is not critically sensitive to mismatches between the various charge-transfer capacitors. It should be noted that four cells are only technically needed in order to realize all resolutions up to 4 bits; however, in order to guarantee $100 \%$ total capacitance utilization among all the possible resolutions while achieving optimal relative sizing, eight cells in total are instead employed.

## C. Inter-Cell Reconfiguration Switches

This section discusses the implementation details to generate the desired ratios with a minimum set of programming switches, and hence minimum added parasitics. The required inter-cell reconfiguration switches can be divided into two main categories: switches to implement ratio-programming within a specific recursion depth $N$, and switches for resolution reconfiguration.

1) Ratio-Reconfiguration Switches: Fig. 8 illustrates a simplified schematic of two $2: 1 \mathrm{SC}$ cells connected in parallel. By operating the four switches in each $2: 1$ cell from the non-overlapped clock phases, $\Phi_{1}$ and $\Phi_{2}$, the $1 / 2$ ratio is realized. In order to realize $1 / 4$ and $3 / 4$ conversion ratios in a 2 bit RSC, the two cells in Fig. 8 are either connected in cascade or in stack through


Fig. 8. Two 2:1 SC cells interconnection through ratio-reconfiguration switches. $V_{\text {int }}$ is the inter-cell intermediate node.
the added four reconfiguration switches $r_{1}, r_{2}, r_{3}$, and $r_{4}$. To realize a $1 / 4$ conversion ratio, the second cell is connected between the output port $M I D_{1}$ of the first cell and the converter ground, 0 . This is accomplished through the three reconfiguration switches $r_{2}, r_{3}$, and $r_{4}$. The first cell output side (i.e., $V_{\text {out }}$ ) switches $s 2_{1}$ and $s 3_{1}$ are disabled and replaced by the reconfiguration switches $r_{2}$ and $r_{3}$, and hence $r_{2}, r_{3}$ are operated through $\Phi_{2}$ and $\Phi_{1}$, respectively. As a result, the first cell output charge is routed to the intermediate node $V_{\text {int }}$ between the two cells instead of the converter output $V_{\text {out }}$. To cascade both cells, the second cell input port $I N_{\mathrm{top}_{2}}$ is reconfigured to the intermediate node $V_{\text {int }}$ between the two cells instead of the converter input voltage $V_{\mathrm{in}}$. The switch $s 4_{2}$ is disabled and the reconfiguration switch $r_{4}$ is operated in its place through the same clock phase $\Phi_{2}$. Similarly, to realize the $3 / 4$ conversion ratio, the first cell charge is routed to the intermediate node $V_{\text {int }}$ through the switches $r_{2}$ and $r_{3}$, and the reconfiguration switch $r_{1}$ is operated in place of $s 1_{2}$. With such inter-cell connection, no extra series reconfiguration switches are required.

The proposed inter-cell reconfiguration switches are scalable. By replicating the same four connections between each pair of consecutive cells in an $N$-stage cascade, reconfiguration among the various ratios with a resolution of $m_{\text {odd }} / 2^{N}$ can be realized. The conductance of the right half switches, $r_{1}$ and $r_{4}$, is double the conductance of the left half switches, $r_{2}$ and $r_{3}$, for optimal binary sizing.
2) Resolution Reconfiguration Switches: Reconfiguration of the recursion depth (i.e., resolution) can be implemented through the same four ratio-reconfiguration switches; no additional programming switches are required. During resolution reconfiguration, the function of the reconfiguration switch pair $r_{2}$ and $r_{3}$ in Fig. 8 is changed from routing the cell output charge to $V_{\text {int }}$, to instead extracting charge from the intermediate node. Fig. 9 illustrates the operation of the ratio-reconfiguration switches to reduce the resolution from 3 bit to 2 bit in a RSC. As shown in Fig. 9(a), the converter connects three $2: 1$ cells in cascade through the reconfiguration switch blocks $R_{1,2}$ and $R_{2,3}$. The 3 bit converter employs two sub-cells $C 3_{1}$ and $C 3_{2}$ to realize the third cell $C 3$ in the cascade, for maximum resource utilization. The reconfiguration switch pairs $\left(r 1_{3_{1}}, r 1_{3_{2}}\right)$ and $\left(r 4_{3_{1}}, r 4_{3_{2}}\right)$ are operated in parallel, to connect the two
sub-cells $C 3_{1}$ and $C 3_{2}$ as one cell in series or stack with the second cell $C 2$. As shown in Fig. 9(b), to connect the sub cells $C 3_{1}$ and $C 3_{2}$ in cascade, the inter-cell switches $r 1_{3_{1}}$ and $r 4_{3_{1}}$ are operated in place of the switches $s 2_{3_{1}}$ and $s 3_{3_{1}}$ in order to route the output of cell $C 3_{1}$ to the intermediate node $V_{\mathrm{int2}}$, while the reconfiguration switch $r 1_{3_{2}}$ or $r 4_{3_{2}}$ is operated in place of the switch $s 1_{3_{2}}$ or $s 4_{3_{2}}$, respectively, to realize $3 / 4$ or $1 / 4$ ratios. A similar procedure is followed for the reconfiguration block $R_{1,2}$ to connect the cells $C 1$ and $C 2$ in cascade. Finally, the second cell $C 2$ output-side switches $s 2_{2}$ and $s 3_{2}$ are operated in place of the reconfiguration switches $r 2_{2}$ and $r 3_{2}$, and a 2 bit resolution is realized as shown in Fig. 9(b).

## IV. Circuit Implementation

In order to validate the performance of the proposed RSC topology, a 4 bit RSC converter that realizes 15 ratios is implemented in $0.25 \mu \mathrm{~m}$ bulk CMOS process. Importantly, the RSC topology is inherently modular. Thus, design of the converter requires custom implementation of only two SC building blocks.

## A. 4-Bit Power Stage Block Diagram

Fig. 10 shows the recursive block diagram of the implemented 4 bit power stage, consisting of the two basic $2: 1$ building blocks: boundary and transfer cells. These two building blocks are connected together to implement four reconfigurable stages: $C 1, C 2, C 3$, and $C 4$. The capacitance and conductance of the last two stages, $C 3$ and $C 4$, are recursively binary-sliced to achieve $100 \%$ capacitance utilization and optimal relative sizing across the various ratios at any resolution. The fourth cell, $C 4$, consists of three binary-sized sub-cells $C 4_{1}, C 4_{2}$, and $C 4_{3}$, while the sub-cell $C 4_{3}$ is further sliced into two sub-cells, $C 4_{3_{1}}$ and $C 4_{3_{2}}$. Similarly, the third cell $C 3$ comprises two binary weighted sub-cells, $C 3_{1}$ and $C 3_{2}$. The eight total cells are interconnected at four intermediate nodes, $V_{\text {int1 }}, V_{\text {int2 }}$, $V_{\mathrm{int} 3_{1}}$, and $V_{\mathrm{int} 3_{2}}$, through four reconfiguration blocks, $R_{1,2}$, $R_{2,3}, R_{4_{1}, 4_{2}}$, and $R_{4_{2}, 4_{3}}$, along with a half reconfiguration block $R_{3,4}$.

As shown in Fig. 10, two reconfiguration-switch blocks $R_{1,2}$, $R_{2,3}$ are employed between the three stages $C 1, C 2$, and $C 3$ to realize recursive interconnection across the various resolutions until 3 bit operation. Similarly, another two reconfigu-ration-switch blocks, $R_{4_{1}, 4_{2}}$, and $R_{4_{2}, 4_{3}}$ are used to interconnect the sub-cells of the fourth stage, $C 4$, for 3 bit resolution or lower. Instead of using the typical 4 -switch reconfiguration block, a 2-switch reconfiguration block $R_{3,4}$ is used to cascade the third and fourth stages, $C 3$ and $C 4$. The 2 -switch reconfiguration block includes only the two switches that deliver charge to an intermediate node, and hence can be considered as a half reconfiguration block. Since the nodes $V_{\mathrm{int} 3_{1}}$, and $V_{\mathrm{int} 3_{2}}$ should be separate when cascading the sub-cells $C 4_{1}, C 4_{2}$, and $C 4_{3}$ to realize the 3 bit resolution, the reconfiguration block $R_{3,4}$ is further sliced into two sub-blocks, $R_{3,4_{\mathrm{a}}}$ and $R_{3,4_{\mathrm{b}}}$, to enable node isolation as illustrated in Fig. 10. The two sub-blocks $R_{3,4_{\mathrm{a}}}, R_{3,4_{\mathrm{b}}}$ have relative conductance of $3: 4$, respectively, to match the relative sizing between the sub-cells $\left(C 4_{1}, C 4_{2}\right)$, and $\left(C 4_{3}\right)$. Each switch in the implemented five reconfiguration


Fig. 9. Realization of 2 bit resolution from 3 bit resolution RSC using the same ratio-reconfiguration switches. (a) 3 bit RSC. (b) 2 bit RSC.


Fig. 10. Recursive implementation block diagram of the 4 bit RSC converter. The implemented RSC comprises four stages of eight cells $C i$ and five reconfiguration switch blocks $R_{i, i+1}$.
blocks is assigned the optimal binary sizing of the total available conductance $G_{\text {tot }}$, which matches the relative charge that it routes.

## B. Reconfiguration Costs

In the implemented 4 bit converter, boundary cells extract charge from the converter input voltage $V_{\mathrm{in}}$, (e.g., $C 1$ and $C 4_{1}$ ), or deliver charge to the converter output $V_{\text {out }}$ (e.g., $C 4_{31}$ and
$C 4_{32}$ ) across all the ratios. Therefore, these boundary cells only need an extra reconfiguration switch pair to deliver charge to a neighboring cell, or shuttle the charge from a neighboring cell to the converter output $V_{\text {out }}$. On the other hand, transfer cells perform charge displacement from one stage to the next, (e.g., $C 2, C 3_{1}, C 3_{2}$, and $C 4_{2}$ ). Thus, transfer cells employ four reconfiguration switches to extract the charge from one stage and deliver it to the next. Since, all the switches are binary weighted


Fig. 11. Boundary and transfer cells schematic.
to match the relative charge shuttled through a cell, the contribution of the extra four reconfiguration switches in a transfer cell to the flying capacitor bottom-plate parasitics matches the contribution of the original four switches of the $2: 1$ cell. In a boundary cell, such contribution is divided by two in relation to the original switches contribution.

In total, four cells contribute a normalized added drain parasitics of $1 / 2$, while the remaining cells add $100 \%$. The average normalized added drain parasitics from the used reconfiguration switches is less than unity, or approximately $77.6 \%$ of the original switches drain parasitics. It should be noted that, in general, the drain parasitics constitute a small percentage of the gate capacitance.

## C. Programmable-Port SC Boundary and Transfer Cells

In Fig. 10, each 2:1 SC cell is represented with a single capacitor and four switches. However, in the actual implementation, each cell includes two capacitors and eight switches to implement two out-of-phase $2: 1$ cells. A port state can be defined for a cell $\left(I N_{\text {top }}, I N_{\text {bottom }}, M I D\right)$. A boundary cell operates in one of the four port-states: $\left(V_{\mathrm{in}}, 0, V_{\mathrm{out}}\right),\left(V_{\mathrm{in}}, 0, V_{\mathrm{INT}}\right)$, $\left(V_{\text {INT }}, 0, V_{\text {out }}\right)$, and ( $\left.V_{\text {in }}, V_{\text {INT }}, V_{\text {out }}\right)$, where $I N T$ represents an inter-cell node. The first state is the typical case where the cell divides the converter input $V_{\text {in }}$ by two. In the second state, the cell extracts charge from $V_{\text {in }}$ to a neighboring cell. On the other hand, for a boundary cell to deliver charge to the output $V_{\text {out }}$ from a neighbor, the cell input or ground ports are routed from the intermediate node, $I N T$, instead of $V_{\text {in }}$ or 0 , which results in the last two states $\left(V_{\mathrm{INT}}, 0, V_{\text {out }}\right)$, and ( $V_{\text {in }}, V_{\text {INT }}, V_{\text {out }}$ ).

Fig. 11 illustrates the implemented standard boundary cell. Two $180^{\circ}$ phase-shifted $2: 1 \mathrm{SC}$ cells are used to guarantee continuous input current through the cell input port, eliminating the need for a bypass capacitance. Since the intermediate node DC level is reconfigured at binary ratios of the input voltage, a transmission gate is used to implement the switches, with the exception of the $V_{\text {in }}$ and ground, 0 , switches. The switches $M_{n 1,2}, M_{o 2,4}, M_{o 1,3}$, and $M_{p 1,2}$ are the original switches of the $2: 1 \mathrm{SC}$ converter which implement the typical port-state ( $V_{\text {in }}, 0, V_{\text {out }}$ ). A pair of reconfiguration switches can be operated as output-side switches or input-side switches by controlling their driving phases. For instance, by operating the switches $M_{i 1}, M_{i 2}$, in Fig. 11, from the non-overlapped clock phases $\Phi_{1}, \Phi_{2}$, respectively, the switches $M_{i 1}, M_{i 2}$ act as output port switches. On the other hand, by driving $M_{i 1}$ from $\Phi_{2}$, and disabling $M_{i 2}$ and $M_{p 1}$, the switch $M_{i 1}$ is operated as an input-side switch and hence the cell input port becomes connected to $V_{\text {int }}$. A similar explanation can be followed to connect the cell ground port $I N_{\text {bottom }}$ to $V_{\text {int }}$ using $M_{i 2}$. Fig. 12 illustrates the four states of a boundary cell and the implemented cell decoder functional table.

The transfer cell is designed using the boundary cell as a starting point. At lower resolution ratios, a transfer cell acts as a boundary cell and hence incorporates the same port-states of the boundary cell. On the other hand, a transfer cell requires two additional states to shuttle charge from one stage to the next. In such cases, the transfer cell input or ground port is connected to the previous cell output port, which is connected to an intermediate node denoted as $V_{\text {int }}$ in Fig. 11, while the transfer cell output port is connected to the next stage

|  | 2:1 Cell State |  |  | $\mathrm{C}_{2} \mathrm{C}_{1} \mathrm{C}_{0}$ | $S_{\text {o }}$ | $S_{M n}$ | $S_{M p}$ | $M_{i 1}$ |  | $M_{\text {i } 2}$ |  | $M_{\text {i21 }}$ |  | $M_{\text {i22 }}$ |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | $\mathrm{IN}_{\text {top }}$ | IN ${ }_{\text {bottom }}$ | MID |  |  |  |  | So | S1 | So | S1 | So | S1 | So | S1 |
| Boundary \&Transfer Cell | $V_{\text {in }}$ | 0 | $V_{\text {out }}$ | 011 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|  | $V_{\text {in }}$ | 0 | $V_{\text {int }}$ | 000 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
|  | $V_{\text {int }}$ | 0 | $V_{\text {out }}$ | 010 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
|  | $V_{\text {in }}$ | $V_{\text {int }}$ | $V_{\text {out }}$ | 001 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| Transfer Cell | $V_{\text {int }}$ | 0 | $V_{\text {int2 }}$ | 110 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 |
|  | $V_{\text {in }}$ | $V_{\text {int }}$ | $V_{\text {int2 }}$ | 101 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |

Fig. 12. Boundary and transfer cells decoder truth table.

(a)

(b)

Fig. 13. Recursive switched-capacitor voltage regulator implementation, comprising eight cells of binary weights and two control loops. (a) Block diagram of the overall RSC chip with binary search controller. (b) 4 bit Recursive SC test chip photo.
input/ground port intermediate node $V_{\mathrm{int} 2}$. Thus, two additional port-states, $\left(I N_{\text {top }}, I N_{\text {bottom }}, M I D\right)$, are required for a transfer cell, $\left(V_{\mathrm{int}}, 0, V_{\mathrm{int} 2}\right)$ and ( $\left.V_{\mathrm{in}}, V_{\mathrm{int}}, V_{\mathrm{int} 2}\right)$, respectively. Fig. 12 illustrates the additional two states and the selection signals generated from the transfer cell decoder.

## D. Output Voltage Regulation

Fig. 13(a) shows the overall block diagram of the implemented 4 bit RSC converter chip. Two control loops are implemented in the proposed converter: an inner fine-grain loop and an outer coarse-grain loop. The inner loop, working within a single conversion ratio, should modulate either the switching frequency, $f_{\mathrm{sw}}$, or the switched capacitance (i.e., digital capacitance modulation, or DCM [4]) for fine-grain linear output voltage regulation and adaptation under load variations. Frequency modulation is chosen in this work to simplify the implementation complexity, as individual control of split sub-cells is not required in this case. The outer loop, implemented in an all-digital fashion, reconfigures the unloaded conversion ratio to minimize the range over which linear regulation is performed, thereby minimizing efficiency degradation.

1) Inner Fine-Grain Controller: The T flip-flop employed in Fig. 13(a) guarantees a $50 \%$ duty-cycle input clock to the nonoverlap phase generator. A Strong-Arm comparator running at $f_{\text {comp }}$ is used to provide the clock input to the T flip-flop, as shown in Fig. 13(a). The comparator sampling clock is produced by an on-chip current-starved oscillator that is set to twice the maximum switching frequency of the power stage; since the power stage switching frequency across all the 15 ratios does not exceed 8 MHz , the current starved oscillator is set to 16 MHz through an external bias, $V_{B}$.
2) Outer Coarse-Grain Controller: Coarse-grain control in reconfigurable SC converters typically switches between discrete ratios by using a resistor string to generate ratio threshold levels [21], [31]. However, a large number of ratios requires a prohibitively large resistor string that must also take into account $R_{\text {out }}$ variation across the different ratios in order to avoid deadlock. In this work, the power stage itself is used to produce the threshold levels. By operating the SC at the maximum $f_{\mathrm{sw}}$ and scanning through the available ratios using binary search, the optimal ratio (i.e., the ratio that provides the required output level $V_{\text {ref }}$ with minimum resistive voltage drop) can be located. The block diagram of the implemented binary search controller


Fig. 14. Recursive binary search controller block diagram.


Fig. 15. Measured and model-predicted efficiency, at 2 mA fixed load current, of the fabricated 4 bit RSC versus the output voltage at an input voltage of 2.5 V . The measured three-ratio efficiency is at 1.86 mA current and the same input voltage.
is shown in Fig. 14. A 4 bit shift register, that is supplied to the ratio-decoder, is used to hold the current ratio state of the SC power stage as shown in Fig. 13(a). Once $S T R O B E$ is asserted, $R S T$ is triggered and the power stage is reconfigured into the $1 / 2$ ratio. Then, $E N$ is asserted, initiating the binary search procedure. As a result, the $C L K$ signal is routed directly from the on-chip oscillator, switching the power stage at 8 MHz to provide the minimal output resistance, $R_{\text {out }}$.

The proposed ratio-state code, shown in Fig. 14, registers consecutive comparison decisions and enables a recursive implementation of the binary controller. Once the counter overflows ( $O V R$ is asserted), the 4 bit shift-register stores the present fine-grain controller comparison decision with $V_{\text {ref }}$. If the comparator output, $C O M P$, is zero, the present power stage output is lower than the desired level, $V_{\text {ref }}$, and the SC is reconfigured into a larger binary-ratio at the next resolution configuration, $\left(1+R_{i-1}\right) / 2$, once the comparison decision 0 is registered at the $O V R$ edge. On the other hand, when $C O M P$
is 1 , the 4 bit register shifts in 1 and the power stage is reconfigured to the lower next-resolution binary-ratio $\left(R_{i-1}\right) / 2$, where $i-1$ is the previous search iteration.

## V. Experimental Verification

The proposed 4 bit Recursive SC converter was fabricated in a $0.25 \mu \mathrm{~m}$ bulk CMOS process using $0.9 \mathrm{fF} / \mu \mathrm{m}^{2}$ MIM capacitors and thin-oxide 2.5 V MOS transistors; a die photo is shown in Fig. 13(b). The RSC occupies $4.645 \mathrm{~mm}^{2}$ for a total capacitance of 3 nF . A three-ratio ( $1 / 3,1 / 2,2 / 3$ ) series-parallel (SP) SC converter was fabricated using the same technology to enable normalized performance comparison with the prototyped RSC. The implemented three-ratio SP is optimized for the same current density $0.5 \mathrm{~mA} / \mathrm{mm}^{2}$ as the prototyped 4 bit RSC.

Fig. 15 shows the measured efficiency of the developed RSC and three-ratio SP converters, along with the results of a numerical model developed for the RSC, three-ratio SP, and 7 bit


Fig. 16. Measured and model-predicted efficiency with external resistive load, modeling a digital load under DVS operation, of the three-ratio SP and the 4 bit RSC across the output voltage, at an input voltage of 2.5 V .


Fig. 17. Measured RSC and three-ratio SC switching frequency $f_{\text {sw }}$ across $V_{\text {out }}$, using the same external resistive load in Fig. 16.


Fig. 18. Measured RSC efficiency versus the load current at $1 / 2$ ratio, while supplying 1.15 V output voltage $V_{\text {out }}$.

SAR topologies, with models based on the work in [32]. In addition, an ideal LDO is included for comparison. All converters are shown for a 2.5 V input voltage $V_{\mathrm{in}}$ and a 2 mA constant load current, except for the SP which has a 1.86 mA load current to ensure equal current density. The efficiency of the RSC
is measured for the following 14 ratios $(1 / 8,3 / 16,1 / 4,5 / 16$, $3 / 8,7 / 16,1 / 2,9 / 16,5 / 8,11 / 16,3 / 4,13 / 16,7 / 8,15 / 16$ ) over an output voltage ranging from 0.1 V to 2.2 V . Interestingly, the efficiency of the RSC at the $9 / 16$ ratio falls below the RSC $1 / 2$ ratio efficiency, since the $9 / 16$ ratio $R_{\mathrm{SSL}}$ is $3.5 \times$ larger than the $1 / 2$ ratio. The RSC and SP SC converters both achieve a peak efficiency of $85 \%$, and the numerical models are each within $1 \%$ of measurement results across the output voltage range. The large number of ratios afforded by the RSC topology enables a $38 \%$ expanded output voltage range ( $0.1-2.2 \mathrm{~V}$ in contrast to $0.2-1.6 \mathrm{~V}$ for the SP), while achieving $6.4 \%$ and $3.5 \%$ higher efficiency at 0.79 V and 1.2 V output voltages, respectively, compared to the SP converter. The measured RSC also achieves $17.7 \%$ higher efficiency than an ideal LDO at 1.6 V . On the other hand, the SP peak efficiencies at the $1 / 3$ and $2 / 3$ ratios (at 0.68 V and 1.5 V output voltages) exceed the RSC by $5.6 \%$ and $5.3 \%$, respectively. The implemented RSC essentially takes the average of the three-ratio efficiency over the $0.52-1.6 \mathrm{~V}$ output range, filling the gaps between the three-ratios $(1 / 3,1 / 2,2 / 3)$ and maintaining a flatter efficiency profile. The 4 bit RSC achieves


Fig. 19. Measured and predicted weighted-average efficiency versus the load current density, from 0.215 to $215 \mathrm{~mA} / \mathrm{mm}^{2}$, for the fabricated RSC and SP in $0.25 \mu \mathrm{~m}$ bulk CMOS. A load of equal probability power-state is assumed. When indicated, an ideal LDO is assumed to fill the efficiency gaps over the $0-2.5$ V output range. (a) Average efficiency of RSC and SP versus current density. (b) Model-predicted RSC and SP efficiency across $V_{\text {out }}$, versus different current densities.
greater than $70 \%$ efficiency over the $0.9-2.2 \mathrm{~V}$ output range with an efficiency improvement of $28 \%$ over the 7 bit SAR.

Fig. 16 shows the measured and numerically modeled efficiency given a $940 \Omega$ resistive load for the RSC and $1 \mathrm{~K} \Omega$ load for the SP in order to mimic the operation of a CMOS digital load under DVS conditions. At 0.8 V and 1.2 V output voltages, the three-ratio SC achieves $59 \%$ and $68.7 \%$ efficiencies while the 15 -ratio RSC achieves $8 \%$ and $7.6 \%$ higher efficiencies at the same voltages, respectively. The RSC delivers a dynamic voltage operating range from 0.04 V to 2.16 V , which is $40.4 \%$ larger than the three-ratio SC output range from 0.09 V to 1.6 V , thereby enabling wider-range DVS operation. The measured operating frequency of the RSC and SP with the external resistive load is shown in Fig. 17. The RSC is switched over a $45 \times$ dynamic range, from 200 KHz to 9 MHz , to realize the $0.04-2.16 \mathrm{~V}$ output voltage range. In contrast, the SP requires a $100 \times$ frequency dynamic range, from 100 KHz to 10 MHz , to produce $V_{\text {out }}$ from 0.09 V to 1.6 V .

Fig. 18 shows the measured efficiency of the $1 / 2$ RSC conversion ratio versus the load current at an output voltage of 1.15 V . In this case, greater than $80 \%$ efficiency is achieved for load currents ranging from $30 \mu \mathrm{~A}$ to 1 mA . These results illustrate the primary advantage of a frequency modulation control, where the switching frequency, as well as switching parasitics loss, scales with the load current.

The peak efficiency of the RSC and the three-ratio SP for various power/current densities are essentially identical, since both deliver the same $1 / 2$ ratio. In DVS applications, system battery life is a key parameter, and for a digital load of uniformprobability power states, the system energy efficiency is essentially the weighted-average efficiency of the converter over the output voltage range. The weighted-average efficiency is given by $\int P\left(V_{\text {out }}\right) \cdot V_{\text {out }} \eta\left(V_{\text {out }}\right) \mathrm{d} V_{\text {out }}$, where $P\left(V_{\text {out }}\right)$ is the probability of a given power state and the integration is over the achievable converter range. Fig. 19 shows the measured and numerically modeled weighted average efficiencies across the
output voltage range, plotted versus current density. As shown in Fig. 19(a), the measured weighted-average efficiency of the RSC exceeds the SP weighted average by $6.9 \%$ at the same current density of $0.23 \mathrm{~mA} / \mathrm{mm}^{2}$. The modeled efficiency of the RSC maintains higher weighted-average efficiency across different current densities, and approaches a $2.5 \%$ higher average than the SP at $16 \mathrm{~mA} / \mathrm{mm}^{2}$. Note that the modeled and measured results diverge after the nominal current density of $0.5 \mathrm{~mA} / \mathrm{mm}^{2}$, as the model assumes optimal total switch width given the increased current density, while the fabricated chips have fixed total conductance.

Since the SP converter can only deliver voltages up to 1.6 V , another weighted-average efficiency metric is calculated assuming that an ideal LDO is used to fill any efficiency gap. With an LDO, the the RSC still exceeds the SP measured weighted average by $3.3 \%$ at $0.23 \mathrm{~mA} / \mathrm{mm}^{2}$. At $16 \mathrm{~mA} / \mathrm{mm}^{2}$ and above, the LDO performance dominates the RSC and the SP efficiency and both converge to the same value. As shown in Fig. 19(b), the RSC maintains superior performance than the SP converter at higher power densities until the LDO performance dominates.

All presented numerically modeled results employ MIM capacitors with a $1.4 \%$ bottom-plate parasitic capacitance ratio. If MOS capacitors were employed in place of MIM capacitors, the $10 \%$ bottom-plate parasitics in this technology would degrade the efficiency by $12.5 \%$ across the output voltage range for a 3 nF of total flying capacitance. On the other hand, if a higher density MIM capacitance were available, for example with a MIM density of $4 \mathrm{fF} / \mu \mathrm{m}^{2}$ and bottom-plate ratio of $4 \times$ lower, the efficiency of both the RSC and SP converters would increase at each discrete ratio. However, due to severe linear regulation away from the nominal three ratios in the SP topology, the efficiency between these ratios only marginally improves. On the other hand, the RSC converter has explicit ratios between these gaps, and thus the efficiency of the RSC topology at these voltages is increased. For example, with


Fig. 20. Coarse-controller measured transient response. (a) Stair control voltage response. (b) Transient response after strobe activation while $V_{\text {ref }}=2 \mathrm{~V}$.

TABLE I
Comparison With Previously Published Fully Integrated SC Converters

| Work | $[\mathbf{2 1}]$ | $[17]$ | $[31]$ | 3-Ratio SP | 4-bit RSC |
| :--- | :---: | :---: | :---: | :---: | :---: |
| Technology | 130 nm | 65 nm | 180 nm | $0.25 \mu \mathrm{~m}$ | $0.25 \mu \mathrm{~m}$ |
| Capacitor Type | Ferroelectric | Bulk PMOS | On-chip | MIM | MIM |
| Chip Area $\left[\mathrm{mm}^{2}\right]$ | 0.366 | 0.64 | 1.69 | 4.33 | 4.645 |
| Total Capacitance $[n F]$ | 8 | 3.88 | 2.24 | 2.8 | 3 |
| Topology | $1,2 / 3,1 / 2,1 / 3$ step down | $1 / 3,2 / 5 \mathrm{SP}$ | 7 -bit SAR | $2 / 3,1 / 2,1 / 3 \mathrm{SP}$ | 4 -bit RSC |
| $V_{\text {in }}[V]$ | 1.5 | $3-4$ | $3.4-4.3$ | 2.5 | 2.5 |
| $V_{\text {out }}[V]$ | $0.4-1.1$ | 1 | $0.9-1.5$ | $0.2-1.6$ | $0.1-2.2$ |
| Quoted Efficiency $(\eta)$ | $93 \%$ | $74 \%$ | $72 \%$ | $85 \%$ | $85 \%$ |
| Load Current @ $(\eta)$ | $1 m A$ | $32 m A$ | $10 \mu \mathrm{~A}$ | $1.86 m A$ | $2 m A$ |

$4 \mathrm{fF} / \mu \mathrm{m}^{2}$ MIM capacitors, the weighted-average efficiency of the RSC exceeds the three-ratio SP by $9 \%$ at $0.23 \mathrm{~mA} / \mathrm{mm}^{2}$, or by $6.8 \%$ when including an ideal LDO. In this example, the RSC and SP weighted averages converge at $60 \mathrm{~mA} / \mathrm{mm}^{2}$, which is $3.8 \times$ larger than the $0.9 \mathrm{fF} / \mu \mathrm{m}^{2}$ MIM capacitor case. Migrating to a more modern technology node with higher density MIM [13], [14], MOS [4], [17], [22], ferroelectric [21], or deep-trench capacitors [16], [18], [19] and lower parasitic switches will thus enable improved performance of the RSC over the SP topology at larger current densities.

Fig. 20(a) shows the control response to a variable stair-case voltage reference, $V_{\text {ref }}$. The control voltage $V_{\text {ref }}$ is changed every $500 \mu$ s with variable step sizes of 650 mV maximum value. Fig. 20(b) details the transient coarse controller response when the strobe signal is activated while the SC is initially producing a 2 V output voltage. Here, the SC power stage phase clock, clk, is switched at the maximum frequency while the coarse controller cycles through the various binary ratios until the output reaches the desired level after $8 \mu \mathrm{~s}$. In the third cycle of this example, the coarse controller reaches the $13 / 16$ ratio, which cannot produce the desired level $V_{\text {ref }}=2 \mathrm{~V}$, given the converter $R_{\text {out }}$. Thus, a fourth correction cycle automatically results and the Back-Off logic returns the power stage to the correct $7 / 8$ ratio. Finally, the coarse controller hands off the regulation operation to the fine-level frequency controller where $c l k$ goes back to a normal frequency. Table I provides a comparison of the implemented prototypes with recent work.

## VI. CONCLUSION

A Recursive SC converter topology is presented that achieves a flattened efficiency profile over a wide voltage range by em-
ploying $2^{N}-1$ ratios in an intelligent and modular manner. Compared to a co-fabricated three-ratio series-parallel converter, the proposed 4 bit RSC achieves a wider operating range and achieves a higher weighted-average efficiency. To achieve high efficiency with a large number of ratios, the RSC topology maximizes the number of connections to the converter input supply and ground in order to minimize both the charge shuttled through the converter flying capacitors and the cascaded losses. Unlike conventional SC topologies, the RSC SSL loss converges to an upper limit $1 /\left(f_{\mathrm{sw}} C_{\mathrm{tot}}\right)$ and becomes fixed for arbitrarily high resolutions $N$. The RSC loss for large resolutions $N$ thus saturates at approximately $2.5 \times$ the loss of a $2: 1 \mathrm{SC}$ that utilizes the same available $C_{\mathrm{tot}}$. By employing both recursive inter-cell connection and recursive slicing, all possible resolutions, $N$, and hence their ratios, can be realized without disconnecting a single capacitor and while satisfying optimal relative sizing of the constituent capacitors and switches, thereby ensuring high efficiency even at larger values of $N$. The inherent regularity and modularity of the RSC topology simplifies the implementation of arbitrarily large resolutions with $2^{N-1}$ possible ratios, resulting in opportunities to achieve greater than 15 ratios in future work.

## REFERENCES

[1] G. Rincon-Mora and P. Allen, "A low-voltage, low quiescent current, low drop-out regulator," IEEE J. Solid-State Circuits, vol. 33, no. 1, pp. 36-44, Jan. 1998.
[2] K. N. Leung and P. Mok, "A capacitor-free cmos low-dropout regulator with damping-factor-control frequency compensation," IEEE J. SolidState Circuits, vol. 38, no. 10, pp. 1691-1702, Oct. 2003.
[3] Y. K. Ramadass and A. P. Chandrakasan, "Minimum energy tracking loop with embedded DC-DC converter enabling ultra-low-voltage operation down to 250 mV in 65 nm CMOS," IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 256-265, Jan. 2008.
[4] Y. K. Ramadass, A. A. Fayed, and A. P. Chandrakasan, "A fully-integrated switched-capacitor step-down DC-DC converter with digital capacitance modulation in 45 nm CMOS," IEEE J. Solid-State Circuits, vol. 45, no. 12, pp. 2557-2565, Dec. 2010.
[5] B. Calhoun and A. Chandrakasan, "Ultra-dynamic voltage scaling (UDVS) using sub-threshold operation and local voltage dithering," IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 238-245, Jan. 2006.
[6] S. Bandyopadhyay, Y. K. Ramadass, and A. P. Chandrakasan, " $20 \mu \mathrm{~A}$ to 100 mA DC-DC converter with 2.8-4.2 V battery supply for portable applications in 45 nm CMOS," IEEE J. Solid-State Circuits, vol. 46, no. 12, pp. 2807-2820, Dec. 2011.
[7] P. Hazucha et al., "A 233-MHz $80 \%-87 \%$ efficient four-phase DC-DC converter utilizing air-core inductors on package," IEEE J. Solid-State Circuits, vol. 40, no. 4, pp. 838-845, Apr. 2005.
[8] G. Schrom et al., "A 100 MHz eight-phase buck converter delivering 12 A in $25 \mathrm{~mm}^{2}$ using air-core inductors," in Proc. 22nd Annu. IEEE Applied Power Electronics Conf. and Exposition, APEC'07, Feb. 2007, pp. 727-730.
[9] P. Li, L. Xue, P. Hazucha, T. Karnik, and R. Bashirullah, "A delay-locked loop synchronization scheme for high-frequency multiphase hysteretic DC-DC converters," IEEE J. Solid-State Circuits, vol. 44, no. 11, pp. 3131-3145, Nov. 2009.
[10] N. Sturcken et al., "A switched-inductor integrated voltage regulator with nonlinear feedback and network-on-chip load in 45 nm SOI," IEEE J. Solid-State Circuits, vol. 47, no. 8, pp. 1935-1945, Aug. 2012.
[11] C. Huang and P. K. T. Mok, "A $100 \mathrm{MHz} 82.4 \%$ efficiency packagebondwire based four-phase fully-integrated buck converter with flying capacitor for area reduction," IEEE J. Solid-State Circuits, vol. 48, no. 12, pp. 2977-2988, Dec. 2013.
[12] H.-P Le, S. R. Sanders, and E. Alon, "Design techniques for fully integrated switched-capacitor DC-DC converters," IEEE J. Solid-State Circuits, vol. 46, no. 9, pp. 2120-2131, Sep. 2011.
[13] T. M. V. Breussegem and M. S. J. Steyaert, "Monolithic capacitive DC-DC converter with single boundary-multiphase control and voltage domain stacking in 90 nm CMOS," IEEE J. Solid-State Circuits, vol. 46, no. 7, pp. 1715-1727, Jul. 2011.
[14] R. Jain et al., "A 0.45-1 V fully-integrated distributed switched capacitor DC-DC converter with high density MIM capacitor in 22 nm trigate CMOS," IEEE J. Solid-State Circuits, vol. 49, no. 4, pp. 917-927, Apr. 2014.
[15] D. Somasekhar et al., "Multi-phase 1 GHz voltage doubler charge pump in 32 nm logic process," IEEE J. Solid-State Circuits, vol. 45, no. 4, pp. 751-758, Apr. 2010.
[16] L. Chang, R. K. Montoye, B. L. Ji, A. J. Weger, K. G. Stawiasz, and R. H. Dennard, "A fully-integrated switched-capacitor $2: 1$ voltage converter with regulation capability and $90 \%$ efficiency at $2.3 \mathrm{~A} / \mathrm{mm}^{2}$," in 2010 IEEE Symp. VLSI Circuits Dig., Jun. 2010, pp. 55-56.
[17] H.-P Le, J. Crossley, S. R. Sanders, and E. Alon, "A sub-ns response fully integrated battery-connected switched-capacitor voltage regulator delivering $0.19 \mathrm{~W} / \mathrm{mm}^{2}$ at $73 \%$ efficiency," in 2013 IEEE Int. SolidState Circuits Conf. Dig. Tech. Papers, Feb. 2013, pp. 372-373.
[18] T. M. Andersen et al., "A $4.6 \mathrm{~W} / \mathrm{mm}^{2}$ power density $86 \%$ efficiency on-chip switched capacitor DC-DC converter in 32 nm SOI CMOS," in Proc. 28th Annu. IEEE Applied Power Electronics Conf. and Exposition, APEC 2013, Mar. 2013, pp. 692-699.
[19] T. M. Andersen et al., "A sub-ns response on-chip switched-capacitor DC-DC voltage regulator delivering $3.7 \mathrm{~W} / \mathrm{mm}^{2}$ at $90 \%$ efficiency using deep-trench capacitors in 32 nm SOI CMOS," in 2014 IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2014, pp. 90-91.
[20] M. D. Seeman and S. R. Sanders, "Analysis and optimization of switched-capacitor DC-DC converters," IEEE Trans. Power Electronics, vol. 23, no. 2, pp. 841-851, Mar. 2008.
[21] D. El-Damak, S. Bandyopadhyay, and A. P. Chandrakasan, "A 93\% efficiency reconfigurable switched-capacitor DC-DC converter using on-chip ferroelectric capacitors," in 2013 IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2013, pp. 374-375.
[22] Y. K. Ramadass and A. P. Chandrakasan, "Voltage scalable switched capacitor DC-DC converter for ultra-low-power on-chip applications," in Proc. 2007 IEEE Power Electronics Specialists Conf., 2007, pp. 2353-2359.
[23] T. V. Breussegem and M. Steyaert, "A 82\% efficiency $0.5 \%$ ripple 16-phase fully integrated capacitive voltage doubler," in 2009 Symp. VLSI Circuits Dig., pp. 198-199.
[24] L. G. Salem and P. P. Mercier, "An 85\%-efficiency fully integrated 15-ratio recursive switched-capacitor DC-DC converter with 0.1-to-2.2 V output voltage range," in 2014 IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2014, pp. 88-89.
[25] M. Evzelman and S. Ben-Yaakov, "Average-current-based conduction losses model of switched capacitor converters," IEEE Trans. Power Electronics, vol. 28, no. 7, pp. 3341-3352, Jul. 2013.
[26] M. D. Seeman, V. W. Ng, H.-P Le, M. John, E. Alon, and S. R. Sanders, "A comparative analysis of switched-capacitor and inductor-based DC-DC conversion technologies," in Proc. IEEE 12th Workshop on Control and Modeling for Power Electronics (COMPEL), Jun. 2010, pp. 1-7.
[27] S. R. Sanders, E. Alon, H.-P Le, M. D. Seeman, M. John, and V. W. Ng , "The road to fully integrated DC-DC conversion via the switchedcapacitor approach," IEEE Trans. Power Electronics, vol. 28, no. 9, pp. 4146-4155, Sep. 2013.
[28] J. Brugler, "Theoretical performance of voltage multiplier circuits," IEEE J. Solid-State Circuits, vol. 6, no. 3, pp. 132-135, Jun. 1971.
[29] J. Dickson, "On-chip high-voltage generation in MNOS integrated circuits using an improved voltage multiplier technique," IEEE J. SolidState Circuits, vol. 11, no. 3, pp. 374-378, Jun. 1976.
[30] M. Makowski and D. Maksimovic, "Performance limits of switchedcapacitor DC-DC converters," in Proc. IEEE Power Electronics Specialist Conf., PESC'95, 1995, vol. 2, pp. 1215-1221.
[31] S. Bang, A. Wang, B. Giridhar, D. Blaauw, and D. Sylvester, "A fully integrated successive-approximation switched-capacitor DC-DC converter with 31 mV output voltage resolution," in 2013 IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2013, pp. 370-371.
[32] M. D. Seeman, "A design methodology for switched-capacitor DC-DC converters," Ph.D. dissertation, EECS Department, Univ. California, Berkeley, CA, USA, 2009.


Loai G. Salem (S'11) received the B.Sc. degree in electronics and communications engineering from Cairo University, Cairo, Egypt, in 2008, and the M.Sc. degree in microelectronics system design from Nile University, Egypt, in 2011. He is currently pursuing the Ph.D. degree in electrical and computer engineering at the University of California at San Diego, La Jolla, CA, USA.

His research interests include fully integrated power management, high-frequency DC-DC conversion, and energy-efficient mixed-signal integrated
circuits.


Patrick P. Mercier (S'04-M'12) received the B.Sc. degree in electrical and computer engineering from the University of Alberta, Edmonton, AB, Canada, in 2006, and the S.M. and Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology (MIT), Cambridge, MA, USA, in 2008 and 2012, respectively.

He is currently an Assistant Professor at the University of California at San Diego (UCSD) in the Department of Electrical and Computer Engineering. His research interests include the design of energy-efficient microsystems, focusing on the design of RF circuits, power converters, and sensor interfaces for miniaturized systems and biomedical applications.
Prof. Mercier was a co-recipient of the 2009 ISSCC Jack Kilby Award for Outstanding Student Paper at ISSCC 2010. He also received a Natural Sciences and Engineering Council of Canada (NSERC) Julie Payette fellowship in 2006, NSERC Postgraduate Scholarships in 2007 and 2009, an Intel Ph.D. Fellowship in 2009, a Graduate Teaching Award in Electrical and Computer Engineering at UCSD in 2013, and the Hellman Fellowship Award in 2014. He currently serves as an Associate Editor of the IEEE Transactions on Biomedical Circuits and Systems.


[^0]:    Manuscript received April 21, 2014; revised July 03, 2014; accepted August 06,2014 . This paper was approved by Guest Editor Makoto Nagata.
    The authors are with the Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA 92093 USA (e-mail: lgsalem@ucsd.edu; pmercier@ucsd.edu).

    Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

    Digital Object Identifier 10.1109/JSSC.2014.2353791

[^1]:    ${ }^{1}$ A simple addition of the two loss limits, $R_{\text {SSL }}$ and $R_{\text {FSL }}$, is used to express the intrinsic RSC loss which overestimates the total $R_{\text {out }}$. A negligible bottom plate parasitics are assumed, besides, the equivalent load resistance $R_{L}$ is assumed to be larger than $R_{\text {out }}$, to obtain simple intuitive expressions. The formula for a 2:1 SC optimal switch width and frequency in [12] are used in the derivation.

