# A Fault Tolerant, Area-Efficient Architecture for Shor's Factoring Algorithm

#### Mark Whitney, Nemanja Isailovic, Yatish Patel, John Kubiatowicz

Univ. of California, Berkeley

June 23, 2009

# Shor's Factoring Algorithm



- Killer app for quantum computing
  - Runtime polynomial in number of bits
- Key component to exponentiation: quantum adder
  - ▶ 1024-bit number: billions of operations, 100,000s of qubits
- High failure rates require strong fault tolerance

# Shor's Factoring Algorithm



- Killer app for quantum computing
  - Runtime polynomial in number of bits
- Key component to exponentiation: quantum adder
  - ▶ 1024-bit number: billions of operations, 100,000s of qubits
- High failure rates require strong fault tolerance
  - Previous area estimates for design were 0.9m<sup>2</sup>
  - 95% of operations are for fault tolerance







Elements of fault tolerance:





- Elements of fault tolerance:
  - Encode qubits in quantum error correcting code (QEC)

◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 のへぐ



- Elements of fault tolerance:
  - Encode qubits in quantum error correcting code (QEC)

▲ロト ▲帰ト ▲ヨト ▲ヨト 三日 - の々ぐ

Add error correction modules



Elements of fault tolerance:

Encode qubits in quantum error correcting code (QEC)

▲□▶ ▲□▶ ▲□▶ ▲□▶ ▲□ ● ● ●

- Add error correction modules
- 3 pieces to error correction operation:
  - Ancilla (helper "parity" bit) production
  - Error syndrome
  - Qubit correction



Elements of fault tolerance:

Encode qubits in quantum error correcting code (QEC)

- Add error correction modules
- 3 pieces to error correction operation:
  - Ancilla (helper "parity" bit) production
  - Error syndrome
  - Qubit correction
- $\blacktriangleright$   $\geq$  90% of QEC gates are ancilla production
- $\blacktriangleright$   $\geq$  85% of all gates are ancilla production



Elements of fault tolerance:

- Encode qubits in quantum error correcting code (QEC)
- Add error correction modules
- 3 pieces to error correction operation:
  - Ancilla (helper "parity" bit) production
  - Error syndrome
  - Qubit correction
- $\blacktriangleright$   $\geq$  90% of QEC gates are ancilla production
- $\blacktriangleright$   $\geq$  85% of all gates are ancilla production
- Efficient design requires managing ancilla production

# Reducing FT Overhead

2 choices for reducing overhead:

- Improve ancilla supply/make more efficient
- Decrease demand for ancilla
- Our contributions:
  - Qalypso: flexible, efficient ancilla production
  - Error correction optimization to reduce correct steps



# Reducing FT Overhead

2 choices for reducing overhead:

- Improve ancilla supply/make more efficient
- Decrease demand for ancilla
- Our contributions:
  - Qalypso: flexible, efficient ancilla production
  - Error correction optimization to reduce correct steps





Compute and QEC ancilla regions in a single tile





Compute and QEC ancilla regions in a single tile

◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 のへぐ



#### Compute and QEC ancilla regions in a single tile





Compute and QEC ancilla regions in a single tile

- Interconnection network connects tiles
  - Previous work: QLA, LQLA



Compute and QEC ancilla regions in a single tile

- Interconnection network connects tiles
  - Previous work: QLA, LQLA
- Specialized memory and compute regions
  - Previous work: CQLA, CQLA+



Compute and QEC ancilla regions in a single tile

- Interconnection network connects tiles
  - Previous work: QLA, LQLA
- Specialized memory and compute regions
  - Previous work: CQLA, CQLA+



- Compute and QEC ancilla regions in a single tile
- Interconnection network connects tiles
  - Previous work: QLA, LQLA
- Specialized memory and compute regions
  - Previous work: CQLA, CQLA+
- Our contribution: Qalypso: flexible memory, compute, QEC



- Compute and QEC ancilla regions in a single tile
- Interconnection network connects tiles
  - Previous work: QLA, LQLA
- Specialized memory and compute regions
  - Previous work: CQLA, CQLA+
- Our contribution: Qalypso: flexible memory, compute, QEC



- Compute and QEC ancilla regions in a single tile
- Interconnection network connects tiles
  - Previous work: QLA, LQLA
- Specialized memory and compute regions
  - Previous work: CQLA, CQLA+
- Our contribution: Qalypso: flexible memory, compute, QEC
- Parameters: number of tiles, compute/memory distribution



- Input: Application circuit (Shor's, etc.)
- Output: Fault tolerant, spatial layout of circuit



- Input: Application circuit (Shor's, etc.)
- Output: Fault tolerant, spatial layout of circuit



- Input: Application circuit (Shor's, etc.)
- Output: Fault tolerant, spatial layout of circuit



- Input: Application circuit (Shor's, etc.)
- Output: Fault tolerant, spatial layout of circuit



- Input: Application circuit (Shor's, etc.)
- Output: Fault tolerant, spatial layout of circuit



- Input: Application circuit (Shor's, etc.)
- Output: Fault tolerant, spatial layout of circuit
- Advantages:
  - Detailed layout allows accurate comparison of designs

▲ロト ▲帰ト ▲ヨト ▲ヨト 三日 - の々ぐ

Can search large space of configurations

Basic metrics:

- Probability no error in output: p<sub>success</sub>
- Area
- Delay of single run of circuit: D<sub>single</sub>

(ロ)、(型)、(E)、(E)、 E) の(の)

Basic metrics:

- Probability no error in output: p<sub>success</sub>
- Area
- Delay of single run of circuit: D<sub>single</sub>

(ロ)、(型)、(E)、(E)、 E) のQの

► Tradeoff area, *D<sub>single</sub>*, *p<sub>success</sub>* in design

Basic metrics:

- Probability no error in output: p<sub>success</sub>
- Area
- Delay of single run of circuit: D<sub>single</sub>

◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 のへぐ

- ► Tradeoff area, *D<sub>single</sub>*, *p<sub>success</sub>* in design
- ► ADCR: Area-Delay to Correct Result

Basic metrics:

- Probability no error in output: p<sub>success</sub>
- Area
- Delay of single run of circuit: D<sub>single</sub>
- ► Tradeoff area, *D<sub>single</sub>*, *p<sub>success</sub>* in design
- ► ADCR: Area-Delay to Correct Result

• 
$$E(Delay) = D_{single} \times E_{correct}(runs) = \frac{D_{single}}{p_{success}}$$

◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 のへぐ

Basic metrics:

- Probability no error in output: p<sub>success</sub>
- Area
- Delay of single run of circuit: D<sub>single</sub>
- Tradeoff area, D<sub>single</sub>, p<sub>success</sub> in design
- ADCR: Area-Delay to Correct Result
  - $E(Delay) = D_{single} \times E_{correct}(runs) = \frac{D_{single}}{P_{success}}$

- ADCR = Area × E(Delay) = Area×D<sub>single</sub>/<sub>Psuccess</sub>
  Area efficiency of probabilistic circuits

Basic metrics:

- Probability no error in output: p<sub>success</sub>
- Area
- Delay of single run of circuit: D<sub>single</sub>
- Tradeoff area, D<sub>single</sub>, p<sub>success</sub> in design
- ADCR: Area-Delay to Correct Result
  - $E(Delay) = D_{single} \times E_{correct}(runs) = \frac{D_{single}}{P_{success}}$
  - ADCR = Area × E(Delay) = Area×D<sub>single</sub>/<sub>Psuccess</sub>
    Area efficiency of probabilistic circuits
- ► ADCR<sub>optimal</sub>: best ADCR for over all possible configurations

## Architectural Evaluation on Random Circuits

- ADCR<sub>optimal</sub> comparison on random circuits
- Compare best performing archs: Qalypso, LQLA, and CQLA+

◆□▶ ◆□▶ ◆臣▶ ◆臣▶ 臣 の�?

#### Architectural Evaluation on Random Circuits

- ADCR<sub>optimal</sub> comparison on random circuits
- ► Compare best performing archs: Qalypso, LQLA, and CQLA+



Qalypso has significantly lower ADCR than previous work

ヘロト ヘ回ト ヘヨト ヘヨト

## Architectural Evaluation on Random Circuits

- ADCR<sub>optimal</sub> comparison on random circuits
- ► Compare best performing archs: Qalypso, LQLA, and CQLA+



Qalypso has significantly lower ADCR than previous work

- 4x lower latency than LQLA
- ▶ 2x smaller area than CQLA+

## Architectural Evaluation on Random Circuits

- ADCR<sub>optimal</sub> comparison on random circuits
- ► Compare best performing archs: Qalypso, LQLA, and CQLA+



- Qalypso has significantly lower ADCR than previous work
  - 4x lower latency than LQLA
  - ▶ 2x smaller area than CQLA+

Qalypso targets ancilla production to performance critical tiles

・ロット (雪) ( き) ( き) ( き)

# Reducing FT Overhead

2 choices for reducing overhead:

- Improve ancilla supply/make more efficient
- Decrease demand for ancilla
- Our contributions:
  - Qalypso: more flexible, efficient ancilla production
  - Error correction optimization to reduce correct steps





▲□▶ ▲圖▶ ▲≣▶ ▲≣▶ = 差 = 釣�?

Standard approach: insert error correction steps everywhere



- Standard approach: insert error correction steps everywhere
- Our approach:



◆□ > ◆□ > ◆豆 > ◆豆 > ̄豆 \_ のへぐ

- Standard approach: insert error correction steps everywhere
- Our approach:
  - Identify critical error paths



- Standard approach: insert error correction steps everywhere
- Our approach:
  - Identify critical error paths
  - Add correction steps where "most important"



Identifying critical error paths

- Identifying critical error paths
  - Gates propagate max input error distance to all outputs

Gates add 1 to error distance



- Identifying critical error paths
  - Gates propagate max input error distance to all outputs

▲ロト ▲帰ト ▲ヨト ▲ヨト 三日 - の々ぐ

• Gates add 1 to error distance



- Identifying critical error paths
  - Gates propagate max input error distance to all outputs

- Gates add 1 to error distance
- *EDist<sub>max</sub>* corresponds to critical paths



- Identifying critical error paths
  - Gates propagate max input error distance to all outputs

▲□▶ ▲□▶ ▲□▶ ▲□▶ ▲□ ● ● ●

- Gates add 1 to error distance
- *EDist<sub>max</sub>* corresponds to critical paths
- Corrections reduce error counts



- Identifying critical error paths
  - Gates propagate max input error distance to all outputs
  - Gates add 1 to error distance
  - *EDist<sub>max</sub>* corresponds to critical paths
- Corrections reduce error counts



- Similar to delay in synchronous circuits
- Apply classical retiming approach: recorrection
  - Corrections balance EDist  $\rightarrow$  Synch registers balance delay



◆□▶ ◆□▶ ◆臣▶ ◆臣▶ 臣 のへぐ



#### Add correct ops at front of circuit



#### Add correct ops at front of circuit



#### Add correct ops at front of circuit

Add enough corrects to reach a target EDist<sub>threshold</sub>

▲ロト ▲帰ト ▲ヨト ▲ヨト 三日 - の々ぐ



- Add correct ops at front of circuit
  - Add enough corrects to reach a target EDist<sub>threshold</sub>
- ▶ Find minimal number of corrects so *EDist<sub>max</sub>* ≤ *EDist<sub>threshold</sub>*



- Add correct ops at front of circuit
  - Add enough corrects to reach a target EDist<sub>threshold</sub>
- ▶ Find minimal number of corrects so *EDist<sub>max</sub>* ≤ *EDist<sub>threshold</sub>*
- ► Find *EDist<sub>threshold</sub>* for desired *p<sub>success</sub>*/area/*D<sub>single</sub>*/ADCR

### Optimization and Adder Performance

1024-bit Quantum Carry Lookahead Adder



▶ Higher *EDist*<sub>threshold</sub> = fewer corrections, more optimization

### Optimization and Adder Performance

1024-bit Quantum Carry Lookahead Adder



- ▶ Higher *EDist*<sub>threshold</sub> = fewer corrections, more optimization
- ▶  $\geq$  10x improvement for CQLA+ and LQLA
- 6x improvement for Qalypso

### Optimization and Adder Performance

1024-bit Quantum Carry Lookahead Adder



▶ Higher *EDist*<sub>threshold</sub> = fewer corrections, more optimization

- $\blacktriangleright \geq 10x$  improvement for CQLA+ and LQLA
- 6x improvement for Qalypso

# Shor's Performance

- ▶ Full Shor's factorization billions of ops for 1024
- Include ripple carry adder design
- > 2x reduction in latency for optimization (QCLA opt best)

◆□▶ ◆□▶ ◆三▶ ◆三▶ 三三 のへぐ

# Shor's Performance

- ▶ Full Shor's factorization billions of ops for 1024
- Include ripple carry adder design
- > 2x reduction in latency for optimization (QCLA opt best)



▶ 5x reduction in area for optimization (QRCA opt best)

Best optimized Qalypso 1024-bit design is approx 0.01m<sup>2</sup>

### Conclusion

- ADCR: New metric for evaluating FT quantum circuits
  - Area efficiency metric including reliability, area, delay
- Qalypso outperforms other QC architectures
  - Detailed layout and simulation allows accurate comparison
  - CAD flow enables automated search of configuration space
- Error correction optimization reduces area and latency
  - Minimal impact to reliability
- Together orders of magnitude area improvement for Shor's factoring