scispace - formally typeset
Search or ask a question

Showing papers by "Moinuddin K. Qureshi published in 2023"


Proceedings ArticleDOI
17 Jun 2023
TL;DR: Astrea-G as discussed by the authors is the first real-time MWPM decoder that performs a brute-force search for the few hundred possible options to perform accurate decoding within a few nanoseconds (1ns average, 456ns worst-case).
Abstract: Quantum devices suffer from high error rates, which makes them ineffective for running practical applications. Quantum computers can be made fault tolerant using Quantum Error Correction (QEC), which protects quantum information by encoding logical qubits using data qubits and parity qubits. The data qubits collectively store the quantum information and the parity qubits are measured periodically to produce a syndrome, which is decoded by a classical decoder to identify the location and type of errors. To prevent errors from accumulating and causing a logical error, decoders must accurately identify errors in real-time, necessitating the use of hardware solutions because software decoders are slow. Ideally, a real-time decoder must match the performance of the Minimum-Weight Perfect Matching (MWPM) decoder. However, due to the complexity of the underlying Blossom algorithm, state-of-the-art real-time decoders either use lookup tables, which are not scalable, or use approximate decoding, which significantly increases logical error rates. In this paper, we leverage two key insights to enable practical real-time MWPM decoding. First, for near-term implementations (with redundancies up to distance d = 7) of surface codes, the Hamming weight of the syndromes tends to be quite small (less than or equal to 10). For this regime, we propose Astrea, which simply performs a brute-force search for the few hundred possible options to perform accurate decoding within a few nanoseconds (1ns average, 456ns worst-case), thus representing the first decoder to be able to do MWPM in real-time up-to distance 7. Second, even for codes that produce syndromes with higher Hamming weights (e.g. d = 9) the search for optimal pairings can be made more efficient by simply discarding the weights that denote significantly lower probability than the logical error-rate of the code. For this regime, we propose a greedy design called Astrea-G, which filters high-cost weights and reorders the search from high-likelihood pairings to low-likelihood pairings to produce the most likely decoding within 1μs (average 450ns). Our evaluations show that Astrea-G provides similar logical error-rates as the software-based MWPM for up-to d = 9 codes while meeting the real-time decoding latency constraints.

Journal ArticleDOI
TL;DR: CoaXiaL as discussed by the authors is a server design that overcomes memory bandwidth limitations by replacing all the DDR interfaces to the processor with the more pin-efficient CXL interface.
Abstract: The memory system is a major performance determinant for server processors. Ever-growing core counts and datasets demand higher bandwidth and capacity as well as lower latency from the memory system. To keep up with growing demands, DDR--the dominant processor interface to memory over the past two decades--has offered higher bandwidth with every generation. However, because each parallel DDR interface requires a large number of on-chip pins, the processor's memory bandwidth is ultimately restrained by its pin-count, which is a scarce resource. With limited bandwidth, multiple memory requests typically contend for each memory channel, resulting in significant queuing delays that often overshadow DRAM's service time and degrade performance. We present CoaXiaL, a server design that overcomes memory bandwidth limitations by replacing \textit{all} DDR interfaces to the processor with the more pin-efficient CXL interface. The widespread adoption and industrial momentum of CXL makes such a transition possible, offering $4\times$ higher bandwidth per pin compared to DDR at a modest latency overhead. We demonstrate that, for a broad range of workloads, CXL's latency premium is more than offset by its higher bandwidth. As CoaXiaL distributes memory requests across more channels, it drastically reduces queuing delays and thereby both the average value and variance of memory access latency. Our evaluation with a variety of workloads shows that CoaXiaL improves the performance of manycore throughput-oriented servers by $1.52\times$ on average and by up to $3\times$.