scispace - formally typeset
Search or ask a question
Author

Carlo Condo

Bio: Carlo Condo is an academic researcher from Huawei. The author has contributed to research in topics: Decoding methods & Polar code. The author has an hindex of 20, co-authored 97 publications receiving 1311 citations. Previous affiliations of Carlo Condo include McGill University & Polytechnic University of Turin.


Papers
More filters
Journal ArticleDOI
TL;DR: The struggles of designing a family of polar codes able to satisfy the demands of 5G systems are illustrated, with particular attention to rate flexibility and low decoding latency.
Abstract: Polar codes have attracted the attention of academia and industry alike in the past decade, such that the $5^{\mathrm {th}}$ generation wireless systems (5G) standardization process of the $3^{\mathrm {rd}}$ generation partnership project (3GPP) chose polar codes as a channel coding scheme. In this tutorial, we provide a description of the encoding process of polar codes adopted by the 5G standard. We illustrate the struggles of designing a family of polar codes able to satisfy the demands of 5G systems, with particular attention to rate flexibility and low decoding latency. The result of these efforts is an elaborate framework that applies novel coding techniques to provide a solid channel code for NR requirements.

197 citations

Journal ArticleDOI
TL;DR: In this paper, the authors improved SSCL and SSCL-SPC by proving that the list size imposes a specific number of path splitting required to decode rate one and single parity check codes, while guaranteeing exactly the same error-correct performance as if the paths were forked at each bit estimation.
Abstract: Polar codes have gained significant amount of attention during the past few years and have been selected as a coding scheme for the next generation of mobile broadband standard. Among decoding schemes, successive-cancellation list (SCL) decoding provides a reasonable tradeoff between the error-correction performance and hardware implementation complexity when used to decode polar codes, at the cost of limited throughput. The simplified SCL (SSCL) and its extension SSCL-SPC increase the speed of decoding by removing redundant calculations when encountering particular information and frozen bit patterns (rate one and single parity check codes), while keeping the error-correction performance unaltered. In this paper, we improve SSCL and SSCL-SPC by proving that the list size imposes a specific number of path splitting required to decode rate one and single parity check codes. Thus, the number of splitting can be limited while guaranteeing exactly the same error-correction performance as if the paths were forked at each bit estimation. We call the new decoding algorithms Fast-SSCL and Fast-SSCL-SPC. Moreover, we show that the number of path forks in a practical application can be tuned to achieve desirable speed, while keeping the error-correction performance almost unchanged. Hardware architectures implementing both algorithms are then described and implemented: It is shown that our design can achieve $\mathbf {1.86}$ Gb/s throughput, higher than the best state-of-the-art decoders.

149 citations

Journal ArticleDOI
TL;DR: A speed-up technique for successive-cancellation list decoding of polar codes that is exact for list size of 2, while its approximations bring negligible error-correction performance degradation (<;0.05 dB) for other list sizes.
Abstract: Polar codes are a recently discovered family of capacity-achieving error-correcting codes. Among the proposed decoding algorithms, successive-cancellation list decoding guarantees the best error-correction performance with codes of moderate lengths, but it yields low throughput. Speed-up techniques have been proposed in the past: most of them rely on approximations that degrade the error-correction capability of the algorithm. We propose a speed-up technique for successive-cancellation list decoding of polar codes that is exact for list size of 2, while its approximations bring negligible error-correction performance degradation ( $3.16\times $ , at the cost of 14.2% in area occupation.

110 citations

Journal ArticleDOI
TL;DR: This paper proposes an efficient computational method, which is inspired by a computational core of fully connected neural networks, to process convolutional layers of state-of-the-art deep CNNs within strict latency requirements, and implemented its method customized for VGG and VGG-based networks which have shown state of theart performance on different classification/recognition data sets.
Abstract: In the past few years, the demand for real-time hardware implementations of deep neural networks (DNNs), especially convolutional neural networks (CNNs), has dramatically increased, thanks to their excellent performance on a wide range of recognition and classification tasks. When considering real-time action recognition and video/image classification systems, latency is of paramount importance. Therefore, applications strive to maximize the accuracy while keeping the latency under a given application-specific maximum: in most cases, this threshold cannot exceed a few hundred milliseconds. Until now, the research on DNNs has mainly focused on achieving a better classification or recognition accuracy, whereas very few works in literature take in account the computational complexity of the model. In this paper, we propose an efficient computational method, which is inspired by a computational core of fully connected neural networks, to process convolutional layers of state-of-the-art deep CNNs within strict latency requirements. To this end, we implemented our method customized for VGG and VGG-based networks which have shown state-of-the-art performance on different classification/recognition data sets. The implementation results in 65-nm CMOS technology show that the proposed accelerator can process convolutional layers of VGGNet up to 9.5 times faster than state-of-the-art accelerators reported to-date while occupying 3.5 mm2.

93 citations

Proceedings ArticleDOI
19 Mar 2017
TL;DR: In this paper, the Rate-1 node decoder was proposed to reduce the number of time-steps needed to decode the polar codes in order to improve the error-correction performance.
Abstract: Polar codes are capacity achieving error correcting codes that can be decoded through the successive-cancellation algorithm. To improve its error-correction performance, a list-based version called successive-cancellation list (SCL) has been proposed in the past, that however substantially increases the number of time-steps in the decoding process. The simplified SCL (SSCL) decoding algorithm exploits constituent codes within the polar code structure to greatly reduce the required number of time-steps without introducing any error-correction performance loss. In this paper, we propose a faster decoding approach to decode one of these constituent codes, the Rate-1 node. We use this Rate-1 node decoder to develop Fast-SSCL. We demonstrate that only a list-size-bound number of bits needs to be estimated in Rate-1 nodes and Fast-SSCL exactly matches the error-correction performance of SCL and SSCL. This technique can potentially greatly reduce the total number of time-steps needed for polar codes decoding: analysis on a set of case studies show that Fast-SSCL has a number of time- steps requirement that is up to 66.6% lower than SSCL and 88.1% lower than SCL.

69 citations


Cited by
More filters
Journal ArticleDOI
20 Mar 2020
TL;DR: This article reviews the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification, and answers the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures.
Abstract: Domain-specific hardware is becoming a promising topic in the backdrop of improvement slow down for general-purpose processors due to the foreseeable end of Moore’s Law. Machine learning, especially deep neural networks (DNNs), has become the most dazzling domain witnessing successful applications in a wide spectrum of artificial intelligence (AI) tasks. The incomparable accuracy of DNNs is achieved by paying the cost of hungry memory consumption and high computational complexity, which greatly impedes their deployment in embedded systems. Therefore, the DNN compression concept was naturally proposed and widely used for memory saving and compute acceleration. In the past few years, a tremendous number of compression techniques have sprung up to pursue a satisfactory tradeoff between processing efficiency and application accuracy. Recently, this wave has spread to the design of neural network accelerators for gaining extremely high performance. However, the amount of related works is incredibly huge and the reported approaches are quite divergent. This research chaos motivates us to provide a comprehensive survey on the recent advances toward the goal of efficient compression and execution of DNNs without significantly compromising accuracy, involving both the high-level algorithms and their applications in hardware design. In this article, we review the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification. We explain their compression principles, evaluation metrics, sensitivity analysis, and joint-way use. Then, we answer the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures. In the end, we discuss several existing issues such as fair comparison, testing workloads, automatic compression, influence on security, and framework/hardware-level support, and give promising topics in this field and the possible challenges as well. This article attempts to enable readers to quickly build up a big picture of neural network compression and acceleration, clearly evaluate various methods, and confidently get started in the right way.

499 citations

Journal ArticleDOI
TL;DR: A survey of various techniques suggested for compressing and accelerating the ML and DL models is presented and the challenges of the existing techniques are discussed and future research directions in the field are provided.
Abstract: In recent years, machine learning (ML) and deep learning (DL) have shown remarkable improvement in computer vision, natural language processing, stock prediction, forecasting, and audio processing to name a few. The size of the trained DL model is large for these complex tasks, which makes it difficult to deploy on resource-constrained devices. For instance, size of the pre-trained VGG16 model trained on the ImageNet dataset is more than 500 MB. Resource-constrained devices such as mobile phones and internet of things devices have limited memory and less computation power. For real-time applications, the trained models should be deployed on resource-constrained devices. Popular convolutional neural network models have millions of parameters that leads to increase in the size of the trained model. Hence, it becomes essential to compress and accelerate these models before deploying on resource-constrained devices while making the least compromise with the model accuracy. It is a challenging task to retain the same accuracy after compressing the model. To address this challenge, in the last couple of years many researchers have suggested different techniques for model compression and acceleration. In this paper, we have presented a survey of various techniques suggested for compressing and accelerating the ML and DL models. We have also discussed the challenges of the existing techniques and have provided future research directions in the field.

221 citations

Journal ArticleDOI
TL;DR: The struggles of designing a family of polar codes able to satisfy the demands of 5G systems are illustrated, with particular attention to rate flexibility and low decoding latency.
Abstract: Polar codes have attracted the attention of academia and industry alike in the past decade, such that the $5^{\mathrm {th}}$ generation wireless systems (5G) standardization process of the $3^{\mathrm {rd}}$ generation partnership project (3GPP) chose polar codes as a channel coding scheme. In this tutorial, we provide a description of the encoding process of polar codes adopted by the 5G standard. We illustrate the struggles of designing a family of polar codes able to satisfy the demands of 5G systems, with particular attention to rate flexibility and low decoding latency. The result of these efforts is an elaborate framework that applies novel coding techniques to provide a solid channel code for NR requirements.

197 citations