scispace - formally typeset
Search or ask a question

Showing papers on "Codebook published in 2020"


Journal ArticleDOI
TL;DR: A convolutional neural network for MIMO (CNN-MIMO) that accepts as input an imperfect channel matrix and gives the analog precoder and combiners at the output and the proposed DL framework outperforms conventional techniques.
Abstract: In multi-user millimeter wave (mmWave) multiple-input-multiple-output (MIMO) systems, hybrid precoding is a crucial task to lower the complexity and cost while achieving a sufficient sum-rate. Previous works on hybrid precoding were usually based on optimization or greedy approaches. These methods either provide higher complexity or have sub-optimum performance. Moreover, the performance of these methods mostly relies on the quality of the channel data. In this work, we propose a deep learning (DL) framework to improve the performance and provide less computation time as compared to conventional techniques. In fact, we design a convolutional neural network for MIMO (CNN-MIMO) that accepts as input an imperfect channel matrix and gives the analog precoder and combiners at the output. The procedure includes two main stages. First, we develop an exhaustive search algorithm to select the analog precoder and combiners from a predefined codebook maximizing the achievable sum-rate. Then, the selected precoder and combiners are used as output labels in the training stage of CNN-MIMO where the input-output pairs are obtained. We evaluate the performance of the proposed method through numerous and extensive simulations and show that the proposed DL framework outperforms conventional techniques. Overall, CNN-MIMO provides a robust hybrid precoding scheme in the presence of imperfections regarding the channel matrix. On top of this, the proposed approach exhibits less computation time with comparison to the optimization and codebook based approaches.

89 citations


Journal ArticleDOI
Junho Cho1
TL;DR: Variable-length prefix-free codes are constructed that are optimal (or near-optimal) in the sense that no (or few) other codes of the same cardinality can achieve a smaller expected energy per code symbol for the same resolution rate.
Abstract: In this paper, we construct variable-length prefix-free codes that are optimal (or near-optimal) in the sense that no (or few) other codes of the same cardinality can achieve a smaller expected energy per code symbol for the same resolution rate. Under stringent constraints of 4096 codewords or below per codebook, the constructed codes yield an energy per code symbol within a few tenths of a dB of the unconstrained theoretic lower bound, across a wide range of resolution rates with fine granularity. We also propose a framing method that allows variable-length codes to be transmitted using a fixed-length frame. The penalty caused by framing is studied using simulations and analysis, showing that the energy per code symbol is kept within 0.3 dB of the unconstrained theoretic limit for some tested codes with a large frame length. When the proposed method is used to implement probabilistic constellation shaping for communications in the additive white Gaussian noise channel, simulations show that between 0.21 dB and 0.98 dB of shaping gains are achieved relative to uniform 4-, 8-, 16-, and 32-ary quadrature amplitude modulation.

45 citations


Journal ArticleDOI
TL;DR: A novel framework of hierarchical gradient quantization that is proved to guarantee model convergency by analyzing the convergence rate as a function of quantization bits and to substantially reduce the communication overhead compared with the state-of-the-art signSGD scheme.
Abstract: Edge machine learning involves the deployment of learning algorithms at the wireless network edge so as to leverage massive mobile data for enabling intelligent applications. The mainstream edge learning approach, federated learning, has been developed based on distributed gradient descent. Based on the approach, stochastic gradients are computed at edge devices and then transmitted to an edge server for updating a global AI model. Since each stochastic gradient is typically high-dimensional, communication overhead becomes a bottleneck for edge learning. To address this issue, we propose a novel framework of hierarchical gradient quantization and study its effect on the learning performance. First, the framework features a practical hierarchical architecture for decomposing the stochastic gradient into its norm and normalized block gradients, and efficiently quantizes them using a uniform quantizer and a low-dimensional Grassmannian codebook, respectively. Subsequently, the quantized normalized block gradients are scaled and cascaded to yield the quantized normalized stochastic gradient using a socalled hinge vector, which is compressed using another low-dimensional Grassmannian quantizer designed under the criterion of minimum distortion. The other feature of the framework is a bit-allocation scheme for reducing the distortion, which divides the total quantization bits to determine the resolutions of low-dimensional quantizers. The framework is proved to guarantee model convergency by analyzing the convergence rate as a function of quantization bits. Furthermore, by simulation, our design is shown to substantially reduce the communication overhead compared with the state-of-the-art signSGD scheme, while achieving similar learning accuracies.

44 citations


Journal ArticleDOI
TL;DR: Simulation results show that the proposed multiuser beam training scheme can approach the performance of the beam sweeping but with significantly reduced beam training overhead.
Abstract: In this article, multiuser beam training based on hierarchical codebook for millimeter wave massive multi-input multi-output is investigated, where the base station (BS) simultaneously performs beam training with multiple user equipments (UEs). For the UEs, an alternative minimization method with a closed-form expression (AMCF) is proposed to design the hierarchical codebook under the constant modulus constraint. To speed up the convergence of the AMCF, an initialization method based on Zadoff-Chu sequence is proposed. For the BS, a simultaneous multiuser beam training scheme based on an adaptively designed hierarchical codebook is proposed, where the codewords in the current layer of the codebook are designed according to the beam training results of the previous layer. The codewords at the BS are designed with multiple mainlobes, each covering a spatial region for one or more UEs. Simulation results verify the effectiveness of the proposed hierarchical codebook design schemes and show that the proposed multiuser beam training scheme can approach the performance of the beam sweeping but with significantly reduced beam training overhead.

38 citations


Posted Content
TL;DR: In this article, a comparative approach to study two types of overloaded CD-NOMA, i.e., sparse code multiple access (SCMA) and dense CDMA (DCMA), is presented, which are distinctive from each other in terms of their codebooks having sparsity.
Abstract: This paper is focused on code-domain non-orthogonal multiple access (CD-NOMA), which is an emerging paradigm to support massive connectivity for future machine-type wireless networks. We take a comparative approach to study two types of overloaded CD-NOMA, i.e., sparse code multiple access (SCMA) and dense code multiple access (DCMA), which are distinctive from each other in terms of their codebooks having sparsity or not. By analysing their individual diversity orders (DO) in Rayleigh fading channels, it is found that DCMA can be designed to enjoy full DO which is equal to the maximum number of resource nodes in the system. This is in contrast to SCMA whose error rate suffers from limited DO equal to the codebook sparsity (i.e., the effective number of resource nodes occupied by each user). We conduct theoretical analysis for the codebook design criteria and propose to use generalized sphere decoder for DCMA detection. We numerically evaluate two types of multiple access schemes under "4x6" (i.e., six users communicate over four subcarriers) and "5x10" NOMA settings and reveal that DCMA gives rise to significantly improved error rate performance in Rayleigh fading channels, whilst having decoding complexity comparable to that of SCMA.

36 citations


Journal ArticleDOI
TL;DR: The simulation results reveal that the proposed cryptosystem with cipher block chaining, cipher feedback, and output feedback modes can efficiently and effectively hide all information of the color images with few details even in the presence of some input blocks with similar data.
Abstract: Recently, massive research works have been accomplished for augmenting privacy and security requirements for cybersecurity applications in wireless communication networks. This is attributed to the fact that conventional security processes are not appropriate for robust, efficient, and reliable multimedia streaming over unsecure media. Therefore, this paper presents an efficient color image cryptosystem based on RC6 with different modes of operation. The proposed cryptosystem is composed of two phases: encryption and decryption. The encryption phase starts by decomposing the color plainimage with few details into its RGB components, which in turn, are segmented into 128-bit blocks. These blocks are then enciphered using RC6 with an appropriate mode of operation. After that, the corresponding enciphered blocks of RGB components are multiplexed for constructing the final cipherimage. This scenario is reversed in the decryption phase. The performance of the proposed cryptosystem is gauged via simulation using a set of encryption quality metrics. The simulation results reveal that the proposed cryptosystem with cipher block chaining (CBC), cipher feedback (CFB), and output feedback (OFB) modes can efficiently and effectively hide all information of the color images with few details even in the presence of some input blocks with similar data. On the other hand, the results show that the electronic codebook (ECB) mode is not effective at all in hiding all details of images. Finally, the obtained results ensure the applicability of the proposed cryptosystem and its efficiency in encrypting images in terms of security, encryption quality, and noise immunity.

36 citations


Journal ArticleDOI
TL;DR: This study considers a spectrum sharing architecture, wherein a multiple-input multiple-output communication system cooperatively coexists with a surveillance radar, and guarantees the radar performance on all of the range-azimuth cells of the patrolled region under signal-dependent and signal-independent interference.
Abstract: In this study, we consider a spectrum sharing architecture, wherein a multiple-input multiple-output communication system cooperatively coexists with a surveillance radar. The degrees of freedom for system design are the transmit powers of both systems, the receive linear filters used for pulse compression and interference mitigation at the radar, and the space-time communication codebook. The design criterion is the maximization of the mutual information between the input and output symbols of the communication system, subject to constraints aimed at safeguarding the radar performance. Unlike previous studies, we do not require any time-synchronization between the two systems, and we guarantee the radar performance on all of the range-azimuth cells of the patrolled region under signal-dependent (endogenous) and signal-independent (exogenous) interference. This leads to a non-convex problem, and an approximate solution is thus introduced using a block coordinate ascent method. A thorough analysis is provided to show the merits of the proposed approach and emphasize the inherent tradeoff among the achievable mutual information, the density of scatterers in the environment, and the number of protected radar cells.

32 citations


Journal ArticleDOI
TL;DR: A novel low-complexity beam training scheme is explored by fully exploiting the sparse nature of the mmWave channel and the strong correlations of large-scale antenna arrays to reduce beam search space.
Abstract: We consider a 5G millimeter-wave (mmWave) massive MIMO system with subconnected hybrid structure, where each transceiver unit (TXRU) is connected to a subarray at the transmitter (TX) or the receiver (RX). Beam training is the first step for codebook-based beamforming in the absence of ideal channel state information (CSI). The complexity of the optimal exhaustive search beam training is exponentially increased with the number of TXRUs (or subarrays) at the TX and the RX, which is infeasible in practice. The conventional beam training schemes are mainly based on the idea of mathematical search algorithms and novel beam codebook designs. In this study, we explore a novel low-complexity beam training scheme by fully exploiting the sparse nature of the mmWave channel and the strong correlations of large-scale antenna arrays. Specifically, we propose a training scheme to reduce beam search space for 5G mmWave system, which consists of the following two key stages: 1) a beam subset optimization method from the perspective of array gain for capturing the energy of dominant channel propagation paths, and 2) a class of linear iterative beam search algorithms within the optimized beam subset utilizing the idea of limiting the degrees of freedom of the subarray indices during beam training. The range of the optimized beam subset size and linear search condition are also given. The performances of the proposed two algorithms, beam subset optimization-single subarray linear search and beam subset optimization-subarray pair linear search, are evaluated in a 5G mmWave system. Analysis has shown that the complexity of the proposed scheme is linear with the number of channel propagation paths, which can considerably reduce the complexity of conventional schemes. Simulation results have verified the near-optimal performance of the proposed schemes when compared with exhaustive search as well as other existing training methods.

31 citations


Journal ArticleDOI
TL;DR: Deep neural networks (DNNs) are introduced to replace an end-to-end limited feedback procedure including pilot-aided channel training process, channel codebook design, and beamforming vector selection, which shows an 1 dB symbol error rate (SER) gain with reduced computational complexity.
Abstract: We study a deep learning (DL) based limited feedback methods for multi-antenna systems. Deep neural networks (DNNs) are introduced to replace an end-to-end limited feedback procedure including pilot-aided channel training process, channel codebook design, and beamforming vector selection. The DNNs are trained to yield binary feedback information as well as an efficient beamforming vector which maximizes the effective channel gain. Compared to conventional limited feedback schemes, the proposed DL method shows an 1 dB symbol error rate (SER) gain with reduced computational complexity.

30 citations


Proceedings ArticleDOI
21 Jun 2020
TL;DR: This work designs reconstruction codes that reconstruct a codeword from N distinct noisy reads and shows that the redundancy of certain reconstruction codes is within one bit of optimality.
Abstract: The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication scenario where the sender transmits a codeword from some codebook and the receiver obtains multiple noisy reads of the codeword. The common setup assumes the codebook to be the entire space and the problem is to determine the minimum number of distinct reads that is required to reconstruct the transmitted codeword.Motivated by modern storage devices, we study a variant of the problem where the number of noisy reads N is fixed. Specifically, we design reconstruction codes that reconstruct a codeword from N distinct noisy reads. We focus on channels that introduce single edit error (i.e. a single substitution, insertion, or deletion) and their variants, and design reconstruction codes for all values of N. In particular, for the case of a single edit, we show that as the number of noisy reads increases, the number of redundant bits required can be gracefully reduced from logn + O(1) to loglogn + O(1), and then to O(1), where n denotes the length of a codeword. We also show that the redundancy of certain reconstruction codes is within one bit of optimality.

29 citations


Journal ArticleDOI
TL;DR: It is shown that the proposed RSSI-based unsupervised deep learning method to design the hybrid beamforming in massive MIMO systems not only greatly increases the spectral efficiency, but also has near-optimal sum-rate and outperforms other state-of-the-art full-CSI solutions.
Abstract: Hybrid beamforming is a promising technique to reduce the complexity and cost of massive multiple-input multiple-output (MIMO) systems while providing high data rate. However, the hybrid precoder design is a challenging task requiring channel state information (CSI) feedback and solving a complex optimization problem. This paper proposes a novel RSSI-based unsupervised deep learning method to design the hybrid beamforming in massive MIMO systems. Furthermore, we propose i) a method to design the synchronization signal (SS) in initial access (IA); and ii) a method to design the codebook for the analog precoder. We also evaluate the system performance through a realistic channel model in various scenarios. We show that the proposed method not only greatly increases the spectral efficiency especially in frequency-division duplex (FDD) communication by using partial CSI feedback, but also has near-optimal sum-rate and outperforms other state-of-the-art full-CSI solutions.

Book ChapterDOI
23 Aug 2020
TL;DR: In this paper, a holistically-guided decoder is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder, and the decoding task is converted to novel codebook generation and codeword assembly task.
Abstract: Both performance and efficiency are important to semantic segmentation. State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN), which adopt dilated convolutions in the backbone networks to extract high-resolution feature maps for achieving high-performance segmentation performance. However, due to many convolution operations are conducted on the high-resolution feature maps, such dilatedFCN-based methods result in large computational complexity and memory consumption. To balance the performance and efficiency, there also exist encoder-decoder structures that gradually recover the spatial information by combining multi-level feature maps from the encoder. However, the performances of existing encoder-decoder methods are far from comparable with the dilatedFCN-based methods. In this paper, we propose the EfficientFCN, whose backbone is a common ImageNet pretrained network without any dilated convolution. A holistically-guided decoder is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding task is converted to novel codebook generation and codeword assembly task, which takes advantages of the high-level and low-level features from the encoder. Such a framework achieves comparable or even better performance than state-of-the-art methods with only 1/3 of the computational cost. Extensive experiments on PASCAL Context, PASCAL VOC, ADE20K validate the effectiveness of the proposed EfficientFCN.

Posted Content
TL;DR: An efficient online machine learning framework that learns how to adapt the codebook beam patterns to the specific deployment, surrounding environment, user distribution, and hardware characteristics is developed by designing a novel complex-valued neural network architecture.
Abstract: Millimeter wave (mmWave) and massive MIMO systems are intrinsic components of 5G and beyond These systems rely on using beamforming codebooks for both initial access and data transmission Current beam codebooks, however, generally consist of a large number of narrow beams that scan all possible directions, even if these directions are never used This leads to very large training overhead Further, these codebooks do not normally account for the hardware impairments or the possible non-uniform array geometries, and their calibration is an expensive process To overcome these limitations, this paper develops an efficient online machine learning framework that learns how to adapt the codebook beam patterns to the specific deployment, surrounding environment, user distribution, and hardware characteristics This is done by designing a novel complex-valued neural network architecture in which the neuron weights directly model the beamforming weights of the analog phase shifters, accounting for the key hardware constraints such as the constant-modulus and quantized-angles This model learns the codebook beams through online and self-supervised training avoiding the need for explicit channel state information This respects the practical situations where the channel is either unavailable, imperfect, or hard to obtain, especially in the presence of hardware impairments Simulation results highlight the capability of the proposed solution in learning environment and hardware aware beam codebooks, which can significantly reduce the training overhead, enhance the achievable data rates, and improve the robustness against possible hardware impairments

Proceedings ArticleDOI
04 May 2020
TL;DR: In this paper, a collaborative quantization (CQ) scheme is proposed to jointly learn the codebook of LPC coefficients and the corresponding residuals, which achieves much higher quality than its predecessor at 9 kbps with even lower model complexity.
Abstract: Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC coefficients and the corresponding residuals. CQ does not simply shoehorn LPC to a neural network, but bridges the computational capacity of advanced neural network models and traditional, yet efficient and domain-specific digital signal processing methods in an integrated manner. We demonstrate that CQ achieves much higher quality than its predecessor at 9 kbps with even lower model complexity. We also show that CQ can scale up to 24 kbps where it outperforms AMR-WB and Opus. As a neural waveform codec, CQ models are with less than 1 million parameters, significantly less than many other generative models.

Proceedings ArticleDOI
06 Jul 2020
TL;DR: It is shown that joint beam tracking and beam pattern adaptation provides a significant SNR gain compared to the beam tracking only schemes, especially as the user mobility increases.
Abstract: In this paper, we consider a joint beam tracking and pattern optimization problem for massive multiple input multiple output (MIMO) systems in which the base station (BS) selects a beamforming codebook and performs adaptive beam tracking taking into account the user mobility. A joint adaptation scheme is developed in a two-phase reinforcement learning framework which utilizes practical signaling and feedback information. In particular, an inner agent adjusts the transmission beam index for a given beamforming codebook based on short-term instantaneous signal-to-noise ratio (SNR) rewards. In addition, an outer agent selects the beamforming codebook based on long-term SNR rewards. Simulation results demonstrate that the proposed online learning outperforms conventional codebook-based beamforming schemes using the same number of feedback information. It is further shown that joint beam tracking and beam pattern adaptation provides a significant SNR gain compared to the beam tracking only schemes, especially as the user mobility increases.

Journal ArticleDOI
TL;DR: The SCMA scheme is implemented to jointly optimize the codebook and power allocation in the downlink D2D enabled cellular network, with the aim to maximize the system data rate.
Abstract: According to advanced wireless network standards, Device-to-device based communication underlaid conventional cellular network is considered a promising technology to improve the network performance. Precisely, this hybrid architecture provides an efficient resource allocation for cellular and D2D users while increasing the flexible utilization of the spectrum resources. Recently, the sparse code multiple access (SCMA) has been proposed as an efficient non-orthogonal multiple access technology for the 5G network paradigm. The SCMA scheme enhances the spectral efficiency, supports a massive connectivity, and diverses applications by enabling system overloading. Thus, in this paper, SCMA technology is applied to a D2D enabled cellular network, targeted at utilizing the overloading feature of the SCMA scheme to support a massive device connectivity while enhancing the overall network performance. The SCMA scheme is implemented to jointly optimize the codebook and power allocation in the downlink D2D enabled cellular network, with the aim to maximize the system data rate. This joint optimization problem is solved by decomposing the original problem into two sub-problems: codebook allocation and power allocation. For the codebook allocation, the rate aware codebook selection scheme for D2D system (RACBS-D2D) is proposed using conflict graph. For the power allocation solution, a geometric water-filling (GWF) method is utilized to propose the iterative GWF-based power allocation (IGWFPA) scheme. The performance of the proposed schemes is evaluated through simulations that reveal the benefits of the proposed solutions under different scenarios.

Patent
Wang Yi1, Zhang Shichang1, Li Yingyang
29 Apr 2020
TL;DR: In this article, a method for an uplink transmission of a user equipment (UE) in a wireless communication system and the UE are provided, which includes receiving, from a base station (BS), control signaling; grouping physical downlink shared channels (PDSCHs) based on the control signalling; determining a hybrid automatic repeat request acknowledgement/negative acknowledgement (HARQ-ACK/NACK) codebook for each grouping of the PDSCH; and transmitting HARQ-ack/Nack information corresponding to the HARQ -ACK codebook.
Abstract: A method for an uplink transmission of a user equipment (UE) in a wireless communication system and the UE are provided The method includes receiving, from a base station (BS), control signaling; grouping physical downlink shared channels (PDSCHs) based on the control signaling; determining a hybrid automatic repeat request acknowledgement/negative acknowledgement (HARQ-ACK/NACK) codebook for each grouping of the PDSCHs; and transmitting HARQ-ACK/NACK information corresponding to the HARQ-ACK/NACK codebook The UE includes a transceiver configured to receive control signaling from a BS; and a processor configured to: group PDSCHs based on the control signaling; and determine an HARQ-ACK/NACK codebook for each group of PDSCHs, wherein the transceiver is further configured to transmit HARQ-ACK/NACK information corresponding to the HARQ-ACK/NACK codebook

Posted Content
TL;DR: The EfficientFCN is proposed, whose backbone is a common ImageNet pre-trained network without any dilated convolution, and achieves comparable or even better performance than state-of-the-art methods with only 1/3 of the computational cost.
Abstract: Both performance and efficiency are important to semantic segmentation. State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN), which adopt dilated convolutions in the backbone networks to extract high-resolution feature maps for achieving high-performance segmentation performance. However, due to many convolution operations are conducted on the high-resolution feature maps, such dilatedFCN-based methods result in large computational complexity and memory consumption. To balance the performance and efficiency, there also exist encoder-decoder structures that gradually recover the spatial information by combining multi-level feature maps from the encoder. However, the performances of existing encoder-decoder methods are far from comparable with the dilatedFCN-based methods. In this paper, we propose the EfficientFCN, whose backbone is a common ImageNet pre-trained network without any dilated convolution. A holistically-guided decoder is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding task is converted to novel codebook generation and codeword assembly task, which takes advantages of the high-level and low-level features from the encoder. Such a framework achieves comparable or even better performance than state-of-the-art methods with only 1/3 of the computational cost. Extensive experiments on PASCAL Context, PASCAL VOC, ADE20K validate the effectiveness of the proposed EfficientFCN.

Proceedings ArticleDOI
26 May 2020
TL;DR: In this article, a neural network architecture is proposed to learn beam codebooks that adapt to the surrounding environment and the user locations, which leads to noticeable gains compared to classical codebook design approaches.
Abstract: Scaling the number of antennas up is a key characteristic of current and future wireless communication systems. The hardware cost and power consumption, however, motivate large-scale MIMO systems, especially at millimeter wave (mmWave) bands, to rely on analog-only or hybrid analog/digital transceiver architectures. With these architectures, mmWave base stations normally use pre-defined beamforming codebooks for both initial access and data transmissions. Current beam codebooks, however, generally adopt single-lobe narrow beams and scan the entire angular space. This leads to high beam training overhead and loss in the achievable beamforming gains. In this paper, we propose a new machine learning framework for learning beamforming codebooks in hardware-constrained large-scale MIMO systems. More specifically, we develop a neural network architecture that accounts for the hardware constraints and learns beam codebooks that adapt to the surrounding environment and the user locations. Simulation results highlight the capability of the proposed solution in learning multi-lobe beams and reducing the codebook size, which leads to noticeable gains compared to classical codebook design approaches.

Journal ArticleDOI
TL;DR: A new training algorithm is developed that addresses the dimming constraints through a dual formulation of the optimization of the DL-based VLC transceiver with universal dimming support and can be optimized via the end-to-end training procedure.
Abstract: This paper studies a deep learning (DL) framework for the design of binary modulated visible light communication (VLC) transceiver with universal dimming support. The dimming control for the optical binary signal boils down to a combinatorial codebook design so that the average Hamming weight of binary codewords matches with arbitrary dimming target. An unsupervised DL technique is employed for obtaining a neural network to replace the encoder-decoder pair that recovers the message from the optically transmitted signal. In such a task, a novel stochastic binarization method is developed to generate the set of binary codewords from continuous-valued neural network outputs. For universal support of arbitrary dimming target, the DL-based VLC transceiver is trained with multiple dimming constraints, which turns out to be a constrained training optimization that is very challenging to handle with existing DL methods. We develop a new training algorithm that addresses the dimming constraints through a dual formulation of the optimization. Based on the developed algorithm, the resulting VLC transceiver can be optimized via the end-to-end training procedure. Numerical results verify that the proposed codebook outperforms theoretically best constant weight codebooks under various VLC setups.

Journal ArticleDOI
TL;DR: To improve the accuracy of ANN search, a new PQ-based method named product quantization with dual codebooks (DCPQ) is proposed, which can achieve higher accuracy compared with the standard PQ and its variants.

Posted Content
TL;DR: This paper proposed Transformer-based VQ-VAE for unsupervised unit discovery and Trans transformer-based inverter for the speech synthesis given the extracted codebook for the ZeroSpeech 2020 challenge.
Abstract: In this paper, we report our submitted system for the ZeroSpeech 2020 challenge on Track 2019. The main theme in this challenge is to build a speech synthesizer without any textual information or phonetic labels. In order to tackle those challenges, we build a system that must address two major components such as 1) given speech audio, extract subword units in an unsupervised way and 2) re-synthesize the audio from novel speakers. The system also needs to balance the codebook performance between the ABX error rate and the bitrate compression rate. Our main contribution here is we proposed Transformer-based VQ-VAE for unsupervised unit discovery and Transformer-based inverter for the speech synthesis given the extracted codebook. Additionally, we also explored several regularization methods to improve performance even further.

Proceedings ArticleDOI
25 Oct 2020
TL;DR: In this paper, a Transformer-based VQ-VAE was proposed for unsupervised subword unit discovery and a transformer-based inverter for the speech synthesis given the extracted codebook.
Abstract: In this paper, we report our submitted system for the ZeroSpeech 2020 challenge on Track 2019. The main theme in this challenge is to build a speech synthesizer without any textual information or phonetic labels. In order to tackle those challenges, we build a system that must address two major components such as 1) given speech audio, extract subword units in an unsupervised way and 2) re-synthesize the audio from novel speakers. The system also needs to balance the codebook performance between the ABX error rate and the bitrate compression rate. Our main contribution here is we proposed Transformer-based VQ-VAE for unsupervised unit discovery and Transformer-based inverter for the speech synthesis given the extracted codebook. Additionally, we also explored several regularization methods to improve performance even further.

Journal ArticleDOI
TL;DR: A beam pairing algorithm that results in reduced inter-beam interference and simplifies beam and user selection in MU-MIMO and performs better than an existing hybrid precoding scheme based on channel reconstruction.
Abstract: We propose a hybrid beamforming scheme with partial interfering beam feedback for a codebook-based multi-user multiple-input multiple-output (MU-MIMO) system, where users feed back information only about the top- $p$ transmit beams. For the analog part of the precoding, we consider two codebooks, the conventional Discrete Fourier Transform (DFT) codebook with uniform amplitude beamforming vectors and the Taylor codebook with non-uniform amplitude beamforming vectors. For the digital precoding part, the effective channel matrix is approximated and used for zero-forcing (ZF). We also propose a beam pairing algorithm that results in reduced inter-beam interference and simplifies beam and user selection in MU-MIMO. When $p$ is equal to the number of beams in the transmit codebook, the proposed scheme includes an existing scheme with full effective channel matrix feedback as a special case. Numerical results show that the proposed hybrid beamforming performs better than an existing hybrid precoding scheme based on channel reconstruction.

Journal ArticleDOI
TL;DR: The results show that introducing modulation to the SPARC design can significantly reduce decoding complexity without sacrificing error performance, and asymptotically capacity achieving for the complex AWGN channel.
Abstract: This paper studies a generalization of sparse superposition codes (SPARCs) for communication over the complex AWGN channel. In a SPARC, the codebook is defined in terms of a design matrix, and each codeword is a generated by multiplying the design matrix with a sparse message vector. In the standard SPARC construction, information is encoded in the locations of the non-zero entries of the message vector. In this paper we generalize the construction and consider modulated SPARCs, where information in encoded in both the locations and the values of the non-zero entries of the message vector. We focus on the case where the non-zero entries take values from a Phase Shift Keying (PSK) constellation. We propose a computationally efficient Approximate Message Passing (AMP) decoder, and obtain analytical bounds on the state evolution parameters which predict the error performance of the decoder. Using these bounds we show that PSK-modulated SPARCs are asymptotically capacity achieving for the complex AWGN channel, with either spatial coupling or power allocation. We also provide numerical simulation results to demonstrate the error performance at finite code lengths. These results show that introducing modulation to the SPARC design can significantly reduce decoding complexity without sacrificing error performance.

Proceedings ArticleDOI
25 Oct 2020
TL;DR: This paper introduces an important extension to VQ-VAE for learning F0-related suprasegmental information simultaneously along with traditional phone features, and shows that the proposed extension reduces F0 distortion of reconstructed speech for all unseen test speakers, and results in significantly higher preference scores from a listening test.
Abstract: Vector Quantized Variational AutoEncoders (VQ-VAE) are a powerful representation learning framework that can discover discrete groups of features from a speech signal without supervision. Until now, the VQ-VAE architecture has previously modeled individual types of speech features, such as only phones or only F0. This paper introduces an important extension to VQ-VAE for learning F0-related suprasegmental information simultaneously along with traditional phone features.The proposed framework uses two encoders such that the F0 trajectory and speech waveform are both input to the system, therefore two separate codebooks are learned. We used a WaveRNN vocoder as the decoder component of VQ-VAE. Our speaker-independent VQ-VAE was trained with raw speech waveforms from multi-speaker Japanese speech databases. Experimental results show that the proposed extension reduces F0 distortion of reconstructed speech for all unseen test speakers, and results in significantly higher preference scores from a listening test. We additionally conducted experiments using single-speaker Mandarin speech to demonstrate advantages of our architecture in another language which relies heavily on F0.

Posted Content
TL;DR: This paper focuses on VQ-VAE, a state-of-the-art discrete bottleneck model shown to perform on par with its continuous counterparts, and shows that the codebook learning can suffer from poor initialization and non-stationarity of clustered encoder outputs, but can be successfully overcome by increasing the learning rate.
Abstract: In this paper we demonstrate methods for reliable and efficient training of discrete representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs). Discrete latent variable models have been shown to learn nontrivial representations of speech, applicable to unsupervised voice conversion and reaching state-of-the-art performance on unit discovery tasks. For unsupervised representation learning, they became viable alternatives to continuous latent variable models such as the Variational Auto-Encoder (VAE). However, training deep discrete variable models is challenging, due to the inherent non-differentiability of the discretization operation. In this paper we focus on VQ-VAE, a state-of-the-art discrete bottleneck model shown to perform on par with its continuous counterparts. It quantizes encoder outputs with on-line $k$-means clustering. We show that the codebook learning can suffer from poor initialization and non-stationarity of clustered encoder outputs. We demonstrate that these can be successfully overcome by increasing the learning rate for the codebook and periodic date-dependent codeword re-initialization. As a result, we achieve more robust training across different tasks, and significantly increase the usage of latent codewords even for large codebooks. This has practical benefit, for instance, in unsupervised representation learning, where large codebooks may lead to disentanglement of latent representations.

Journal ArticleDOI
TL;DR: S-SOM improves the properties of input density mapping, vector quantization, and clustering of the standard SOM in the presence of outliers by upgrading the learning rule in order to smooth the representation of outlying input vectors onto the map.

Proceedings ArticleDOI
01 Dec 2020
TL;DR: This paper proposes several lossless encoding techniques that exploit different characteristics of the quantized gradients during distributed DNN training, and shows their effectiveness on 5 different DNN models across three different data-sets, and compares them with classic state-of-the-art Elias-based encoding techniques.
Abstract: Distributed stochastic algorithms, equipped with gradient compression techniques, such as codebook quantization, are becoming increasingly popular and considered state-of-the-art in training large deep neural network (DNN) models. However, communicating the quantized gradients in a network requires efficient encoding techniques. For this, practitioners generally use Elias encoding-based techniques without considering their computational overhead or data-volume. In this paper, based on Huffman coding, we propose several lossless encoding techniques that exploit different characteristics of the quantized gradients during distributed DNN training. Then, we show their effectiveness on 5 different DNN models across three different data-sets, and compare them with classic state-of-the-art Elias-based encoding techniques. Our results show that the proposed Huffman-based encoders (i.e., RLH, SH, and SHS) can reduce the encoded data-volume by up to 5.1×, 4.32×, and 3.8×, respectively, compared to the Elias-based encoders.

Posted Content
TL;DR: An upper bound on ergodic rate loss with maximum ratio transmission is derived and the rate loss can be cut down by optimizing the feedback bit allocation during codebook generation and an adaptive bit partitioning strategy is proposed that is adaptive to diverse environment and system parameters is proposed.
Abstract: In frequency division duplexing systems, the base station (BS) acquires downlink channel state information (CSI) via channel feedback, which has not been adequately investigated in the presence of RIS. In this study, we examine the limited channel feedback scheme by proposing a novel cascaded codebook and an adaptive bit partitioning strategy. The RIS segments the channel between the BS and mobile station into two sub-channels, each with line-of-sight (LoS) and non-LoS (NLoS) paths. To quantize the path gains, the cascaded codebook is proposed to be synthesized by two sub-codebooks whose codeword is cascaded by LoS and NLoS components. This enables the proposed cascaded codebook to cater the different distributions of LoS and NLoS path gains by flexibly using different feedback bits to design the codeword structure. On the basis of the proposed cascaded codebook, we derive an upper bound on ergodic rate loss with maximum ratio transmission and show that the rate loss can be cut down by optimizing the feedback bit allocation during codebook generation. To minimize the upper bound, we propose a bit partitioning strategy that is adaptive to diverse environment and system parameters. Extensive simulations are presented to show the superiority and robustness of the cascaded codebook and the efficiency of the adaptive bit partitioning scheme.