scispace - formally typeset
Search or ask a question

Showing papers by "Qualcomm published in 2021"


Journal ArticleDOI
TL;DR: Versatile Video Coding (VVC) was developed by the Joint Video Experts Team (JVET) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an evergrowing need for improved video compression as well as to support a wider variety of today's media content and emerging applications as mentioned in this paper.
Abstract: Versatile Video Coding (VVC) was finalized in July 2020 as the most recent international video coding standard. It was developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an ever-growing need for improved video compression as well as to support a wider variety of today’s media content and emerging applications. This paper provides an overview of the novel technical features for new applications and the core compression technologies for achieving significant bit rate reductions in the neighborhood of 50% over its predecessor for equal video quality, the High Efficiency Video Coding (HEVC) standard, and 75% over the currently most-used format, the Advanced Video Coding (AVC) standard. It is explained how these new features in VVC provide greater versatility for applications. Highlighted applications include video with resolutions beyond standard- and high-definition, video with high dynamic range and wide color gamut, adaptive streaming with resolution changes, computer-generated and screen-captured video, ultralow-delay streaming, 360° immersive video, and multilayer coding e.g., for scalability. Furthermore, early implementations are presented to show that the new VVC standard is implementable and ready for real-world deployment.

250 citations


Journal ArticleDOI
19 Jan 2021
TL;DR: This article summarizes these developments in video coding standardization after AVC, and focuses on providing an overview of the first version of VVC, including comparisons against HEVC.
Abstract: In the last 17 years, since the finalization of the first version of the now-dominant H.264/Moving Picture Experts Group-4 (MPEG-4) Advanced Video Coding (AVC) standard in 2003, two major new generations of video coding standards have been developed. These include the standards known as High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC). HEVC was finalized in 2013, repeating the ten-year cycle time set by its predecessor and providing about 50% bit-rate reduction over AVC. The cycle was shortened by three years for the VVC project, which was finalized in July 2020, yet again achieving about a 50% bit-rate reduction over its predecessor (HEVC). This article summarizes these developments in video coding standardization after AVC. It especially focuses on providing an overview of the first version of VVC, including comparisons against HEVC. Besides further advances in hybrid video compression, as in previous development cycles, the broad versatility of the application domain that is highlighted in the title of VVC is explained. Included in VVC is the support for a wide range of applications beyond the typical standard- and high-definition camera-captured content codings, including features to support computer-generated/screen content, high dynamic range content, multilayer and multiview coding, and support for immersive media such as 360° video.

246 citations


Proceedings Article
03 Mar 2021
TL;DR: A new simple approach for image compression: instead of storing the RGB values for each pixel of an image, the weights of a neural network overfitted to the image are stored, and this approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights.
Abstract: We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches.

73 citations


Journal ArticleDOI
01 Aug 2021
TL;DR: Wirelessly networked and powered electronic microchips that can autonomously perform neural sensing and electrical microstimulation and can potentially be scaled to 770 neurograins using a customized time-division multiple access protocol are reported.
Abstract: Multichannel electrophysiological sensors and stimulators—particularly those used to study the nervous system—are usually based on monolithic microelectrode arrays. However, the architecture of such arrays limits flexibility in electrode placement and scaling to a large number of nodes, especially across non-contiguous locations. Here we report wirelessly networked and powered electronic microchips that can autonomously perform neural sensing and electrical microstimulation. The microchips, which we term neurograins, have an ~1 GHz electromagnetic transcutaneous link to an external telecom hub, providing bidirectional communication and control at the individual device level. To illustrate the potential of the approach, we show that 48 neurograins can be individually addressed on a rat cortical surface and used for the acute recording of neural activity. Theoretical calculations and experimental measurements show that the link configuration could potentially be scaled to 770 neurograins using a customized time-division multiple access protocol. Wirelessly powered microchips, which have an ~1 GHz electromagnetic transcutaneous link to an external telecom hub, can be used for multichannel in vivo neural sensing, stimulation and data acquisition.

69 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: This article proposed a boundary-aware loss term for semantic segmentation using an inverse-transformation network, which can efficiently learn the degree of parametric transformations between estimated and target boundaries.
Abstract: We present a novel boundary-aware loss term for semantic segmentation using an inverse-transformation network, which efficiently learns the degree of parametric transformations between estimated and target boundaries. This plug-in loss term complements the cross-entropy loss in capturing boundary transformations and allows consistent and significant performance improvement on segmentation backbone models without increasing their size and computational complexity. We analyze the quantitative and qualitative effects of our loss function on three indoor and outdoor segmentation benchmarks, including Cityscapes, NYU-Depth-v2, and PASCAL, integrating it into the training phase of several backbone networks in both single-task and multi-task settings. Our extensive experiments show that the proposed method consistently outperforms base-lines, and even sets the new state-of-the-art on two datasets.

60 citations


Journal ArticleDOI
TL;DR: In this article, a pressure-responsive element based on membrane deflection and a battery-free, wireless mode of operation capable of multi-site measurements at strategic locations across the body is introduced.
Abstract: Capabilities for continuous monitoring of pressures and temperatures at critical skin interfaces can help to guide care strategies that minimize the potential for pressure injuries in hospitalized patients or in individuals confined to the bed. This paper introduces a soft, skin-mountable class of sensor system for this purpose. The design includes a pressure-responsive element based on membrane deflection and a battery-free, wireless mode of operation capable of multi-site measurements at strategic locations across the body. Such devices yield continuous, simultaneous readings of pressure and temperature in a sequential readout scheme from a pair of primary antennas mounted under the bedding and connected to a wireless reader and a multiplexer located at the bedside. Experimental evaluation of the sensor and the complete system includes benchtop measurements and numerical simulations of the key features. Clinical trials involving two hemiplegic patients and a tetraplegic patient demonstrate the feasibility, functionality and long-term stability of this technology in operating hospital settings.

56 citations


Journal ArticleDOI
TL;DR: An overview of the technologies for in-loop processing and filtering in the Versatile Video Coding (VVC) standard, which comprise luma mapping with chroma scaling, deblocking filter, sample adaptive offset, adaptive loop filter and cross-component adaptive loopfilter.
Abstract: This paper presents an overview of the technologies for in-loop processing and filtering in the Versatile Video Coding (VVC) standard. These processes comprise luma mapping with chroma scaling, deblocking filter, sample adaptive offset, adaptive loop filter and cross-component adaptive loop filter. They are qualified as “in-loop” because they are applied inside the encoding and decoding loops, before storing the pictures in the decoded picture buffer. The filters are complementary and address different purposes. Luma mapping with chroma scaling aims at adaptively modifying the coded samples distribution for improved coding efficiency. The deblocking filter aims at reducing blocking discontinuities. Sample adaptive offset mostly aims at reducing artifacts resulting from the quantization of transform coefficients. Adaptive loop filter and cross-component adaptive loop filter are adaptive filters enabling to enhance the reconstructed signal, using for instance Wiener-filter encoding approaches. The paper provides an overview of the in-loop filtering process and a detailed description of the filtering algorithms. Objective compression efficiency results are provided for each filter, with indication of cumulative coding gains. Subjective benefits are illustrated. Implementation issues considered during the design of the VVC in-loop filters are also discussed.

56 citations


Journal ArticleDOI
TL;DR: The intra prediction and mode coding of the Versatile Video Coding (VVC) standard is presented and a bitrate saving of 25% on average is reported over H.265/HEVC using an objective metric.
Abstract: This paper presents the intra prediction and mode coding of the Versatile Video Coding (VVC) standard. This standard was collaboratively developed by the Joint Video Experts Team (JVET). It follows the traditional architecture of a hybrid block-based codec that was also the basis of previous standards. Almost all intra prediction features of VVC either contain substantial modifications in comparison with its predecessor H.265/HEVC or were newly added. The key aspects of these tools are the following: 65 angular intra prediction modes with block shape-adaptive directions and 4-tap interpolation filters are supported as well as the DC and Planar mode, Position Dependent Prediction Combination is applied for most of these modes, Multiple Reference Line Prediction can be used, an intra block can be further subdivided by the Intra Subpartition mode, Matrix-based Intra Prediction is supported, and the chroma prediction signal can be generated by the Cross Component Linear Model method. Finally, the intra prediction mode in VVC is coded separately for luma and chroma. Here, a Most Probable Mode list containing six modes is applied for luma. The individual compression performance of tools is reported in this paper. For the full VVC intra codec, a bitrate saving of 25% on average is reported over H.265/HEVC using an objective metric. Significant subjective benefits are illustrated with specific examples.

55 citations


Journal ArticleDOI
TL;DR: To reduce hardware decoder complexity, virtual pipeline data unit constraints are introduced, which forbid certain multi-type tree splits and a local dual tree is described, which reduces the number of small chroma intra blocks.
Abstract: Versatile Video Coding (VVC) is the latest video coding standard jointly developed by ITU-T VCEG and ISO/IEC MPEG. In this paper, technical details and experimental results for the VVC block partitioning structure are provided. Among all the new technical aspects of VVC, the block partitioning structure is identified as one of the most substantial changes relative to the previous video coding standards and provides the most significant coding gains. The new partitioning structure is designed using a more flexible scheme. Each coding tree unit (CTU) is either treated as one coding unit or split into multiple coding units by one or more recursive quaternary tree partitions followed by one or more recursive multi-type tree splits. The latter can be horizontal binary tree split, vertical binary tree split, horizontal ternary tree split, or vertical ternary tree split. A CTU dual tree for intra-coded slices is described on top of the new block partitioning structure, allowing separate coding trees for luma and chroma. Also, a new way of handling picture boundaries is presented. Additionally, to reduce hardware decoder complexity, virtual pipeline data unit constraints are introduced, which forbid certain multi-type tree splits. Finally, a local dual tree is described, which reduces the number of small chroma intra blocks.

50 citations


Journal ArticleDOI
TL;DR: The vaping-induced pulmonary disease outbreak spawned increased coverage about the dangers of vaping and internet searches for vaping cessation, and the ratio of observed to expected search volumes during the outbreak era was forecast with historical trends.
Abstract: Background In the latter half of 2019, an outbreak of pulmonary disease in the USA resulted in 2807 hospitalisations and 68 deaths, as of 18 February 2020. Given the severity of the outbreak, we assessed whether articles during the outbreak era more frequently warned about the dangers of vaping and whether internet searches for vaping cessation increased. Methods Using Tobacco Watcher, a media monitoring platform that automatically identifies and categorises news articles from sources across the globe, we obtained all articles that (a) discussed the outbreak and (b) primarily warned about the dangers of vaping. We obtained internet search trends originating from the USA that mentioned ‘quit’ or ‘stop’ and ‘e cig(s),’ ‘ecig(s),’ ‘e-cig(s),’ ‘e cigarette(s),’ ‘e-cigarette(s),’ ‘electronic cigarette(s),’ ‘vape(s),’ ‘vaping’ or ‘vaper(s)’ from Google Trends (eg, ‘how do I quit vaping?’). All data were obtained from 1 January 2014 to 18 February 2020 and ARIMA models were used with historical trends to forecast the ratio of observed to expected search volumes during the outbreak era. Results News of the vaping-induced pulmonary disease outbreak was first reported on 25 July 2019 with 195 articles, culminating in 44 512 articles by 18 February 2020. On average, news articles warning about the dangers of vaping were 130% (95% prediction interval (PI): −15 to 417) and searches for vaping cessation were 76% (95% PI: 28 to 182) higher than expected levels for the days during the period when the sources of the outbreak were unknown (25 July to 27 September 2019). News and searches stabilised just after the US Centers for Disease Control and Prevention reported that a primary source of the outbreak was an additive used in marijuana vapes on 27 September 2019. In sum, there were 12 286 articles archived in Tobacco Watcher primarily warning about the dangers of vaping and 1 025 000 cessation searches following the outbreak. Conclusion The vaping-induced pulmonary disease outbreak spawned increased coverage about the dangers of vaping and internet searches for vaping cessation. Resources and strategies that respond to this elevated interest should become a priority among public health leaders.

47 citations


Proceedings ArticleDOI
27 Apr 2021
TL;DR: In this paper, a conditional early exiting framework is proposed to automatically determine the earliest point in processing where an inference is sufficiently reliable and generate on-the-fly supervision signals to the gates to provide a dynamic trade-off between accuracy and computational cost.
Abstract: In this paper, we propose a conditional early exiting framework for efficient video recognition. While existing works focus on selecting a subset of salient frames to re-duce the computation costs, we propose to use a simple sampling strategy combined with conditional early exiting to enable efficient recognition. Our model automatically learns to process fewer frames for simpler videos and more frames for complex ones. To achieve this, we employ a cascade of gating modules to automatically determine the earliest point in processing where an inference is sufficiently reliable. We generate on-the-fly supervision signals to the gates to provide a dynamic trade-off between accuracy and computational cost. Our proposed model outperforms competing methods on three large-scale video benchmarks. In particular, on ActivityNet1.3 and mini-kinetics, we outperform the state-of-the-art efficient video recognition methods with 1.3× and 2.1 less GFLOPs, respectively. Addition-ally, our method sets× a new state of the art for efficient video understanding on the HVU benchmark.

Journal ArticleDOI
TL;DR: This brief presents a review of developments in spin-transfer-torque magnetoresistive random access memory (STT-MRAM) sensing over the past 20 years from a circuit design perspective and key breakthroughs for achieving the optimal reference scheme, read disturbance prevention, read energy reduction, accurate yield estimation, and overcoming other non-idealities are discussed.
Abstract: This brief presents a review of developments in spin-transfer-torque magnetoresistive random access memory (STT-MRAM) sensing over the past 20 years from a circuit design perspective. Various sensing schemes are categorized and described according to the data-cell variation-tolerant characteristics, pre-amplifiers, and offset tolerance. Key breakthroughs for achieving the optimal reference scheme, read disturbance prevention, read energy reduction, accurate yield estimation, and overcoming other non-idealities are discussed. This review is intended to facilitate further enhancement of STT-MRAM sensing in advanced technology nodes, thereby fulfilling STT-MRAM’s potential as a universal memory.

Journal ArticleDOI
TL;DR: The experimental results on VVC reference software show that average 4.5% and 3.6% overall coding gain can be achieved by the VVC transform coding tools for All Intra and Random Access configurations, respectively.
Abstract: In the past decade, the development of transform coding techniques has achieved significant progress and several advanced transform tools have been adopted in the new generation Versatile Video Coding (VVC) standard. In this paper, a brief history of transform coding development during VVC standardization is presented, and the transform coding tools in the VVC standard are described in detail together with their initial design, incremental improvements and implementation aspects. To improve coding efficiency, four new transform coding techniques are introduced in VVC, which are namely Multiple Transform Selection (MTS), Low-Frequency Non-separable Secondary Transform (LFNST) and Sub-Block Transform (SBT), as well as a large (64-point) type-2 DCT. The experimental results on VVC reference software (VTM-9.0) show that average 4.5% and 3.6% overall coding gain can be achieved by the VVC transform coding tools for All Intra and Random Access configurations, respectively.

Journal ArticleDOI
TL;DR: The paper provides an overview of the quantization and entropy coding methods in the Versatile Video Coding (VVC) standard and discusses motivations and implementation aspects.
Abstract: The paper provides an overview of the quantization and entropy coding methods in the Versatile Video Coding (VVC) standard. Special focus is laid on techniques that improve coding efficiency relative to the methods included in the High Efficiency Video Coding (HEVC) standard: The inclusion of trellis-coded quantization, the advanced context modeling for entropy coding of transform coefficient levels, the arithmetic coding engine with multi-hypothesis probability estimation, and the joint coding of chroma residuals. Beside a description of the design concepts, the paper also discusses motivations and implementation aspects. The effectiveness of the quantization and entropy coding methods specified in VVC is validated by experimental results.

Proceedings ArticleDOI
18 Oct 2021
TL;DR: Uncovering TRR (U-TRR) as discussed by the authors is an experimental methodology to analyze in-DRAM TRR implementations, based on the new observation that data retention failures in DRAM enable a side channel that leaks information on how TRR refreshes potential victim rows.
Abstract: The RowHammer vulnerability in DRAM is a critical threat to system security. To protect against RowHammer, vendors commit to security-through-obscurity: modern DRAM chips rely on undocumented, proprietary, on-die mitigations, commonly known as Target Row Refresh (TRR). At a high level, TRR detects and refreshes potential RowHammer-victim rows, but its exact implementations are not openly disclosed. Security guarantees of TRR mechanisms cannot be easily studied due to their proprietary nature. To assess the security guarantees of recent DRAM chips, we present Uncovering TRR (U-TRR), an experimental methodology to analyze in-DRAM TRR implementations. U-TRR is based on the new observation that data retention failures in DRAM enable a side channel that leaks information on how TRR refreshes potential victim rows. U-TRR allows us to (i) understand how logical DRAM rows are laid out physically in silicon; (ii) study undocumented on-die TRR mechanisms; and (iii) combine (i) and (ii) to evaluate the RowHammer security guarantees of modern DRAM chips. We show how U-TRR allows us to craft RowHammer access patterns that successfully circumvent the TRR mechanisms employed in 45 DRAM modules of the three major DRAM vendors. We find that the DRAM modules we analyze are vulnerable to RowHammer, having bit flips in up to 99.9% of all DRAM rows.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: Zhang et al. as discussed by the authors proposed a novel deep face relighting method that predicts the ratio (quotient) image between a source image and the target image with the desired lighting, allowing them to relight the image while maintaining the local facial details.
Abstract: Existing face relighting methods often struggle with two problems: maintaining the local facial details of the subject and accurately removing and synthesizing shadows in the relit image, especially hard shadows. We propose a novel deep face relighting method that addresses both problems. Our method learns to predict the ratio (quotient) image between a source image and the target image with the desired lighting, allowing us to relight the image while maintaining the local facial details. During training, our model also learns to accurately modify shadows by using estimated shadow masks to emphasize on the high-contrast shadow borders. Furthermore, we introduce a method to use the shadow mask to estimate the ambient light intensity in an image, and are thus able to leverage multiple datasets during training with different global lighting intensities. With quantitative and qualitative evaluations on the Multi-PIE and FFHQ datasets, we demonstrate that our proposed method faithfully maintains the local facial details of the subject and can accurately handle hard shadows while achieving state-of-the-art face relighting performance.

Journal ArticleDOI
TL;DR: This paper describes the screen content support and the five main low-level screen content coding tools in VVC: transform skip residual coding (TSRC), block-based differential pulse-code modulation (BDPCM), intra block copy (IBC), adaptive color transform (ACT), and the palette mode.
Abstract: In an increasingly connected world, consumer video experiences have diversified away from traditional broadcast video into new applications with increased use of non-camera-captured content such as computer screen desktop recordings or animations created by computer rendering, collectively referred to as screen content. There has also been increased use of graphics and character content that is rendered and mixed or overlaid together with camera-generated content. The emerging Versatile Video Coding (VVC) standard, in its first version, addresses this market change by the specification of low-level coding tools suitable for screen content. This is in contrast to its predecessor, the High Efficiency Video Coding (HEVC) standard, where highly efficient screen content support is only available in extension profiles of its version 4. This paper describes the screen content support and the five main low-level screen content coding tools in VVC: transform skip residual coding (TSRC), block-based differential pulse-code modulation (BDPCM), intra block copy (IBC), adaptive color transform (ACT), and the palette mode. The specification of these coding tools in the first version of VVC enables the VVC reference software implementation (VTM) to achieve average bit-rate savings of about 41% to 61% relative to the HEVC test model (HM) reference software implementation using the Main 10 profile for 4:2:0 screen content test sequences. Compared to the HM using the Screen-Extended Main 10 profile and the same 4:2:0 test sequences, the VTM provides about 19% to 25% bit-rate savings. The same comparison with 4:4:4 test sequences revealed bit-rate savings of about 13% to 27% for $Y'C_{B}C_{R}$ and of about 6% to 14% for $R'G'B'$ screen content. Relative to the HM without the HEVC version 4 screen content coding extensions, the bit-rate savings for 4:4:4 test sequences are about 33% to 64% for $Y'C_{B}C_{R}$ and 43% to 66% for $R'G'B'$ screen content.

Journal ArticleDOI
TL;DR: This work is the first to holistically consider the key practical constraints of analog beamforming codebooks, a minimal number of radio frequency (RF) chains, limited channel knowledge, beam alignment, and a limited receive dynamic range.
Abstract: Full-duplex millimeter wave (mmWave) communication has shown increasing promise for self-interference cancellation via hybrid precoding and combining. This paper proposes a novel mmWave multiple-input multiple-output (MIMO) design for configuring the analog and digital beamformers of a full-duplex transceiver. This work is the first to holistically consider the key practical constraints of analog beamforming codebooks, a minimal number of radio frequency (RF) chains, limited channel knowledge, beam alignment, and a limited receive dynamic range. To prevent self-interference from saturating receive components, such as LNAs and ADCs, a design framework is developed that limits the degree of self-interference on a per-antenna and per-RF chain basis. We present a means for constructing analog beamforming candidates from beam alignment measurements to afford our design greater flexibility in its aim to reduce self-interference. Numerical results evaluate the design in a variety of settings and validate the need to prevent receiver-side saturation. These results and corresponding insights serve as useful design references and benchmarks for practical full-duplex mmWave transceivers.

Proceedings ArticleDOI
30 Aug 2021
TL;DR: In this paper, a broadcasted residual learning method is proposed to achieve high accuracy with small model size and computational load, which can effectively represent useful audio features with much less computation than conventional convolutional neural networks.
Abstract: Keyword spotting is an important research field because it plays a key role in device wake-up and user interaction on smart devices. However, it is challenging to minimize errors while operating efficiently in devices with limited resources such as mobile phones. We present a broadcasted residual learning method to achieve high accuracy with small model size and computational load. Our method configures most of the residual functions as 1D temporal convolution while still allows 2D convolution together using a broadcasted-residual connection that expands temporal output to frequency-temporal dimension. This residual mapping enables the network to effectively represent useful audio features with much less computation than conventional convolutional neural networks. We also propose a novel network architecture, Broadcasting-residual network (BC-ResNet), based on broadcasted residual learning and describe how to scale up the model according to the target device's resources. BC-ResNets achieve state-of-the-art 98.0% and 98.7% top-1 accuracy on Google speech command datasets v1 and v2, respectively, and consistently outperform previous approaches, using fewer computations and parameters.

Proceedings ArticleDOI
23 Apr 2021
TL;DR: Skip-Convolution as mentioned in this paper proposes skip-convolutions to leverage the large amount of redundancies in video streams and save computations by replacing all convolutions with skip-convolutions in two state-of-the-art architectures.
Abstract: We propose Skip-Convolutions to leverage the large amount of redundancies in video streams and save computations. Each video is represented as a series of changes across frames and network activations, denoted as residuals. We reformulate standard convolution to be efficiently computed on residual frames: each layer is coupled with a binary gate deciding whether a residual is important to the model prediction, e.g. foreground regions, or it can be safely skipped, e.g. background regions. These gates can either be implemented as an efficient network trained jointly with convolution kernels, or can simply skip the residuals based on their magnitude. Gating functions can also incorporate block-wise sparsity structures, as required for efficient implementation on hardware platforms. By replacing all convolutions with Skip-Convolutions in two state-of-the-art architectures, namely EfficientDet and HRNet, we reduce their computational cost consistently by a factor of 3 ∼ 4× for two different tasks, without any accuracy drop. Extensive comparisons with existing model compression, as well as image and video efficiency methods demonstrate that Skip-Convolutions set a new state-of-the-art by effectively exploiting the temporal redundancies in videos.

Posted Content
TL;DR: This paper presents ENCO, an efficient structure learning method for directed, acyclic causal graphs leveraging observational and interventional data that can efficiently recover graphs with hundreds of nodes, an order of magnitude larger than what was previously possible, while handling deterministic variables and latent confounders.
Abstract: Learning the structure of a causal graphical model using both observational and interventional data is a fundamental problem in many scientific fields. A promising direction is continuous optimization for score-based methods, which efficiently learn the causal graph in a data-driven manner. However, to date, those methods require constrained optimization to enforce acyclicity or lack convergence guarantees. In this paper, we present ENCO, an efficient structure learning method for directed, acyclic causal graphs leveraging observational and interventional data. ENCO formulates the graph search as an optimization of independent edge likelihoods, with the edge orientation being modeled as a separate parameter. Consequently, we can provide convergence guarantees of ENCO under mild conditions without constraining the score function with respect to acyclicity. In experiments, we show that ENCO can efficiently recover graphs with hundreds of nodes, an order of magnitude larger than what was previously possible, while handling deterministic variables and latent confounders.

Journal ArticleDOI
TL;DR: An overview of the VVC high-level syntax (HLS), which forms its system and transport interface is given and Comparisons to the HLS design in High Efficiency Video Coding (HEVC), the previous major video coding standards, are included.
Abstract: Versatile Video Coding (VVC), a.k.a. ITU-T H.266 | ISO/IEC 23090-3, is the new generation video coding standard that has just been finalized by the Joint Video Experts Team (JVET) of ITU-T VCEG and ISO/IEC MPEG at its $19^{\mathrm {th}}$ meeting ending on July 1, 2020. This paper gives an overview of the VVC high-level syntax (HLS), which forms its system and transport interface. Comparisons to the HLS designs in High Efficiency Video Coding (HEVC) and Advanced Video Coding (AVC), the previous major video coding standards, are included. When discussing new HLS features introduced into VVC or differences relative to HEVC and AVC, the reasoning behind the design differences and the benefits they bring are described. The HLS of VVC enables newer and more versatile use cases such as video region extraction, composition and merging of content from multiple coded video bitstreams, and viewport-adaptive 360° immersive media.

Journal ArticleDOI
TL;DR: In this paper, a two-element phase-locked loop (PLL)-coupled array for the implementation of millimeter-wave (mm-wave) and sub-thz (sub-THz) phased arrays is presented.
Abstract: A new two-element phase-locked loop (PLL)-coupled array for the implementation of millimeter-wave (mm-wave) and subterahertz (sub-THz) phased arrays is presented. This architecture avoids using lossy phase shifter to create the required phase shift between the adjacent elements in a phased-array system. The required phase shift is generated by utilizing a dual nested loop PLL. The two PLL loops work together to stabilize the frequency and create the required phase shift. Moreover, it can be scaled simply by adding more unit cells to the architecture. A 112–121-GHz two-element phased array is designed and fabricated in a standard 65-nm CMOS process. It consumes 147-mW power and provides a phase shift of 46.7° ranging from 58.53° to 105.2° at 117 GHz.

Journal Article
TL;DR: The learned-threshold pruning method, which learns per-layer thresholds via gradient descent, makes thresholds trainable, hence scalable to deeper networks, and effectively prunes newer architectures, such as EfficientNet, MobileNetV2 and MixNet.
Abstract: This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method learns per-layer thresholds via gradient descent, unlike conventional methods where they are set as input. Making thresholds trainable also makes LTP computationally efficient, hence scalable to deeper networks. For example, it takes 30 epochs for LTP to prune ResNet50 on ImageNet by a factor of 9.1. This is in contrast to other methods that search for per-layer thresholds via a computationally intensive iterative pruning and fine-tuning process. Additionally, with a novel differentiable L0 regularization, LTP is able to operate effectively on architectures with batch-normalization. This is important since L1 and L2 penalties lose their regularizing effect in networks with batch-normalization. Finally, LTP generates a trail of progressively sparser networks from which the desired pruned network can be picked based on sparsity and performance requirements. These features allow LTP to achieve competitive compression rates on ImageNet networks such as AlexNet (26.4× compression with 79.1% Top-5 accuracy) and ResNet50 (9.1× compression with 92.0% Top-5 accuracy). We also show that LTP effectively prunes modern \textit{compact} architectures, such as EfficientNet, MobileNetV2 and MixNet.

Journal ArticleDOI
TL;DR: A novel group-sparse Bayesian learning (G-SBL) scheme is conceived for channel estimation that exploits the frequency-domain (FD) correlation of the channel’s frequency response (CFR) while transmitting pilots on only a few subcarriers, thus it has a reduced pilot overhead.
Abstract: Sparse, group-sparse and online channel estimation is conceived for millimeter wave (mmWave) multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems. We exploit the angular sparsity of the mmWave channel impulse response (CIR) to achieve improved estimation performance. First a sparse Bayesian learning (SBL)-based technique is developed for the estimation of each individual subcarrier’s quasi-static channel, which leads to an improved performance versus complexity trade-off in comparison to conventional channel estimation. Then a novel group-sparse Bayesian learning (G-SBL) scheme is conceived for reducing the channel estimation mean square error (MSE). The salient aspect of our G-SBL technique is that it exploits the frequency-domain (FD) correlation of the channel’s frequency response (CFR), while transmitting pilots on only a few subcarriers, thus it has a reduced pilot overhead. A low complexity (LC) version of G-SBL, termed LCG-SBL, is also developed that reduces the computational cost of the G-SBL significantly. Subsequently, an online G-SBL (O-SBL) variant is designed for the estimation of doubly-selective mmWave MIMO OFDM channels, which has low processing delay and exploits temporal correlation as well. This is followed by the design of a hybrid transmit precoder and receive combiner, which can operate directly on the estimated beamspace domain CFRs, together with a limited channel state information (CSI) feedback. Our simulation results confirms the accuracy of the analysis.

Journal ArticleDOI
TL;DR: This work forms a decentralized partially observable Markov decision process with a novel reward structure that provides long term proportional fairness in terms of throughput and incorporates these features into a distributed reinforcement learning framework for contention-based spectrum access.
Abstract: The increasing number of wireless devices operating in unlicensed spectrum motivates the development of intelligent adaptive approaches to spectrum access. We consider decentralized contention-based medium access for base stations (BSs) operating on unlicensed shared spectrum, where each BS autonomously decides whether or not to transmit on a given resource. The contention decision attempts to maximize not its own downlink throughput, but rather a network-wide objective. We formulate this problem as a decentralized partially observable Markov decision process with a novel reward structure that provides long term proportional fairness in terms of throughput. We then introduce a two-stage Markov decision process in each time slot that uses information from spectrum sensing and reception quality to make a medium access decision. Finally, we incorporate these features into a distributed reinforcement learning framework for contention-based spectrum access. Our formulation provides decentralized inference, online adaptability and also caters to partial observability of the environment through recurrent Q-learning. Empirically, we find its maximization of the proportional fairness metric to be competitive with a genie-aided adaptive energy detection threshold, while being robust to channel fading and small contention windows.

Journal ArticleDOI
TL;DR: The technical details of each coding tool are presented and the design elements with the consideration of typical hardware implementations are highlighted and visual quality improvement is demonstrated and analyzed.
Abstract: Efficient representation and coding of fine-granular motion information is one of the key research areas for exploiting inter-frame correlation in video coding. Representative techniques towards this direction are affine motion compensation (AMC), decoder-side motion vector refinement (DMVR), and subblock-based temporal motion vector prediction (SbTMVP). Fine-granular motion information is derived at subblock level for all the three coding tools. In addition, the obtained inter prediction can be further refined by two optical flow-based coding tools, the bi-directional optical flow (BDOF) for bi-directional inter prediction and the prediction refinement with optical flow (PROF) exclusively used in combination with AMC. The aforementioned five coding tools have been extensively studied and finally adopted in the Versatile Video Coding (VVC) standard. This paper presents technical details of each tool and highlights the design elements with the consideration of typical hardware implementations. Following the common test conditions defined by Joint Video Experts Team (JVET) for the development of VVC, 5.7% bitrate reduction on average is achieved by the five tools. For test sequences characterized by large and complex motion, up to 13.4% bitrate reduction is observed. Additionally, visual quality improvement is demonstrated and analyzed.

Proceedings ArticleDOI
Simyung Chang1, Hyoungwoo Park1, Janghoon Cho1, Hyunsin Park1, Sungrack Yun1, Kyu Woong Hwang1 
06 Jun 2021
TL;DR: SubSpectral Normalization (SSN) as mentioned in this paper splits the input frequency dimension into several groups (sub-bands) and performs a different normalization for each group, which removes the interfrequency deflection while the network learns a frequency-aware characteristic.
Abstract: Convolutional Neural Networks are widely used in various machine learning domains. In image processing, the features can be obtained by applying 2D convolution to all spatial dimensions of the input. However, in the audio case, frequency domain input like Mel-Spectrogram has different and unique characteristics in the frequency dimension. Thus, there is a need for a method that allows the 2D convolution layer to handle the frequency dimension differently. In this work, we introduce SubSpectral Normalization (SSN), which splits the input frequency dimension into several groups (sub-bands) and performs a different normalization for each group. SSN also includes an affine transformation that can be applied to each group. Our method removes the inter-frequency deflection while the network learns a frequency-aware characteristic. In the experiments with audio data, we observed that SSN can efficiently improve the network’s performance.

Proceedings ArticleDOI
19 Sep 2021
TL;DR: PLONQ, a progressive neural image compression scheme which pushes the boundary of variable bitrate compression by allowing quality scalable coding with a single bitstream, is presented and it outperforms SPIHT, a well-known wavelet-based progressive image codec.
Abstract: We present PLONQ, a progressive neural image compression scheme which pushes the boundary of variable bitrate compression by allowing quality scalable coding with a single bitstream. In contrast to existing learned variable bitrate solutions which produce separate bitstreams for each quality, it enables easier rate-control and requires less storage. Leveraging the latent scaling based variable bitrate solution, we introduce nested quantization, a method that defines multiple quantization levels with nested quantization grids, and progressively refines all latents from the coarsest to the finest quantization level. To achieve finer progressiveness in between any two quantization levels, latent elements are incrementally refined with an importance ordering defined in the rate-distortion sense. To the best of our knowledge, PLONQ is the first learning-based progressive image coding scheme and it outperforms SPIHT, a well-known wavelet-based progressive image codec.

Journal ArticleDOI
TL;DR: In this paper, a real-time tracking framework for polyp region segmentation in colonoscopic video frames is proposed, where the saliency map is composed of four probability maps generated by incorporating the characteristics associated with the polyps, and the elliptical shape of polyps is used by the particles for final refinement using an active contour (AC) model.
Abstract: In this article, an automatic polyp detection system for endoscopic video frames is proposed. Manual inspection of each frame for polyp localization in the colonoscopic video has many adversaries. This work proposes a real-time tracking framework for polyp region segmentation in hugely acquired colonoscopic video frames. In our work, the polyp region in the frame is roughly detected by a saliency map at first, followed by a modified tracking mechanism for localization. The work suggests the use of a visual saliency map as the measurement model for tracking. The saliency map is composed of four probability maps generated by incorporating the characteristics associated with the polyps. The elliptical shape of the polyps is used by the particles for final refinement using an active contour (AC) model. The tracking efficiency and the segmentation score achieved using the proposed method suggest that our method can be used for polyp detection and localization. The proposed method achieves an average dice score of 66.06% in the CVC clinic Database. Our method can be employed in both online as well as off-line endoscopic video sequences. A GUI is also designed using the proposed method as an automatic polyp detection system.