Showing papers by "Qualcomm published in 2021"

PDF

Open Access

Journal Article•DOI•

Overview of the Versatile Video Coding (VVC) Standard and its Applications

[...]

Benjamin Bross¹, Ye-Kui Wang, Yan Ye², Shan Liu³, Jianle Chen⁴, Gary J. Sullivan⁵, Jens-Rainer Ohm⁶ - Show less +3 more•Institutions (6)

Heinrich Hertz Institute¹, Alibaba Group², Tencent³, Qualcomm⁴, Microsoft⁵, RWTH Aachen University⁶

02 Aug 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: Versatile Video Coding (VVC) was developed by the Joint Video Experts Team (JVET) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an evergrowing need for improved video compression as well as to support a wider variety of today's media content and emerging applications as mentioned in this paper.

...read moreread less

Abstract: Versatile Video Coding (VVC) was finalized in July 2020 as the most recent international video coding standard. It was developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an ever-growing need for improved video compression as well as to support a wider variety of today’s media content and emerging applications. This paper provides an overview of the novel technical features for new applications and the core compression technologies for achieving significant bit rate reductions in the neighborhood of 50% over its predecessor for equal video quality, the High Efficiency Video Coding (HEVC) standard, and 75% over the currently most-used format, the Advanced Video Coding (AVC) standard. It is explained how these new features in VVC provide greater versatility for applications. Highlighted applications include video with resolutions beyond standard- and high-definition, video with high dynamic range and wide color gamut, adaptive streaming with resolution changes, computer-generated and screen-captured video, ultralow-delay streaming, 360° immersive video, and multilayer coding e.g., for scalability. Furthermore, early implementations are presented to show that the new VVC standard is implementable and ready for real-world deployment.

...read moreread less

250 citations

Journal Article•DOI•

Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC)

[...]

Benjamin Bross¹, Jianle Chen², Jens-Rainer Ohm³, Gary J. Sullivan⁴, Ye-Kui Wang - Show less +1 more•Institutions (4)

Heinrich Hertz Institute¹, Qualcomm², RWTH Aachen University³, Microsoft⁴

19 Jan 2021

TL;DR: This article summarizes these developments in video coding standardization after AVC, and focuses on providing an overview of the first version of VVC, including comparisons against HEVC.

...read moreread less

Abstract: In the last 17 years, since the finalization of the first version of the now-dominant H.264/Moving Picture Experts Group-4 (MPEG-4) Advanced Video Coding (AVC) standard in 2003, two major new generations of video coding standards have been developed. These include the standards known as High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC). HEVC was finalized in 2013, repeating the ten-year cycle time set by its predecessor and providing about 50% bit-rate reduction over AVC. The cycle was shortened by three years for the VVC project, which was finalized in July 2020, yet again achieving about a 50% bit-rate reduction over its predecessor (HEVC). This article summarizes these developments in video coding standardization after AVC. It especially focuses on providing an overview of the first version of VVC, including comparisons against HEVC. Besides further advances in hybrid video compression, as in previous development cycles, the broad versatility of the application domain that is highlighted in the title of VVC is explained. Included in VVC is the support for a wide range of applications beyond the typical standard- and high-definition camera-captured content codings, including features to support computer-generated/screen content, high dynamic range content, multilayer and multiview coding, and support for immersive media such as 360° video.

...read moreread less

246 citations

Proceedings Article•

COIN: COmpression with Implicit Neural representations

[...]

Emilien Dupont¹, Adam Golinski¹, Milad Alizadeh², Yee Whye Teh¹, Arnaud Doucet¹ - Show less +1 more•Institutions (2)

University of Oxford¹, Qualcomm²

03 Mar 2021

TL;DR: A new simple approach for image compression: instead of storing the RGB values for each pixel of an image, the weights of a neural network overfitted to the image are stored, and this approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights.

...read moreread less

Abstract: We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches.

...read moreread less

73 citations

Journal Article•DOI•

Neural recording and stimulation using wireless networks of microimplants

[...]

Jihun Lee¹, Vincent W. Leung², Ah-Hyoung Lee¹, Ah-Hyoung Lee³, Jiannan Huang⁴, Peter M. Asbeck⁴, Patrick P. Mercier⁴, Stephen J. Shellhammer⁵, Lawrence E. Larson¹, Farah Laiwalla¹, Arto V. Nurmikko¹ - Show less +7 more•Institutions (5)

Brown University¹, Baylor University², Seoul National University³, University of California, San Diego⁴, Qualcomm⁵

01 Aug 2021

TL;DR: Wirelessly networked and powered electronic microchips that can autonomously perform neural sensing and electrical microstimulation and can potentially be scaled to 770 neurograins using a customized time-division multiple access protocol are reported.

...read moreread less

Abstract: Multichannel electrophysiological sensors and stimulators—particularly those used to study the nervous system—are usually based on monolithic microelectrode arrays. However, the architecture of such arrays limits flexibility in electrode placement and scaling to a large number of nodes, especially across non-contiguous locations. Here we report wirelessly networked and powered electronic microchips that can autonomously perform neural sensing and electrical microstimulation. The microchips, which we term neurograins, have an ~1 GHz electromagnetic transcutaneous link to an external telecom hub, providing bidirectional communication and control at the individual device level. To illustrate the potential of the approach, we show that 48 neurograins can be individually addressed on a rat cortical surface and used for the acute recording of neural activity. Theoretical calculations and experimental measurements show that the link configuration could potentially be scaled to 770 neurograins using a customized time-division multiple access protocol. Wirelessly powered microchips, which have an ~1 GHz electromagnetic transcutaneous link to an external telecom hub, can be used for multichannel in vivo neural sensing, stimulation and data acquisition.

...read moreread less

69 citations

Proceedings Article•DOI•

InverseForm: A Loss Function for Structured Boundary-Aware Segmentation

[...]

Shubhankar Borse¹, Ying Wang¹, Yizhe Zhang¹, Fatih Porikli¹•Institutions (1)

Qualcomm¹

01 Jun 2021

TL;DR: This article proposed a boundary-aware loss term for semantic segmentation using an inverse-transformation network, which can efficiently learn the degree of parametric transformations between estimated and target boundaries.

...read moreread less

Abstract: We present a novel boundary-aware loss term for semantic segmentation using an inverse-transformation network, which efficiently learns the degree of parametric transformations between estimated and target boundaries. This plug-in loss term complements the cross-entropy loss in capturing boundary transformations and allows consistent and significant performance improvement on segmentation backbone models without increasing their size and computational complexity. We analyze the quantitative and qualitative effects of our loss function on three indoor and outdoor segmentation benchmarks, including Cityscapes, NYU-Depth-v2, and PASCAL, integrating it into the training phase of several backbone networks in both single-task and multi-task settings. Our extensive experiments show that the proposed method consistently outperforms base-lines, and even sets the new state-of-the-art on two datasets.

...read moreread less

60 citations

Journal Article•DOI•

Battery-free, wireless soft sensors for continuous multi-site measurements of pressure and temperature from patients at risk for pressure injuries.

[...]

Yong Suk Oh¹, Yong Suk Oh², Jae Hwan Kim³, Jae Hwan Kim², Zhaoqian Xie⁴, Seokjoo Cho¹, Hyeonseok Han¹, Sung Woo Jeon³, Minsu Park², Myeong Namkoong⁵, Raudel Avila², Zhen Song⁴, Sung-Uk Lee, Kabseok Ko⁶, Jungyup Lee, Je-Sang Lee, Weon Gi Min, Byeong-Ju Lee⁷, Myungwoo Choi¹, Ha Uk Chung, Jongwon Kim², Jongwon Kim⁸, Mengdi Han⁹, Jahyun Koo¹⁰, Yeon Sik Choi², Sung Soo Kwak², Sung Bong Kim³, Sung Bong Kim², Jeonghyun Kim¹¹, Jungil Choi¹², Chang-Mo Kang², Jong Uk Kim¹³, Jong Uk Kim², Kyeongha Kwon¹, Sang Min Won¹³, Janice Mihyun Baek³, Yujin Lee³, So Young Kim³, Wei Lu², Abraham Vázquez-Guardado², Hyoyoung Jeong², Hanjun Ryu², Geumbee Lee², Kyuyoung Kim¹, Seung-Hwan Kim¹, Min Seong Kim¹, Jungrak Choi¹, Dong Yun Choi¹⁴, Quansan Yang², Hangbo Zhao¹⁵, Wubin Bai¹⁶, Hokyung Jang¹⁷, Yongjoon Yu, Jaeman Lim², Xu Guo⁴, Bong Hoon Kim¹⁸, Seokwoo Jeon¹, Charles R. Davies¹⁹, Anthony Banks², Hyung Jin Sung¹, Yonggang Huang, Inkyu Park¹, John A. Rogers - Show less +59 more•Institutions (19)

KAIST¹, Northwestern University², University of Illinois at Urbana–Champaign³, Dalian University of Technology⁴, Texas A&M University⁵, Qualcomm⁶, Pusan National University⁷, Kyung Hee University⁸, Peking University⁹, Korea University¹⁰, Kwangwoon University¹¹, Kookmin University¹², Sungkyunkwan University¹³, KITECH¹⁴, University of Southern California¹⁵, University of North Carolina at Chapel Hill¹⁶, University of Wisconsin-Madison¹⁷, Soongsil University¹⁸, Urbana University¹⁹

24 Aug 2021-Nature Communications

TL;DR: In this article, a pressure-responsive element based on membrane deflection and a battery-free, wireless mode of operation capable of multi-site measurements at strategic locations across the body is introduced.

...read moreread less

Abstract: Capabilities for continuous monitoring of pressures and temperatures at critical skin interfaces can help to guide care strategies that minimize the potential for pressure injuries in hospitalized patients or in individuals confined to the bed. This paper introduces a soft, skin-mountable class of sensor system for this purpose. The design includes a pressure-responsive element based on membrane deflection and a battery-free, wireless mode of operation capable of multi-site measurements at strategic locations across the body. Such devices yield continuous, simultaneous readings of pressure and temperature in a sequential readout scheme from a pair of primary antennas mounted under the bedding and connected to a wireless reader and a multiplexer located at the bedside. Experimental evaluation of the sensor and the complete system includes benchtop measurements and numerical simulations of the key features. Clinical trials involving two hemiplegic patients and a tetraplegic patient demonstrate the feasibility, functionality and long-term stability of this technology in operating hospital settings.

...read moreread less

56 citations

Journal Article•DOI•

VVC In-Loop Filters

[...]

Marta Karczewicz¹, Hu Nan¹, Jonathan Taquet, Chen Ching-Yeh², Kiran Misra, Kenneth Andersson³, Peng Yin⁴, Taoran Lu⁴, Edouard Francois⁵, Chen Jie⁶ - Show less +6 more•Institutions (6)

Qualcomm¹, MediaTek², Ericsson³, Dolby Laboratories⁴, InterDigital, Inc.⁵, Alibaba Group⁶

09 Apr 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: An overview of the technologies for in-loop processing and filtering in the Versatile Video Coding (VVC) standard, which comprise luma mapping with chroma scaling, deblocking filter, sample adaptive offset, adaptive loop filter and cross-component adaptive loopfilter.

...read moreread less

Abstract: This paper presents an overview of the technologies for in-loop processing and filtering in the Versatile Video Coding (VVC) standard. These processes comprise luma mapping with chroma scaling, deblocking filter, sample adaptive offset, adaptive loop filter and cross-component adaptive loop filter. They are qualified as “in-loop” because they are applied inside the encoding and decoding loops, before storing the pictures in the decoded picture buffer. The filters are complementary and address different purposes. Luma mapping with chroma scaling aims at adaptively modifying the coded samples distribution for improved coding efficiency. The deblocking filter aims at reducing blocking discontinuities. Sample adaptive offset mostly aims at reducing artifacts resulting from the quantization of transform coefficients. Adaptive loop filter and cross-component adaptive loop filter are adaptive filters enabling to enhance the reconstructed signal, using for instance Wiener-filter encoding approaches. The paper provides an overview of the in-loop filtering process and a detailed description of the filtering algorithms. Objective compression efficiency results are provided for each filter, with indication of cumulative coding gains. Subjective benefits are illustrated. Implementation issues considered during the design of the VVC in-loop filters are also discussed.

...read moreread less

56 citations

Journal Article•DOI•

Intra Prediction and Mode Coding in VVC

[...]

Jonathan Pfaff¹, Filippov Alexey Konstantinovich², Shan Liu³, Xin Zhao³, Jianle Chen⁴, Santiago De-Luxan-Hernandez¹, Thomas Wiegand¹, Rufitskiy Vasily Alexeevich², Adarsh Krishnan Ramasubramonian⁴, Geert Van der Auwera⁴ - Show less +6 more•Institutions (4)

Heinrich Hertz Institute¹, Huawei², Tencent³, Qualcomm⁴

12 Apr 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The intra prediction and mode coding of the Versatile Video Coding (VVC) standard is presented and a bitrate saving of 25% on average is reported over H.265/HEVC using an objective metric.

...read moreread less

Abstract: This paper presents the intra prediction and mode coding of the Versatile Video Coding (VVC) standard. This standard was collaboratively developed by the Joint Video Experts Team (JVET). It follows the traditional architecture of a hybrid block-based codec that was also the basis of previous standards. Almost all intra prediction features of VVC either contain substantial modifications in comparison with its predecessor H.265/HEVC or were newly added. The key aspects of these tools are the following: 65 angular intra prediction modes with block shape-adaptive directions and 4-tap interpolation filters are supported as well as the DC and Planar mode, Position Dependent Prediction Combination is applied for most of these modes, Multiple Reference Line Prediction can be used, an intra block can be further subdivided by the Intra Subpartition mode, Matrix-based Intra Prediction is supported, and the chroma prediction signal can be generated by the Cross Component Linear Model method. Finally, the intra prediction mode in VVC is coded separately for luma and chroma. Here, a Most Probable Mode list containing six modes is applied for luma. The individual compression performance of tools is reported in this paper. For the full VVC intra codec, a bitrate saving of 25% on average is reported over H.265/HEVC using an objective metric. Significant subjective benefits are illustrated with specific examples.

...read moreread less

55 citations

Journal Article•DOI•

Block Partitioning Structure in the VVC Standard

[...]

Yu-Wen Huang¹, Jicheng An², Han Huang³, Li Xiang⁴, Hsiang Shih-Ta¹, Kai Zhang, Gao Han⁵, Jackie Ma⁶, Chubach Olena¹ - Show less +5 more•Institutions (6)

MediaTek¹, Alibaba Group², Qualcomm³, Tencent⁴, Huawei⁵, Heinrich Hertz Institute⁶

11 Jun 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: To reduce hardware decoder complexity, virtual pipeline data unit constraints are introduced, which forbid certain multi-type tree splits and a local dual tree is described, which reduces the number of small chroma intra blocks.

...read moreread less

Abstract: Versatile Video Coding (VVC) is the latest video coding standard jointly developed by ITU-T VCEG and ISO/IEC MPEG. In this paper, technical details and experimental results for the VVC block partitioning structure are provided. Among all the new technical aspects of VVC, the block partitioning structure is identified as one of the most substantial changes relative to the previous video coding standards and provides the most significant coding gains. The new partitioning structure is designed using a more flexible scheme. Each coding tree unit (CTU) is either treated as one coding unit or split into multiple coding units by one or more recursive quaternary tree partitions followed by one or more recursive multi-type tree splits. The latter can be horizontal binary tree split, vertical binary tree split, horizontal ternary tree split, or vertical ternary tree split. A CTU dual tree for intra-coded slices is described on top of the new block partitioning structure, allowing separate coding trees for luma and chroma. Also, a new way of handling picture boundaries is presented. Additionally, to reduce hardware decoder complexity, virtual pipeline data unit constraints are introduced, which forbid certain multi-type tree splits. Finally, a local dual tree is described, which reduces the number of small chroma intra blocks.

...read moreread less

50 citations

Journal Article•DOI•

News coverage of the E-cigarette, or Vaping, product use Associated Lung Injury (EVALI) outbreak and internet searches for vaping cessation.

[...]

Eric C. Leas¹, Alicia L. Nobles², Alicia L. Nobles¹, Theodore L. Caputi³, Mark Dredze⁴, Shu-Hong Zhu¹, Joanna E. Cohen⁴, John W. Ayers¹ - Show less +4 more•Institutions (4)

University of California, San Diego¹, Qualcomm², National University of Ireland³, Johns Hopkins University⁴

01 Sep 2021-Tobacco Control

TL;DR: The vaping-induced pulmonary disease outbreak spawned increased coverage about the dangers of vaping and internet searches for vaping cessation, and the ratio of observed to expected search volumes during the outbreak era was forecast with historical trends.

...read moreread less

Abstract: Background In the latter half of 2019, an outbreak of pulmonary disease in the USA resulted in 2807 hospitalisations and 68 deaths, as of 18 February 2020. Given the severity of the outbreak, we assessed whether articles during the outbreak era more frequently warned about the dangers of vaping and whether internet searches for vaping cessation increased. Methods Using Tobacco Watcher, a media monitoring platform that automatically identifies and categorises news articles from sources across the globe, we obtained all articles that (a) discussed the outbreak and (b) primarily warned about the dangers of vaping. We obtained internet search trends originating from the USA that mentioned ‘quit’ or ‘stop’ and ‘e cig(s),’ ‘ecig(s),’ ‘e-cig(s),’ ‘e cigarette(s),’ ‘e-cigarette(s),’ ‘electronic cigarette(s),’ ‘vape(s),’ ‘vaping’ or ‘vaper(s)’ from Google Trends (eg, ‘how do I quit vaping?’). All data were obtained from 1 January 2014 to 18 February 2020 and ARIMA models were used with historical trends to forecast the ratio of observed to expected search volumes during the outbreak era. Results News of the vaping-induced pulmonary disease outbreak was first reported on 25 July 2019 with 195 articles, culminating in 44 512 articles by 18 February 2020. On average, news articles warning about the dangers of vaping were 130% (95% prediction interval (PI): −15 to 417) and searches for vaping cessation were 76% (95% PI: 28 to 182) higher than expected levels for the days during the period when the sources of the outbreak were unknown (25 July to 27 September 2019). News and searches stabilised just after the US Centers for Disease Control and Prevention reported that a primary source of the outbreak was an additive used in marijuana vapes on 27 September 2019. In sum, there were 12 286 articles archived in Tobacco Watcher primarily warning about the dangers of vaping and 1 025 000 cessation searches following the outbreak. Conclusion The vaping-induced pulmonary disease outbreak spawned increased coverage about the dangers of vaping and internet searches for vaping cessation. Resources and strategies that respond to this elevated interest should become a priority among public health leaders.

...read moreread less

47 citations

Proceedings Article•DOI•

FrameExit: Conditional Early Exiting for Efficient Video Recognition

[...]

Amir Ghodrati¹, Babak Ehteshami Bejnordi¹, Amirhossein Habibian¹•Institutions (1)

Qualcomm¹

27 Apr 2021

TL;DR: In this paper, a conditional early exiting framework is proposed to automatically determine the earliest point in processing where an inference is sufficiently reliable and generate on-the-fly supervision signals to the gates to provide a dynamic trade-off between accuracy and computational cost.

...read moreread less

Abstract: In this paper, we propose a conditional early exiting framework for efficient video recognition. While existing works focus on selecting a subset of salient frames to re-duce the computation costs, we propose to use a simple sampling strategy combined with conditional early exiting to enable efficient recognition. Our model automatically learns to process fewer frames for simpler videos and more frames for complex ones. To achieve this, we employ a cascade of gating modules to automatically determine the earliest point in processing where an inference is sufficiently reliable. We generate on-the-fly supervision signals to the gates to provide a dynamic trade-off between accuracy and computational cost. Our proposed model outperforms competing methods on three large-scale video benchmarks. In particular, on ActivityNet1.3 and mini-kinetics, we outperform the state-of-the-art efficient video recognition methods with 1.3× and 2.1 less GFLOPs, respectively. Addition-ally, our method sets× a new state of the art for efficient video understanding on the HVU benchmark.

...read moreread less

Journal Article•DOI•

STT-MRAM Sensing: A Review

[...]

Taehui Na¹, Seung H. Kang², Seong-Ook Jung³•Institutions (3)

Incheon National University¹, Qualcomm², Yonsei University³

01 Jan 2021-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: This brief presents a review of developments in spin-transfer-torque magnetoresistive random access memory (STT-MRAM) sensing over the past 20 years from a circuit design perspective and key breakthroughs for achieving the optimal reference scheme, read disturbance prevention, read energy reduction, accurate yield estimation, and overcoming other non-idealities are discussed.

...read moreread less

Abstract: This brief presents a review of developments in spin-transfer-torque magnetoresistive random access memory (STT-MRAM) sensing over the past 20 years from a circuit design perspective. Various sensing schemes are categorized and described according to the data-cell variation-tolerant characteristics, pre-amplifiers, and offset tolerance. Key breakthroughs for achieving the optimal reference scheme, read disturbance prevention, read energy reduction, accurate yield estimation, and overcoming other non-idealities are discussed. This review is intended to facilitate further enhancement of STT-MRAM sensing in advanced technology nodes, thereby fulfilling STT-MRAM’s potential as a universal memory.

...read moreread less

Journal Article•DOI•

Transform Coding in the VVC Standard

[...]

Xin Zhao¹, Kim Seunghwan², Zhao Yin³, Hilmi E. Egilmez⁴, Koo Moonmo², Shan Liu¹, Jani Lainema⁵, Marta Karczewicz⁴ - Show less +4 more•Institutions (5)

Tencent¹, LG Electronics², Huawei³, Qualcomm⁴, Nokia⁵

09 Jun 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The experimental results on VVC reference software show that average 4.5% and 3.6% overall coding gain can be achieved by the VVC transform coding tools for All Intra and Random Access configurations, respectively.

...read moreread less

Abstract: In the past decade, the development of transform coding techniques has achieved significant progress and several advanced transform tools have been adopted in the new generation Versatile Video Coding (VVC) standard. In this paper, a brief history of transform coding development during VVC standardization is presented, and the transform coding tools in the VVC standard are described in detail together with their initial design, incremental improvements and implementation aspects. To improve coding efficiency, four new transform coding techniques are introduced in VVC, which are namely Multiple Transform Selection (MTS), Low-Frequency Non-separable Secondary Transform (LFNST) and Sub-Block Transform (SBT), as well as a large (64-point) type-2 DCT. The experimental results on VVC reference software (VTM-9.0) show that average 4.5% and 3.6% overall coding gain can be achieved by the VVC transform coding tools for All Intra and Random Access configurations, respectively.

...read moreread less

Journal Article•DOI•

Quantization and Entropy Coding in the Versatile Video Coding (VVC) Standard

[...]

Heiko Schwarz¹, Muhammed Zeyd Coban², Marta Karczewicz², Chuang Tzu-Der³, Frank Bossen, Alexander Alshin⁴, Jani Lainema⁵, Christian R. Helmrich¹, Thomas Wiegand¹ - Show less +5 more•Institutions (5)

Heinrich Hertz Institute¹, Qualcomm², MediaTek³, Intel⁴, Nokia⁵

09 Apr 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The paper provides an overview of the quantization and entropy coding methods in the Versatile Video Coding (VVC) standard and discusses motivations and implementation aspects.

...read moreread less

Abstract: The paper provides an overview of the quantization and entropy coding methods in the Versatile Video Coding (VVC) standard. Special focus is laid on techniques that improve coding efficiency relative to the methods included in the High Efficiency Video Coding (HEVC) standard: The inclusion of trellis-coded quantization, the advanced context modeling for entropy coding of transform coefficient levels, the arithmetic coding engine with multi-hypothesis probability estimation, and the joint coding of chroma residuals. Beside a description of the design concepts, the paper also discusses motivations and implementation aspects. The effectiveness of the quantization and entropy coding methods specified in VVC is validated by experimental results.

...read moreread less

Proceedings Article•DOI•

Uncovering In-DRAM RowHammer Protection Mechanisms:A New Methodology, Custom RowHammer Patterns, and Implications

[...]

Hasan Hassan¹, Yahya Can Tugrul², Jeremie S. Kim¹, Victor van der Veen³, Kaveh Razavi¹, Onur Mutlu¹ - Show less +2 more•Institutions (3)

ETH Zurich¹, TOBB University of Economics and Technology², Qualcomm³

18 Oct 2021

TL;DR: Uncovering TRR (U-TRR) as discussed by the authors is an experimental methodology to analyze in-DRAM TRR implementations, based on the new observation that data retention failures in DRAM enable a side channel that leaks information on how TRR refreshes potential victim rows.

...read moreread less

Abstract: The RowHammer vulnerability in DRAM is a critical threat to system security. To protect against RowHammer, vendors commit to security-through-obscurity: modern DRAM chips rely on undocumented, proprietary, on-die mitigations, commonly known as Target Row Refresh (TRR). At a high level, TRR detects and refreshes potential RowHammer-victim rows, but its exact implementations are not openly disclosed. Security guarantees of TRR mechanisms cannot be easily studied due to their proprietary nature. To assess the security guarantees of recent DRAM chips, we present Uncovering TRR (U-TRR), an experimental methodology to analyze in-DRAM TRR implementations. U-TRR is based on the new observation that data retention failures in DRAM enable a side channel that leaks information on how TRR refreshes potential victim rows. U-TRR allows us to (i) understand how logical DRAM rows are laid out physically in silicon; (ii) study undocumented on-die TRR mechanisms; and (iii) combine (i) and (ii) to evaluate the RowHammer security guarantees of modern DRAM chips. We show how U-TRR allows us to craft RowHammer access patterns that successfully circumvent the TRR mechanisms employed in 45 DRAM modules of the three major DRAM vendors. We find that the DRAM modules we analyze are vulnerable to RowHammer, having bit flips in up to 99.9% of all DRAM rows.

...read moreread less

Proceedings Article•DOI•

Towards High Fidelity Face Relighting with Realistic Shadows

[...]

Andrew Hou¹, Ze Zhang¹, Michel Sarkis², Ning Bi², Yiying Tong¹, Xiaoming Liu¹ - Show less +2 more•Institutions (2)

Michigan State University¹, Qualcomm²

01 Jun 2021

TL;DR: Zhang et al. as discussed by the authors proposed a novel deep face relighting method that predicts the ratio (quotient) image between a source image and the target image with the desired lighting, allowing them to relight the image while maintaining the local facial details.

...read moreread less

Abstract: Existing face relighting methods often struggle with two problems: maintaining the local facial details of the subject and accurately removing and synthesizing shadows in the relit image, especially hard shadows. We propose a novel deep face relighting method that addresses both problems. Our method learns to predict the ratio (quotient) image between a source image and the target image with the desired lighting, allowing us to relight the image while maintaining the local facial details. During training, our model also learns to accurately modify shadows by using estimated shadow masks to emphasize on the high-contrast shadow borders. Furthermore, we introduce a method to use the shadow mask to estimate the ambient light intensity in an image, and are thus able to leverage multiple datasets during training with different global lighting intensities. With quantitative and qualitative evaluations on the Multi-PIE and FFHQ datasets, we demonstrate that our proposed method faithfully maintains the local facial details of the subject and can accurately handle hard shadows while achieving state-of-the-art face relighting performance.

...read moreread less

Journal Article•DOI•

Overview of the Screen Content Support in VVC: Applications, Coding Tools, and Performance

[...]

Tung Nguyen¹, Xu Xiaozhong², Felix Henry, Liao Ruling, Mohammed Golam Sarwer, Marta Karczewicz³, Chao Yung-Hsuan³, Xu Jizheng, Shan Liu², Detlev Marpe¹, Gary J. Sullivan⁴ - Show less +7 more•Institutions (4)

Heinrich Hertz Institute¹, Tencent², Qualcomm³, Microsoft⁴

20 Apr 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This paper describes the screen content support and the five main low-level screen content coding tools in VVC: transform skip residual coding (TSRC), block-based differential pulse-code modulation (BDPCM), intra block copy (IBC), adaptive color transform (ACT), and the palette mode.

...read moreread less

Abstract: In an increasingly connected world, consumer video experiences have diversified away from traditional broadcast video into new applications with increased use of non-camera-captured content such as computer screen desktop recordings or animations created by computer rendering, collectively referred to as screen content. There has also been increased use of graphics and character content that is rendered and mixed or overlaid together with camera-generated content. The emerging Versatile Video Coding (VVC) standard, in its first version, addresses this market change by the specification of low-level coding tools suitable for screen content. This is in contrast to its predecessor, the High Efficiency Video Coding (HEVC) standard, where highly efficient screen content support is only available in extension profiles of its version 4. This paper describes the screen content support and the five main low-level screen content coding tools in VVC: transform skip residual coding (TSRC), block-based differential pulse-code modulation (BDPCM), intra block copy (IBC), adaptive color transform (ACT), and the palette mode. The specification of these coding tools in the first version of VVC enables the VVC reference software implementation (VTM) to achieve average bit-rate savings of about 41% to 61% relative to the HEVC test model (HM) reference software implementation using the Main 10 profile for 4:2:0 screen content test sequences. Compared to the HM using the Screen-Extended Main 10 profile and the same 4:2:0 test sequences, the VTM provides about 19% to 25% bit-rate savings. The same comparison with 4:4:4 test sequences revealed bit-rate savings of about 13% to 27% for $Y'C_{B}C_{R}$ and of about 6% to 14% for $R'G'B'$ screen content. Relative to the HM without the HEVC version 4 screen content coding extensions, the bit-rate savings for 4:4:4 test sequences are about 33% to 64% for $Y'C_{B}C_{R}$ and 43% to 66% for $R'G'B'$ screen content.

...read moreread less

Journal Article•DOI•

Hybrid Beamforming for Millimeter Wave Full-Duplex under Limited Receive Dynamic Range

[...]

Ian P. Roberts¹, Jeffrey G. Andrews¹, Sriram Vishwanath²•Institutions (2)

University of Texas at Austin¹, Qualcomm²

15 Jun 2021-IEEE Transactions on Wireless Communications

TL;DR: This work is the first to holistically consider the key practical constraints of analog beamforming codebooks, a minimal number of radio frequency (RF) chains, limited channel knowledge, beam alignment, and a limited receive dynamic range.

...read moreread less

Abstract: Full-duplex millimeter wave (mmWave) communication has shown increasing promise for self-interference cancellation via hybrid precoding and combining. This paper proposes a novel mmWave multiple-input multiple-output (MIMO) design for configuring the analog and digital beamformers of a full-duplex transceiver. This work is the first to holistically consider the key practical constraints of analog beamforming codebooks, a minimal number of radio frequency (RF) chains, limited channel knowledge, beam alignment, and a limited receive dynamic range. To prevent self-interference from saturating receive components, such as LNAs and ADCs, a design framework is developed that limits the degree of self-interference on a per-antenna and per-RF chain basis. We present a means for constructing analog beamforming candidates from beam alignment measurements to afford our design greater flexibility in its aim to reduce self-interference. Numerical results evaluate the design in a variety of settings and validate the need to prevent receiver-side saturation. These results and corresponding insights serve as useful design references and benchmarks for practical full-duplex mmWave transceivers.

...read moreread less

Proceedings Article•DOI•

Broadcasted Residual Learning for Efficient Keyword Spotting

[...]

Byeonggeun Kim¹, Simyung Chang¹, Jinkyu Lee², Dooyong Sung¹•Institutions (2)

Qualcomm¹, Yonsei University²

30 Aug 2021

TL;DR: In this paper, a broadcasted residual learning method is proposed to achieve high accuracy with small model size and computational load, which can effectively represent useful audio features with much less computation than conventional convolutional neural networks.

...read moreread less

Abstract: Keyword spotting is an important research field because it plays a key role in device wake-up and user interaction on smart devices. However, it is challenging to minimize errors while operating efficiently in devices with limited resources such as mobile phones. We present a broadcasted residual learning method to achieve high accuracy with small model size and computational load. Our method configures most of the residual functions as 1D temporal convolution while still allows 2D convolution together using a broadcasted-residual connection that expands temporal output to frequency-temporal dimension. This residual mapping enables the network to effectively represent useful audio features with much less computation than conventional convolutional neural networks. We also propose a novel network architecture, Broadcasting-residual network (BC-ResNet), based on broadcasted residual learning and describe how to scale up the model according to the target device's resources. BC-ResNets achieve state-of-the-art 98.0% and 98.7% top-1 accuracy on Google speech command datasets v1 and v2, respectively, and consistently outperform previous approaches, using fewer computations and parameters.

...read moreread less

Proceedings Article•DOI•

Skip-Convolutions for Efficient Video Processing

[...]

Amirhossein Habibian¹, Davide Abati¹, Taco S. Cohen¹, Babak Ehteshami Bejnordi¹•Institutions (1)

Qualcomm¹

23 Apr 2021

TL;DR: Skip-Convolution as mentioned in this paper proposes skip-convolutions to leverage the large amount of redundancies in video streams and save computations by replacing all convolutions with skip-convolutions in two state-of-the-art architectures.

...read moreread less

Abstract: We propose Skip-Convolutions to leverage the large amount of redundancies in video streams and save computations. Each video is represented as a series of changes across frames and network activations, denoted as residuals. We reformulate standard convolution to be efficiently computed on residual frames: each layer is coupled with a binary gate deciding whether a residual is important to the model prediction, e.g. foreground regions, or it can be safely skipped, e.g. background regions. These gates can either be implemented as an efficient network trained jointly with convolution kernels, or can simply skip the residuals based on their magnitude. Gating functions can also incorporate block-wise sparsity structures, as required for efficient implementation on hardware platforms. By replacing all convolutions with Skip-Convolutions in two state-of-the-art architectures, namely EfficientDet and HRNet, we reduce their computational cost consistently by a factor of 3 ∼ 4× for two different tasks, without any accuracy drop. Extensive comparisons with existing model compression, as well as image and video efficiency methods demonstrate that Skip-Convolutions set a new state-of-the-art by effectively exploiting the temporal redundancies in videos.

...read moreread less

Posted Content•

Efficient Neural Causal Discovery without Acyclicity Constraints

[...]

Phillip Lippe¹, Taco S. Cohen², Efstratios Gavves¹•Institutions (2)

University of Amsterdam¹, Qualcomm²

22 Jul 2021-arXiv: Learning

TL;DR: This paper presents ENCO, an efficient structure learning method for directed, acyclic causal graphs leveraging observational and interventional data that can efficiently recover graphs with hundreds of nodes, an order of magnitude larger than what was previously possible, while handling deterministic variables and latent confounders.

...read moreread less

Abstract: Learning the structure of a causal graphical model using both observational and interventional data is a fundamental problem in many scientific fields. A promising direction is continuous optimization for score-based methods, which efficiently learn the causal graph in a data-driven manner. However, to date, those methods require constrained optimization to enforce acyclicity or lack convergence guarantees. In this paper, we present ENCO, an efficient structure learning method for directed, acyclic causal graphs leveraging observational and interventional data. ENCO formulates the graph search as an optimization of independent edge likelihoods, with the edge orientation being modeled as a separate parameter. Consequently, we can provide convergence guarantees of ENCO under mild conditions without constraining the score function with respect to acyclicity. In experiments, we show that ENCO can efficiently recover graphs with hundreds of nodes, an order of magnitude larger than what was previously possible, while handling deterministic variables and latent confounders.

...read moreread less

Journal Article•DOI•

The High-Level Syntax of the Versatile Video Coding (VVC) Standard

[...]

Ye-Kui Wang, Robert Skupin¹, Miska Hannuksela², Sachin G. Deshpande, Hendry³, Virginie Drugeon⁴, Rickard Sjöberg⁵, Choi Byeongdoo⁶, Vadim Seregin⁷, Yago Sanchez¹, Jill Macdonald Boyce⁸, Wade Wan⁹, Gary J. Sullivan¹⁰ - Show less +9 more•Institutions (10)

Heinrich Hertz Institute¹, Nokia², LG Electronics³, Panasonic⁴, Ericsson⁵, Tencent⁶, Qualcomm⁷, Intel⁸, Broadcom⁹, Microsoft¹⁰

05 Apr 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: An overview of the VVC high-level syntax (HLS), which forms its system and transport interface is given and Comparisons to the HLS design in High Efficiency Video Coding (HEVC), the previous major video coding standards, are included.

...read moreread less

Abstract: Versatile Video Coding (VVC), a.k.a. ITU-T H.266 | ISO/IEC 23090-3, is the new generation video coding standard that has just been finalized by the Joint Video Experts Team (JVET) of ITU-T VCEG and ISO/IEC MPEG at its $19^{\mathrm {th}}$ meeting ending on July 1, 2020. This paper gives an overview of the VVC high-level syntax (HLS), which forms its system and transport interface. Comparisons to the HLS designs in High Efficiency Video Coding (HEVC) and Advanced Video Coding (AVC), the previous major video coding standards, are included. When discussing new HLS features introduced into VVC or differences relative to HEVC and AVC, the reasoning behind the design differences and the benefits they bring are described. The HLS of VVC enables newer and more versatile use cases such as video region extraction, composition and merging of content from multiple coded video bitstreams, and viewport-adaptive 360° immersive media.

...read moreread less

Journal Article•DOI•

An mm-Wave Scalable PLL-Coupled Array for Phased-Array Applications in 65-nm CMOS

[...]

Hamidreza Afzal¹, Razieh Abedi², Rouzbeh Kananizadeh¹, Payam Heydari³, Omeed Momeni¹ - Show less +1 more•Institutions (3)

University of California, Davis¹, Qualcomm², University of California, Irvine³

01 Feb 2021-IEEE Transactions on Microwave Theory and Techniques

TL;DR: In this paper, a two-element phase-locked loop (PLL)-coupled array for the implementation of millimeter-wave (mm-wave) and sub-thz (sub-THz) phased arrays is presented.

...read moreread less

Abstract: A new two-element phase-locked loop (PLL)-coupled array for the implementation of millimeter-wave (mm-wave) and subterahertz (sub-THz) phased arrays is presented. This architecture avoids using lossy phase shifter to create the required phase shift between the adjacent elements in a phased-array system. The required phase shift is generated by utilizing a dual nested loop PLL. The two PLL loops work together to stabilize the frequency and create the required phase shift. Moreover, it can be scaled simply by adding more unit cells to the architecture. A 112–121-GHz two-element phased array is designed and fabricated in a standard 65-nm CMOS process. It consumes 147-mW power and provides a phase shift of 46.7° ranging from 58.53° to 105.2° at 117 GHz.

...read moreread less

Journal Article•

Learned Threshold Pruning

[...]

Kambiz Azarian¹, Yash Bhalgat¹, Jinwon Lee², Tijmen Blankevoort¹•Institutions (2)

Qualcomm¹, Amazon.com²

04 May 2021-arXiv: Learning

TL;DR: The learned-threshold pruning method, which learns per-layer thresholds via gradient descent, makes thresholds trainable, hence scalable to deeper networks, and effectively prunes newer architectures, such as EfficientNet, MobileNetV2 and MixNet.

...read moreread less

Abstract: This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method learns per-layer thresholds via gradient descent, unlike conventional methods where they are set as input. Making thresholds trainable also makes LTP computationally efficient, hence scalable to deeper networks. For example, it takes 30 epochs for LTP to prune ResNet50 on ImageNet by a factor of 9.1. This is in contrast to other methods that search for per-layer thresholds via a computationally intensive iterative pruning and fine-tuning process. Additionally, with a novel differentiable L0 regularization, LTP is able to operate effectively on architectures with batch-normalization. This is important since L1 and L2 penalties lose their regularizing effect in networks with batch-normalization. Finally, LTP generates a trail of progressively sparser networks from which the desired pruned network can be picked based on sparsity and performance requirements. These features allow LTP to achieve competitive compression rates on ImageNet networks such as AlexNet (26.4× compression with 79.1% Top-5 accuracy) and ResNet50 (9.1× compression with 92.0% Top-5 accuracy). We also show that LTP effectively prunes modern \textit{compact} architectures, such as EfficientNet, MobileNetV2 and MixNet.

...read moreread less

Journal Article•DOI•

Sparse, Group-Sparse, and Online Bayesian Learning Aided Channel Estimation for Doubly-Selective mmWave Hybrid MIMO OFDM Systems

[...]

Suraj Srivastava¹, Ch Suraj Kumar Patro², Aditya K. Jagannatham¹, Lajos Hanzo³•Institutions (3)

Indian Institute of Technology Kanpur¹, Qualcomm², University of Southampton³

22 May 2021-IEEE Transactions on Communications

TL;DR: A novel group-sparse Bayesian learning (G-SBL) scheme is conceived for channel estimation that exploits the frequency-domain (FD) correlation of the channel’s frequency response (CFR) while transmitting pilots on only a few subcarriers, thus it has a reduced pilot overhead.

...read moreread less

Abstract: Sparse, group-sparse and online channel estimation is conceived for millimeter wave (mmWave) multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems. We exploit the angular sparsity of the mmWave channel impulse response (CIR) to achieve improved estimation performance. First a sparse Bayesian learning (SBL)-based technique is developed for the estimation of each individual subcarrier’s quasi-static channel, which leads to an improved performance versus complexity trade-off in comparison to conventional channel estimation. Then a novel group-sparse Bayesian learning (G-SBL) scheme is conceived for reducing the channel estimation mean square error (MSE). The salient aspect of our G-SBL technique is that it exploits the frequency-domain (FD) correlation of the channel’s frequency response (CFR), while transmitting pilots on only a few subcarriers, thus it has a reduced pilot overhead. A low complexity (LC) version of G-SBL, termed LCG-SBL, is also developed that reduces the computational cost of the G-SBL significantly. Subsequently, an online G-SBL (O-SBL) variant is designed for the estimation of doubly-selective mmWave MIMO OFDM channels, which has low processing delay and exploits temporal correlation as well. This is followed by the design of a hybrid transmit precoder and receive combiner, which can operate directly on the estimated beamspace domain CFRs, together with a limited channel state information (CSI) feedback. Our simulation results confirms the accuracy of the analysis.

...read moreread less

Journal Article•DOI•

A Deep Reinforcement Learning Framework for Contention-Based Spectrum Sharing

[...]

Akash Doshi¹, Srinivas Yerramalli², Lorenzo Ferrari², Taesang Yoo¹, Jeffrey G. Andrews¹ - Show less +1 more•Institutions (2)

University of Texas at Austin¹, Qualcomm²

07 Jun 2021-IEEE Journal on Selected Areas in Communications

TL;DR: This work forms a decentralized partially observable Markov decision process with a novel reward structure that provides long term proportional fairness in terms of throughput and incorporates these features into a distributed reinforcement learning framework for contention-based spectrum access.

...read moreread less

Abstract: The increasing number of wireless devices operating in unlicensed spectrum motivates the development of intelligent adaptive approaches to spectrum access. We consider decentralized contention-based medium access for base stations (BSs) operating on unlicensed shared spectrum, where each BS autonomously decides whether or not to transmit on a given resource. The contention decision attempts to maximize not its own downlink throughput, but rather a network-wide objective. We formulate this problem as a decentralized partially observable Markov decision process with a novel reward structure that provides long term proportional fairness in terms of throughput. We then introduce a two-stage Markov decision process in each time slot that uses information from spectrum sensing and reception quality to make a medium access decision. Finally, we incorporate these features into a distributed reinforcement learning framework for contention-based spectrum access. Our formulation provides decentralized inference, online adaptability and also caters to partial observability of the environment through recurrent Q-learning. Empirically, we find its maximization of the proportional fairness metric to be competitive with a genie-aided adaptive energy detection threshold, while being robust to channel fading and small contention windows.

...read moreread less

Journal Article•DOI•

Subblock-Based Motion Derivation and Inter Prediction Refinement in the Versatile Video Coding Standard

[...]

Yang Haitao¹, Chen Huanbang¹, Jianle Chen², Esenlik Semih¹, Sriram Sethuraman³, Xiaoyu Xiu, Elena Alshina¹, Jiancong Luo⁴ - Show less +4 more•Institutions (4)

Huawei¹, Qualcomm², Amazon.com³, Apple Inc.⁴

27 Jul 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The technical details of each coding tool are presented and the design elements with the consideration of typical hardware implementations are highlighted and visual quality improvement is demonstrated and analyzed.

...read moreread less

Abstract: Efficient representation and coding of fine-granular motion information is one of the key research areas for exploiting inter-frame correlation in video coding. Representative techniques towards this direction are affine motion compensation (AMC), decoder-side motion vector refinement (DMVR), and subblock-based temporal motion vector prediction (SbTMVP). Fine-granular motion information is derived at subblock level for all the three coding tools. In addition, the obtained inter prediction can be further refined by two optical flow-based coding tools, the bi-directional optical flow (BDOF) for bi-directional inter prediction and the prediction refinement with optical flow (PROF) exclusively used in combination with AMC. The aforementioned five coding tools have been extensively studied and finally adopted in the Versatile Video Coding (VVC) standard. This paper presents technical details of each tool and highlights the design elements with the consideration of typical hardware implementations. Following the common test conditions defined by Joint Video Experts Team (JVET) for the development of VVC, 5.7% bitrate reduction on average is achieved by the five tools. For test sequences characterized by large and complex motion, up to 13.4% bitrate reduction is observed. Additionally, visual quality improvement is demonstrated and analyzed.

...read moreread less

Proceedings Article•DOI•

Subspectral Normalization for Neural Audio Data Processing

[...]

Simyung Chang¹, Hyoungwoo Park¹, Janghoon Cho¹, Hyunsin Park¹, Sungrack Yun¹, Kyu Woong Hwang¹ - Show less +2 more•Institutions (1)

Qualcomm¹

06 Jun 2021

TL;DR: SubSpectral Normalization (SSN) as mentioned in this paper splits the input frequency dimension into several groups (sub-bands) and performs a different normalization for each group, which removes the interfrequency deflection while the network learns a frequency-aware characteristic.

...read moreread less

Abstract: Convolutional Neural Networks are widely used in various machine learning domains. In image processing, the features can be obtained by applying 2D convolution to all spatial dimensions of the input. However, in the audio case, frequency domain input like Mel-Spectrogram has different and unique characteristics in the frequency dimension. Thus, there is a need for a method that allows the 2D convolution layer to handle the frequency dimension differently. In this work, we introduce SubSpectral Normalization (SSN), which splits the input frequency dimension into several groups (sub-bands) and performs a different normalization for each group. SSN also includes an affine transformation that can be applied to each group. Our method removes the inter-frequency deflection while the network learns a frequency-aware characteristic. In the experiments with audio data, we observed that SSN can efficiently improve the network’s performance.

...read moreread less

Proceedings Article•DOI•

Progressive Neural Image Compression With Nested Quantization And Latent Ordering

[...]

Yadong Lu¹, Yinhao Zhu², Yang Yang³, Amir Said⁴, Taco S. Cohen³ - Show less +1 more•Institutions (4)

University of California, Irvine¹, University of Notre Dame², Qualcomm³, Shandong University⁴

19 Sep 2021

TL;DR: PLONQ, a progressive neural image compression scheme which pushes the boundary of variable bitrate compression by allowing quality scalable coding with a single bitstream, is presented and it outperforms SPIHT, a well-known wavelet-based progressive image codec.

...read moreread less

Abstract: We present PLONQ, a progressive neural image compression scheme which pushes the boundary of variable bitrate compression by allowing quality scalable coding with a single bitstream. In contrast to existing learned variable bitrate solutions which produce separate bitstreams for each quality, it enables easier rate-control and requires less storage. Leveraging the latent scaling based variable bitrate solution, we introduce nested quantization, a method that defines multiple quantization levels with nested quantization grids, and progressively refines all latents from the coarsest to the finest quantization level. To achieve finer progressiveness in between any two quantization levels, latent elements are incrementally refined with an importance ordering defined in the rate-distortion sense. To the best of our knowledge, PLONQ is the first learning-based progressive image coding scheme and it outperforms SPIHT, a well-known wavelet-based progressive image codec.

...read moreread less

Journal Article•DOI•

Detection of Polyps in Colonoscopic Videos Using Saliency Map-Based Modified Particle Filter

[...]

Pradipta Sasmal¹, Manas Kamal Bhuyan¹, Shashwata Gupta², Yuji Iwahori³•Institutions (3)

Indian Institute of Technology Guwahati¹, Qualcomm², Chubu University³

21 May 2021-IEEE Transactions on Instrumentation and Measurement

TL;DR: In this paper, a real-time tracking framework for polyp region segmentation in colonoscopic video frames is proposed, where the saliency map is composed of four probability maps generated by incorporating the characteristics associated with the polyps, and the elliptical shape of polyps is used by the particles for final refinement using an active contour (AC) model.

...read moreread less

Abstract: In this article, an automatic polyp detection system for endoscopic video frames is proposed. Manual inspection of each frame for polyp localization in the colonoscopic video has many adversaries. This work proposes a real-time tracking framework for polyp region segmentation in hugely acquired colonoscopic video frames. In our work, the polyp region in the frame is roughly detected by a saliency map at first, followed by a modified tracking mechanism for localization. The work suggests the use of a visual saliency map as the measurement model for tracking. The saliency map is composed of four probability maps generated by incorporating the characteristics associated with the polyps. The elliptical shape of the polyps is used by the particles for final refinement using an active contour (AC) model. The tracking efficiency and the segmentation score achieved using the proposed method suggest that our method can be used for polyp detection and localization. The proposed method achieves an average dice score of 66.06% in the CVC clinic Database. Our method can be employed in both online as well as off-line endoscopic video sequences. A GUI is also designed using the proposed method as an automatic polyp detection system.

...read moreread less

Collapse