Bio: Li Li is an academic researcher from University of Science and Technology of China. The author has contributed to research in topics: Point cloud & Motion compensation. The author has an hindex of 17, co-authored 81 publications receiving 1215 citations. Previous affiliations of Li Li include University of Missouri & University of Missouri–Kansas City.
TL;DR: Experimental results show that the proposed λ-domain rate control can achieve the target bitrates more accurately than the original rate control algorithm in the HEVC reference software as well as obtain significant R-D performance gain.
Abstract: Rate control is a useful tool for video coding, especially in real-time communication applications. Most of existing rate control algorithms are based on the R-Q model, which characterizes the relationship between bitrate R and quantization Q , under the assumption that Q is the critical factor on rate control. However, with the video coding schemes becoming more and more flexible, it is very difficult to accurately model the R-Q relationship. In fact, we find that there exists a more robust correspondence between R and the Lagrange multiplier λ . Therefore, in this paper, we propose a novel λ -domain rate control algorithm based on the R-λ model, and implement it in the newest video coding standard high efficiency video coding (HEVC). Experimental results show that the proposed λ -domain rate control can achieve the target bitrates more accurately than the original rate control algorithm in the HEVC reference software as well as obtain significant R-D performance gain. Thanks to the high accurate rate control algorithm, hierarchical bit allocation can be enabled in the implemented video coding scheme, which can bring additional R-D performance gain. Experimental results demonstrate that the proposed λ -domain rate control algorithm is effective for HEVC, which outperforms the R-Q model based rate control in HM-8.0 (HEVC reference software) by 0.55 dB on average and up to 1.81 dB for low delay coding structure, and 1.08 dB on average and up to 3.77 dB for random access coding structure. The proposed λ -domain rate control algorithm has already been adopted by Joint Collaborative Team on Video Coding and integrated into the HEVC reference software.
••11 Jul 2016
TL;DR: Experimental results show the superior performance of the pseudo-sequence-based scheme, which achieves as high as 6.6 dB gain compared with directly encoding the raw image by the legacy JPEG.
Abstract: We propose a pseudo-sequence-based scheme for light field image compression. In our scheme, the raw image captured by a light field camera is decomposed into multiple views according to the lenslet array of that camera. These views constitute a pseudo sequence like video, and the redundancy between views is exploited by a video encoder. The specific coding order of views, prediction structure, and rate allocation have been investigated for encoding the pseudo sequence. Experimental results show the superior performance of our scheme, which achieves as high as 6.6 dB gain compared with directly encoding the raw image by the legacy JPEG.
TL;DR: A new CNN structure for up-sampling is explored, which features deconvolution of feature maps, multi-scale fusion, and residue learning, making the network both compact and efficient.
Abstract: Inspired by the recent advances of image super-resolution using convolutional neural network (CNN), we propose a CNN-based block up-sampling scheme for intra frame coding. A block can be down-sampled before being compressed by normal intra coding, and then up-sampled to its original resolution. Different from previous studies on down/up-sampling-based coding, the up-sampling methods in our scheme have been designed by training CNN instead of hand-crafted. We explore a new CNN structure for up-sampling, which features deconvolution of feature maps, multi-scale fusion, and residue learning, making the network both compact and efficient. We also design different networks for the up-sampling of luma and chroma components, respectively, where the chroma up-sampling CNN utilizes the luma information to boost its performance. In addition, we design a two-stage up-sampling process, the first stage being within the block-by-block coding loop, and the second stage being performed on the entire frame, so as to refine block boundaries. We also empirically study how to set the coding parameters of down-sampled blocks for pursuing the frame-level rate-distortion optimization. Our proposed scheme is implemented into the high-efficiency video coding (HEVC) reference software, and a comprehensive set of experiments have been performed to evaluate our methods. Experimental results show that our scheme achieves significant bits saving compared with the HEVC anchor, especially at low bit rates, leading to on average 5.5% BD-rate reduction on common test sequences and on average 9.0% BD-rate reduction on ultrahigh definition test sequences.
TL;DR: The requirements of image CR are translated into operable optimization targets for training CNN-CR and the visual quality of the compact resolved image is ensured by constraining its difference from a naively downsampled version and the information loss of imageCR is measured by upsampling/super-resolving the compact-resolved image and comparing that to the original image.
Abstract: We study the dual problem of image super-resolution (SR), which we term image compact-resolution (CR). Opposite to image SR that hallucinates a visually plausible high-resolution image given a low-resolution input, image CR provides a low-resolution version of a high-resolution image, such that the low-resolution version is both visually pleasing and as informative as possible compared to the high-resolution image. We propose a convolutional neural network (CNN) for image CR, namely, CNN-CR, inspired by the great success of CNN for image SR. Specifically, we translate the requirements of image CR into operable optimization targets for training CNN-CR: the visual quality of the compact resolved image is ensured by constraining its difference from a naively downsampled version and the information loss of image CR is measured by upsampling/super-resolving the compact-resolved image and comparing that to the original image. Accordingly, CNN-CR can be trained either separately or jointly with a CNN for image SR. We explore different training strategies as well as different network structures for CNN-CR. Our experimental results show that the proposed CNN-CR clearly outperforms simple bicubic downsampling and achieves on average 2.25 dB improvement in terms of the reconstruction quality on a large collection of natural images. We further investigate two applications of image CR, i.e., low-bit-rate image compression and image retargeting. Experimental results show that the proposed CNN-CR helps achieve significant bits saving than High Efficiency Video Coding when applied to image compression and produce visually pleasing results when applied to image retargeting.
TL;DR: A simplified affine motion model-based coding framework to overcome the limitation of a translational motion model and maintain low-computational complexity is studied.
Abstract: In this paper, we study a simplified affine motion model-based coding framework to overcome the limitation of a translational motion model and maintain low-computational complexity. The proposed framework mainly has three key contributions. First, we propose to reduce the number of affine motion parameters from 6 to 4. The proposed four-parameter affine motion model can not only handle most of the complex motions in natural videos, but also save the bits for two parameters. Second, to efficiently encode the affine motion parameters, we propose two motion prediction modes, i.e., an advanced affine motion vector prediction scheme combined with a gradient-based fast affine motion estimation algorithm and an affine model merge scheme, where the latter attempts to reuse the affine motion parameters (instead of the motion vectors) of neighboring blocks. Third, we propose two fast affine motion compensation algorithms. One is the one-step sub-pixel interpolation that reduces the computations of each interpolation. The other is the interpolation-precision-based adaptive block size motion compensation that performs motion compensation at the block level rather than the pixel level to reduce the number of interpolation. Our proposed techniques have been implemented based on the state-of-the-art high-efficiency video coding standard, and the experimental results show that the proposed techniques altogether achieve, on average, 11.1% and 19.3% bits saving for random access and low-delay configurations, respectively, on typical video sequences that have rich rotation or zooming motions. Meanwhile, the computational complexity increases of both the encoder and the decoder are within an acceptable range.
TL;DR: A comprehensive overview and discussion of research in light field image processing, including basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data are presented.
Abstract: Light field imaging has emerged as a technology allowing to capture richer visual information from our world. As opposed to traditional photography, which captures a 2D projection of the light in the scene integrating the angular domain, light fields collect radiance from rays in all directions, demultiplexing the angular information lost in conventional photography. On the one hand, this higher dimensional representation of visual data offers powerful capabilities for scene understanding, and substantially improves the performance of traditional computer vision problems such as depth sensing, post-capture refocusing, segmentation, video stabilization, material classification, etc. On the other hand, the high-dimensionality of light fields also brings up new challenges in terms of data capture, data compression, content editing, and display. Taking these two elements together, research in light field image processing has become increasingly popular in the computer vision, computer graphics, and signal processing communities. In this paper, we present a comprehensive overview and discussion of research in this field over the past 20 years. We focus on all aspects of light field image processing, including basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data.
TL;DR: The evolution and development of neural network-based compression methodologies are introduced for images and video respectively and the joint compression on semantic and visual information is tentatively explored to formulate high efficiency signal representation structure for both human vision and machine vision.
Abstract: In recent years, the image and video coding technologies have advanced by leaps and bounds. However, due to the popularization of image and video acquisition devices, the growth rate of image and video data is far beyond the improvement of the compression ratio. In particular, it has been widely recognized that there are increasing challenges of pursuing further coding performance improvement within the traditional hybrid coding framework. Deep convolution neural network which makes the neural network resurge in recent years and has achieved great success in both artificial intelligent and signal processing fields, also provides a novel and promising solution for image and video compression. In this paper, we provide a systematic, comprehensive and up-to-date review of neural network-based image and video compression techniques. The evolution and development of neural network-based compression methodologies are introduced for images and video respectively. More specifically, the cutting-edge video coding techniques by leveraging deep learning and HEVC framework are presented and discussed, which promote the state-of-the-art video coding performance substantially. Moreover, the end-to-end image and video coding frameworks based on neural networks are also reviewed, revealing interesting explorations on next generation image and video coding frameworks/standards. The most significant research works on the image and video coding related topics using neural networks are highlighted, and future trends are also envisioned. In particular, the joint compression on semantic and visual information is tentatively explored to formulate high efficiency signal representation structure for both human vision and machine vision, which are the two dominant signal receptors in the age of artificial intelligence.
TL;DR: This article reviews both datasets and visual attention modelling approaches for 360° video/image, which either utilize the spherical characteristics or visual attention models, and overviews the compression approaches.
Abstract: Nowadays, 360° video/image has been increasingly popular and drawn great attention. The spherical viewing range of 360° video/image accounts for huge data, which pose the challenges to 360° video/image processing in solving the bottleneck of storage, transmission, etc. Accordingly, the recent years have witnessed the explosive emergence of works on 360° video/image processing. In this article, we review the state-of-the-art works on 360° video/image processing from the aspects of perception, assessment and compression. First, this article reviews both datasets and visual attention modelling approaches for 360° video/image. Second, we survey the related works on both subjective and objective visual quality assessment (VQA) of 360° video/image. Third, we overview the compression approaches for 360° video/image, which either utilize the spherical characteristics or visual attention models. Finally, we summarize this overview article and outlook the future research trends on 360° video/image processing.
31 Dec 2020
TL;DR: The authors discusses various works by different researchers on linear regression and polynomial regression and compares their performance using the best approach to optimize prediction and precision, and almost all of the articles analyzed in this review is focused on datasets; in order to determine a model's efficiency, it must be correlated with the actual values obtained for the explanatory variables.
Abstract: Perhaps one of the most common and comprehensive statistical and machine learning algorithms are linear regression. Linear regression is used to find a linear relationship between one or more predictors. The linear regression has two types: simple regression and multiple regression (MLR). This paper discusses various works by different researchers on linear regression and polynomial regression and compares their performance using the best approach to optimize prediction and precision. Almost all of the articles analyzed in this review is focused on datasets; in order to determine a model's efficiency, it must be correlated with the actual values obtained for the explanatory variables.