Fully Connected Network-Based Intra Prediction for Image Coding

doi:10.1109/TIP.2018.2817044

Home
/
Papers
/
Fully Connected Network-Based Intra Prediction for Image Coding

Journal Article•DOI•

Fully Connected Network-Based Intra Prediction for Image Coding

Jiahao Li¹, Bin Li², Xu Jizheng², Ruiqin Xiong¹, Wen Gao¹ - Show less +1 more•Institutions (2)

Peking University¹, Microsoft²

19 Mar 2018-IEEE Transactions on Image Processing (IEEE)-Vol. 27, Iss: 7, pp 3236-3247

TL;DR: This paper proposes using a fully connected network to learn an end-to-end mapping from neighboring reconstructed pixels to the current block to generate better prediction using traditional single line-based methods.

read less

Abstract: This paper proposes a deep learning method for intra prediction. Different from traditional methods utilizing some fixed rules, we propose using a fully connected network to learn an end-to-end mapping from neighboring reconstructed pixels to the current block. In the proposed method, the network is fed by multiple reference lines. Compared with traditional single line-based methods, more contextual information of the current block is utilized. For this reason, the proposed network has the potential to generate better prediction. In addition, the proposed network has good generalization ability on different bitrate settings. The model trained from a specified bitrate setting also works well on other bitrate settings. Experimental results demonstrate the effectiveness of the proposed method. When compared with high efficiency video coding reference software HM-16.9, our network can achieve an average of 3.4% bitrate saving. In particular, the average result of 4K sequences is 4.5% bitrate saving, where the maximum one is 7.4%.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Image and Video Compression With Neural Networks: A Review

[...]

Siwei Ma¹, Xinfeng Zhang², Chuanmin Jia¹, Zhenghui Zhao¹, Shiqi Wang³, Shanshe Wang¹ - Show less +2 more•Institutions (3)

Peking University¹, Chinese Academy of Sciences², City University of Hong Kong³

01 Jun 2020-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The evolution and development of neural network-based compression methodologies are introduced for images and video respectively and the joint compression on semantic and visual information is tentatively explored to formulate high efficiency signal representation structure for both human vision and machine vision.

...read moreread less

Abstract: In recent years, the image and video coding technologies have advanced by leaps and bounds. However, due to the popularization of image and video acquisition devices, the growth rate of image and video data is far beyond the improvement of the compression ratio. In particular, it has been widely recognized that there are increasing challenges of pursuing further coding performance improvement within the traditional hybrid coding framework. Deep convolution neural network which makes the neural network resurge in recent years and has achieved great success in both artificial intelligent and signal processing fields, also provides a novel and promising solution for image and video compression. In this paper, we provide a systematic, comprehensive and up-to-date review of neural network-based image and video compression techniques. The evolution and development of neural network-based compression methodologies are introduced for images and video respectively. More specifically, the cutting-edge video coding techniques by leveraging deep learning and HEVC framework are presented and discussed, which promote the state-of-the-art video coding performance substantially. Moreover, the end-to-end image and video coding frameworks based on neural networks are also reviewed, revealing interesting explorations on next generation image and video coding frameworks/standards. The most significant research works on the image and video coding related topics using neural networks are highlighted, and future trends are also envisioned. In particular, the joint compression on semantic and visual information is tentatively explored to formulate high efficiency signal representation structure for both human vision and machine vision, which are the two dominant signal receptors in the age of artificial intelligence.

...read moreread less

235 citations

Cites background or methods from "Fully Connected Network-Based Intra..."

...The network structure of IPFCN [70]....
[...]
...proposed a new intra prediction mode using fully connected network (IPFCN) [70], which competes with the existing 35 HEVC intra prediction modes....
[...]
...TABLE I THE CODING PERFORMANCE OF IPFCN [70] UNDER COMMON TEST CONDITION WITH FULL LENGTH SEQUENCE....
[...]

Journal Article•DOI•

Nonlinear Transform Coding

[...]

Johannes Ballé¹, Philip A. Chou¹, David Minnen¹, Saurabh Singh¹, Nick Johnston¹, Eirikur Agustsson¹, Sung Jin Hwang¹, George Toderici¹ - Show less +4 more•Institutions (1)

Google¹

01 Feb 2021-IEEE Journal of Selected Topics in Signal Processing

TL;DR: A novel variant of entropy-constrained vector quantization, based on artificial neural networks, as well as learned entropy models, is introduced to assess the empirical rate–distortion performance of nonlinear transform coding methods.

...read moreread less

Abstract: We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate–distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate–distortion performance of NTC with the help of simple example sources, for which the optimal performance of a vector quantizer is easier to estimate than with natural data sources. To this end, we introduce a novel variant of entropy-constrained vector quantization. We provide an analysis of various forms of stochastic optimization techniques for NTC models; review architectures of transforms based on artificial neural networks, as well as learned entropy models; and provide a direct comparison of a number of methods to parameterize the rate–distortion trade-off of nonlinear transforms, introducing a simplified one.

...read moreread less

123 citations

Journal Article•DOI•

Learning a Convolutional Neural Network for Image Compact-Resolution

[...]

Yue Li¹, Dong Liu¹, Houqiang Li¹, Li Li², Zhu Li², Feng Wu¹ - Show less +2 more•Institutions (2)

University of Science and Technology of China¹, University of Missouri–Kansas City²

01 Mar 2019-IEEE Transactions on Image Processing

TL;DR: The requirements of image CR are translated into operable optimization targets for training CNN-CR and the visual quality of the compact resolved image is ensured by constraining its difference from a naively downsampled version and the information loss of imageCR is measured by upsampling/super-resolving the compact-resolved image and comparing that to the original image.

...read moreread less

Abstract: We study the dual problem of image super-resolution (SR), which we term image compact-resolution (CR). Opposite to image SR that hallucinates a visually plausible high-resolution image given a low-resolution input, image CR provides a low-resolution version of a high-resolution image, such that the low-resolution version is both visually pleasing and as informative as possible compared to the high-resolution image. We propose a convolutional neural network (CNN) for image CR, namely, CNN-CR, inspired by the great success of CNN for image SR. Specifically, we translate the requirements of image CR into operable optimization targets for training CNN-CR: the visual quality of the compact resolved image is ensured by constraining its difference from a naively downsampled version and the information loss of image CR is measured by upsampling/super-resolving the compact-resolved image and comparing that to the original image. Accordingly, CNN-CR can be trained either separately or jointly with a CNN for image SR. We explore different training strategies as well as different network structures for CNN-CR. Our experimental results show that the proposed CNN-CR clearly outperforms simple bicubic downsampling and achieves on average 2.25 dB improvement in terms of the reconstruction quality on a large collection of natural images. We further investigate two applications of image CR, i.e., low-bit-rate image compression and image retargeting. Experimental results show that the proposed CNN-CR helps achieve significant bits saving than High Efficiency Video Coding when applied to image compression and produce visually pleasing results when applied to image retargeting.

...read moreread less

104 citations

Journal Article•DOI•

Deep Learning-Based Video Coding: A Review and a Case Study

[...]

Dong Liu¹, Yue Li¹, Jianping Lin¹, Houqiang Li¹, Feng Wu¹ - Show less +1 more•Institutions (1)

University of Science and Technology of China¹

05 Feb 2020-ACM Computing Surveys

TL;DR: Deep Learning Video Coding (DLVC) as mentioned in this paper is based on convolutional neural network (CNN) and block adaptive resolution coding (BLRC) for image/video coding.

...read moreread less

Abstract: The past decade has witnessed the great success of deep learning in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. We review the representative works about using deep learning for image/video coding, an actively developing research area since 2015. We divide the related works into two categories: new coding schemes that are built primarily upon deep networks, and deep network-based coding tools that shall be used within traditional coding schemes. For deep schemes, pixel probability modeling and auto-encoder are the two approaches, that can be viewed as predictive coding and transform coding, respectively. For deep tools, there have been several techniques using deep learning to perform intra-picture prediction, inter-picture prediction, cross-channel prediction, probability distribution prediction, transform, post- or in-loop filtering, down- and up-sampling, as well as encoding optimizations. In the hope of advocating the research of deep learning-based video coding, we present a case study of our developed prototype video codec, Deep Learning Video Coding (DLVC). DLVC features two deep tools that are both based on convolutional neural network (CNN), namely CNN-based in-loop filter and CNN-based block adaptive resolution coding. The source code of DLVC has been released for future research.

...read moreread less

69 citations

Journal Article•DOI•

Artificial Intelligence in the Creative Industries: A Review

[...]

Nantheera Anantrasirichai¹, David Bull¹•Institutions (1)

University of Bristol¹

24 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is concluded that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity.

...read moreread less

Abstract: This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity.

...read moreread less

68 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

Collapse

References

PDF

Open Access

More filters

Book Chapter•DOI•

Learning internal representations by error propagation

[...]

David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams

01 Jan 1988

TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.

...read moreread less

Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

...read moreread less

17,604 citations

"Fully Connected Network-Based Intra..." refers background in this paper

...The fully connected network, also known as multi-layer perceptron [15], is widely used in various tasks....
[...]

Book•

Learning internal representations by error propagation

[...]

David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams

03 Jan 1986

TL;DR: In this paper, the problem of the generalized delta rule is discussed and the Generalized Delta Rule is applied to the simulation results of simulation results in terms of the generalized delta rule.

...read moreread less

Abstract: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion

...read moreread less

13,579 citations

Posted Content•

Caffe: Convolutional Architecture for Fast Feature Embedding

[...]

Yangqing Jia¹, Evan Shelhamer², Jeff Donahue², Sergey Karayev², Jonathan Long², Ross Girshick², Sergio Guadarrama², Trevor Darrell² - Show less +4 more•Institutions (2)

Google¹, University of California, Berkeley²

20 Jun 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU ($\approx$ 2.5 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

...read moreread less

12,531 citations

Posted Content•

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

[...]

Kaiming He¹, Xiangyu Zhang², Shaoqing Ren¹, Jian Sun¹•Institutions (2)

Microsoft¹, Xi'an Jiaotong University²

06 Feb 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.

...read moreread less

Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.

...read moreread less

11,866 citations

"Fully Connected Network-Based Intra..." refers methods in this paper

...In this paper, we take the parametric rectified linear unit (PReLU) [36] as the non-linear...
[...]

Proceedings Article•DOI•

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

[...]

Kaiming He¹, Xiangyu Zhang², Shaoqing Ren¹, Jian Sun¹•Institutions (2)

Microsoft¹, Xi'an Jiaotong University²

07 Dec 2015

TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.

...read moreread less

Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66% [33]). To our knowledge, our result is the first to surpass the reported human-level performance (5.1%, [26]) on this dataset.

...read moreread less

11,732 citations