Showing papers by "Xu Jizheng published in 2018"

PDF

Open Access

Journal Article•DOI•

Fully Connected Network-Based Intra Prediction for Image Coding

[...]

Jiahao Li¹, Bin Li², Xu Jizheng², Ruiqin Xiong¹, Wen Gao¹ - Show less +1 more•Institutions (2)

19 Mar 2018-IEEE Transactions on Image Processing

TL;DR: This paper proposes using a fully connected network to learn an end-to-end mapping from neighboring reconstructed pixels to the current block to generate better prediction using traditional single line-based methods.

...read moreread less

Abstract: This paper proposes a deep learning method for intra prediction. Different from traditional methods utilizing some fixed rules, we propose using a fully connected network to learn an end-to-end mapping from neighboring reconstructed pixels to the current block. In the proposed method, the network is fed by multiple reference lines. Compared with traditional single line-based methods, more contextual information of the current block is utilized. For this reason, the proposed network has the potential to generate better prediction. In addition, the proposed network has good generalization ability on different bitrate settings. The model trained from a specified bitrate setting also works well on other bitrate settings. Experimental results demonstrate the effectiveness of the proposed method. When compared with high efficiency video coding reference software HM-16.9, our network can achieve an average of 3.4% bitrate saving. In particular, the average result of 4K sequences is 4.5% bitrate saving, where the maximum one is 7.4%.

...read moreread less

143 citations

Book Chapter•DOI•

Affinity Derivation and Graph Merge for Instance Segmentation

[...]

Yiding Liu¹, Siyu Yang², Bin Li³, Wengang Zhou¹, Xu Jizheng³, Houqiang Li¹, Yan Lu³ - Show less +3 more•Institutions (3)

University of Science and Technology of China¹, Beihang University², Microsoft³

08 Sep 2018

TL;DR: This work presents an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to the same instance, which uses two neural networks with similar structures to cluster pixels into instances.

...read moreread less

Abstract: We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to the same instance. In our scheme, we use two neural networks with similar structures. One predicts the pixel level semantic score and the other is designed to derive pixel affinities. Regarding pixels as the vertexes and affinities as edges, we then propose a simple yet effective graph merge algorithm to cluster pixels into instances. Experiments show that our scheme generates fine grained instance masks. With Cityscape training data, the proposed scheme achieves 27.3 AP on test set.

...read moreread less

91 citations

Proceedings Article•

Frequency-Domain Dynamic Pruning for Convolutional Neural Networks

[...]

Zhenhua Liu, Xu Jizheng¹, Xiulian Peng¹, Ruiqin Xiong²•Institutions (2)

Microsoft¹, Peking University²

02 Dec 2018

TL;DR: Experimental results demonstrate that the proposed scheme can outperform previous spatial-domain counterparts by a large margin and can achieve a compression ratio of 8.4x and a theoretical inference speed-up of 9.2x for ResNet-110, while the accuracy is even better than the reference model on CIFAR-110.

...read moreread less

Abstract: Deep convolutional neural networks have demonstrated their powerfulness in a variety of applications. However, the storage and computational requirements have largely restricted their further extensions on mobile devices. Recently, pruning of unimportant parameters has been used for both network compression and acceleration. Considering that there are spatial redundancy within most filters in a CNN, we propose a frequency-domain dynamic pruning scheme to exploit the spatial correlations. The frequency-domain coefficients are pruned dynamically in each iteration and different frequency bands are pruned discriminatively, given their different importance on accuracy. Experimental results demonstrate that the proposed scheme can outperform previous spatial-domain counterparts by a large margin. Specifically, it can achieve a compression ratio of 8.4x and a theoretical inference speed-up of 9.2x for ResNet-110, while the accuracy is even better than the reference model on CIFAR-110.

...read moreread less

84 citations

Posted Content•

Affinity Derivation and Graph Merge for Instance Segmentation

[...]

Yiding Liu¹, Siyu Yang², Bin Li³, Wengang Zhou¹, Xu Jizheng³, Houqiang Li¹, Yan Lu³ - Show less +3 more•Institutions (3)

University of Science and Technology of China¹, Beihang University², Microsoft³

27 Nov 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as mentioned in this paper presented an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to the same instance, and used two neural networks with similar structure.

...read moreread less

Abstract: We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to a same instance. In our scheme, we use two neural networks with similar structure. One is to predict pixel level semantic score and the other is designed to derive pixel affinities. Regarding pixels as the vertexes and affinities as edges, we then propose a simple yet effective graph merge algorithm to cluster pixels into instances. Experimental results show that our scheme can generate fine-grained instance mask. With Cityscapes training data, the proposed scheme achieves 27.3 AP on test set.

...read moreread less

67 citations

Journal Article•DOI•

Efficient Multiple-Line-Based Intra Prediction for HEVC

[...]

Jiahao Li¹, Bin Li¹, Xu Jizheng¹, Ruiqin Xiong²•Institutions (2)

Microsoft¹, Peking University²

01 Apr 2018-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This paper proposes an efficient multiple-line-based intra-prediction scheme to improve coding efficiency and designs several fast algorithms to speed up the encoding process.

...read moreread less

Abstract: Traditional intra prediction usually utilizes the nearest reference line to generate the predicted block when considering strong spatial correlation. However, this kind of single-line-based method does not always work well due to at least two issues. One is the incoherence caused by the signal noise or the texture of other objects, where this texture deviates from the inherent texture of the current block. The other reason is that the nearest reference line usually has worse reconstruction quality in block-based video coding. Due to these two issues, this paper proposes an efficient multiple-line-based intra-prediction scheme to improve coding efficiency. Besides the nearest reference line, further reference lines are also utilized. The further reference lines with a relatively higher quality can provide potentially better prediction. At the same time, the residue compensation is introduced to calibrate the prediction of boundary regions in a block when we utilize further reference lines. To speed up the encoding process, this paper designs several fast algorithms. The experimental results show that compared with HM-16.9, the proposed fast search method achieves a 2.0% bit saving on average and up to 3.7% by increasing the encoding time by 112%.

...read moreread less

44 citations

Journal Article•DOI•

Variable Block-Sized Signal-Dependent Transform for Video Coding

[...]

Cuiling Lan¹, Xu Jizheng¹, Wenjun Zeng¹, Guangming Shi², Feng Wu³ - Show less +1 more•Institutions (3)

Microsoft¹, Xidian University², University of Science and Technology of China³

01 Aug 2018-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This paper presents a variable block-sized signal-dependent transforms (SDTs) design based on the High Efficiency Video Coding (HEVC) framework and presents a fast algorithm for transform derivation, which shows the effectiveness of the SDTs for different block sizes.

...read moreread less

Abstract: Transform, as one of the most important modules of mainstream video coding systems, seems very stable over the past several decades. However, recent developments indicate that bringing more options for transform can lead to coding efficiency benefits. In this paper, we go further to investigate how the coding efficiency can be improved over the state-of-the-art method by adapting a transform for each block. We present a variable block-sized signal-dependent transforms (SDTs) design based on the High Efficiency Video Coding (HEVC) framework. For a coding block ranged from $4\times4$ to $32\times32$ , we collect a quantity of similar blocks from the reconstructed area and use them to derive the Karhunen–Loeve transform. We avoid sending overhead bits to denote the transform by performing the same procedure at the decoder. In this way, the transform for every block is tailored according to its statistics, to be signal-dependent. To make the large block-sized SDTs feasible, we present a fast algorithm for transform derivation. Experimental results show the effectiveness of the SDTs for different block sizes, which leads to up to 23.3% bit-saving. On average, we achieve BD-rate saving of 2.2%, 2.4%, 3.3%, and 7.1% under AI-Main10, RA-Main10, RA-Main10, and LP-Main10 configurations, respectively, compared with the test model HM-12 of HEVC. The proposed scheme has also been adopted into the joint exploration test model for the exploration of potential future video coding standard.

...read moreread less

29 citations

Posted Content•

Weakly Supervised Bilinear Attention Network for Fine-Grained Visual Classification

[...]

Tao Hu¹, Honggang Qi, Cong Huang, Qingming Huang, Yan Lu, Xu Jizheng - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

06 Aug 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: Weakly Supervised Bilinear Attention Network (WS-BAN) is proposed and achieves the state-of-the-art performance on multiple fine-grained classification datasets, including CUB-200-2011, Stanford Car and FGVC-Aircraft, which demonstrated its effectiveness.

...read moreread less

Abstract: For fine-grained visual classification, objects usually share similar geometric structure but present variant local appearance and different pose. Therefore, localizing and extracting discriminative local features play a crucial role in accurate category prediction. Existing works either pay attention to limited object parts or train isolated networks for locating and classification. In this paper, we propose Weakly Supervised Bilinear Attention Network (WS-BAN) to solve these issues. It jointly generates a set of attention maps (region-of-interest maps) to indicate the locations of object's parts and extracts sequential part features by Bilinear Attention Pooling (BAP). Besides, we propose attention regularization and attention dropout to weakly supervise the generating process of attention maps. WS-BAN can be trained end-to-end and achieves the state-of-the-art performance on multiple fine-grained classification datasets, including CUB-200-2011, Stanford Car and FGVC-Aircraft, which demonstrated its effectiveness.

...read moreread less

13 citations

Proceedings Article•DOI•

Intra Block Copy for Screen Content in the Emerging AV1 Video Codec

[...]

Jiahao Li¹, Hui Su², Alexander Jay Converse², Bin Li³, Roger Zhou³, Bruce Lin³, Xu Jizheng³, Yan Lu³, Ruiqin Xiong¹ - Show less +5 more•Institutions (3)

Peking University¹, Google², Microsoft³

27 Mar 2018

TL;DR: The technology of intra block copy in AV1 uses the hash matching method at the encoder side to efficiently search the predictor in the reconstructed regions of the current picture, and achieves 12.2% bitrate saving.

...read moreread less

Abstract: Screen content coding plays an important role in many applications. To meet the growing demands of screen content coding, the emerging AV1 video codec incorporates several coding tools, which are specially designed for screen content utilizing its distinctive characteristics. Among these tools, the intra block copy utilizes the characteristic that repeating patterns frequently occur in screen content. This paper presents the technology of intra block copy in AV1. In particular, to efficiently search the predictor in the reconstructed regions of the current picture, AV1 uses the hash matching method at the encoder side. For the generation of hash table, a bottom-to-up manner is adopted to reduce the redundant computation and then decrease the encoding time. In addition, several constraints are involved to facilitate hardware design. Experimental results demonstrate that the intra block copy in AV1 can bring 27.1% bitrate saving for screen content. When compared with the non hash-based intra block copy, the hash-based method achieves 12.2% bitrate saving.

...read moreread less

12 citations

Posted Content•

Facial Landmarks Detection by Self-Iterative Regression based Landmarks-Attention Network

[...]

Tao Hu¹, Honggang Qi¹, Xu Jizheng², Qingming Huang¹•Institutions (2)

Chinese Academy of Sciences¹, Microsoft²

18 Mar 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a self-iterative regressor is trained to learn the descent directions for samples from coarse stages to fine stages, and parameters are iteratively updated by the same regressor.

...read moreread less

Abstract: Cascaded Regression (CR) based methods have been proposed to solve facial landmarks detection problem, which learn a series of descent directions by multiple cascaded regressors separately trained in coarse and fine stages. They outperform the traditional gradient descent based methods in both accuracy and running speed. However, cascaded regression is not robust enough because each regressor's training data comes from the output of previous regressor. Moreover, training multiple regressors requires lots of computing resources, especially for deep learning based methods. In this paper, we develop a Self-Iterative Regression (SIR) framework to improve the model efficiency. Only one self-iterative regressor is trained to learn the descent directions for samples from coarse stages to fine stages, and parameters are iteratively updated by the same regressor. Specifically, we proposed Landmarks-Attention Network (LAN) as our regressor, which concurrently learns features around each landmark and obtains the holistic location increment. By doing so, not only the rest of regressors are removed to simplify the training process, but the number of model parameters is significantly decreased. The experiments demonstrate that with only 3.72M model parameters, our proposed method achieves the state-of-the-art performance.

...read moreread less

8 citations

Patent•

Video processing method and apparatus

[...]

Liu Hongbin, Li Zhang, Zhang Kai, Xu Jizheng, Wang Yue - Show less +1 more

22 May 2018

TL;DR: In this paper, the authors proposed a method to divide a high-resolution video into first low-resolution sub-videos of a plurality of display areas, and then encode the first lowresolution subvideos to obtain sub-video code streams.

...read moreread less

Abstract: The embodiment of the invention provides a video processing method and apparatus, and relates to the technical field of video processing The method comprises the following steps: dividing a high-resolution video into first low-resolution sub-videos of a plurality of display areas; encoding the first low-resolution sub-videos to obtain sub-video code streams; dividing each sub-video code stream into a plurality of code stream units; sorting the plurality of code stream units of the sub-video code streams to obtain a low-resolution target video stream; and sending the target video stream to a video playing device for decoding and playing By adoption of the video processing method and apparatus, the high-resolution video can be divided into a plurality of low-resolution sub-videos to reducethe video resolution, the plurality of high-resolution videos are encoded and arranged to form the low-resolution target video stream, and the low-resolution target video stream is sent to the videoplaying device for decoding and playing so as to improve the transmission efficiency In this way, the video playing device can decode and play the high-resolution video through a decoder of general videos

...read moreread less

5 citations

Journal Article•DOI•

Diversity-Based Reference Picture Management for Low Delay Screen Content Coding

[...]

Jiahao Li¹, Bin Li¹, Xu Jizheng¹, Ruiqin Xiong²•Institutions (2)

Microsoft¹, Peking University²

01 Jun 2018-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The concept of diversity is introduced for the reference picture set (RPS) to help formulate the RPM problem, and the bit saving of the proposed scheme is 4.9% on average and up to 13.7%, without increasing encoding time.

...read moreread less

Abstract: Screen content coding plays an important role in many applications. Conventional reference picture management (RPM) strategies developed for natural content may not work well for screen content. This is because many regions in screen content remain static for a long time, causing a lot of repetitive contents to stay in the decoded picture buffer. The repetitive contents are not conducive to inter prediction, but still occupy valuable memory. This paper proposes a diversity-based RPM scheme for screen content coding. The concept of diversity is introduced for the reference picture set (RPS) to help formulate the RPM problem. By maximizing the diversity of RPS, more potentially better predictions are provided. Better compression performance can then be achieved. Meanwhile, the proposed scheme is nonnormative and compatible with existing video coding standards, such as High Efficiency Video Coding. The experimental results show that, for low delay screen content coding, the bit saving of the proposed scheme is 4.9% on average and up to 13.7%, without increasing encoding time.

...read moreread less

Patent•

Real-time screen sharing

[...]

Bin Li¹, Xu Jizheng, Yan Lu•Institutions (1)

Microsoft¹

27 Dec 2018

TL;DR: In this paper, a method and apparatus for real-time screen sharing is provided, in which a certain condition is satisfied for an image encoded by a first device and an image decoded by a second device, if the pause time of pausing satisfies a certain length, a parameter associated with an image compression ratio is adjusted.

...read moreread less

Abstract: In implementations of the subject matter described herein, there are provided a method and apparatus for real time screen sharing. During the screen sharing of two devices, if a certain condition is satisfied for an image encoded by a first device and an image decoded by a second device, the first device pauses image processing images. If the pause time of pausing satisfies a certain length, a parameter associated with an image compression ratio is adjusted. After the first device resumes image processing, the adjusted parameter is used to encode a new image captured on the first device. According to implementations of the subject matter described herein, a transmission code rate during the screen sharing can be controlled according to the pause time of pausing image processing. The implementations of the subject matter described herein can reduce transmission latency of screen sharing, thereby effectively ensuring the user experience.

...read moreread less

Patent•

Quantization/scaling and inverse quantization/scaling adjustment when switching color spaces

[...]

Bin Li, Xu Jizheng, J. Gary Sullivan

23 Mar 2018

TL;DR: In this article, the color space adjustment coefficients for each component when switching from RGB to YCoCg were adjusted to increase coding efficiency when switching between color spaces during encoding and decoding.

...read moreread less

Abstract: FIELD: data processing.SUBSTANCE: invention relates to adaptive coding and decoding. Encoding of image elements including a quantization correction or scaling for the color components of the second color space in accordance with the color space adjustment coefficients for each component when switching from RGB to YCoCg the color space between two of the elements. Adjustment includes adjusting final or intermediate values of quantization parameter (QP) for color components YCoCg of color space, the QpY variable indicating intermediate QP value for the first color component for the RGB color space, and the color space correction factors for each component correct QpY by -5, -3 and -5 for the Y, Co, and Cg components, respectively.EFFECT: technical result is to increase coding efficiency when switching between color spaces during encoding and decoding.18 cl, 17 dwg

...read moreread less