scispace - formally typeset
Search or ask a question

Showing papers by "Xu Jizheng published in 2018"


Journal Article•DOI•
Jiahao Li1, Bin Li2, Xu Jizheng2, Ruiqin Xiong1, Wen Gao1 •
TL;DR: This paper proposes using a fully connected network to learn an end-to-end mapping from neighboring reconstructed pixels to the current block to generate better prediction using traditional single line-based methods.
Abstract: This paper proposes a deep learning method for intra prediction. Different from traditional methods utilizing some fixed rules, we propose using a fully connected network to learn an end-to-end mapping from neighboring reconstructed pixels to the current block. In the proposed method, the network is fed by multiple reference lines. Compared with traditional single line-based methods, more contextual information of the current block is utilized. For this reason, the proposed network has the potential to generate better prediction. In addition, the proposed network has good generalization ability on different bitrate settings. The model trained from a specified bitrate setting also works well on other bitrate settings. Experimental results demonstrate the effectiveness of the proposed method. When compared with high efficiency video coding reference software HM-16.9, our network can achieve an average of 3.4% bitrate saving. In particular, the average result of 4K sequences is 4.5% bitrate saving, where the maximum one is 7.4%.

143 citations


Book Chapter•DOI•
08 Sep 2018
TL;DR: This work presents an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to the same instance, which uses two neural networks with similar structures to cluster pixels into instances.
Abstract: We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to the same instance. In our scheme, we use two neural networks with similar structures. One predicts the pixel level semantic score and the other is designed to derive pixel affinities. Regarding pixels as the vertexes and affinities as edges, we then propose a simple yet effective graph merge algorithm to cluster pixels into instances. Experiments show that our scheme generates fine grained instance masks. With Cityscape training data, the proposed scheme achieves 27.3 AP on test set.

91 citations


Proceedings Article•
02 Dec 2018
TL;DR: Experimental results demonstrate that the proposed scheme can outperform previous spatial-domain counterparts by a large margin and can achieve a compression ratio of 8.4x and a theoretical inference speed-up of 9.2x for ResNet-110, while the accuracy is even better than the reference model on CIFAR-110.
Abstract: Deep convolutional neural networks have demonstrated their powerfulness in a variety of applications. However, the storage and computational requirements have largely restricted their further extensions on mobile devices. Recently, pruning of unimportant parameters has been used for both network compression and acceleration. Considering that there are spatial redundancy within most filters in a CNN, we propose a frequency-domain dynamic pruning scheme to exploit the spatial correlations. The frequency-domain coefficients are pruned dynamically in each iteration and different frequency bands are pruned discriminatively, given their different importance on accuracy. Experimental results demonstrate that the proposed scheme can outperform previous spatial-domain counterparts by a large margin. Specifically, it can achieve a compression ratio of 8.4x and a theoretical inference speed-up of 9.2x for ResNet-110, while the accuracy is even better than the reference model on CIFAR-110.

84 citations


Posted Content•
TL;DR: Zhang et al. as mentioned in this paper presented an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to the same instance, and used two neural networks with similar structure.
Abstract: We present an instance segmentation scheme based on pixel affinity information, which is the relationship of two pixels belonging to a same instance. In our scheme, we use two neural networks with similar structure. One is to predict pixel level semantic score and the other is designed to derive pixel affinities. Regarding pixels as the vertexes and affinities as edges, we then propose a simple yet effective graph merge algorithm to cluster pixels into instances. Experimental results show that our scheme can generate fine-grained instance mask. With Cityscapes training data, the proposed scheme achieves 27.3 AP on test set.

67 citations


Journal Article•DOI•
TL;DR: This paper proposes an efficient multiple-line-based intra-prediction scheme to improve coding efficiency and designs several fast algorithms to speed up the encoding process.
Abstract: Traditional intra prediction usually utilizes the nearest reference line to generate the predicted block when considering strong spatial correlation. However, this kind of single-line-based method does not always work well due to at least two issues. One is the incoherence caused by the signal noise or the texture of other objects, where this texture deviates from the inherent texture of the current block. The other reason is that the nearest reference line usually has worse reconstruction quality in block-based video coding. Due to these two issues, this paper proposes an efficient multiple-line-based intra-prediction scheme to improve coding efficiency. Besides the nearest reference line, further reference lines are also utilized. The further reference lines with a relatively higher quality can provide potentially better prediction. At the same time, the residue compensation is introduced to calibrate the prediction of boundary regions in a block when we utilize further reference lines. To speed up the encoding process, this paper designs several fast algorithms. The experimental results show that compared with HM-16.9, the proposed fast search method achieves a 2.0% bit saving on average and up to 3.7% by increasing the encoding time by 112%.

44 citations


Journal Article•DOI•
TL;DR: This paper presents a variable block-sized signal-dependent transforms (SDTs) design based on the High Efficiency Video Coding (HEVC) framework and presents a fast algorithm for transform derivation, which shows the effectiveness of the SDTs for different block sizes.
Abstract: Transform, as one of the most important modules of mainstream video coding systems, seems very stable over the past several decades. However, recent developments indicate that bringing more options for transform can lead to coding efficiency benefits. In this paper, we go further to investigate how the coding efficiency can be improved over the state-of-the-art method by adapting a transform for each block. We present a variable block-sized signal-dependent transforms (SDTs) design based on the High Efficiency Video Coding (HEVC) framework. For a coding block ranged from $4\times4$ to $32\times32$ , we collect a quantity of similar blocks from the reconstructed area and use them to derive the Karhunen–Loeve transform. We avoid sending overhead bits to denote the transform by performing the same procedure at the decoder. In this way, the transform for every block is tailored according to its statistics, to be signal-dependent. To make the large block-sized SDTs feasible, we present a fast algorithm for transform derivation. Experimental results show the effectiveness of the SDTs for different block sizes, which leads to up to 23.3% bit-saving. On average, we achieve BD-rate saving of 2.2%, 2.4%, 3.3%, and 7.1% under AI-Main10, RA-Main10, RA-Main10, and LP-Main10 configurations, respectively, compared with the test model HM-12 of HEVC. The proposed scheme has also been adopted into the joint exploration test model for the exploration of potential future video coding standard.

29 citations


Posted Content•
Tao Hu1, Honggang Qi, Cong Huang, Qingming Huang, Yan Lu, Xu Jizheng •
TL;DR: Weakly Supervised Bilinear Attention Network (WS-BAN) is proposed and achieves the state-of-the-art performance on multiple fine-grained classification datasets, including CUB-200-2011, Stanford Car and FGVC-Aircraft, which demonstrated its effectiveness.
Abstract: For fine-grained visual classification, objects usually share similar geometric structure but present variant local appearance and different pose. Therefore, localizing and extracting discriminative local features play a crucial role in accurate category prediction. Existing works either pay attention to limited object parts or train isolated networks for locating and classification. In this paper, we propose Weakly Supervised Bilinear Attention Network (WS-BAN) to solve these issues. It jointly generates a set of attention maps (region-of-interest maps) to indicate the locations of object's parts and extracts sequential part features by Bilinear Attention Pooling (BAP). Besides, we propose attention regularization and attention dropout to weakly supervise the generating process of attention maps. WS-BAN can be trained end-to-end and achieves the state-of-the-art performance on multiple fine-grained classification datasets, including CUB-200-2011, Stanford Car and FGVC-Aircraft, which demonstrated its effectiveness.

13 citations


Proceedings Article•DOI•
27 Mar 2018
TL;DR: The technology of intra block copy in AV1 uses the hash matching method at the encoder side to efficiently search the predictor in the reconstructed regions of the current picture, and achieves 12.2% bitrate saving.
Abstract: Screen content coding plays an important role in many applications. To meet the growing demands of screen content coding, the emerging AV1 video codec incorporates several coding tools, which are specially designed for screen content utilizing its distinctive characteristics. Among these tools, the intra block copy utilizes the characteristic that repeating patterns frequently occur in screen content. This paper presents the technology of intra block copy in AV1. In particular, to efficiently search the predictor in the reconstructed regions of the current picture, AV1 uses the hash matching method at the encoder side. For the generation of hash table, a bottom-to-up manner is adopted to reduce the redundant computation and then decrease the encoding time. In addition, several constraints are involved to facilitate hardware design. Experimental results demonstrate that the intra block copy in AV1 can bring 27.1% bitrate saving for screen content. When compared with the non hash-based intra block copy, the hash-based method achieves 12.2% bitrate saving.

12 citations


Posted Content•
TL;DR: In this paper, a self-iterative regressor is trained to learn the descent directions for samples from coarse stages to fine stages, and parameters are iteratively updated by the same regressor.
Abstract: Cascaded Regression (CR) based methods have been proposed to solve facial landmarks detection problem, which learn a series of descent directions by multiple cascaded regressors separately trained in coarse and fine stages. They outperform the traditional gradient descent based methods in both accuracy and running speed. However, cascaded regression is not robust enough because each regressor's training data comes from the output of previous regressor. Moreover, training multiple regressors requires lots of computing resources, especially for deep learning based methods. In this paper, we develop a Self-Iterative Regression (SIR) framework to improve the model efficiency. Only one self-iterative regressor is trained to learn the descent directions for samples from coarse stages to fine stages, and parameters are iteratively updated by the same regressor. Specifically, we proposed Landmarks-Attention Network (LAN) as our regressor, which concurrently learns features around each landmark and obtains the holistic location increment. By doing so, not only the rest of regressors are removed to simplify the training process, but the number of model parameters is significantly decreased. The experiments demonstrate that with only 3.72M model parameters, our proposed method achieves the state-of-the-art performance.

8 citations


Patent•
Liu Hongbin, Li Zhang, Zhang Kai, Xu Jizheng, Wang Yue 
22 May 2018
TL;DR: In this paper, the authors proposed a method to divide a high-resolution video into first low-resolution sub-videos of a plurality of display areas, and then encode the first lowresolution subvideos to obtain sub-video code streams.
Abstract: The embodiment of the invention provides a video processing method and apparatus, and relates to the technical field of video processing The method comprises the following steps: dividing a high-resolution video into first low-resolution sub-videos of a plurality of display areas; encoding the first low-resolution sub-videos to obtain sub-video code streams; dividing each sub-video code stream into a plurality of code stream units; sorting the plurality of code stream units of the sub-video code streams to obtain a low-resolution target video stream; and sending the target video stream to a video playing device for decoding and playing By adoption of the video processing method and apparatus, the high-resolution video can be divided into a plurality of low-resolution sub-videos to reducethe video resolution, the plurality of high-resolution videos are encoded and arranged to form the low-resolution target video stream, and the low-resolution target video stream is sent to the videoplaying device for decoding and playing so as to improve the transmission efficiency In this way, the video playing device can decode and play the high-resolution video through a decoder of general videos

5 citations


Journal Article•DOI•
TL;DR: The concept of diversity is introduced for the reference picture set (RPS) to help formulate the RPM problem, and the bit saving of the proposed scheme is 4.9% on average and up to 13.7%, without increasing encoding time.
Abstract: Screen content coding plays an important role in many applications. Conventional reference picture management (RPM) strategies developed for natural content may not work well for screen content. This is because many regions in screen content remain static for a long time, causing a lot of repetitive contents to stay in the decoded picture buffer. The repetitive contents are not conducive to inter prediction, but still occupy valuable memory. This paper proposes a diversity-based RPM scheme for screen content coding. The concept of diversity is introduced for the reference picture set (RPS) to help formulate the RPM problem. By maximizing the diversity of RPS, more potentially better predictions are provided. Better compression performance can then be achieved. Meanwhile, the proposed scheme is nonnormative and compatible with existing video coding standards, such as High Efficiency Video Coding. The experimental results show that, for low delay screen content coding, the bit saving of the proposed scheme is 4.9% on average and up to 13.7%, without increasing encoding time.

Patent•
Bin Li1, Xu Jizheng, Yan Lu•
27 Dec 2018
TL;DR: In this paper, a method and apparatus for real-time screen sharing is provided, in which a certain condition is satisfied for an image encoded by a first device and an image decoded by a second device, if the pause time of pausing satisfies a certain length, a parameter associated with an image compression ratio is adjusted.
Abstract: In implementations of the subject matter described herein, there are provided a method and apparatus for real time screen sharing. During the screen sharing of two devices, if a certain condition is satisfied for an image encoded by a first device and an image decoded by a second device, the first device pauses image processing images. If the pause time of pausing satisfies a certain length, a parameter associated with an image compression ratio is adjusted. After the first device resumes image processing, the adjusted parameter is used to encode a new image captured on the first device. According to implementations of the subject matter described herein, a transmission code rate during the screen sharing can be controlled according to the pause time of pausing image processing. The implementations of the subject matter described herein can reduce transmission latency of screen sharing, thereby effectively ensuring the user experience.

Patent•
23 Mar 2018
TL;DR: In this article, the color space adjustment coefficients for each component when switching from RGB to YCoCg were adjusted to increase coding efficiency when switching between color spaces during encoding and decoding.
Abstract: FIELD: data processing.SUBSTANCE: invention relates to adaptive coding and decoding. Encoding of image elements including a quantization correction or scaling for the color components of the second color space in accordance with the color space adjustment coefficients for each component when switching from RGB to YCoCg the color space between two of the elements. Adjustment includes adjusting final or intermediate values of quantization parameter (QP) for color components YCoCg of color space, the QpY variable indicating intermediate QP value for the first color component for the RGB color space, and the color space correction factors for each component correct QpY by -5, -3 and -5 for the Y, Co, and Cg components, respectively.EFFECT: technical result is to increase coding efficiency when switching between color spaces during encoding and decoding.18 cl, 17 dwg