scispace - formally typeset
Search or ask a question
Author

Baoliang Chen

Other affiliations: Xidian University
Bio: Baoliang Chen is an academic researcher from City University of Hong Kong. The author has contributed to research in topics: Computer science & Engineering. The author has an hindex of 4, co-authored 16 publications receiving 36 citations. Previous affiliations of Baoliang Chen include Xidian University.

Papers
More filters
Journal ArticleDOI
TL;DR: This work proposes a no-reference video quality assessment method, aiming to achieve high-generalization capability in cross-content, -resolution and -frame rate quality prediction, and proposes a pyramid temporal aggregation module by involving the short-term and long-term memory to aggregate the frame-level quality.
Abstract: In this work, we propose a no-reference video quality assessment method, aiming to achieve high-generalization capability in cross-content, -resolution and -frame rate quality prediction. In particular, we evaluate the quality of a video by learning effective feature representations in spatial-temporal domain. In the spatial domain, to tackle the resolution and content variations, we impose the Gaussian distribution constraints on the quality features. The unified distribution can significantly reduce the domain gap between different video samples, resulting in more generalized quality feature representation. Along the temporal dimension, inspired by the mechanism of visual perception, we propose a pyramid temporal aggregation module by involving the short-term and long-term memory to aggregate the frame-level quality. Experiments show that our method outperforms the state-of-the-art methods on cross-dataset settings, and achieves comparable performance on intra-dataset configurations, demonstrating the high-generalization capability of the proposed method. The codes are released at.

36 citations

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a framework that eliminates the influence of inherent variance from acquisition cameras at the feature level, leading to the generalized face spoofing detection model that could be highly adaptive to different acquisition devices.
Abstract: There has been an increasing consensus in learning based face anti-spoofing that the divergence in terms of camera models is causing a large domain gap in real application scenarios. We describe a framework that eliminates the influence of inherent variance from acquisition cameras at the feature level, leading to the generalized face spoofing detection model that could be highly adaptive to different acquisition devices. In particular, the framework is composed of two branches. The first branch aims to learn the camera invariant spoofing features via feature level decomposition in the high frequency domain. Motivated by the fact that the spoofing features exist not only in the high frequency domain, in the second branch the discrimination capability of extracted spoofing features is further boosted from the enhanced image based on the recomposition of the high-frequency and low-frequency information. Finally, the classification results of the two branches are fused together by a weighting strategy. Experiments show that the proposed method can achieve better performance in both intra-dataset and cross-dataset settings, demonstrating the high generalization capability in various application scenarios.

29 citations

Proceedings ArticleDOI
12 Apr 2018
TL;DR: This paper adopts CN-N to acquire a high-quality edge map from the input low-resolution (LR) depth image and uses it as the weight of the regularization term in a total variation (TV) model for super-resolution.
Abstract: In this paper, we propose single depth image super-resolution using convolutional neural networks (CNN). We adopt CN-N to acquire a high-quality edge map from the input low-resolution (LR) depth image. We use the high-quality edge map as the weight of the regularization term in a total variation (TV) model for super-resolution. First, we interpolate the LR depth image using bicubic interpolation and extract its low-quality edge map. Then, we get the high-quality edge map from the low-quality one using CNN. Since the CNN output often contains broken edges and holes, we refine it using the low-quality edge map. Guided by the high-quality edge map, we upsample the input LR depth image in the TV model. The edge-based guidance in TV effectively removes noise in depth while minimizing jagged artifacts and preserving sharp edges. Various experiments on the Middle-bury stereo dataset and Laser Scan dataset demonstrate the superiority of the proposed method over state-of-the-arts in both qualitative and quantitative measurements.

18 citations

Journal ArticleDOI
TL;DR: The proposed variational fusion of time-of-flight (TOF) and stereo data for depth estimation using edge-selective joint filtering (ESJF) successfully produces HR depth maps and outperforms the state of the art in preserving edges and removing noise.
Abstract: In this paper, we propose variational fusion of time-of-flight (TOF) and stereo data for depth estimation using edge-selective joint filtering (ESJF). ESJF is able to adaptively select edges for depth upsampling from the TOF depth map, stereo matching-based disparity map, and stereo images. We adopt ESJF to produce high-resolution (HR) depth maps with accurate edge information from low-resolution ones captured by the TOF camera. First, we measure confidences of TOF and stereo data based on a Gaussian function to be used as fusion weights. Then, we upsample the TOF depth map using ESJF and extract vertical and horizontal discontinuity maps from it. Finally, we perform variational fusion of TOF and stereo depth data guided by the discontinuity maps. Experimental results show that the proposed method successfully produces HR depth maps and outperforms the state of the art in preserving edges and removing noise.

17 citations

Journal ArticleDOI
TL;DR: This paper develops the first unsupervised domain adaptation based no reference quality assessment method for SCIs, leveraging rich subjective ratings of the natural images (NIs) and introduces three types of losses which complementarily and explicitly regularize the feature space of ranking in a progressive manner.
Abstract: In this paper, we quest the capability of transferring the quality of natural scene images to the images that are not acquired by optical cameras (e.g., screen content images, SCIs), rooted in the widely accepted view that the human visual system has adapted and evolved through the perception of natural environment. Here, we develop the first unsupervised domain adaptation based no reference quality assessment method for SCIs, leveraging rich subjective ratings of the natural images (NIs). In general, it is a non-trivial task to directly transfer the quality prediction model from NIs to a new type of content (i.e., SCIs) that holds dramatically different statistical characteristics. Inspired by the transferability of pair-wise relationship, the proposed quality measure operates based on the philosophy of improving the transferability and discriminability simultaneously. In particular, we introduce three types of losses which complementarily and explicitly regularize the feature space of ranking in a progressive manner. Regarding feature discriminatory capability enhancement, we propose a center based loss to rectify the classifier and improve its prediction capability not only for source domain (NI) but also the target domain (SCI). For feature discrepancy minimization, the maximum mean discrepancy (MMD) is imposed on the extracted ranking features of NIs and SCIs. Furthermore, to further enhance the feature diversity, we introduce the correlation penalization between different feature dimensions, leading to the features with lower rank and higher diversity. Experiments show that our method can achieve higher performance on different source-target settings based on a light-weight convolution neural network. The proposed method also sheds light on learning quality assessment measures for unseen application-specific content without the cumbersome and costing subjective evaluations.

16 citations


Cited by
More filters
Proceedings ArticleDOI
01 Oct 2019
TL;DR: In this article, the authors measure the quality of depth map upsampling using renderings of resulting 3D surfaces, and demonstrate that a simple visual appearance-based loss, when used with either a trained CNN or simply a deep prior, yields significantly improved 3D shapes as measured by a number of existing perceptual metrics.
Abstract: RGBD images, combining high-resolution color and lower-resolution depth from various types of depth sensors, are increasingly common. One can significantly improve the resolution of depth maps by taking advantage of color information; deep learning methods make combining color and depth information particularly easy. However, fusing these two sources of data may lead to a variety of artifacts. If depth maps are used to reconstruct 3D shapes, e.g., for virtual reality applications, the visual quality of upsampled images is particularly important. The main idea of our approach is to measure the quality of depth map upsampling using renderings of resulting 3D surfaces. We demonstrate that a simple visual appearance-based loss, when used with either a trained CNN or simply a deep prior, yields significantly improved 3D shapes, as measured by a number of existing perceptual metrics. We compare this approach with a number of existing optimization and learning-based techniques.

31 citations

Journal ArticleDOI
TL;DR: A novel DCNN is proposed to progressively reconstruct the high-resolution depth map guided by the intensity image, where the multi-scale intensity features are extracted to provide guidance for the refinement of depth features as their resolutions are gradually enhanced.

28 citations

Journal ArticleDOI
TL;DR: Novel confidence estimation techniques for ToF data are introduced and accurate confidence cues allow outperforming state-of-the-art data fusion schemes even with the simplest fusion strategies known in the literature.
Abstract: Time-of-Flight (ToF) sensors and stereo vision systems are two widely used technologies for depth estimation. Due to their rather complementary strengths and limitations, the two sensors are often combined to infer more accurate depth maps. A key research issue in this field is how to estimate the reliability of the sensed depth data. While this problem has been widely studied for stereo systems, it has been seldom considered for ToF sensors. Therefore, starting from the work done for stereo data, in this paper, we firstly introduce novel confidence estimation techniques for ToF data. Moreover, we also show how by using learning-based confidence metrics jointly trained on the two sensors yields better performance. Finally, deploying different fusion frameworks, we show how confidence estimation can be exploited in order to guide the fusion of depth data from the two sensors. Experimental results show how accurate confidence cues allow outperforming state-of-the-art data fusion schemes even with the simplest fusion strategies known in the literature.

24 citations

Posted Content
TL;DR: This work demonstrates that a simple visual appearance-based loss, when used with either a trained CNN or simply a deep prior, yields significantly improved 3D shapes, as measured by a number of existing perceptual metrics.
Abstract: RGBD images, combining high-resolution color and lower-resolution depth from various types of depth sensors, are increasingly common. One can significantly improve the resolution of depth maps by taking advantage of color information; deep learning methods make combining color and depth information particularly easy. However, fusing these two sources of data may lead to a variety of artifacts. If depth maps are used to reconstruct 3D shapes, e.g., for virtual reality applications, the visual quality of upsampled images is particularly important. The main idea of our approach is to measure the quality of depth map upsampling using renderings of resulting 3D surfaces. We demonstrate that a simple visual appearance-based loss, when used with either a trained CNN or simply a deep prior, yields significantly improved 3D shapes, as measured by a number of existing perceptual metrics. We compare this approach with a number of existing optimization and learning-based techniques.

19 citations

Journal ArticleDOI
TL;DR: The proposed variational fusion of time-of-flight (TOF) and stereo data for depth estimation using edge-selective joint filtering (ESJF) successfully produces HR depth maps and outperforms the state of the art in preserving edges and removing noise.
Abstract: In this paper, we propose variational fusion of time-of-flight (TOF) and stereo data for depth estimation using edge-selective joint filtering (ESJF). ESJF is able to adaptively select edges for depth upsampling from the TOF depth map, stereo matching-based disparity map, and stereo images. We adopt ESJF to produce high-resolution (HR) depth maps with accurate edge information from low-resolution ones captured by the TOF camera. First, we measure confidences of TOF and stereo data based on a Gaussian function to be used as fusion weights. Then, we upsample the TOF depth map using ESJF and extract vertical and horizontal discontinuity maps from it. Finally, we perform variational fusion of TOF and stereo depth data guided by the discontinuity maps. Experimental results show that the proposed method successfully produces HR depth maps and outperforms the state of the art in preserving edges and removing noise.

17 citations