scispace - formally typeset
Search or ask a question

Showing papers by "Guangming Shi published in 2018"


Proceedings Article•DOI•
Guimei Cao1, Xuemei Xie1, Wenzhe Yang1, Quan Liao1, Guangming Shi1, Jinjian Wu1 •
10 Apr 2018
TL;DR: In this paper, a multi-level feature fusion method for introducing contextual information in SSD, in order to improve the accuracy for small objects, is proposed to detect small objects at a fast speed, using the best object detector Single Shot Multibox Detector.
Abstract: Small objects detection is a challenging task in computer vision due to its limited resolution and information. In order to solve this problem, the majority of existing methods sacrifice speed for improvement in accuracy. In this paper, we aim to detect small objects at a fast speed, using the best object detector Single Shot Multibox Detector (SSD) with respect to accuracy-vs-speed trade-off as base architecture. We propose a multi-level feature fusion method for introducing contextual information in SSD, in order to improve the accuracy for small objects. In detailed fusion operation, we design two feature fusion modules, concatenation module and element-sum module, different in the way of adding contextual information. Experimental results show that these two fusion modules obtain higher mAP on PASCAL VOC2007 than baseline SSD by 1.6 and 1.7 points respectively, especially with 2-3 points improvement on some small objects categories. The testing speed of them is 43 and 40 FPS respectively, superior to the state of the art Deconvolutional single shot detector (DSSD) by 29.4 and 26.4 FPS.

174 citations


Journal Article•DOI•
Guangming Shi1, Tao Huang1, Weisheng Dong1, Jinjian Wu1, Xuemei Xie1 •
TL;DR: This paper proposes to model the sparse component with a Gaussian scale mixture (GSM) model, which has the advantages of jointly estimating the variances of the sparse coefficients (and hence the regularization parameters) and the unknown sparse coefficients, leading to significant estimation accuracy improvements for background subtraction.
Abstract: Recovering the background and foreground parts from video frames has important applications in video surveillance. Under the assumption that the background parts are stationary and the foreground are sparse, most of existing methods are based on the framework of robust principal component analysis (RPCA), i.e., modeling the background and foreground parts as a low-rank and sparse matrices, respectively. However, in realistic complex scenarios, the conventional l 1 norm sparse regularizer often fails to well characterize the varying sparsity of the foreground components. How to select the sparsity regularizer parameters adaptively according to the local statistics is critical to the success of the RPCA framework for background subtraction task. In this paper, we propose to model the sparse component with a Gaussian scale mixture (GSM) model. Compared with the conventional l 1 norm, the GSM-based sparse model has the advantages of jointly estimating the variances of the sparse coefficients (and hence the regularization parameters) and the unknown sparse coefficients, leading to significant estimation accuracy improvements. Moreover, considering that the foreground parts are highly structured, a structured extension of the GSM model is further developed. Specifically, the input frame is divided into many homogeneous regions using superpixel segmentation. By characterizing the set of sparse coefficients in each homogeneous region with the same GSM prior, the local dependencies among the sparse coefficients can be effectively exploited, leading to further improvements for background subtraction. Experimental results on several challenging scenarios show that the proposed method performs much better than most of existing background subtraction methods in terms of both performance and speed.

37 citations


Journal Article•DOI•
TL;DR: Through hardware experiments, for the first time to the authors' knowledge, a snapshot system that allows for simultaneous depth and spectral imaging with a cross-modal stereo system is demonstrated.
Abstract: This letter presents a novel approach for simultaneous depth and spectral imaging with a cross-modal stereo system. Two images of the target scene are captured at the same time: one compressively sampled hyperspectral measurement and one panchromatic measurement. The underlying hyperspectral cube is first reconstructed by leveraging the compressive sensing theory, during which a self-adaptive dictionary is learned from the panchromatic measurement to facilitate the reconstruction. The depth information of the scene is then recovered by estimating a disparity map between the hyperspectral cube and the panchromatic measurement through stereo matching. This disparity map, once obtained, is used to align the hyperspectral and panchromatic measurements to boost the hyperspectral reconstruction in an iterative manner. Through hardware experiments, for the first time to our knowledge, we demonstrate a snapshot system that allows for simultaneous depth and spectral imaging. The proposed system is capable of recording depth and spectral videos of dynamic scenes.

33 citations


Journal Article•DOI•
TL;DR: Experimental results show that the proposed hybrid SR method significantly outperforms existing model-based SR methods and is highly competitive to current state-of-the-art learning-basedSR methods in terms of both subjective and objective image qualities.
Abstract: Recovering a high-resolution (HR) image from its low-resolution (LR) version is an ill-posed inverse problem. Learning accurate prior of HR images is of great importance to solve this inverse problem. Existing super-resolution (SR) methods either learn a non-parametric image prior from training data (a large set of LR/HR patch pairs) or estimate a parametric prior from the LR image analytically. Both methods have their limitations: the former lacks flexibility when dealing with different SR settings; while the latter often fails to adapt to spatially varying image structures. In this paper, we propose to take a hybrid approach toward image SR by combining those two lines of ideas-that is, a parametric sparse prior of HR images is learned from the training set as well as the input LR image. By exploiting the strengths of both worlds, we can more accurately recover the sparse codes and therefore HR image patches than conventional sparse coding approaches. Experimental results show that the proposed hybrid SR method significantly outperforms existing model-based SR methods and is highly competitive to current state-of-the-art learning-based SR methods in terms of both subjective and objective image qualities.

33 citations


Posted Content•
TL;DR: Experimental results show that the proposed convolutional CS framework substantially outperforms previous state-of-the-art CS methods in term of both PSNR and visual quality.
Abstract: Compressive sensing (CS), aiming to reconstruct an image/signal from a small set of random measurements has attracted considerable attentions in recent years Due to the high dimensionality of images, previous CS methods mainly work on image blocks to avoid the huge requirements of memory and computation, ie, image blocks are measured with Gaussian random matrices, and the whole images are recovered from the reconstructed image blocks Though efficient, such methods suffer from serious blocking artifacts In this paper, we propose a convolutional CS framework that senses the whole image using a set of convolutional filters Instead of reconstructing individual blocks, the whole image is reconstructed from the linear convolutional measurements Specifically, the convolutional CS is implemented based on a convolutional neural network (CNN), which performs both the convolutional CS and nonlinear reconstruction Through end-to-end training, the sensing filters and the reconstruction network can be jointly optimized To facilitate the design of the CS reconstruction network, a novel two-branch CNN inspired from a sparsity-based CS reconstruction model is developed Experimental results show that the proposed method substantially outperforms previous state-of-the-art CS methods in term of both PSNR and visual quality

31 citations


Posted Content•
TL;DR: This work has developed a GAN-based approach toward image demosacing in which a discriminator network with both perceptual and adversarial loss functions are used for quality assurance and proposes to optimize the perceptual quality of reconstructed images by the proposed GAN in an end-to-end manner.
Abstract: Image demosaicing - one of the most important early stages in digital camera pipelines - addressed the problem of reconstructing a full-resolution image from so-called color-filter-arrays. Despite tremendous progress made in the pase decade, a fundamental issue that remains to be addressed is how to assure the visual quality of reconstructed images especially in the presence of noise corruption. Inspired by recent advances in generative adversarial networks (GAN), we present a novel deep learning approach toward joint demosaicing and denoising (JDD) with perceptual optimization in order to ensure the visual quality of reconstructed images. The key contributions of this work include: 1) we have developed a GAN-based approach toward image demosacing in which a discriminator network with both perceptual and adversarial loss functions are used for quality assurance; 2) we propose to optimize the perceptual quality of reconstructed images by the proposed GAN in an end-to-end manner. Such end-to-end optimization of GAN is particularly effective for jointly exploiting the gain brought by each modular component (e.g., residue learning in the generative network and perceptual loss in the discriminator network). Our extensive experimental results have shown convincingly improved performance over existing state-of-the-art methods in terms of both subjective and objective quality metrics with a comparable computational cost.

30 citations


Journal Article•DOI•
TL;DR: This paper presents a variable block-sized signal-dependent transforms (SDTs) design based on the High Efficiency Video Coding (HEVC) framework and presents a fast algorithm for transform derivation, which shows the effectiveness of the SDTs for different block sizes.
Abstract: Transform, as one of the most important modules of mainstream video coding systems, seems very stable over the past several decades. However, recent developments indicate that bringing more options for transform can lead to coding efficiency benefits. In this paper, we go further to investigate how the coding efficiency can be improved over the state-of-the-art method by adapting a transform for each block. We present a variable block-sized signal-dependent transforms (SDTs) design based on the High Efficiency Video Coding (HEVC) framework. For a coding block ranged from $4\times4$ to $32\times32$ , we collect a quantity of similar blocks from the reconstructed area and use them to derive the Karhunen–Loeve transform. We avoid sending overhead bits to denote the transform by performing the same procedure at the decoder. In this way, the transform for every block is tailored according to its statistics, to be signal-dependent. To make the large block-sized SDTs feasible, we present a fast algorithm for transform derivation. Experimental results show the effectiveness of the SDTs for different block sizes, which leads to up to 23.3% bit-saving. On average, we achieve BD-rate saving of 2.2%, 2.4%, 3.3%, and 7.1% under AI-Main10, RA-Main10, RA-Main10, and LP-Main10 configurations, respectively, compared with the test model HM-12 of HEVC. The proposed scheme has also been adopted into the joint exploration test model for the exploration of potential future video coding standard.

29 citations


Journal Article•DOI•
TL;DR: Investigating the alterations in baseline brain activity in patients with pED, as indexed by the amplitude of low-frequency (0.01–0.08 Hz) fluctuation (ALFF) may shed light on the neural pathology underlying pED.
Abstract: Recent neuroimaging studies have elucidated many interesting and promising findings on sexuality regarding the neural underpinnings of both normal and abnormal sexual processes. Psychogenic erectile dysfunction (pED) consists of a major part of male sexual dysfunction in China, but the understanding of the central mechanism of pED is still in its infancy. It is commonly appreciated that pED is a functional disorder, which can be attributed predominantly or exclusively to psychological factors, such as anxiety, depression, loss of self-esteem, and psychosocial stresses. Most previous studies probed the central response in the brain of pED patients using sexual-related stimuli. However, little concern has been given to a more fundamental issue whether the baseline brain activity is altered in pED or not. With rs-fMRI data, the current study aimed to explain the central mechanism behind pED by investigating the alterations in baseline brain activity in patients with pED, as indexed by the amplitude of low-frequency (0.01–0.08 Hz) fluctuation (ALFF). After the psychological screening and urological examination procedure, 26 pED patients and 26 healthy matched controls were enrolled. Our results explicated significantly lower baseline brain activity in the right anterior insula and right orbitofrontal cortex for pED patients (multiple comparison corrected). Additionally, the voxel-wise correlation analysis showed that ALFF of the right anterior insula was correlated with the outcomes of erectile function (multiple comparison corrected). Our results implied there was impaired cognitive and motivational processing of sexual stimuli in pED patients. Our current findings may shed light on the neural pathology underlying pED. We hope that our study has provided a new angle looking into pED research by investigating resting state brain activity. Furthermore, we suggest that the current study may put forward a more subtle conception of insular influence on pED, which may help foster new specific, mechanistic insights.

27 citations


Journal Article•DOI•
TL;DR: A novel robust tensor approximation (RTA) framework with the Laplacian Scale Mixture (LSM) modeling for three-dimensional data and beyond is proposed and Experimental results on three datasets have shown that the proposed algorithm can better preserve the sharpness of important image structures and outperform several existing state-of-the-art image/video denoising methods.
Abstract: Sparse and low-rank models have been widely studied in the literature of signal processing and computer vision. However, as the dimensionality of dataset increases (e.g., multispectral images, dynamic MRI images, and video sequences), the optimality of vector and matrix-based data representations and modeling tools becomes questionable. Inspired by recent advances in sparse and low-rank tensor analysis, we propose a novel robust tensor approximation (RTA) framework with the Laplacian Scale Mixture (LSM) modeling for three-dimensional (3-D) data and beyond. Our technical contributions are summarized as follows: first, conceptually similar to robust PCA, we consider its tensor extension here—i.e., low-rank tensor approximation in the presence of outliers modeled by sparse noise; second, built upon previous work on tensor sparsity, we propose to model tensor coefficients with an LSM prior and formulate a maximum a posterior estimation problem for noisy observations. Both unknown sparse coefficients and hidden LSM parameters can be efficiently estimated by the method of alternating optimization; and third, we have derived closed-form solutions for both subproblems and developed computationally efficient denoising techniques for multiframe images and video. Experimental results on three datasets have shown that the proposed algorithm can better preserve the sharpness of important image structures and outperform several existing state-of-the-art image/video denoising methods (e.g., BM4D/VBM4D and tensor dictionary learning).

26 citations


Journal Article•DOI•
TL;DR: A no-reference quality index for the multiply distorted images using the biorder structure degradation and the nonlocal statistics and the superiority of the proposed metric to the state-of-the-art metrics is demonstrated.
Abstract: In the past decade, extensive image quality metrics have been proposed. The majority of them are tailored for the images that contain a specific type of distortion. However, in practice, the images are usually degraded by different types of distortions simultaneously. This poses great challenges to the existing quality metrics. Motivated by this, this paper proposes a no-reference quality index for the multiply distorted images using the biorder structure degradation and the nonlocal statistics. The design philosophy is inspired by the fact that the human visual system (HVS) is highly sensitive to the degradations of both the spatial contrast and the spatial distribution, which are prone to be changed by the joint effects of the multiple distortions. Specifically, the multiresolution representation of the image is first built by downsampling to simulate the hierarchical property of the HVS. Then, the structure degradation is calculated to measure the spatial contrast. Considering the fact that the human visual cortex has the separate mechanisms to perceive the first- and second-order structures, dubbed biorder structures, the degradations of biorder structures are calculated to account for the spatial contrast, producing the first group of the quality-aware features. Furthermore, the nonlocal self-similarity statistics is calculated to measure the spatial distribution, producing the second group of features. Finally, all the features are fed into the random forest regression model to learn the quality model for the multiply distorted images. Extensive experimental results conducted on the three public databases demonstrate the superiority of the proposed metric to the state-of-the-art metrics. Moreover, the proposed metric is also advantageous over the existing metrics in terms of the generalization ability.

25 citations


Journal Article•DOI•
TL;DR: A hierarchical hash design and the corresponding block matching scheme to significantly reduce the complexity of hash-based block matching is proposed, which greatly reduces complexity without compromising coding efficiency.
Abstract: In the latest High Efficiency Video Coding (HEVC) development, i.e., HEVC screen content coding extensions (HEVC-SCC), a hash-based inter-motion search/block matching scheme is adopted in the reference test model, which brings significant coding gains to code screen content. However, the hash table generation itself may take up to half the encoding time and is thus too complex for practical usage. In this paper, we propose a hierarchical hash design and the corresponding block matching scheme to significantly reduce the complexity of hash-based block matching. The hierarchical structure in the proposed scheme allows large block calculation to use the results of small blocks. Thus, we avoid redundant computation among blocks with different sizes, which greatly reduces complexity without compromising coding efficiency. The experimental results show that compared with the hash-based block matching scheme in the HEVC-SCC test model (SCM)-6.0, the proposed scheme reduces about 77% of hash processing time, which leads to 12% and 16% encoding time savings in random access (RA) and low-delay B coding structures. The proposed scheme has been adopted into the latest SCM. A parallel implementation of the proposed hash table generation on graphics processing unit (GPU) is also presented to show the high parallelism of the proposed scheme, which achieves more than 30 frames/s for 1080p sequences and 60 frames/s for 720p sequences. With the fast hash-based block matching integrated into x265 and the hash table generated on GPU, the encoder can achieve 11.8% and 14.0% coding gains on average for RA and low-delay P coding structures, respectively, for real-time encoding.

Journal Article•DOI•
TL;DR: This letter utilized the finite-element method with tetrahedral elements to get a better modeling of the human head and the radiative cooling is also taken into account by using the radiation boundary condition.
Abstract: The influences of electromagnetic radiation from mobile phones on human health have aroused significant public concern. One of the most serious consequences is the increment of temperature of tissues and organs. Due to the specificity of the research object, numerical methods for electromagnetic and thermal analysis are usually adopted instead of trials. Most of the existing studies adopt a finite-difference time-domain method for electromagnetic and thermal simulation. Moreover, the radiative cooling phenomenon is neglected during the thermal simulation. In this letter, we utilized the finite-element method with tetrahedral elements to get a better modeling of the human head. Furthermore, the radiative cooling is also taken into account by using the radiation boundary condition. Numerical results validate the accuracy and capability of the proposed method.

Proceedings Article•DOI•
Tao Huang1, Fang Fang Wu1, Weisheng Dong1, Guangming Shi1, Xin Li2 •
01 Aug 2018
TL;DR: A lightweight convolutional neural network is proposed for joint demosaicking and denoising (JDD) problem with the following salient features: the densely connected network is trained in an end-to-end manner to learn the mapping from the noisy low- resolution space (CFA image) to the clean high-resolution space (color image).
Abstract: Color demosaicking and image denoising each plays an important role in digital cameras. Conventional model-based methods often fail around the areas of strong textures and produce disturbing visual artifacts such as aliasing and zippering. Recently developed deep learning based methods were capable of obtaining images of better qualities though at the price of high computational cost, which make them not suitable for real-time applications. In this paper, we propose a lightweight convolutional neural network for joint demosaicking and denoising (JDD) problem with the following salient features. First, the densely connected network is trained in an end-to-end manner to learn the mapping from the noisy low-resolution space (CFA image) to the clean high-resolution space (color image). Second, the concept of deep residue learning and aggregated residual transformations are extended from image denoising and classification to JDD supporting more efficient training. Third, the design of our end-to-end network architecture is inspired by a rigorous analysis of JDD using sparsity models. Experimental results conducted for both demosaicking-only and JDD tasks have shown that the proposed method performs much better than existing state-of-the-art methods (i.e., higher visual quality, smaller training set and lower computational cost).

Journal Article•DOI•
TL;DR: A distributed optimization framework with decomposition-coordination to solve the utility maximization problem of the cellular network system with mobile data offloading and proposes two distributed algorithms that can achieve the global optimization solution under different scenarios.
Abstract: Mobile data offloading allows alternative network systems such as Wi-Fi hotspot access points (APs) to offload the traffic that is originally targeted for the cellular network operators (CNOs). It can alleviate the cellular network congestions and enhance users’ quality-of-service. One of the main challenges of the mobile data offloading is to develop simple and distributed traffic allocation strategies that can optimally allocate the cellular data traffic from multiple CNOs to the APs. In this work, we propose a distributed optimization framework with decomposition-coordination to solve this problem. In our framework, the utility maximization problem of the cellular network system with mobile data offloading is formulated as a nonsmooth convex optimization problem. We then divide this problem into a set of subproblems, and each of them is solved by a CNO or an AP using its local information. All the subproblems are coordinated with each other by a virtual data offloading coordinator (VDOC), which can collect the intermediate calculation values from CNOs and APs. The VDOC will then feedback the coordination results calculated from the collected values to the corresponding CNOs and APs. We propose two distributed algorithms that can achieve the global optimization solution under different scenarios. The first one is the multiblock proximal Jacobi alternating direction method of multipliers (ProxJ-ADMM). In this algorithm, the communication between the VDOC and CNOs or APs is assumed to be fully synchronized, and the VDOC will only feedback the coordination result after successfully receiving all the values from APs and CNOs. We relax this assumption by introducing the second algorithm referred to as the distributed asynchronized ADMM (Async-ADMM) algorithm. In this algorithm, the coordination between the VDOC and APs or CNOs does not need to be perfectly synchronized and the VDOC will feedback a coordination result whenever it receives a value from at least one AP or CNO. We prove that both algorithm can achieve the global optimal solution. We present the numerical results to verify the performance of our proposed approaches under various network settings and conditions.

Proceedings Article•DOI•
01 Feb 2018
TL;DR: Different from previous methods recovering images block by block, the proposed framework rebuilds the structure information destroyed in the measurement part of CS to ensure no block effect is removed in reconstructed images.
Abstract: Compressive sensing (CS) theory is able to acquire measurements of a scene at sub-Nyquist rate and recover the scene image from these under-sampled measurements. Recent years, CS has been improved greatly for the application of deep learning technology. In conventional methods, block-based mechanism is used to recover images from measurements, which usually causes block effect in reconstructed images. In this paper, we propose a novel CNN-based network for CS to solve this problem. In the measurement part, the input is measured block by block to acquire the measurements. While in the recovery part, all the measurements from one image are used simultaneously to reconstruct the full image. Different from previous methods recovering images block by block, the proposed framework rebuilds the structure information destroyed in the measurement part. Block effect is removed accordingly. Experiments show that there is no block effect at all in the reconstructed images. On a standard dataset our method has significant improvements in reconstruction results compared with existing state-of-the-art methods.

Journal Article•DOI•
TL;DR: This paper proposes a model inspired by the observations in three-dimensional environment to better present the influence of depth, and demonstrates that the proposed method can outperform the state-of-the-art fixation prediction algorithms on several public data sets for stereoscopic saliency estimation.

Proceedings Article•DOI•
Xuemei Xie1, Wenzhe Yang1, Guimei Cao1, Jianxiu Yang1, Zhifu Zhao1, Shu Chen1, Quan Liao1, Guangming Shi1 •
01 Sep 2018
TL;DR: This paper proposes a single-shot vehicle detector, which focuses on accurate and real-time vehicle detection in UAV imagery, and proposes a dynamic training strategy (DTS) which constructs the network to learn more discriminative features of hard examples, via using cross entropy and focal loss function alternately.
Abstract: Fast and accurate vehicle detection in unmanned aerial vehicle (UAV) imagery is a meaningful but challenging task, playing an important role in a wide range of applications. Due to its tiny size, few features, variable scales and imbalance vehicle sample problems in UAV imagery, current deep learning methods used in this task cannot achieve a satisfactory performance both in accuracy and speed, which is obvious a classical trade-off problem. In this paper, we propose a single-shot vehicle detector, which focuses on accurate and real-time vehicle detection in UAV imagery. We make contributions in the following two aspects: 1) presenting a multi-scale feature fusion module to combine the high resolution but semantically weak features with the low resolution but semantically strong features, aiming to introduce context information to enhance the feature representation of the small vehicles; 2) proposing a dynamic training strategy (DTS) which constructs the network to learn more discriminative features of hard examples, via using cross entropy and focal loss function alternately. Experimental results show that our method can achieve 90.8% accuracy in UAV images and can run at 59 FPS on a single NVIDIA 1080Ti GPU for the small vehicle detection in UAV images.

Proceedings Article•DOI•
Xuemei Xie1, Jiang Du1, Guangming Shi1, Jianxiu Yang1, Wan Liu1, Wang Li1 •
10 Apr 2018
TL;DR: An efficient approach for "DVS image" noise removal is proposed based on K-SVD algorithm and the algorithm is improved according to certain applications and can deal with " DVS images" containing different amount of noise.
Abstract: Dynamic Vision Sensor (DVS) is an event-based camera, which captures the changing pixel of vision. It captures the scene in the form of events. In this paper, we use a unique approach to visualize the events DVS captures with "DVS images". DVS is sensitive enough to capture objects moving in high speed, but noise is also captured. In order to improve the quality, we remove the noise of those images. Different from traditional images, the noise and objects in "DVS images" are both composed of distributed points. It is hard to use traditional methods to remove the noise. This paper proposes an efficient approach for "DVS image" noise removal. It is based on K-SVD algorithm and we improve the algorithm according to certain applications. The proposed framework can deal with "DVS images" containing different amount of noise. Experiments show that the proposed method can work well both on a fixed DVS and a moving DVS.

Journal Article•DOI•
TL;DR: A weighted RDO scheme for SC coding (SCC), in which the repeating characteristics are taken into account when deciding RD tradeoff for each block, and a hash-based design to approximate the results to avoid the complexity of direct search.
Abstract: Unlike camera-captured video, screen content (SC) often contains a lot of repeating patterns, which makes some blocks used as references much more important than others. However, conventional rate-distortion optimization (RDO) schemes in video coding do not consider the dependence among image blocks, which often leads to a locally optimal parameter selection, especially for SC. In this paper, we present a weighted RDO scheme for SC coding (SCC), in which the repeating characteristics are taken into account when deciding RD tradeoff for each block. For one block, the number being referenced by the current picture and following pictures is estimated and based on the number, we set a proper weight in the RDO process to reflect its importance from a global point of view. To estimate the number being referenced, we propose a hash-based method to approximate the results to avoid the complexity of direct search. Experimental results show that compared with the High Efficiency Video Coding SCC reference software, 10.1%, 14.5%, and 2.2% on average and up to 25.7%, 39.8%, and 4.6% bit saving can be achieved by considering weights provided by our scheme for hierarchical-B, IBBB, and all intra coding structures, respectively. Thanks to our hash-based design, the complexity increase brought by the proposed scheme is marginal.

Journal Article•DOI•
TL;DR: A novel attended visual content degradation-based RR IQA method is introduced and experimental results demonstrate that the proposed method uses only several values and performs consistently with the subjective perception.
Abstract: Reduced-reference (RR) image quality assessment (IQA), which aims to use a small amount of the reference information but achieve high accuracy, is greatly demanded in quality-orientated systems. In order to design a better RR IQA model which performs consistently with the subjective perception, the inner mechanism of the human visual system (HVS) is usually investigated and imitated. In this paper, the attention mechanism is thoroughly analyzed and used for RR IQA modeling. Generally, the HVS is more sensitive to the distortion on the attended region than that on the unattended region. Thus, the saliency of each region is calculated to highlight its importance, and a saliency weighted local structure (SWLS)-based histogram is created for visual structure degradation measurement. Meanwhile, the distortion may cause attention shift (changing the attended region). In other words, the difference of attention between the reference and distorted images can efficiently represent the quality degradation. Therefore, the attention distribution is analyzed with the salient map, and an orientation located global saliency (OLGS)-based histogram is built for attention shift measurement. Finally, combining the quality degradations from both SWLS and OLGS, a novel attended visual content degradation-based RR IQA method is introduced. 1 Experimental results demonstrate that the proposed method uses only several values (18 values) and performs consistently with the subjective perception. Moreover, the proposed attention procedure can be easily extended to the existing RR IQA models and improve their performances. 1 The source code of the proposed method will be available at http://web.xidian.edu.cn/wjj/en/index.html

Proceedings Article•DOI•
Wenfei Wan1, Jinjian Wu1, Guangming Shi1, Yongbo Li1, Weisheng Dong1 •
01 Jul 2018
TL;DR: This work proposes a more accurate perception structure measurement and uses their similarity comparisons to evaluate the SR algorithms, demonstrating that the proposed method performs well consistent with the human visual perception.
Abstract: With the outstanding performance of deep learning based single image super-resolution (SISR) methods, the traditional SISR evaluation metrics (e.g., PSNR and SSIM, which measure the per-pixel differences and simple structure similarities respectively) are facing great challenges. When assessing SISR algorithms, they generally are hardly consistent with the human visual system (HVS). According to the psychological studies, the HVS presents different sensitivities to the plain, edge and texture regions, which are difficult to be accurately identified and measured with the existing quality indexes, especially for SR images. To deal with this problem, we firstly build a SISR subjective assessment database including several major deep learning based SR methods. Then we propose a more accurate perception structure measurement and use their similarity comparisons to evaluate the SR algorithms. Experimental results on the databases demonstrate that the proposed method performs well consistent with the human visual perception.

Book Chapter•DOI•
23 Nov 2018
TL;DR: This paper proposes perceptual CS to obtain high-level structured recovery, and employs perceptual loss, defined on feature level, to enhance the structure information of the recovered images.
Abstract: Compressive sensing (CS) works to acquire measurements at sub-Nyquist rate and recover the scene images. Existing CS methods always recover the scene images in pixel level. This causes the smoothness of recovered images and lack of structure information, especially at a low measurement rate. To overcome this drawback, in this paper, we propose perceptual CS to obtain high-level structured recovery. Our task no longer focuses on pixel level. Instead, we work to make a better visual effect. In detail, we employ perceptual loss, defined on feature level, to enhance the structure information of the recovered images. Experiments show that our method achieves better visual results with stronger structure information than existing CS methods at the same measurement rate.

Journal Article•DOI•
Guangming Shi1, Li Ruodai1, Fu Li1, Yi Niu1, Lili Yang1 •
TL;DR: This study focused on evaluating the correspondence retrieval in a coding-free binary grid pattern and proposed graph based topological labelling (GBTL) algorithm, which significantly improved the robustness and ease of work, while achieving comparable precision.

Book Chapter•DOI•
Jinjian Wu1, Ke Zhang1, Yuxin Zhang1, Xuemei Xie1, Guangming Shi1 •
03 Oct 2018
TL;DR: This work introduces a novel event coherence detection algorithm for high-speed objective tracking, which can accurately track the small objects with high speed and performs efficiently.
Abstract: High-speed object tracking is still a great challenge for video processing. Traditional cameras can hardly capture the motion trajectory of the high-speed moving object. With differential logarithmic photodetector and nanosecond response latency to fast stimuli, dynamic vision sensor (DVS) is extremely sensitive to the moving object (especially for the object with high speed). However, existing object tracking algorithms, which are limited by their frame-by-frame processing mode, are no longer suitable for DVS. In this work, we introduce a novel event coherence detection algorithm for high-speed objective tracking. The moving target is determined by judging the coherence of the event according to the event distribution at a certain moment. Experimental results demonstrate that the proposed algorithm can accurately track the small objects with high speed. Meanwhile, the proposed algorithm performs efficiently, which can run in real time.

Journal Article•DOI•
Yang Jiao1, Yi Niu1, Lin Liu1, Guanghui Zhao1, Guangming Shi1, Fu Li1 •
TL;DR: This paper introduces a new SAR image visualization algorithm to map the high dynamic range SAR amplitude values to low dynamic range displays via reflectivity distortion preserved entropy maximization, with designed objective to present the maximal amount of information content in the displayed image.
Abstract: The visualization of synthetic aperture radar (SAR) images plays a critical role in remote sensing applications. To effectively obtain the image suitable for human observation, this paper introduces a new SAR image visualization algorithm to map the high dynamic range SAR amplitude values to low dynamic range displays via reflectivity distortion preserved entropy maximization. Its designed objective is to present the maximal amount of information content in the displayed image, and being optimal in an information theoretical sense, as well as restricting the upper bound of the reflection distortion caused by tone mapping. The resulting optimization problem can be graph theoretically modeled as a $K$ -edges maximum weight path problem in a directed acyclic graph, and it can be solved efficiently by dynamic programming in real time. Empirical evidences are provided to demonstrate the superior visual quality obtained by our new visualization technique.

Proceedings Article•DOI•
01 Aug 2018
TL;DR: Instead of recovering RGB images channel by channel uniformly, the proposed framework adopts non-uniform sampling in different channels in YCbCr color space, which greatly enhances the performance on CS for color images and gives a powerful ability to better capture the structure information.
Abstract: We propose a novel compressive sensing framework for color images. Recently, compressive sensing (CS) has gain its popularity with the development of deep learning. To our best knowledge, existing methods all deal with RGB images channel by channel. This brings redundancy of measurements. In this paper, we do a breakthrough work. Instead of recovering RGB images channel by channel uniformly, we adopt non-uniform sampling in different channels in YCbCr color space. The luminance component takes up more measurements while the other channels take up less in the proposed framework. It greatly enhances the performance on CS for color images. Moreover, perceptual loss gives a powerful ability to better capture the structure information. We give the measurement rate at 2% as an example in the experiments, and the results show the proposed method outperforms all the existing methods with better structure of images.

Journal Article•DOI•
TL;DR: A novel DDL method, named CW-DDL, to learn a discriminative dictionary for classification by exploiting class-wise coding coefficients, which shows the superior performance to related DDL methods on several benchmark datasets, and coupled with the CNN features, it also leads to the state-of-art performance on the more challenging dataset.

Proceedings Article•DOI•
Jinjian Wu1, Man Zhang1, Xuemei Xie1, Guangming Shi1, Zuoming Sun •
01 Sep 2018
TL;DR: This work attempts to measure the image quality with its visual entropy degradation by measuring the degradations on the joint probability distributions of the joint entropy equation, and introduces a novel BIQA method.
Abstract: Blind image quality assessment (BIQA), which needs nothing about the reference image, is greatly desired in the automatic visual signal processing system. However, without the guide of the reference image, BIQA is extremely difficult to be realized. Distortions degrade the visual contents and cause quality degradation during image perception. From the perspective of information theory, distortions decrease the amount of visual information that an image contains. Therefore, we attempt to measure the image quality with its visual entropy degradation. Inspired by the response in the local receptive field of the retina, the local orientation and intensity features are extracted for visual content representation. Next, the joint probability distributions of the two features are calculated for visual entropy estimation. By deducing the joint entropy equation, the joint probability distributions are decoupled and analyzed. Finally, by measuring the degradations on these decoupled distributions, a novel BIQA method is introduced. Experimental results on the publicly available databases demonstrate that the proposed BIQA performs hiahlv consistent with the human perception.

Posted Content•
TL;DR: A novel framework of implementing hybrid structured sparse coding processes by deep convolutional neural networks and shows that the proposed hybrid image restoration method performs comparably with and often better than the current state-of-the-art techniques.
Abstract: State-of-the-art approaches toward image restoration can be classified into model-based and learning-based. The former - best represented by sparse coding techniques - strive to exploit intrinsic prior knowledge about the unknown high-resolution images; while the latter - popularized by recently developed deep learning techniques - leverage external image prior from some training dataset. It is natural to explore their middle ground and pursue a hybrid image prior capable of achieving the best in both worlds. In this paper, we propose a systematic approach of achieving this goal called Structured Analysis Sparse Coding (SASC). Specifically, a structured sparse prior is learned from extrinsic training data via a deep convolutional neural network (in a similar way to previous learning-based approaches); meantime another structured sparse prior is internally estimated from the input observation image (similar to previous model-based approaches). Two structured sparse priors will then be combined to produce a hybrid prior incorporating the knowledge from both domains. To manage the computational complexity, we have developed a novel framework of implementing hybrid structured sparse coding processes by deep convolutional neural networks. Experimental results show that the proposed hybrid image restoration method performs comparably with and often better than the current state-of-the-art techniques.

Journal Article•DOI•
01 Jan 2018
TL;DR: This paper proposes a new method for high accuracy and real time spike detection by exploiting the structural features of spikes and introducing a simple and effective measure to further reduce the influence of background noise.
Abstract: Recordings of extracellular spikes have been widely used in various fields ranging from basic neuroscientific research to clinical applications. However, in the extracellular recording system, how to accurately detect spikes from the recorded signal in real time is still a major challenging work. Although the existing algorithms for online spike detection have made great progress, there still remains much room for improvement in terms of accuracy. In this paper, we propose a new method for high accuracy and real time spike detection. Concretely, differential operator is firstly employed to accentuate spikes in the signal for its simplicity and strong capacity to detect significant changes. Then, by exploiting the structural features of spikes, the resolution parameter is introduced to improve the performance of differential operator. Finally, a simple and effective measure is utilized to further reduce the influence of background noise, which makes spike detection more accurate. The results of simulated and real data show that the proposed method is able to precisely detect spikes while maintaining low computational complexity.