scispace - formally typeset
Search or ask a question

Showing papers by "Junhui Hou published in 2021"


Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed an underwater image enhancement network via medium transmission-guided multi-color space embedding, which enriches the diversity of feature representations by incorporating the characteristics of different color spaces into a unified structure.
Abstract: Underwater images suffer from color casts and low contrast due to wavelength- and distance-dependent attenuation and scattering. To solve these two degradation issues, we present an underwater image enhancement network via medium transmission-guided multi-color space embedding, called Ucolor . Concretely, we first propose a multi-color space encoder network, which enriches the diversity of feature representations by incorporating the characteristics of different color spaces into a unified structure. Coupled with an attention mechanism, the most discriminative features extracted from multiple color spaces are adaptively integrated and highlighted. Inspired by underwater imaging physical models, we design a medium transmission (indicating the percentage of the scene radiance reaching the camera)-guided decoder network to enhance the response of network towards quality-degraded regions. As a result, our network can effectively improve the visual quality of underwater images by exploiting multiple color spaces embedding and the advantages of both physical model-based and learning-based methods. Extensive experiments demonstrate that our Ucolor achieves superior performance against state-of-the-art methods in terms of both visual quality and quantitative metrics. The code is publicly available at: https://li-chongyi.github.io/Proj_Ucolor.html .

200 citations


Journal ArticleDOI
TL;DR: An attention steered interweave fusion network (ASIF-Net) is proposed to detect salient objects, which progressively integrates cross-modal and cross-level complementarity from the RGB image and corresponding depth map via steering of an attention mechanism.
Abstract: Salient object detection from RGB-D images is an important yet challenging vision task, which aims at detecting the most distinctive objects in a scene by combining color information and depth constraints. Unlike prior fusion manners, we propose an attention steered interweave fusion network (ASIF-Net) to detect salient objects, which progressively integrates cross-modal and cross-level complementarity from the RGB image and corresponding depth map via steering of an attention mechanism. Specifically, the complementary features from RGB-D images are jointly extracted and hierarchically fused in a dense and interweaved manner. Such a manner breaks down the barriers of inconsistency existing in the cross-modal data and also sufficiently captures the complementarity. Meanwhile, an attention mechanism is introduced to locate the potential salient regions in an attention-weighted fashion, which advances in highlighting the salient objects and suppressing the cluttered background regions. Instead of focusing only on pixelwise saliency, we also ensure that the detected salient objects have the objectness characteristics (e.g., complete structure and sharp boundary) by incorporating the adversarial learning that provides a global semantic constraint for RGB-D salient object detection. Quantitative and qualitative experiments demonstrate that the proposed method performs favorably against 17 state-of-the-art saliency detectors on four publicly available RGB-D salient object detection datasets. The code and results of our method are available at https://github.com/Li-Chongyi/ASIF-Net .

188 citations


Journal ArticleDOI
TL;DR: In this article, a multi-color space encoder network is proposed to enhance the diversity of feature representations by incorporating the characteristics of different color spaces into a unified structure, and the most discriminative features extracted from multiple color spaces are adaptively integrated and highlighted.
Abstract: Underwater images suffer from color casts and low contrast due to wavelength- and distance-dependent attenuation and scattering. To solve these two degradation issues, we present an underwater image enhancement network via medium transmission-guided multi-color space embedding, called Ucolor. Concretely, we first propose a multi-color space encoder network, which enriches the diversity of feature representations by incorporating the characteristics of different color spaces into a unified structure. Coupled with an attention mechanism, the most discriminative features extracted from multiple color spaces are adaptively integrated and highlighted. Inspired by underwater imaging physical models, we design a medium transmission (indicating the percentage of the scene radiance reaching the camera)-guided decoder network to enhance the response of the network towards quality-degraded regions. As a result, our network can effectively improve the visual quality of underwater images by exploiting multiple color spaces embedding and the advantages of both physical model-based and learning-based methods. Extensive experiments demonstrate that our Ucolor achieves superior performance against state-of-the-art methods in terms of both visual quality and quantitative metrics.

111 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: Li et al. as discussed by the authors proposed CorrNet3D to drive the learning of dense correspondence between 3D shapes by means of deformation-like reconstruction to overcome the need for annotated data.
Abstract: Motivated by the intuition that one can transform two aligned point clouds to each other more easily and meaningfully than a misaligned pair, we propose CorrNet3D – the first unsupervised and end-to-end deep learning-based framework – to drive the learning of dense correspondence between 3D shapes by means of deformation-like reconstruction to overcome the need for annotated data. Specifically, CorrNet3D consists of a deep feature embedding module and two novel modules called correspondence indicator and symmetric deformer. Feeding a pair of raw point clouds, our model first learns the pointwise features and passes them into the indicator to generate a learnable correspondence matrix used to permute the input pair. The symmetric deformer, with an additional regularized loss, transforms the two permuted point clouds to each other to drive the unsupervised learning of the correspondence. The extensive experiments on both synthetic and real-world datasets of rigid and non-rigid 3D shapes show our CorrNet3D outperforms state-of-the-art methods to a large extent, including those taking meshes as input. CorrNet3D is a flexible framework in that it can be easily adapted to supervised learning if annotated data are available. The source code and pre-trained model will be available at https://github.com/ZENGYIMINGEAMON/CorrNet3D.git.

43 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a self-training approach named Crowd-SDNet that enables a typical object detector trained only with point-level annotations to estimate both the center points and sizes of crowded objects.
Abstract: In this article, we propose a novel self-training approach named Crowd-SDNet that enables a typical object detector trained only with point-level annotations (i.e., objects are labeled with points) to estimate both the center points and sizes of crowded objects. Specifically, during training, we utilize the available point annotations to supervise the estimation of the center points of objects directly. Based on a locally-uniform distribution assumption, we initialize pseudo object sizes from the point-level supervisory information, which are then leveraged to guide the regression of object sizes via a crowdedness-aware loss. Meanwhile, we propose a confidence and order-aware refinement scheme to continuously refine the initial pseudo object sizes such that the ability of the detector is increasingly boosted to detect and count objects in crowds simultaneously. Moreover, to address extremely crowded scenes, we propose an effective decoding method to improve the detector’s representation ability. Experimental results on the WiderFace benchmark show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks, i.e., our method improves the average precision by more than 10% and reduces the counting error by 31.2%. Besides, our method obtains the best results on the crowd counting and localization datasets (i.e., ShanghaiTech and NWPU-Crowd) and vehicle counting datasets (i.e., CARPK and PUCPR+) compared with state-of-the-art counting-by-detection methods. The code will be publicly available at https://github.com/WangyiNTU/Point-supervised-crowd-detection .

42 citations


Journal ArticleDOI
TL;DR: This work proposes a new semisupervised model, which is able to simultaneously learn the similarity matrix with supervisory information and generate the clustering results, such that the mutual enhancement effect of the two tasks can produce better clustering performance.
Abstract: As a variant of non-negative matrix factorization (NMF), symmetric NMF (SymNMF) can generate the clustering result without additional post-processing, by decomposing a similarity matrix into the product of a clustering indicator matrix and its transpose. However, the similarity matrix in the traditional SymNMF methods is usually predefined, resulting in limited clustering performance. Considering that the quality of the similarity graph is crucial to the final clustering performance, we propose a new semisupervised model, which is able to simultaneously learn the similarity matrix with supervisory information and generate the clustering results, such that the mutual enhancement effect of the two tasks can produce better clustering performance. Our model fully utilizes the supervisory information in the form of pairwise constraints to propagate it for obtaining an informative similarity matrix. The proposed model is finally formulated as a non-negativity-constrained optimization problem. Also, we propose an iterative method to solve it with the convergence theoretically proven. Extensive experiments validate the superiority of the proposed model when compared with nine state-of-the-art NMF models.

38 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a progressive zero-centric residual network (PZRes-Net) to solve the problem of super-resolution that merges a low resolution HSI (LR-HSI) and a high resolution multispectral image (HR-MSI).
Abstract: This paper explores the problem of hyperspectral image (HSI) super-resolution that merges a low resolution HSI (LR-HSI) and a high resolution multispectral image (HR-MSI). The cross-modality distribution of the spatial and spectral information makes the problem challenging. Inspired by the classic wavelet decomposition-based image fusion, we propose a novel lightweight deep neural network-based framework, namely progressive zero-centric residual network (PZRes-Net), to address this problem efficiently and effectively. Specifically, PZRes-Net learns a high resolution and zero-centric residual image, which contains high-frequency spatial details of the scene across all spectral bands, from both inputs in a progressive fashion along the spectral dimension. And the resulting residual image is then superimposed onto the up-sampled LR-HSI in a mean-value invariant manner, leading to a coarse HR-HSI, which is further refined by exploring the coherence across all spectral bands simultaneously. To learn the residual image efficiently and effectively, we employ spectral-spatial separable convolution with dense connections. In addition, we propose zero-mean normalization implemented on the feature maps of each layer to realize the zero-mean characteristic of the residual image. Extensive experiments over both real and synthetic benchmark datasets demonstrate that our PZRes-Net outperforms state-of-the-art methods to a significant extent in terms of both 4 quantitative metrics and visual quality, e.g., our PZRes-Net improves the PSNR more than 3dB, while saving $2.3\times $ parameters and consuming $15\times $ less FLOPs. The code is publicly available at https://github.com/zbzhzhy/PZRes-Net

32 citations


Journal ArticleDOI
TL;DR: In this paper, a subjective and objective Point Cloud Quality Assessment (PCQA) in an immersive environment and study the effect of geometry and texture attributes in compression distortion was performed using a head mounted display (HMD) with six degrees of freedom.
Abstract: In this paper, we focus on subjective and objective Point Cloud Quality Assessment (PCQA) in an immersive environment and study the effect of geometry and texture attributes in compression distortion. Using a Head-Mounted Display (HMD) with six degrees of freedom, we establish a subjective PCQA database, named SIAT Point Cloud Quality Database (SIAT-PCQD). Our database consists of 340 distorted point clouds compressed by the MPEG point cloud encoder with the combination of 20 sequences and 17 pairs of geometry and texture quantization parameters. The impact of distorted geometry and texture attributes is further discussed in this paper. Then, we propose two projection-based objective quality evaluation methods, i.e., a weighted view projection based model and a patch projection based model. Our subjective database and findings can be used in point cloud processing, transmission, and coding, especially for virtual reality applications. The subjective dataset1 has been released in the public repository.

31 citations


Journal ArticleDOI
TL;DR: This paper presents a new full-reference image quality assessment (IQA) method for conducting the perceptual quality evaluation of the light field images, called the symmetry and depth feature-based model (SDFM).
Abstract: This paper presents a new full-reference image quality assessment (IQA) method for conducting the perceptual quality evaluation of the light field (LF) images, called the symmetry and depth feature-based model (SDFM). Specifically, the radial symmetry transform is first employed on the luminance components of the reference and distorted LF images to extract their symmetry features for capturing the spatial quality of each view of an LF image. Second, the depth feature extraction scheme is designed to explore the geometry information inherited in an LF image for modeling its LF structural consistency across views. The similarity measurements are subsequently conducted on the comparison of their symmetry and depth features separately, which are further combined to achieve the quality score for the distorted LF image. Note that the proposed SDFM that explores the symmetry and depth features is conformable to the human vision system, which identifies the objects by sensing their structures and geometries. Extensive simulation results on the dense light fields dataset have clearly shown that the proposed SDFM outperforms multiple classical and recently developed IQA algorithms on quality evaluation of the LF images.

28 citations


Journal ArticleDOI
TL;DR: Flexible-PU as discussed by the authors proposes an end-to-end learning-based framework to generate dense point clouds from given sparse point clouds to model the underlying geometric structures of objects/scenes.
Abstract: This paper addresses the problem of generating dense point clouds from given sparse point clouds to model the underlying geometric structures of objects/scenes. To tackle this challenging issue, we propose a novel end-to-end learning-based framework. Specifically, by taking advantage of the linear approximation theorem, we first formulate the problem explicitly, which boils down to determining the interpolation weights and high-order approximation errors. Then, we design a lightweight neural network to adaptively learn unified and sorted interpolation weights as well as the high-order refinements, by analyzing the local geometry of the input point cloud. The proposed method can be interpreted by the explicit formulation, and thus is more memory-efficient than existing ones. In sharp contrast to the existing methods that work only for a pre-defined and fixed upsampling factor, the proposed framework only requires a single neural network with one-time training to handle various upsampling factors within a typical range, which is highly desired in real-world applications. In addition, we propose a simple yet effective training strategy to drive such a flexible ability. In addition, our method can handle non-uniformly distributed and noisy data well. Extensive experiments on both synthetic and real-world data demonstrate the superiority of the proposed method over state-of-the-art methods both quantitatively and qualitatively. The code will be publicly available at https://github.com/ninaqy/Flexible-PU .

27 citations


Journal ArticleDOI
TL;DR: In this paper, a tensor low-rank norm was proposed for multi-view spectral clustering (MVSC), which explicitly imposes a symmetric low rank constraint on the frontal and horizontal slices of the tensor to characterize intra-view and inter-view relationships.
Abstract: This paper explores the problem of multi-view spectral clustering (MVSC) based on tensor low-rank modeling. Unlike the existing methods that all adopt an off-the-shelf tensor low-rank norm without considering the special characteristics of the tensor in MVSC, we design a novel structured tensor low-rank norm tailored to MVSC. Specifically, we explicitly impose a symmetric low-rank constraint and a structured sparse low-rank constraint on the frontal and horizontal slices of the tensor to characterize the intra-view and inter-view relationships, respectively. Moreover, the two constraints could be jointly optimized to achieve mutual refinement. On basis of the novel tensor low-rank norm, we formulate MVSC as a convex low-rank tensor recovery problem, which is then efficiently solved with an augmented Lagrange multiplier-based method iteratively. Extensive experimental results on seven commonly used benchmark datasets show that the proposed method outperforms state-of-the-art methods to a significant extent. Impressively, our method is able to produce perfect clustering. In addition, the parameters of our method can be easily tuned, and the proposed model is robust to different datasets, demonstrating its potential in practice. The code is available at https://github.com/jyh-learning/MVSC-TLRR.

Proceedings ArticleDOI
17 Oct 2021
TL;DR: Zhang et al. as mentioned in this paper proposed an attention-driven graph clustering network (AGCN), which exploits a heterogeneity-wise fusion module to dynamically fuse the node attribute feature and the topological graph feature.
Abstract: The combination of the traditional convolutional network (i.e., an auto-encoder) and the graph convolutional network has attracted much attention in clustering, in which the auto-encoder extracts the node attribute feature and the graph convolutional network captures the topological graph feature. However, the existing works (i) lack a flexible combination mechanism to adaptively fuse those two kinds of features for learning the discriminative representation and (ii) overlook the multi-scale information embedded at different layers for subsequent cluster assignment, leading to inferior clustering results. To this end, we propose a novel deep clustering method named Attention-driven Graph Clustering Network (AGCN). Specifically, AGCN exploits a heterogeneity-wise fusion module to dynamically fuse the node attribute feature and the topological graph feature. Moreover, AGCN develops a scale-wise fusion module to adaptively aggregate the multi-scale features embedded at different layers. Based on a unified optimization framework, AGCN can jointly perform feature learning and cluster assignment in an unsupervised fashion. Compared with the existing deep clustering methods, our method is more flexible and effective since it comprehensively considers the numerous and discriminative information embedded in the network and directly produces the clustering results. Extensive quantitative and qualitative results on commonly used benchmark datasets validate that our AGCN consistently outperforms state-of-the-art methods.

Journal ArticleDOI
TL;DR: This paper designs a simultaneous version of block-based OMP (Orthogonal Matching Pursuit) algorithm, namely SBOMP, to solve the VS problem with such video frame representation as block sparse representation by considering each video frame as a block containing a number of patches.
Abstract: In recent years, sparse representation has been successfully utilized for video summarization (VS). However, most of the sparse representation based VS methods characterize each video frame with global features. As a result, some important local details could be neglected by global features, which may compromise the performance of summarization. In this paper, we propose to partition each video frame into a number of patches and characterize each patch with global features. Instead of concatenating the features of each patch and utilizing conventional sparse representation, we formulate the VS problem with such video frame representation as block sparse representation by considering each video frame as a block containing a number of patches. By taking the reconstruction constraint into account, we devise a simultaneous version of block-based OMP (Orthogonal Matching Pursuit) algorithm, namely SBOMP, to solve the proposed model. The proposed model is further extended to a neighborhood based model which considers temporally adjacent frames as a super block. This is one of the first sparse representation based VS methods taking both spatial and temporal contexts into account with blocks. Experimental results on two widely used VS datasets have demonstrated that our proposed methods present clear superiority over existing sparse representation based VS methods and are highly comparable to some deep learning ones requiring supervision information for extra model training.

Posted ContentDOI
TL;DR: Zhang et al. as discussed by the authors proposed an attention-driven graph clustering network (AGCN), which exploits a heterogeneity-wise fusion module to dynamically fuse the node attribute feature and the topological graph feature.
Abstract: The combination of the traditional convolutional network (i.e., an auto-encoder) and the graph convolutional network has attracted much attention in clustering, in which the auto-encoder extracts the node attribute feature and the graph convolutional network captures the topological graph feature. However, the existing works (i) lack a flexible combination mechanism to adaptively fuse those two kinds of features for learning the discriminative representation and (ii) overlook the multi-scale information embedded at different layers for subsequent cluster assignment, leading to inferior clustering results. To this end, we propose a novel deep clustering method named Attention-driven Graph Clustering Network (AGCN). Specifically, AGCN exploits a heterogeneity-wise fusion module to dynamically fuse the node attribute feature and the topological graph feature. Moreover, AGCN develops a scale-wise fusion module to adaptively aggregate the multi-scale features embedded at different layers. Based on a unified optimization framework, AGCN can jointly perform feature learning and cluster assignment in an unsupervised fashion. Compared with the existing deep clustering methods, our method is more flexible and effective since it comprehensively considers the numerous and discriminative information embedded in the network and directly produces the clustering results. Extensive quantitative and qualitative results on commonly used benchmark datasets validate that our AGCN consistently outperforms state-of-the-art methods.

Journal ArticleDOI
TL;DR: SP-DLRR as discussed by the authors is composed of two modules, i.e., the classification-guided superpixel segmentation and the discriminative low-rank representation, which are iteratively conducted.
Abstract: In this paper, we propose a novel classification scheme for the remotely sensed hyperspectral image (HSI), namely SP-DLRR, by comprehensively exploring its unique characteristics, including the local spatial information and low-rankness. SP-DLRR is mainly composed of two modules, i.e., the classification-guided superpixel segmentation and the discriminative low-rank representation, which are iteratively conducted. Specifically, by utilizing the local spatial information and incorporating the predictions from a typical classifier, the first module segments pixels of an input HSI (or its restoration generated by the second module) into superpixels. According to the resulting superpixels, the pixels of the input HSI are then grouped into clusters and fed into our novel discriminative low-rank representation model with an effective numerical solution. Such a model is capable of increasing the intra-class similarity by suppressing the spectral variations locally while promoting the inter-class discriminability globally, leading to a restored HSI with more discriminative pixels. Experimental results on three benchmark datasets demonstrate the significant superiority of SP-DLRR over state-of-the-art methods, especially for the case with an extremely limited number of training pixels.

Journal ArticleDOI
TL;DR: Guo et al. as discussed by the authors proposed a deep learning-based framework for the reconstruction of high-quality LFs from acquisitions via learned coded apertures, which incorporates the measurement observation into the deep learning framework elegantly to avoid relying entirely on data-driven priors for LF reconstruction.
Abstract: In this paper, we propose a novel learning-based framework for the reconstruction of high-quality LFs from acquisitions via learned coded apertures. The proposed method incorporates the measurement observation into the deep learning framework elegantly to avoid relying entirely on data-driven priors for LF reconstruction. Specifically, we first formulate the compressive LF reconstruction as an inverse problem with an implicit regularization term. Then, we construct the regularization term with a deep efficient spatial-angular separable convolutional sub-network in the form of local and global residual learning to comprehensively explore the signal distribution free from the limited representation ability and inefficiency of deterministic mathematical modeling. Furthermore, we extend this pipeline to LF denoising and spatial super-resolution, which could be considered as variants of coded aperture imaging equipped different degradation matrices. Extensive experimental results demonstrate that the proposed methods outperform state-of-the-art approaches to a significant extent both quantitatively and qualitatively, i.e., the reconstructed LFs not only achieve much higher PSNR/SSIM but also preserve the LF parallax structure better on both real and synthetic LF benchmarks. The code will be publicly available at https://github.com/MantangGuo/DRLF.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this paper, the authors propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations, which not only makes the solution space well-constrained but also enables their method to be solved iteratively with a recurrent framework.
Abstract: Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data. In this paper, we resolve these two challenges simultaneously. First, we propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations. This representation not only makes the solution space well-constrained but also enables our method to be solved iteratively with a recurrent framework, which greatly reduces the difficulty of learning. Second, we introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images so that our full framework can be trained end-to-end without ground truth supervision. Extensive experiments on several different datasets demonstrate that our proposed method outperforms the previous state-of-the-art by a large margin.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed analytical distortion and rate models for the geometry, and color information, and formulated the joint bit allocation problem as a constrained convex optimization problem, and solved it with an interior point method.
Abstract: In video-based 3D point cloud compression, the quality of the reconstructed 3D point cloud depends on both the geometry, and color distortions. Finding an optimal allocation of the total bitrate between the geometry coder, and the color coder is a challenging task due to the large number of possible solutions. To solve this bit allocation problem, we first propose analytical distortion, and rate models for the geometry, and color information. Using these models, we formulate the joint bit allocation problem as a constrained convex optimization problem, and solve it with an interior point method. Experimental results show that the rate-distortion performance of the proposed solution is close to that obtained with exhaustive search but at only 0.66 $\%$ of its time complexity.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a maximum entropy subspace clustering network (MESC-Net) which maximizes the entropy of the affinity matrix to promote the connectivity within each subspace, in which its elements corresponding to the same subspace are uniformly and densely distributed.
Abstract: Deep subspace clustering networks have attracted much attention in subspace clustering, in which an auto-encoder non-linearly maps the input data into a latent space, and a fully connected layer named self-expressiveness module is introduced to learn the affinity matrix via a typical regularization term (e.g., sparse or low-rank). However, the adopted regularization terms ignore the connectivity within each subspace, limiting their clustering performance. In addition, the adopted framework suffers from the coupling issue between the auto-encoder module and the self-expressiveness module, making the network training non-trivial. To tackle these two issues, we propose a novel deep subspace clustering method named Maximum Entropy Subspace Clustering Network (MESC-Net). Specifically, MESC-Net maximizes the entropy of the affinity matrix to promote the connectivity within each subspace, in which its elements corresponding to the same subspace are uniformly and densely distributed. Meanwhile, we design a novel framework to explicitly decouple the auto-encoder module and the self-expressiveness module. Besides, we also theoretically prove that the learned affinity matrix satisfies the block-diagonal property under the assumption of independent subspaces. Extensive quantitative and qualitative results on commonly used benchmark datasets validate MESC-Net significantly outperforms state-of-the-art methods. The code is publicly available at.

Journal ArticleDOI
TL;DR: This article proposes a dynamic regularization method for CNNs that can dynamically adjust the regularization strength in the training procedure, thereby balancing the underfitting and overfitting of CNNs.
Abstract: Regularization is commonly used for alleviating overfitting in machine learning. For convolutional neural networks (CNNs), regularization methods, such as DropBlock and Shake-Shake, have illustrated the improvement in the generalization performance. However, these methods lack a self-adaptive ability throughout training. That is, the regularization strength is fixed to a predefined schedule, and manual adjustments are required to adapt to various network architectures. In this article, we propose a dynamic regularization method for CNNs. Specifically, we model the regularization strength as a function of the training loss. According to the change of the training loss, our method can dynamically adjust the regularization strength in the training procedure, thereby balancing the underfitting and overfitting of CNNs. With dynamic regularization, a large-scale model is automatically regularized by the strong perturbation, and vice versa. Experimental results show that the proposed method can improve the generalization capability on off-the-shelf network architectures and outperform state-of-the-art regularization methods.

Journal ArticleDOI
TL;DR: Extensive experimental results show that the proposed DP-GLPCA can produce much higher clustering accuracy than state-of-the-art constrained clustering methods, and it can converge to a Karush-Kuhn-Tucker point.
Abstract: In this article, we propose a novel model for constrained clustering, namely, the dissimilarity propagation-guided graph-Laplacian principal component analysis (DP-GLPCA). By fully utilizing a limited number of weakly supervisory information in the form of pairwise constraints, the proposed DP-GLPCA is capable of capturing both the local and global structures of input samples to exploit their characteristics for excellent clustering. More specifically, we first formulate a convex semisupervised low-dimensional embedding model by incorporating a new dissimilarity regularizer into GLPCA (i.e., an unsupervised dimensionality reduction model), in which both the similarity and dissimilarity between low-dimensional representations are enforced with the constraints to improve their discriminability. An efficient iterative algorithm based on the inexact augmented Lagrange multiplier is designed to solve it with the global convergence guaranteed. Furthermore, we innovatively propose to propagate the cannot-link constraints (i.e., dissimilarity) to refine the dissimilarity regularizer to be more informative. The resulting DP model is iteratively solved, and we also prove that it can converge to a Karush–Kuhn–Tucker point. Extensive experimental results over nine commonly used benchmark data sets show that the proposed DP-GLPCA can produce much higher clustering accuracy than state-of-the-art constrained clustering methods. Besides, the effectiveness and advantage of the proposed DP model are experimentally verified. To the best of our knowledge, it is the first time to investigate DP, which is contrast to existing pairwise constraint propagation that propagates similarity. The code is publicly available at https://github.com/jyh-learning/DP-GLPCA .

Journal ArticleDOI
TL;DR: This paper proposes a new method, namely local-global balanced low-rank approximation (GLB-LRA), which can increase the similarity between pixels belonging to an identical category while promoting the discriminability between pixels of different categories.
Abstract: This paper explores the problem of recovering the discriminative representation of a hyperspectral remote sensing image (HRSI), which suffers from spectral variations, to boost its classification accuracy. To tackle this challenge, we propose a new method, namely local-global balanced low-rank approximation (GLB-LRA), which can increase the similarity between pixels belonging to an identical category while promoting the discriminability between pixels of different categories. Specifically, by taking advantage of the particular structural spatial information of HRSIs, we exploit the low-rankness of an HRSI robustly in both spatial and spectral domains from the perspective of local and global balance. We mathematically formulate GLB-LRA as an explicit optimization problem and propose an iterative algorithm to solve it efficiently. Experimental results over three commonly-used benchmark datasets demonstrate the significant superiority of our method over state-of-the-art methods.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a physically-interpretable, compact, efficient, and end-to-end learning-based framework, namely AGD-Net, to recover hyperspectral images from single RGB images.
Abstract: This paper investigates the problem of recovering hyperspectral (HS) images from single RGB images. To tackle such a severely ill-posed problem, we propose a physically-interpretable, compact, efficient, and end-to-end learning-based framework, namely AGD-Net. Precisely, by taking advantage of the imaging process, we first formulate the problem explicitly based on the classic gradient descent algorithm. Then, we design a lightweight neural network with a multi-stage architecture to mimic the formed amended gradient descent process, in which efficient convolution and novel spectral zero-mean normalization are proposed to effectively extract spatial-spectral features for regressing an initialization, a basic gradient, and an incremental gradient. Besides, based on the approximate low-rank property of HS images, we propose a novel rank loss to promote the similarity between the global structures of reconstructed and ground-truth HS images, which is optimized with our singular value weighting strategy during training. Moreover, AGD-Net, a single network after one-time training, is flexible to handle the reconstruction with various spectral response functions. Extensive experiments over three commonly-used benchmark datasets demonstrate that AGD-Net can improve the reconstruction quality by more than 1.0 dB on average while saving 67× parameters and 32× FLOPs, compared with state-of-the-art methods. The code will be publicly available at https://github.com/zbzhzhy/GD-Net .

Journal ArticleDOI
TL;DR: A joint PCP model for constrained SC is proposed by simultaneously learning a propagation matrix and an affinity matrix that is formulated as a bounded symmetric graph regularized low-rank matrix completion problem and exhibits an ideal appearance under some conditions.
Abstract: Constrained spectral clustering (SC) based on pairwise constraint propagation has attracted much attention due to the good performance. All the existing methods could be generally cast as the following two steps, i.e., a small number of pairwise constraints are first propagated to the whole data under the guidance of a predefined affinity matrix, and the affinity matrix is then refined in accordance with the resulting propagation and finally adopted for SC. Such a stepwise manner, however, overlooks the fact that the two steps indeed depend on each other, i.e., the two steps form a “chicken-egg” problem, leading to suboptimal performance. To this end, we propose a joint PCP model for constrained SC by simultaneously learning a propagation matrix and an affinity matrix. Especially, it is formulated as a bounded symmetric graph regularized low-rank matrix completion problem. We also show that the optimized affinity matrix by our model exhibits an ideal appearance under some conditions. Extensive experimental results in terms of constrained SC, semisupervised classification, and propagation behavior validate the superior performance of our model compared with state-of-the-art methods.

Journal ArticleDOI
TL;DR: In this article, a semisupervised affinity matrix learning method is proposed to learn an affinity matrix of data samples under the supervision of a small number of pairwise constraints (PCs) by observing that both the matrix encoding PCM and the empirically constructed affinity matrix (EAM) express the similarity between samples.
Abstract: This article explores the problem of semisupervised affinity matrix learning, that is, learning an affinity matrix of data samples under the supervision of a small number of pairwise constraints (PCs). By observing that both the matrix encoding PCs, called pairwise constraint matrix (PCM) and the empirically constructed affinity matrix (EAM), express the similarity between samples, we assume that both of them are generated from a latent affinity matrix (LAM) that can depict the ideal pairwise relation between samples. Specifically, the PCM can be thought of as a partial observation of the LAM, while the EAM is a fully observed one but corrupted with noise/outliers. To this end, we innovatively cast the semisupervised affinity matrix learning as the recovery of the LAM guided by the PCM and EAM, which is technically formulated as a convex optimization problem. We also provide an efficient algorithm for solving the resulting model numerically. Extensive experiments on benchmark datasets demonstrate the significant superiority of our method over state-of-the-art ones when used for constrained clustering and dimensionality reduction. The code is publicly available at https://github.com/jyh-learning/LAM.

Posted Content
TL;DR: Zhang et al. as mentioned in this paper proposed an end-to-end learning-based framework to reconstruct hyperspectral (HS) images from single RGB images captured by commercial cameras, without using paired ground-truth HS and RGB images during training.
Abstract: This paper investigates the problem of reconstructing hyperspectral (HS) images from single RGB images captured by commercial cameras, \textbf{without} using paired HS and RGB images during training. To tackle this challenge, we propose a new lightweight and end-to-end learning-based framework. Specifically, on the basis of the intrinsic imaging degradation model of RGB images from HS images, we progressively spread the differences between input RGB images and re-projected RGB images from recovered HS images via effective unsupervised camera spectral response function estimation. To enable the learning without paired ground-truth HS images as supervision, we adopt the adversarial learning manner and boost it with a simple yet effective $\mathcal{L}_1$ gradient clipping scheme. Besides, we embed the semantic information of input RGB images to locally regularize the unsupervised learning, which is expected to promote pixels with identical semantics to have consistent spectral signatures. In addition to conducting quantitative experiments over two widely-used datasets for HS image reconstruction from synthetic RGB images, we also evaluate our method by applying recovered HS images from real RGB images to HS-based visual tracking. Extensive results show that our method significantly outperforms state-of-the-art unsupervised methods and even exceeds the latest supervised method under some settings. The source code is public available at this https URL.

Posted Content
TL;DR: In this article, a lightweight neural network is proposed to dynamically learn weights for interpolating neighboring pixels from input views to synthesize each pixel of novel views independently and recover the spatial correlation between the independently synthesized pixels of each novel view by referring to that of input views using a geometry-based spatial refinement module.
Abstract: In this paper, we tackle the problem of dense light field (LF) reconstruction from sparsely-sampled ones with wide baselines and propose a learnable model, namely dynamic interpolation, to replace the commonly-used geometry warping operation. Specifically, with the estimated geometric relation between input views, we first construct a lightweight neural network to dynamically learn weights for interpolating neighbouring pixels from input views to synthesize each pixel of novel views independently. In contrast to the fixed and content-independent weights employed in the geometry warping operation, the learned interpolation weights implicitly incorporate the correspondences between the source and novel views and adapt to different image content information. Then, we recover the spatial correlation between the independently synthesized pixels of each novel view by referring to that of input views using a geometry-based spatial refinement module. We also constrain the angular correlation between the novel views through a disparity-oriented LF structure loss. Experimental results on LF datasets with wide baselines show that the reconstructed LFs achieve much higher PSNR/SSIM and preserve the LF parallax structure better than state-of-the-art methods. The source code is publicly available at this https URL.

Posted Content
TL;DR: Wang et al. as mentioned in this paper proposed an adaptive attribute and structure subspace clustering network (AASSC-Net) to simultaneously consider the attribute information to conduct the self-expressiveness, which may limit the clustering performance.
Abstract: Deep self-expressiveness-based subspace clustering methods have demonstrated effectiveness. However, existing works only consider the attribute information to conduct the self-expressiveness, which may limit the clustering performance. In this paper, we propose a novel adaptive attribute and structure subspace clustering network (AASSC-Net) to simultaneously consider the attribute and structure information in an adaptive graph fusion manner. Specifically, we first exploit an auto-encoder to represent input data samples with latent features for the construction of an attribute matrix. We also construct a mixed signed and symmetric structure matrix to capture the local geometric structure underlying data samples. Then, we perform self-expressiveness on the constructed attribute and structure matrices to learn their affinity graphs separately. Finally, we design a novel attention-based fusion module to adaptively leverage these two affinity graphs to construct a more discriminative affinity graph. Extensive experimental results on commonly used benchmark datasets demonstrate that our AASSC-Net significantly outperforms state-of-the-art methods. In addition, we conduct comprehensive ablation studies to discuss the effectiveness of the designed modules. The code will be publicly available at this https URL.

Proceedings ArticleDOI
17 Oct 2021
TL;DR: In this paper, a cycle-consistent reconstruction network (CR-Net) is proposed to reconstruct the 4D light field (LF) from 2D measurements captured by the coded aperture camera by progressively eliminating the residuals between projected measurements from the reconstructed LF and input measurements.
Abstract: This paper investigates the 4-D light field (LF) reconstruction from 2-D measurements captured by the coded aperture camera. To tackle such an ill-posed inverse problem, we propose a cycle-consistent reconstruction network (CR-Net). To be specific, based on the intrinsic linear imaging model of the coded aperture, CR-Net reconstructs an LF through progressively eliminating the residuals between the projected measurements from the reconstructed LF and input measurements. Moreover, to address the crucial issue of extracting representative features from high-dimensional LF data efficiently and effectively, we formulate the problem in a probability space and propose to approximate a posterior distribution of a set of carefully-defined LF processing events, including both layer-wise spatial-angular feature extraction and network-level feature aggregation. Through droppath from a densely-connected template network, we derive an adaptively learned spatial-angular fusion strategy, which is sharply contrasted with existing manners that combine spatial and angular features empirically. Extensive experiments on both simulated measurements and measurements by a real coded aperture camera demonstrate the significant advantage of our method over state-of-the-art ones, i.e., our method improves the reconstruction quality by 4.5 dB.

Posted Content
TL;DR: APNT-Fusion as discussed by the authors proposes an attention-guided progressive neural texture fusion (APNT)-based HDR restoration model which aims to address content association ambiguities caused by saturation, motion, and various artifacts introduced during multi-exposure fusion such as ghosting, noise and blur.
Abstract: High Dynamic Range (HDR) imaging via multi-exposure fusion is an important task for most modern imaging platforms. In spite of recent developments in both hardware and algorithm innovations, challenges remain over content association ambiguities caused by saturation, motion, and various artifacts introduced during multi-exposure fusion such as ghosting, noise, and blur. In this work, we propose an Attention-guided Progressive Neural Texture Fusion (APNT-Fusion) HDR restoration model which aims to address these issues within one framework. An efficient two-stream structure is proposed which separately focuses on texture feature transfer over saturated regions and multi-exposure tonal and texture feature fusion. A neural feature transfer mechanism is proposed which establishes spatial correspondence between different exposures based on multi-scale VGG features in the masked saturated HDR domain for discriminative contextual clues over the ambiguous image areas. A progressive texture blending module is designed to blend the encoded two-stream features in a multi-scale and progressive manner. In addition, we introduce several novel attention mechanisms, i.e., the motion attention module detects and suppresses the content discrepancies among the reference images; the saturation attention module facilitates differentiating the misalignment caused by saturation from those caused by motion; and the scale attention module ensures texture blending consistency between different coder/decoder scales. We carry out comprehensive qualitative and quantitative evaluations and ablation studies, which validate that these novel modules work coherently under the same framework and outperform state-of-the-art methods.