scispace - formally typeset
Search or ask a question

Showing papers by "Yulan Guo published in 2018"


Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this article, the authors propose a network architecture to incorporate all steps of stereo matching, including matching cost calculation, matching cost aggregation, disparity calculation, and disparity refinement, which achieves the state-of-the-art performance on the KITTI 2012 and KittI 2015 benchmarks while maintaining a very fast running time.
Abstract: Stereo matching algorithms usually consist of four steps, including matching cost calculation, matching cost aggregation, disparity calculation, and disparity refinement. Existing CNN-based methods only adopt CNN to solve parts of the four steps, or use different networks to deal with different steps, making them difficult to obtain the overall optimal solution. In this paper, we propose a network architecture to incorporate all steps of stereo matching. The network consists of three parts. The first part calculates the multi-scale shared features. The second part performs matching cost calculation, matching cost aggregation and disparity calculation to estimate the initial disparity using shared features. The initial disparity and the shared features are used to calculate the feature constancy that measures correctness of the correspondence between two input images. The initial disparity and the feature constancy are then fed into a sub-network to refine the initial disparity. The proposed method has been evaluated on the Scene Flow and KITTI datasets. It achieves the state-of-the-art performance on the KITTI 2012 and KITTI 2015 benchmarks while maintaining a very fast running time. Source code is available at http://github.com/leonzfa/iResNet.

252 citations


Journal ArticleDOI
TL;DR: Both simulated and real data experiments demonstrate that the proposed SSTV-LRTF method achieves superior performance for HSI mixed-noise removal, as compared to the state-of-the-art TV regularized and LR-based methods.
Abstract: Several bandwise total variation (TV) regularized low-rank (LR)-based models have been proposed to remove mixed noise in hyperspectral images (HSIs). These methods convert high-dimensional HSI data into 2-D data based on LR matrix factorization. This strategy introduces the loss of useful multiway structure information. Moreover, these bandwise TV-based methods exploit the spatial information in a separate manner. To cope with these problems, we propose a spatial–spectral TV regularized LR tensor factorization (SSTV-LRTF) method to remove mixed noise in HSIs. From one aspect, the hyperspectral data are assumed to lie in an LR tensor, which can exploit the inherent tensorial structure of hyperspectral data. The LRTF-based method can effectively separate the LR clean image from sparse noise. From another aspect, HSIs are assumed to be piecewisely smooth in the spatial domain. The TV regularization is effective in preserving the spatial piecewise smoothness and removing Gaussian noise. These facts inspire the integration of the LRTF with TV regularization. To address the limitations of bandwise TV, we use the SSTV regularization to simultaneously consider local spatial structure and spectral correlation of neighboring bands. Both simulated and real data experiments demonstrate that the proposed SSTV-LRTF method achieves superior performance for HSI mixed-noise removal, as compared to the state-of-the-art TV regularized and LR-based methods.

144 citations


Journal ArticleDOI
TL;DR: Experimental results show that road boundaries can be robustly extracted with an average completeness over 95%, an average correctness over 98%, and an average quality over 94% on two data sets.
Abstract: Effective extraction of road boundaries plays a significant role in intelligent transportation applications, including autonomous driving, vehicle navigation, and mapping This paper presents a new method to automatically extract 3-D road boundaries from mobile laser scanning (MLS) data The proposed method includes two main stages: supervoxel generation and 3-D road boundary extraction Supervoxels are generated by selecting smooth points as seeds and assigning points into facets centered on these seeds using several attributes (eg, geometric, intensity, and spatial distance) 3-D road boundaries are then extracted using the $\alpha $ -shape algorithm and the graph cuts-based energy minimization algorithm The proposed method was tested on two data sets acquired by a RIEGL VMX-450 MLS system Experimental results show that road boundaries can be robustly extracted with an average completeness over 95%, an average correctness over 98%, and an average quality over 94% on two data sets The effectiveness and superiority of the proposed method over the state-of-the-art methods is demonstrated

73 citations


Book ChapterDOI
02 Dec 2018
TL;DR: This paper proposes an end-to-end trainable video SR framework to super-resolve both images and optical flows and demonstrates that HR optical flows provide more accurate correspondences than their LR counterparts and improve both accuracy and consistency performance.
Abstract: Video super-resolution (SR) aims to generate a sequence of high-resolution (HR) frames with plausible and temporally consistent details from their low-resolution (LR) counterparts. The generation of accurate correspondence plays a significant role in video SR. It is demonstrated by traditional video SR methods that simultaneous SR of both images and optical flows can provide accurate correspondences and better SR results. However, LR optical flows are used in existing deep learning based methods for correspondence generation. In this paper, we propose an end-to-end trainable video SR framework to super-resolve both images and optical flows. Specifically, we first propose an optical flow reconstruction network (OFRnet) to infer HR optical flows in a coarse-to-fine manner. Then, motion compensation is performed according to the HR optical flows. Finally, compensated LR inputs are fed to a super-resolution network (SRnet) to generate the SR results. Extensive experiments demonstrate that HR optical flows provide more accurate correspondences than their LR counterparts and improve both accuracy and consistency performance. Comparative results on the Vid4 and DAVIS-10 datasets show that our framework achieves the state-of-the-art performance.

67 citations


Journal ArticleDOI
TL;DR: An accurate and robust method for infrared small target detection using multiscale gray and variance difference measures is proposed to alleviate the impact of background fluctuation and improve the robustness of the method.
Abstract: As a long-standing problem, infrared small target detection is challenging due to the dimness of targets and the complexity of background. Considering the limitation of traditional approaches, we propose an accurate and robust method for infrared small target detection using multiscale gray and variance difference measures. A multiscale adaptive gray difference measure is first used to enhance small targets and improve detection accuracy. Then, a multiscale variance difference measure is proposed to alleviate the impact of background fluctuation and improve the robustness of our method. By integrating these two approaches, targets can be extracted accurately using a threshold-adaptive segmentation. Extensive experiments have been conducted on datasets with various scenes. Results have demonstrated the effectiveness and outperformance of our method as compared to the state-of-the-art methods.

35 citations


Posted Content
TL;DR: Wang et al. as mentioned in this paper proposed an end-to-end trainable video super-resolution framework to super-resolve both images and optical flows, where an optical flow reconstruction network (OFRnet) was proposed to infer HR optical flows in a coarse to fine manner.
Abstract: Video super-resolution (SR) aims to generate a sequence of high-resolution (HR) frames with plausible and temporally consistent details from their low-resolution (LR) counterparts. The generation of accurate correspondence plays a significant role in video SR. It is demonstrated by traditional video SR methods that simultaneous SR of both images and optical flows can provide accurate correspondences and better SR results. However, LR optical flows are used in existing deep learning based methods for correspondence generation. In this paper, we propose an end-to-end trainable video SR framework to super-resolve both images and optical flows. Specifically, we first propose an optical flow reconstruction network (OFRnet) to infer HR optical flows in a coarse-to-fine manner. Then, motion compensation is performed according to the HR optical flows. Finally, compensated LR inputs are fed to a super-resolution network (SRnet) to generate the SR results. Extensive experiments demonstrate that HR optical flows provide more accurate correspondences than their LR counterparts and improve both accuracy and consistency performance. Comparative results on the Vid4 and DAVIS-10 datasets show that our framework achieves the state-of-the-art performance.

18 citations


Proceedings ArticleDOI
01 Aug 2018
TL;DR: A multi-scale feature learning block is first introduced to obtain informative contextual features in 3D point clouds and a global and local feature aggregation block is extended to improve the feature learning ability of the network.
Abstract: Semantic segmentation of 3D scenes is a fundamental problem in 3D computer vision. In this paper, we propose a deep neural network for 3D semantic segmentation of raw point clouds. A multi-scale feature learning block is first introduced to obtain informative contextual features in 3D point clouds. A global and local feature aggregation block is then extended to improve the feature learning ability of the network. Based on these strategies, a powerful architecture named 3DMAX-Net is finally provided for semantic segmentation in raw 3D point clouds. Experiments have been conducted on the Stanford large-scale 3D Indoor Spaces Dataset using only geometry information. Experimental results have clearly shown the superiority of the proposed network.

15 citations


Journal ArticleDOI
TL;DR: This paper presents SISE as a mobile crowdsourcing system that uses a new abstraction for indoor general entities and their semantics, enGraph, to automatically update changed semantics of indoor floorplans using images and inertial data, and proposes efficient methods to generate enGraph.
Abstract: Indoor semantic floorplan is important for a range of location based service (LBS) applications, attracting many research efforts in several years. In many cases, the out-of-date indoor semantic floorplans would gradually deteriorate and even break down the LBS performance. Thus, it is important to automatically update changed semantics of indoor floorplans caused by environmental variation. However, few research has been focused on the continuous semantic updating problem. This paper presents SISE as a mobile crowdsourcing system that uses a new abstraction for indoor general entities and their semantics, enGraph, to automatically update changed semantics of indoor floorplans using images and inertial data. We first propose efficient methods to generate enGraph. Thus, an image can be associated with an indoor semantic floorplan. Accordingly, we formulate the enGraph matching problem and then propose a quality-based maximum common subgraph matching algorithm so that entities extracted from an image can be corresponded to entities in the indoor semantic floorplan. Furthermore, we propose a quadrant comparison algorithm and a region shrink based localization algorithm to detect and localize changed entities. Thus, the new semantics can be labeled and out-of-date semantics can be removed. Extensive experiments have been conducted on real and synthetic data. Experimental results show that 80 percent of out-of-date semantics of indoor general entities can be updated by SISE.

10 citations


Journal ArticleDOI
TL;DR: This letter proposes a semi-online MOT method using online discriminative appearance learning and tracklet association with a sliding window that is improved by 8.31% and 12.38% in terms of Multiple Object Tracking Accuracy and Multiple Object tracking Precision, respectively, as compared to the baseline.
Abstract: Online multiple object tracking (MOT) is highly challenging when multiple objects have similar appearance or under long occlusion. In this letter, we propose a semi-online MOT method using online discriminative appearance learning and tracklet association with a sliding window. We connect similar detections of neighboring frames in a temporal window, and improve the performance of appearance feature by online discriminative appearance learning. Then, tracklet association is performed by minimizing a subgraph decomposition cost. Occlusions and missing detections are recovered after tracklet stitching. Our method has been tested on two public datasets. Experimental results have demonstrated the significant performance improvement of our method. Specifically, the proposed method is improved by 8.31% and 12.38% in terms of Multiple Object Tracking Accuracy and Multiple Object Tracking Precision, respectively, as compared to the baseline.

9 citations


Book ChapterDOI
23 Nov 2018
TL;DR: A local adaptive contrast measure for robust infrared small target detection using gray and variance difference is proposed and can achieve better detection performance than the state-of-the-art approaches.
Abstract: Infrared small target detection plays an important role in infrared monitoring and early warning systems. This paper proposes a local adaptive contrast measure for robust infrared small target detection using gray and variance difference. First, a size-adaptive gray-level target enhancement process is performed. Then, an improved multiscale variance difference method is proposed for target enhancement and cloud clutter removal. To demonstrate the effectiveness of the proposed approach, a test dataset consisting of two infrared image sequences with different backgrounds was collected. Experiments on the test dataset demonstrate that the proposed infrared small target detection method can achieve better detection performance than the state-of-the-art approaches.

3 citations


Proceedings ArticleDOI
01 Aug 2018
TL;DR: The proposed Complementary Learners with Instance-specific Proposals (CLIP) tracker consists of three main components, including a translation filter, a scale filter, and an error correction module, which aims to provide an excellent real-time inference.
Abstract: Correlation filter based trackers have been extensively investigated for their superior efficiency and fairly good robustness. However, it remains challenging to achieve longterm tracking when the object is under occlusion and severe deformation. In this paper, we propose a tracker named Complementary Learners with Instance-specific Proposals (CLIP). The CLIP tracker consists of three main components, including a translation filter, a scale filter, and an error correction module. Complementary features are incorporated into the translation filter to cope with illumination changes and deformation, and an adaptive updating mechanism is proposed to prevent model corruption. The translation filter aims to provide an excellent real-time inference. Furthermore, the error correction module is activated to correct the localization error by an instance-specific proposal generator, especially when the target suffers from dramatic appearance changes. Experimental results on the OTB, Temple-Color 128 and UAV20L datasets demonstrate that the CLIP tracker performs favorably against existing competitive trackers in term of accuracy and robustness. Moreover, our proposed CLIP tracker runs at the speed of 33 fps on the OTB. It is highly suitable for real-time applications.

Proceedings ArticleDOI
01 Aug 2018
TL;DR: An augmented descriptor is proposed by combining ORB feature and the context descriptor to increase its discriminability and matching performance and achieves higher precision/recall and faster speed than the original algorithm proposed by Antonio et al.
Abstract: Visual loop closure is important in pose tracking and relocalization in many robotics and Argument Reality (AR) systems. For large and highly repetitive environments, sparse keypoint-based methods face several challenges, especially the discriminability of descriptors. In this paper, we propose an augmented descriptor by combining ORB feature and the context descriptor to increase its discriminability and matching performance. An end-to-end network is adopted to perform simultaneous feature learning and code hashing for the context. In addition, feature position clustering is used to reduce the number of contexts. Besides, hash mapping is adopted to reduce the dimensionality of ORB features. Finally, the context descriptors and ORB features with dimensionality reduction are stacked. Experimental results on the NewCollege and TUM datasets demonstrate that our algorithm achieves higher precision/recall and faster speed than the original algorithm proposed by Antonio et al. [1].

Patent
06 Nov 2018
TL;DR: In this article, a semi-online visual multi-target tracking method based on a wavelet graph correlation model is proposed, and the multi-frame detection of a plurality of targets is associated into a small track in a short time window, and appearance characteristics and the movement speed of the initial time period and the ending time period of the wavelets are extracted.
Abstract: The invention discloses a semi-online visual multi-target tracking method based on a wavelet graph correlation model, and the multi-frame detection of a plurality of targets is associated into a smalltrack in a short time window, and the appearance characteristics and the movement speed of the initial time period and the ending time period of the wavelets are extracted. After the mutual attraction between the wavelets is evaluated, the wavelets are further correlated to form a long track through an undirected graph model, and partial results are output after batch processing. The method achieves the high offline tracking precision and cannot achieve the compromising with the low online tracking accuracy in real time, achieving better balance and having the characteristics of being rapid,simple and robust. By establishing the appearance model of the target, the appearance feature of the target has higher discrimination, and the time sequence is determined before the wavelet correlation, the appearance similarity analysis is carried out on the feature with the closer time, and the number of identity changes is effectively reduced, and the time delay cannot be longer, so that the algorithm is high in precision.