scispace - formally typeset
Search or ask a question
Author

Yue Wang

Bio: Yue Wang is an academic researcher from Zhejiang University. The author has contributed to research in topics: Pose & Graph (abstract data type). The author has an hindex of 18, co-authored 123 publications receiving 1112 citations. Previous affiliations of Yue Wang include Zhejiang University of Technology & Stanford University.

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
01 May 2017
TL;DR: Zhang et al. as mentioned in this paper construct a dense reference map from the sparse laser range data, redefining the depth estimation task as estimating the distance between the real and the reference depth.
Abstract: Many standard robotic platforms are equipped with at least a fixed 2D laser range finder and a monocular camera. Although those platforms do not have sensors for 3D depth sensing capability, knowledge of depth is an essential part in many robotics activities. Therefore, recently, there is an increasing interest in depth estimation using monocular images. As this task is inherently ambiguous, the data-driven estimated depth might be unreliable in robotics applications. In this paper, we have attempted to improve the precision of monocular depth estimation by introducing 2D planar observation from the remaining laser range finder without extra cost. Specifically, we construct a dense reference map from the sparse laser range data, redefining the depth estimation task as estimating the distance between the real and the reference depth. To solve the problem, we construct a novel residual of residual neural network, and tightly combine the classification and regression losses for continuous depth estimation. Experimental results suggest that our method achieves considerable promotion compared to the state-of-the-art methods on both NYUD2 and KITTI, validating the effectiveness of our method on leveraging the additional sensory information. We further demonstrate the potential usage of our method in obstacle avoidance where our methodology provides comprehensive depth information compared to the solution using monocular camera or 2D laser range finder alone.

113 citations

Proceedings ArticleDOI
Huan Yin1, Li Tang1, Xiaqing Ding1, Yue Wang1, Rong Xiong1 
26 Jun 2018
TL;DR: A semi-handcrafted representation learning method for LiDAR point clouds using siamese LocNets, which states the place recognition problem to a similarity modeling problem and a global localization framework with range-only observations is proposed.
Abstract: Global localization in 3D point clouds is a challenging problem of estimating the pose of vehicles without any prior knowledge. In this paper, a solution to this problem is presented by achieving place recognition and metric pose estimation in the global prior map. Specifically, we present a semi-handcrafted representation learning method for LiDAR point clouds using siamese LocNets, which states the place recognition problem to a similarity modeling problem. With the final learned representations by LocNet, a global localization framework with range-only observations is proposed. To demonstrate the performance and effectiveness of our global localization system, KITTI dataset is employed for comparison with other algorithms, and also on our long-time multi-session datasets for evaluation. The result shows that our system can achieve high accuracy.

94 citations

Journal ArticleDOI
Huan Yin1, Yue Wang1, Xiaqing Ding1, Li Tang1, Shoudong Huang1, Rong Xiong1 
TL;DR: A semi-handcrafted feature learning method for 3D Light detection and ranging (LiDAR) point clouds using artificial statistics and siamese network, which transforms the place recognition problem into a similarity modeling problem and can achieve both high accuracy and efficiency for long-term autonomy.
Abstract: Global localization in 3D point clouds is a challenging task for mobile vehicles in outdoor scenarios, which requires the vehicle to localize itself correctly in a given map without prior knowledge of its pose. This is a critical component of autonomous vehicles or robots on the road for handling localization failures. In this paper, based on reduced dimension scan representations learned from neural networks, a solution to global localization is proposed by achieving place recognition first and then metric pose estimation in the global prior map. Specifically, we present a semi-handcrafted feature learning method for 3D Light detection and ranging (LiDAR) point clouds using artificial statistics and siamese network, which transforms the place recognition problem into a similarity modeling problem. Additionally, the sensor data using dimension reduced representations require less storage space and make the searching easier. With the learned representations by networks and the global poses, a prior map is built and used in the localization framework. In the localization step, position only observations obtained by place recognition are used in a particle filter algorithm to achieve precise pose estimation. To demonstrate the effectiveness of our place recognition and localization approach, KITTI benchmark and our multi-session datasets are employed for comparison with other geometric-based algorithms. The results show that our system can achieve both high accuracy and efficiency for long-term autonomy.

84 citations

Posted Content
TL;DR: The proposed deep architecture achieves superior scene classification results to the state-of-the-art on a publicly available SUN RGB-D dataset, and performance of semantic segmentation, the regularizer, reaches a new record with refinement derived from predicted scene labels.
Abstract: Scene classification is a fundamental perception task for environmental understanding in today's robotics. In this paper, we have attempted to exploit the use of popular machine learning technique of deep learning to enhance scene understanding, particularly in robotics applications. As scene images have larger diversity than the iconic object images, it is more challenging for deep learning methods to automatically learn features from scene images with less samples. Inspired by human scene understanding based on object knowledge, we address the problem of scene classification by encouraging deep neural networks to incorporate object-level information. This is implemented with a regularization of semantic segmentation. With only 5 thousand training images, as opposed to 2.5 million images, we show the proposed deep architecture achieves superior scene classification results to the state-of-the-art on a publicly available SUN RGB-D dataset. In addition, performance of semantic segmentation, the regularizer, also reaches a new record with refinement derived from predicted scene labels. Finally, we apply our SUN RGB-D dataset trained model to a mobile robot captured images to classify scenes in our university demonstrating the generalization ability of the proposed algorithm.

76 citations

Proceedings ArticleDOI
16 May 2016
TL;DR: Inspired by human scene understanding based on object knowledge, this paper addressed the problem of scene classification by encouraging deep neural networks to incorporate object-level information, which is implemented with a regularization of semantic segmentation.
Abstract: Scene classification is a fundamental perception task for environmental understanding in today's robotics. In this paper, we have attempted to exploit the use of popular machine learning technique of deep learning to enhance scene understanding, particularly in robotics applications. As scene images have larger diversity than the iconic object images, it is more challenging for deep learning methods to automatically learn features from scene images with less samples. Inspired by human scene understanding based on object knowledge, we address the problem of scene classification by encouraging deep neural networks to incorporate object-level information. This is implemented with a regularization of semantic segmentation. With only 5 thousand training images, as opposed to 2.5 million images, we show the proposed deep architecture achieves superior scene classification results to the state-of-the-art on a publicly available SUN RGB-D dataset. In addition, performance of semantic segmentation, the regularizer, also reaches a new record with refinement derived from predicted scene labels. Finally, we apply our model trained on SUN RGB-D dataset to a set of images captured in our university using a mobile robot, demonstrating the generalization ability of the proposed algorithm.

72 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Proceedings ArticleDOI
19 Feb 2018
TL;DR: In this article, the authors make the observation that the performance of multi-task learning is strongly dependent on the relative weighting between each task's loss, and propose a principled approach to weight multiple loss functions by considering the homoscedastic uncertainty of each task.
Abstract: Numerous deep learning applications benefit from multitask learning with multiple regression and classification objectives. In this paper we make the observation that the performance of such systems is strongly dependent on the relative weighting between each task's loss. Tuning these weights by hand is a difficult and expensive process, making multi-task learning prohibitive in practice. We propose a principled approach to multi-task deep learning which weighs multiple loss functions by considering the homoscedastic uncertainty of each task. This allows us to simultaneously learn various quantities with different units or scales in both classification and regression settings. We demonstrate our model learning per-pixel depth regression, semantic and instance segmentation from a monocular input image. Perhaps surprisingly, we show our model can learn multi-task weightings and outperform separate models trained individually on each task.

1,515 citations

Posted Content
TL;DR: In this article, the authors propose to weigh multiple loss functions by considering the homoscedastic uncertainty of each task and demonstrate their model learning per-pixel depth regression, semantic and instance segmentation from a monocular input image.
Abstract: Numerous deep learning applications benefit from multi-task learning with multiple regression and classification objectives. In this paper we make the observation that the performance of such systems is strongly dependent on the relative weighting between each task's loss. Tuning these weights by hand is a difficult and expensive process, making multi-task learning prohibitive in practice. We propose a principled approach to multi-task deep learning which weighs multiple loss functions by considering the homoscedastic uncertainty of each task. This allows us to simultaneously learn various quantities with different units or scales in both classification and regression settings. We demonstrate our model learning per-pixel depth regression, semantic and instance segmentation from a monocular input image. Perhaps surprisingly, we show our model can learn multi-task weightings and outperform separate models trained individually on each task.

900 citations

Journal Article
TL;DR: A new approach to visual navigation under changing conditions dubbed SeqSLAM, which removes the need for global matching performance by the vision front-end - instead it must only pick the best match within any short sequence of images.
Abstract: Learning and then recognizing a route, whether travelled during the day or at night, in clear or inclement weather, and in summer or winter is a challenging task for state of the art algorithms in computer vision and robotics. In this paper, we present a new approach to visual navigation under changing conditions dubbed SeqSLAM. Instead of calculating the single location most likely given a current image, our approach calculates the best candidate matching location within every local navigation sequence. Localization is then achieved by recognizing coherent sequences of these “local best matches”. This approach removes the need for global matching performance by the vision front-end - instead it must only pick the best match within any short sequence of images. The approach is applicable over environment changes that render traditional feature-based techniques ineffective. Using two car-mounted camera datasets we demonstrate the effectiveness of the algorithm and compare it to one of the most successful feature-based SLAM algorithms, FAB-MAP. The perceptual change in the datasets is extreme; repeated traverses through environments during the day and then in the middle of the night, at times separated by months or years and in opposite seasons, and in clear weather and extremely heavy rain. While the feature-based method fails, the sequence-based algorithm is able to match trajectory segments at 100% precision with recall rates of up to 60%.

686 citations

Proceedings ArticleDOI
01 Nov 2018
TL;DR: A temporal-aware pipeline to automatically detect deepfake videos is proposed that uses a convolutional neural network to extract frame-level features and a recurrent neural network that learns to classify if a video has been subject to manipulation or not.
Abstract: In recent months a machine learning based free software tool has made it easy to create believable face swaps in videos that leaves few traces of manipulation, in what are known as "deepfake" videos. Scenarios where these realistic fake videos are used to create political distress, blackmail someone or fake terrorism events are easily envisioned. This paper proposes a temporal-aware pipeline to automatically detect deepfake videos. Our system uses a convolutional neural network (CNN) to extract frame-level features. These features are then used to train a recurrent neural network (RNN) that learns to classify if a video has been subject to manipulation or not. We evaluate our method against a large set of deepfake videos collected from multiple video websites. We show how our system can achieve competitive results in this task while using a simple architecture.

645 citations