scispace - formally typeset
Search or ask a question
Author

Jian Sun

Bio: Jian Sun is an academic researcher from Xi'an Jiaotong University. The author has contributed to research in topics: Object detection & Computer science. The author has an hindex of 109, co-authored 360 publications receiving 239387 citations. Previous affiliations of Jian Sun include French Institute for Research in Computer Science and Automation & Tsinghua University.


Papers
More filters
Patent
Jian Sun1, Kaiming He1, Xiaoou Tang1
01 Feb 2010
TL;DR: In this article, techniques and technologies for de-hazing hazy images are described, and some of the disclosed methods include removing the effects of the haze from a hazy image and outputting the recovered, dehazed image.
Abstract: Techniques and technologies for de-hazing hazy images are described. Some techniques provide for determining the effects of the haze and removing the same from an image to recover a de-hazed image. Thus, the de-hazed image does not contain the effects of the haze. Some disclosed technologies allow for similar results. This document also discloses systems and methods for de-hazing images. Some of the disclosed de-hazing systems include an image capture device for capturing the hazy image and a processor for removing the effects of the haze from the hazy image. These systems store the recovered, de-hazed images in a memory and/or display the de-hazed images on a display. Some of the disclosed methods include removing the effects of the haze from a hazy image and outputting the recovered, de-hazed image.

99 citations

Journal ArticleDOI
01 Jul 2006
TL;DR: A new matting algorithm called joint Bayesian flash matting is applied to robustly recover the matte from flash/no-flash images, even for scenes in which the foreground and the background are similar or the background is complex.
Abstract: In this paper, we propose a novel approach to extract mattes using a pair of flash/no-flash images. Our approach, which we call flash matting, was inspired by the simple observation that the most noticeable difference between the flash and no-flash images is the foreground object if the background scene is sufficiently distant. We apply a new matting algorithm called joint Bayesian flash matting to robustly recover the matte from flash/no-flash images, even for scenes in which the foreground and the background are similar or the background is complex. Experimental results involving a variety of complex indoors and outdoors scenes show that it is easy to extract high-quality mattes using an off-the-shelf, flash-equipped camera. We also describe extensions to flash matting for handling more general scenes.

98 citations

Proceedings ArticleDOI
07 Dec 2015
TL;DR: This paper proposes a simple and effective approach by considering both keypoint and line segment correspondences as data-term, which not only helps guild to a correct warp in low-texture condition, but also prevents the undesired distortion induced by warping.
Abstract: To break down the geometry assumptions of traditional motion models (e.g., homography, affine), warping-based motion model recently becomes very popular and is adopted in many latest applications (e.g., image stitching, video stabilization). With high degrees of freedom, the accuracy of model heavily relies on data-terms (keypoint correspondences). In some low-texture environments (e.g., indoor) where keypoint feature is insufficient or unreliable, the warping model is often erroneously estimated. In this paper we propose a simple and effective approach by considering both keypoint and line segment correspondences as data-term. Line segment is a prominent feature in artificial environments and it can supply sufficient geometrical and structural information of scenes, which not only helps guild to a correct warp in low-texture condition, but also prevents the undesired distortion induced by warping. The combination aims to complement each other and benefit for a wider range of scenes. Our method is general and can be ported to many existing applications. Experiments demonstrate that using dual-feature yields more robust and accurate result especially for those low-texture images.

97 citations

Journal ArticleDOI
TL;DR: This work presents a novel area-preservation mapping/flattening method using the optimal mass transport technique, based on the Monge-Brenier theory, which significantly reduces the complexity of the problem, and improves the efficiency, flexibility and scalability during visualization.
Abstract: We present a novel area-preservation mapping/flattening method using the optimal mass transport technique, based on the Monge-Brenier theory. Our optimal transport map approach is rigorous and solid in theory, efficient and parallel in computation, yet general for various applications. By comparison with the conventional Monge-Kantorovich approach, our method reduces the number of variables from O(n2) to O(n), and converts the optimal mass transport problem to a convex optimization problem, which can now be efficiently carried out by Newton's method. Furthermore, our framework includes the area weighting strategy that enables users to completely control and adjust the size of areas everywhere in an accurate and quantitative way. Our method significantly reduces the complexity of the problem, and improves the efficiency, flexibility and scalability during visualization. Our framework, by combining conformal mapping and optimal mass transport mapping, serves as a powerful tool for a broad range of applications in visualization and graphics, especially for medical imaging. We provide a variety of experimental results to demonstrate the efficiency, robustness and efficacy of our novel framework.

96 citations

Posted Content
TL;DR: It is shown by experiments that despite the effective ResNets and Faster R-CNN systems, the design of NoCs is an essential element for the 1st-place winning entries in ImageNet and MS COCO challenges 2015.
Abstract: Most object detectors contain two important components: a feature extractor and an object classifier. The feature extractor has rapidly evolved with significant research efforts leading to better deep convolutional architectures. The object classifier, however, has not received much attention and many recent systems (like SPPnet and Fast/Faster R-CNN) use simple multi-layer perceptrons. This paper demonstrates that carefully designing deep networks for object classification is just as important. We experiment with region-wise classifier networks that use shared, region-independent convolutional features. We call them "Networks on Convolutional feature maps" (NoCs). We discover that aside from deep feature maps, a deep and convolutional per-region classifier is of particular importance for object detection, whereas latest superior image classification models (such as ResNets and GoogLeNets) do not directly lead to good detection accuracy without using such a per-region classifier. We show by experiments that despite the effective ResNets and Faster R-CNN systems, the design of NoCs is an essential element for the 1st-place winning entries in ImageNet and MS COCO challenges 2015.

92 citations


Cited by
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Book ChapterDOI
05 Oct 2015
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Abstract: There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .

49,590 citations

Posted Content
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

44,703 citations