scispace - formally typeset
Search or ask a question
Author

Jian Sun

Bio: Jian Sun is an academic researcher from Xi'an Jiaotong University. The author has contributed to research in topics: Object detection & Computer science. The author has an hindex of 109, co-authored 360 publications receiving 239387 citations. Previous affiliations of Jian Sun include French Institute for Research in Computer Science and Automation & Tsinghua University.


Papers
More filters
Proceedings ArticleDOI
08 Jan 2022
TL;DR: This paper proposes a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences, and generates fightback soft labels for regularization during training.
Abstract: Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes. It causes severe biases of the head classes (with majority samples) against the tailed ones. This renders “how to appropriately define and alleviate the bias” one of the most important issues. Prior works mainly use label distribution or mean score information to indicate a coarse-grained bias. In this paper, we explore to excavate the confusion matrix, which carries the fine-grained misclassification details, to relieve the pairwise biases, generalizing the coarse one. To this end, we propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences. PCB generates fightback soft labels for regularization during training. Besides, an iterative learning paradigm is developed to support a progressive and smooth regularization in such debiasing. PCB can be plugged and played to any existing method as a complement. Experimental results on LVIS demonstrate that our method achieves state-of-the-art performance without bells and whistles. Superior results across various architectures show the generalization ability. The code and trained models are available at https://github.com/megvii-research/PCB.

3 citations

Posted Content
TL;DR: In this article, a canonical view representation is proposed to align the arbitrary view features to a set of learnable reference view features using optimal transport, which is further aggregated to generate a rich and robust 3D shape representation for shape recognition.
Abstract: In this paper, we focus on recognizing 3D shapes from arbitrary views, i.e., arbitrary numbers and positions of viewpoints. It is a challenging and realistic setting for view-based 3D shape recognition. We propose a canonical view representation to tackle this challenge. We first transform the original features of arbitrary views to a fixed number of view features, dubbed canonical view representation, by aligning the arbitrary view features to a set of learnable reference view features using optimal transport. In this way, each 3D shape with arbitrary views is represented by a fixed number of canonical view features, which are further aggregated to generate a rich and robust 3D shape representation for shape recognition. We also propose a canonical view feature separation constraint to enforce that the view features in canonical view representation can be embedded into scattered points in a Euclidean space. Experiments on the ModelNet40, ScanObjectNN, and RGBD datasets show that our method achieves competitive results under the fixed viewpoint settings, and significantly outperforms the applicable methods under the arbitrary view setting.

3 citations

Posted Content
TL;DR: In this article, an implicit feature pyramid network (i-FPN) is proposed to use an implicit function, recently introduced in deep equilibrium model (DEQ), to model the transformation of FPN.
Abstract: In this paper, we present an implicit feature pyramid network (i-FPN) for object detection. Existing FPNs stack several cross-scale blocks to obtain large receptive field. We propose to use an implicit function, recently introduced in deep equilibrium model (DEQ), to model the transformation of FPN. We develop a residual-like iteration to updates the hidden states efficiently. Experimental results on MS COCO dataset show that i-FPN can significantly boost detection performance compared to baseline detectors with ResNet-50-FPN: +3.4, +3.2, +3.5, +4.2, +3.2 mAP on RetinaNet, Faster-RCNN, FCOS, ATSS and AutoAssign, respectively.

3 citations

Posted Content
05 Jun 2015
TL;DR: In this article, the spectral structure of the Laplacian-beltrami operator (LBO) on Riemannian manifolds is estimated from some discrete Laplace operator constructed from this set of sample points.
Abstract: The spectral structure of the Laplacian-Beltrami operator (LBO) on manifolds has been widely used in many applications, include spectral clustering, dimensionality reduction, mesh smoothing, compression and editing, shape segmentation, matching and parameterization, and so on. Typically, the underlying Riemannian manifold is unknown and often given by a set of sample points. The spectral structure of the LBO is estimated from some discrete Laplace operator constructed from this set of sample points. In our previous papers, we proposed the point integral method to discretize the LBO from point clouds, which is also capable to solve the eigenproblem. Then one fundmental issue is the convergence of the eigensystem of the discrete Laplacian to that of the LBO. In this paper, for compact manifolds isometrically embedded in Euclidean spaces possibly with boundary, we show that the eigenvalues and the eigenvectors obtained by the point integral method converges to the eigenvalues and the eigenfunctions of the LBO with the Neumann boundary, and in addition, we give an estimate of the convergence rate. This result provides a solid mathematical foundation for the point integral method in the computation of Laplacian spectra from point clouds.

2 citations

Journal ArticleDOI
TL;DR: A fast inter-robot loop closure selection method that integrates the consistency and topology relationship of inter- robot measurements, which both conform to the continuity characteristics of similar scenes and spatiotemporal consistency is proposed.
Abstract: This paper presents a robust method based on graph topology to find the topologically correct and consistent subset of inter-robot relative pose measurements for multi-robot map fusion. However, the absence of good prior on relative pose gives a severe challenge to distinguish the inliers and outliers, and once the wrong inter-robot loop closures are used to optimize the pose graph, which can seriously corrupt the fused global map. Existing works mainly rely on the consistency of spatial dimension to select inter-robot measurements, while it does not always hold. In this paper, we propose a fast inter-robot loop closure selection method that integrates the consistency and topology relationship of inter-robot measurements, which both conform to the continuity characteristics of similar scenes and spatiotemporal consistency. Firstly, a clustering method integrating topology correctness of inter-robot loop closures is proposed to split the entire measurement set into multiple clusters. Then, our method decomposes the traditional high-dimensional consistency matrix into the sub-matrix blocks corresponding to the overlapping trajectory regions. Finally, we define the weight function to find the topologically correct and consistent subset with the maximum cardinality, then convert the weight function to the maximum clique problem from graph theory and solve it. We evaluate the performance of our method in a simulation and in a real-world experiment. Compared to state-of-the-art methods, the results show that our method can achieve competitive performance in accuracy while reducing computation time by 75%.

2 citations


Cited by
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Book ChapterDOI
05 Oct 2015
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Abstract: There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .

49,590 citations

Posted Content
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

44,703 citations