scispace - formally typeset
Search or ask a question
Author

Qingming Huang

Bio: Qingming Huang is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Computer science & Feature (computer vision). The author has an hindex of 62, co-authored 603 publications receiving 16684 citations. Previous affiliations of Qingming Huang include Association for Computing Machinery & Intel.


Papers
More filters
Proceedings ArticleDOI
19 Apr 2019
TL;DR: CenterNet as discussed by the authors detects each object as a triplet, rather than a pair, of keypoints, which improves both precision and recall by enriching information collected by both the top-left and bottom-right corners and providing more recognizable information from the central regions.
Abstract: In object detection, keypoint-based approaches often experience the drawback of a large number of incorrect object bounding boxes, arguably due to the lack of an additional assessment inside cropped regions. This paper presents an efficient solution that explores the visual patterns within individual cropped regions with minimal costs. We build our framework upon a representative one-stage keypoint-based detector named CornerNet. Our approach, named CenterNet, detects each object as a triplet, rather than a pair, of keypoints, which improves both precision and recall. Accordingly, we design two customized modules, cascade corner pooling, and center pooling, that enrich information collected by both the top-left and bottom-right corners and provide more recognizable information from the central regions. On the MS-COCO dataset, CenterNet achieves an AP of 47.0 %, outperforming all existing one-stage detectors by at least 4.9%. Furthermore, with a faster inference speed than the top-ranked two-stage detectors, CenterNet demonstrates a comparable performance to these detectors. Code is available at https://github.com/Duankaiwen/CenterNet.

1,199 citations

Posted Content
TL;DR: This paper presents an efficient solution that explores the visual patterns within individual cropped regions with minimal costs, and builds the framework upon a representative one-stage keypoint-based detector named CornerNet, which improves both precision and recall.
Abstract: In object detection, keypoint-based approaches often suffer a large number of incorrect object bounding boxes, arguably due to the lack of an additional look into the cropped regions. This paper presents an efficient solution which explores the visual patterns within each cropped region with minimal costs. We build our framework upon a representative one-stage keypoint-based detector named CornerNet. Our approach, named CenterNet, detects each object as a triplet, rather than a pair, of keypoints, which improves both precision and recall. Accordingly, we design two customized modules named cascade corner pooling and center pooling, which play the roles of enriching information collected by both top-left and bottom-right corners and providing more recognizable information at the central regions, respectively. On the MS-COCO dataset, CenterNet achieves an AP of 47.0%, which outperforms all existing one-stage detectors by at least 4.9%. Meanwhile, with a faster inference speed, CenterNet demonstrates quite comparable performance to the top-ranked two-stage detectors. Code is available at this https URL.

1,136 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: A novel Cascaded Partial Decoder (CPD) framework for fast and accurate salient object detection and applies the proposed framework to optimize existing multi-level feature aggregation models and significantly improve their efficiency and accuracy.
Abstract: Existing state-of-the-art salient object detection networks rely on aggregating multi-level features of pre-trained convolutional neural networks (CNNs). However, compared to high-level features, low-level features contribute less to performance. Meanwhile, they raise more computational cost because of their larger spatial resolutions. In this paper, we propose a novel Cascaded Partial Decoder (CPD) framework for fast and accurate salient object detection. On the one hand, the framework constructs partial decoder which discards larger resolution features of shallow layers for acceleration. On the other hand, we observe that integrating features of deep layers will obtain relatively precise saliency map. Therefore we directly utilize generated saliency map to recurrently optimize features of deep layers. This strategy efficiently suppresses distractors in the features and significantly improves their representation ability. Experiments conducted on five benchmark datasets exhibit that the proposed model not only achieves state-of-the-art but also runs much faster than existing models. Besides, we apply the proposed framework to optimize existing multi-level feature aggregation models and significantly improve their efficiency and accuracy.

758 citations

Book ChapterDOI
Matej Kristan1, Ales Leonardis2, Jiří Matas3, Michael Felsberg4, Roman Pflugfelder5, Luka Cehovin1, Tomas Vojir3, Gustav Häger4, Alan Lukežič1, Gustavo Fernandez5, Abhinav Gupta6, Alfredo Petrosino7, Alireza Memarmoghadam8, Alvaro Garcia-Martin9, Andres Solis Montero10, Andrea Vedaldi11, Andreas Robinson4, Andy J. Ma12, Anton Varfolomieiev13, A. Aydin Alatan14, Aykut Erdem15, Bernard Ghanem16, Bin Liu, Bohyung Han17, Brais Martinez18, Chang-Ming Chang19, Changsheng Xu20, Chong Sun21, Daijin Kim17, Dapeng Chen22, Dawei Du20, Deepak Mishra23, Dit-Yan Yeung24, Erhan Gundogdu25, Erkut Erdem15, Fahad Shahbaz Khan4, Fatih Porikli26, Fatih Porikli27, Fei Zhao20, Filiz Bunyak28, Francesco Battistone7, Gao Zhu27, Giorgio Roffo29, Gorthi R. K. Sai Subrahmanyam23, Guilherme Sousa Bastos30, Guna Seetharaman31, Henry Medeiros32, Hongdong Li27, Honggang Qi20, Horst Bischof33, Horst Possegger33, Huchuan Lu21, Hyemin Lee17, Hyeonseob Nam34, Hyung Jin Chang35, Isabela Drummond30, Jack Valmadre11, Jae-chan Jeong36, Jaeil Cho36, Jae-Yeong Lee36, Jianke Zhu37, Jiayi Feng20, Jin Gao20, Jin-Young Choi, Jingjing Xiao2, Ji-Wan Kim36, Jiyeoup Jeong, João F. Henriques11, Jochen Lang10, Jongwon Choi, José M. Martínez9, Junliang Xing20, Junyu Gao20, Kannappan Palaniappan28, Karel Lebeda38, Ke Gao28, Krystian Mikolajczyk35, Lei Qin20, Lijun Wang21, Longyin Wen19, Luca Bertinetto11, Madan Kumar Rapuru23, Mahdieh Poostchi28, Mario Edoardo Maresca7, Martin Danelljan4, Matthias Mueller16, Mengdan Zhang20, Michael Arens, Michel Valstar18, Ming Tang20, Mooyeol Baek17, Muhammad Haris Khan18, Naiyan Wang24, Nana Fan39, Noor M. Al-Shakarji28, Ondrej Miksik11, Osman Akin15, Payman Moallem8, Pedro Senna30, Philip H. S. Torr11, Pong C. Yuen12, Qingming Huang20, Qingming Huang39, Rafael Martin-Nieto9, Rengarajan Pelapur28, Richard Bowden38, Robert Laganiere10, Rustam Stolkin2, Ryan Walsh32, Sebastian B. Krah, Shengkun Li19, Shengping Zhang39, Shizeng Yao28, Simon Hadfield38, Simone Melzi29, Siwei Lyu19, Siyi Li24, Stefan Becker, Stuart Golodetz11, Sumithra Kakanuru23, Sunglok Choi36, Tao Hu20, Thomas Mauthner33, Tianzhu Zhang20, Tony P. Pridmore18, Vincenzo Santopietro7, Weiming Hu20, Wenbo Li40, Wolfgang Hübner, Xiangyuan Lan12, Xiaomeng Wang18, Xin Li39, Yang Li37, Yiannis Demiris35, Yifan Wang21, Yuankai Qi39, Zejian Yuan22, Zexiong Cai12, Zhan Xu37, Zhenyu He39, Zhizhen Chi21 
08 Oct 2016
TL;DR: The Visual Object Tracking challenge VOT2016 goes beyond its predecessors by introducing a new semi-automatic ground truth bounding box annotation methodology and extending the evaluation system with the no-reset experiment.
Abstract: The Visual Object Tracking challenge VOT2016 aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 70 trackers are presented, with a large number of trackers being published at major computer vision conferences and journals in the recent years. The number of tested state-of-the-art trackers makes the VOT 2016 the largest and most challenging benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the Appendix. The VOT2016 goes beyond its predecessors by (i) introducing a new semi-automatic ground truth bounding box annotation methodology and (ii) extending the evaluation system with the no-reset experiment. The dataset, the evaluation kit as well as the results are publicly available at the challenge website (http://votchallenge.net).

744 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: A novel CNN based tracking framework is proposed, which takes full advantage of features from different CNN layers and uses an adaptive Hedge method to hedge several CNN based trackers into a single stronger one.
Abstract: In recent years, several methods have been developed to utilize hierarchical features learned from a deep convolutional neural network (CNN) for visual tracking. However, as features from a certain CNN layer characterize an object of interest from only one aspect or one level, the performance of such trackers trained with features from one layer (usually the second to last layer) can be further improved. In this paper, we propose a novel CNN based tracking framework, which takes full advantage of features from different CNN layers and uses an adaptive Hedge method to hedge several CNN based trackers into a single stronger one. Extensive experiments on a benchmark dataset of 100 challenging image sequences demonstrate the effectiveness of the proposed algorithm compared to several state-of-theart trackers.

736 citations


Cited by
More filters
Journal ArticleDOI
18 Jun 2018
TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.
Abstract: The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251 percent, surpassing the winning entry of 2016 by a relative improvement of ${\sim }$ ∼ 25 percent. Models and code are available at https://github.com/hujie-frank/SENet .

14,807 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper exploits the capability of global context information by different-region-based context aggregation through the pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet) to produce good quality results on the scene parsing task.
Abstract: Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

10,189 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
06 Jun 1986-JAMA
TL;DR: The editors have done a masterful job of weaving together the biologic, the behavioral, and the clinical sciences into a single tapestry in which everyone from the molecular biologist to the practicing psychiatrist can find and appreciate his or her own research.
Abstract: I have developed "tennis elbow" from lugging this book around the past four weeks, but it is worth the pain, the effort, and the aspirin. It is also worth the (relatively speaking) bargain price. Including appendixes, this book contains 894 pages of text. The entire panorama of the neural sciences is surveyed and examined, and it is comprehensive in its scope, from genomes to social behaviors. The editors explicitly state that the book is designed as "an introductory text for students of biology, behavior, and medicine," but it is hard to imagine any audience, interested in any fragment of neuroscience at any level of sophistication, that would not enjoy this book. The editors have done a masterful job of weaving together the biologic, the behavioral, and the clinical sciences into a single tapestry in which everyone from the molecular biologist to the practicing psychiatrist can find and appreciate his or

7,563 citations

Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations