scispace - formally typeset
Search or ask a question
Author

Houqiang Li

Bio: Houqiang Li is an academic researcher from University of Science and Technology of China. The author has contributed to research in topics: Computer science & Motion compensation. The author has an hindex of 57, co-authored 520 publications receiving 12325 citations. Previous affiliations of Houqiang Li include China University of Science and Technology & Nanjing Medical University.


Papers
More filters
Book ChapterDOI
Matej Kristan1, Ales Leonardis2, Jiří Matas3, Michael Felsberg4  +155 moreInstitutions (47)
23 Jan 2019
TL;DR: The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative; results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years.
Abstract: The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new long-term tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).

639 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: Liu et al. as discussed by the authors presented a unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visualsemantic embedding.
Abstract: Automatically describing video content with natural language is a fundamental challenge of computer vision. Re-current Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct but the semantics (e.g., subjects, verbs or objects) are not true. This paper presents a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual-semantic embedding. The former aims to locally maximize the probability of generating the next word given previous words and visual content, while the latter is to create a visual-semantic embedding space for enforcing the relationship between the semantics of the entire sentence and visual content. The experiments on YouTube2Text dataset show that our proposed LSTM-E achieves to-date the best published performance in generating natural sentences: 45.3% and 31.0% in terms of BLEU@4 and METEOR, respectively. Superior performances are also reported on two movie description datasets (M-VAD and MPII-MD). In addition, we demonstrate that LSTM-E outperforms several state-of-the-art techniques in predicting Subject-Verb-Object (SVO) triplets.

563 citations

Proceedings ArticleDOI
Matej Kristan1, Ales Leonardis2, Jiri Matas3, Michael Felsberg4, Roman Pflugfelder5, Luka Čehovin Zajc1, Tomas Vojir3, Gustav Häger4, Alan Lukezic1, Abdelrahman Eldesokey4, Gustavo Fernandez5, Alvaro Garcia-Martin6, Andrej Muhič1, Alfredo Petrosino7, Alireza Memarmoghadam8, Andrea Vedaldi9, Antoine Manzanera10, Antoine Tran10, A. Aydin Alatan11, Bogdan Mocanu, Boyu Chen12, Chang Huang, Changsheng Xu13, Chong Sun12, Dalong Du, David Zhang, Dawei Du13, Deepak Mishra, Erhan Gundogdu14, Erhan Gundogdu11, Erik Velasco-Salido, Fahad Shahbaz Khan4, Francesco Battistone, Gorthi R. K. Sai Subrahmanyam, Goutam Bhat4, Guan Huang, Guilherme Sousa Bastos, Guna Seetharaman15, Hongliang Zhang16, Houqiang Li17, Huchuan Lu12, Isabela Drummond, Jack Valmadre9, Jae-chan Jeong18, Jaeil Cho18, Jae-Yeong Lee18, Jana Noskova, Jianke Zhu19, Jin Gao13, Jingyu Liu13, Ji-Wan Kim18, João F. Henriques9, José M. Martínez, Junfei Zhuang20, Junliang Xing13, Junyu Gao13, Kai Chen21, Kannappan Palaniappan22, Karel Lebeda, Ke Gao22, Kris M. Kitani23, Lei Zhang, Lijun Wang12, Lingxiao Yang, Longyin Wen24, Luca Bertinetto9, Mahdieh Poostchi22, Martin Danelljan4, Matthias Mueller25, Mengdan Zhang13, Ming-Hsuan Yang26, Nianhao Xie16, Ning Wang17, Ondrej Miksik9, Payman Moallem8, Pallavi Venugopal M, Pedro Senna, Philip H. S. Torr9, Qiang Wang13, Qifeng Yu16, Qingming Huang13, Rafael Martin-Nieto, Richard Bowden27, Risheng Liu12, Ruxandra Tapu, Simon Hadfield27, Siwei Lyu28, Stuart Golodetz9, Sunglok Choi18, Tianzhu Zhang13, Titus Zaharia, Vincenzo Santopietro, Wei Zou13, Weiming Hu13, Wenbing Tao21, Wenbo Li28, Wengang Zhou17, Xianguo Yu16, Xiao Bian24, Yang Li19, Yifan Xing23, Yingruo Fan20, Zheng Zhu13, Zhipeng Zhang13, Zhiqun He20 
01 Jul 2017
TL;DR: The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by the VOT initiative; results of 51 trackers are presented; many are state-of-the-art published at major computer vision conferences or journals in recent years.
Abstract: The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by the VOT initiative. Results of 51 trackers are presented; many are state-of-the-art published at major computer vision conferences or journals in recent years. The evaluation included the standard VOT and other popular methodologies and a new "real-time" experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The VOT2017 goes beyond its predecessors by (i) improving the VOT public dataset and introducing a separate VOT2017 sequestered dataset, (ii) introducing a realtime tracking experiment and (iii) releasing a redesigned toolkit that supports complex experiments. The dataset, the evaluation kit and the results are publicly available at the challenge website1.

485 citations

Posted Content
TL;DR: A novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual- semantic embedding and outperforms several state-of-the-art techniques in predicting Subject-Verb-Object (SVO) triplets.
Abstract: Automatically describing video content with natural language is a fundamental challenge of multimedia. Recurrent Neural Networks (RNN), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct but the semantics (e.g., subjects, verbs or objects) are not true. This paper presents a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual-semantic embedding. The former aims to locally maximize the probability of generating the next word given previous words and visual content, while the latter is to create a visual-semantic embedding space for enforcing the relationship between the semantics of the entire sentence and visual content. Our proposed LSTM-E consists of three components: a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep RNN for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics. The experiments on YouTube2Text dataset show that our proposed LSTM-E achieves to-date the best reported performance in generating natural sentences: 45.3% and 31.0% in terms of BLEU@4 and METEOR, respectively. We also demonstrate that LSTM-E is superior in predicting Subject-Verb-Object (SVO) triplets to several state-of-the-art techniques.

419 citations

Journal ArticleDOI
TL;DR: A novel marker-controlled watershed based on mathematical morphology is proposed, which can effectively segment clustered cells with less oversegmentation and design a tracking method based on modified mean shift algorithm, in which several kernels with adaptive scale, shape, and direction are designed.
Abstract: It is important to observe and study cancer cells' cycle progression in order to better understand drug effects on cancer cells. Time-lapse microscopy imaging serves as an important method to measure the cycle progression of individual cells in a large population. Since manual analysis is unreasonably time consuming for the large volumes of time-lapse image data, automated image analysis is proposed. Existing approaches dealing with time-lapse image data are rather limited and often give inaccurate analysis results, especially in segmenting and tracking individual cells in a cell population. In this paper, we present a new approach to segment and track cell nuclei in time-lapse fluorescence image sequence. First, we propose a novel marker-controlled watershed based on mathematical morphology, which can effectively segment clustered cells with less oversegmentation. To further segment undersegmented cells or to merge oversegmented cells, context information among neighboring frames is employed, which is proved to be an effective strategy. Then, we design a tracking method based on modified mean shift algorithm, in which several kernels with adaptive scale, shape, and direction are designed. Finally, we combine mean-shift and Kalman filter to achieve a more robust cell nuclei tracking method than existing ones. Experimental results show that our method can obtain 98.8% segmentation accuracy, 97.4% cell division tracking accuracy, and 97.6% cell tracking accuracy

391 citations


Cited by
More filters
Proceedings ArticleDOI
07 Dec 2015
TL;DR: A minor contribution, inspired by recent advances in large-scale image search, an unsupervised Bag-of-Words descriptor is proposed that yields competitive accuracy on VIPeR, CUHK03, and Market-1501 datasets, and is scalable on the large- scale 500k dataset.
Abstract: This paper contributes a new high quality dataset for person re-identification, named "Market-1501". Generally, current datasets: 1) are limited in scale, 2) consist of hand-drawn bboxes, which are unavailable under realistic settings, 3) have only one ground truth and one query image for each identity (close environment). To tackle these problems, the proposed Market-1501 dataset is featured in three aspects. First, it contains over 32,000 annotated bboxes, plus a distractor set of over 500K images, making it the largest person re-id dataset to date. Second, images in Market-1501 dataset are produced using the Deformable Part Model (DPM) as pedestrian detector. Third, our dataset is collected in an open system, where each identity has multiple images under each camera. As a minor contribution, inspired by recent advances in large-scale image search, this paper proposes an unsupervised Bag-of-Words descriptor. We view person re-identification as a special task of image search. In experiment, we show that the proposed descriptor yields competitive accuracy on VIPeR, CUHK03, and Market-1501 datasets, and is scalable on the large-scale 500k dataset.

3,564 citations

Book ChapterDOI
01 Jan 2011
TL;DR: Weakconvergence methods in metric spaces were studied in this article, with applications sufficient to show their power and utility, and the results of the first three chapters are used in Chapter 4 to derive a variety of limit theorems for dependent sequences of random variables.
Abstract: The author's preface gives an outline: "This book is about weakconvergence methods in metric spaces, with applications sufficient to show their power and utility. The Introduction motivates the definitions and indicates how the theory will yield solutions to problems arising outside it. Chapter 1 sets out the basic general theorems, which are then specialized in Chapter 2 to the space C[0, l ] of continuous functions on the unit interval and in Chapter 3 to the space D [0, 1 ] of functions with discontinuities of the first kind. The results of the first three chapters are used in Chapter 4 to derive a variety of limit theorems for dependent sequences of random variables. " The book develops and expands on Donsker's 1951 and 1952 papers on the invariance principle and empirical distributions. The basic random variables remain real-valued although, of course, measures on C[0, l ] and D[0, l ] are vitally used. Within this framework, there are various possibilities for a different and apparently better treatment of the material. More of the general theory of weak convergence of probabilities on separable metric spaces would be useful. Metrizability of the convergence is not brought up until late in the Appendix. The close relation of the Prokhorov metric and a metric for convergence in probability is (hence) not mentioned (see V. Strassen, Ann. Math. Statist. 36 (1965), 423-439; the reviewer, ibid. 39 (1968), 1563-1572). This relation would illuminate and organize such results as Theorems 4.1, 4.2 and 4.4 which give isolated, ad hoc connections between weak convergence of measures and nearness in probability. In the middle of p. 16, it should be noted that C*(S) consists of signed measures which need only be finitely additive if 5 is not compact. On p. 239, where the author twice speaks of separable subsets having nonmeasurable cardinal, he means "discrete" rather than "separable." Theorem 1.4 is Ulam's theorem that a Borel probability on a complete separable metric space is tight. Theorem 1 of Appendix 3 weakens completeness to topological completeness. After mentioning that probabilities on the rationals are tight, the author says it is an

3,554 citations

Journal ArticleDOI
TL;DR: A broad survey of the recent advances in convolutional neural networks can be found in this article, where the authors discuss the improvements of CNN on different aspects, namely, layer design, activation function, loss function, regularization, optimization and fast computation.

3,125 citations

01 Jan 2006

3,012 citations