scispace - formally typeset
Search or ask a question
Author

Wu Liu

Bio: Wu Liu is an academic researcher from Beijing University of Posts and Telecommunications. The author has contributed to research in topics: Computer science & Deep learning. The author has an hindex of 27, co-authored 129 publications receiving 2845 citations. Previous affiliations of Wu Liu include Chinese Academy of Sciences & Deakin University.


Papers
More filters
Book ChapterDOI
08 Oct 2016
TL;DR: This paper proposes a novel deep learning-based approach to PROgressive Vehicle re-ID, called “PROVID”, which treats vehicle Re-Id as two specific progressive search processes: coarse-to-fine search in the feature space, and near- to-distantsearch in the real world surveillance environment.
Abstract: While re-identification (Re-Id) of persons has attracted intensive attention, vehicle, which is a significant object class in urban video surveillance, is often overlooked by vision community. Most existing methods for vehicle Re-Id only achieve limited performance, as they predominantly focus on the generic appearance of vehicle while neglecting some unique identities of vehicle (e.g., license plate). In this paper, we propose a novel deep learning-based approach to PROgressive Vehicle re-ID, called “PROVID”. Our approach treats vehicle Re-Id as two specific progressive search processes: coarse-to-fine search in the feature space, and near-to-distant search in the real world surveillance environment. The first search process employs the appearance attributes of vehicle for a coarse filtering, and then exploits the Siamese Neural Network for license plate verification to accurately identify vehicles. The near-to-distant search process retrieves vehicles in a manner like human beings, by searching from near to faraway cameras and from close to distant time. Moreover, to facilitate progressive vehicle Re-Id research, we collect to-date the largest dataset named VeRi-776 from large-scale urban surveillance videos, which contains not only massive vehicles with diverse attributes and high recurrence rate, but also sufficient license plates and spatiotemporal labels. A comprehensive evaluation on the VeRi-776 shows that our approach outperforms the state-of-the-art methods by 9.28 % improvements in term of mAP.

450 citations

Proceedings ArticleDOI
11 Jul 2016
TL;DR: A large-scale benchmark dataset for vehicle Re-Id in the real-world urban surveillance scenario, named “VeRi”, which contains over 40,000 bounding boxes of 619 vehicles captured by 20 cameras in unconstrained traffic scene and proposes a baseline which combines the color, texture, and highlevel semantic information extracted by deep neural network.
Abstract: Vehicle, as a significant object class in urban surveillance, attracts massive focuses in computer vision field, such as detection, tracking, and classification. Among them, vehicle re-identification (Re-Id) is an important yet frontier topic, which not only faces the challenges of enormous intra-class and subtle inter-class differences of vehicles in multicameras, but also suffers from the complicated environments in urban surveillance scenarios. Besides, the existing vehicle related datasets all neglect the requirements of vehicle Re-Id: 1) massive vehicles captured in real-world traffic environment; and 2) applicable recurrence rate to give cross-camera vehicle search for vehicle Re-Id. To facilitate vehicle Re-Id research, we propose a large-scale benchmark dataset for vehicle Re-Id in the real-world urban surveillance scenario, named “VeRi”. It contains over 40,000 bounding boxes of 619 vehicles captured by 20 cameras in unconstrained traffic scene. Moreover, each vehicle is captured by 2∼18 cameras in different viewpoints, illuminations, and resolutions to provide high recurrence rate for vehicle Re-Id. Finally, we evaluate six competitive vehicle Re-Id methods on VeRi and propose a baseline which combines the color, texture, and highlevel semantic information extracted by deep neural network.

397 citations

Journal ArticleDOI
TL;DR: This paper proposes PROVID, a PROgressive Vehicle re-IDentification framework based on deep neural networks, which not only utilizes the multimodality data in large-scale video surveillance, such as visual features, license plates, camera locations, and contextual information, but also considers vehicle reidentification in two progressive procedures: coarse- to-fine search in the feature domain, and near-to-distantsearch in the physical space.
Abstract: Compared with person reidentification, which has attracted concentrated attention, vehicle reidentification is an important yet frontier problem in video surveillance and has been neglected by the multimedia and vision communities. Since most existing approaches mainly consider the general vehicle appearance for reidentification while overlooking the distinct vehicle identifier, such as the license plate number, they attain suboptimal performance. In this paper, we propose PROVID, a PROgressive Vehicle re-IDentification framework based on deep neural networks. In particular, our framework not only utilizes the multimodality data in large-scale video surveillance, such as visual features, license plates, camera locations, and contextual information, but also considers vehicle reidentification in two progressive procedures: coarse-to-fine search in the feature domain, and near-to-distant search in the physical space. Furthermore, to evaluate our progressive search framework and facilitate related research, we construct the VeRi dataset, which is the most comprehensive dataset from real-world surveillance videos. It not only provides large numbers of vehicles with varied labels and sufficient cross-camera recurrences but also contains license plate numbers and contextual information. Extensive experiments on the VeRi dataset demonstrate both the accuracy and efficiency of our progressive vehicle reidentification framework.

339 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: A multi-task deep visual-semantic embedding model is developed that can automatically select query-dependent video thumbnails according to both visual and side information and is demonstrated to be effective on 1,000 query-thumbnail dataset labeled by 191 workers in Amazon Mechanical Turk.
Abstract: Given the tremendous growth of online videos, video thumbnail, as the common visualization form of video content, is becoming increasingly important to influence user's browsing and searching experience. However, conventional methods for video thumbnail selection often fail to produce satisfying results as they ignore the side semantic information (e.g., title, description, and query) associated with the video. As a result, the selected thumbnail cannot always represent video semantics and the click-through rate is adversely affected even when the retrieved videos are relevant. In this paper, we have developed a multi-task deep visual-semantic embedding model, which can automatically select query-dependent video thumbnails according to both visual and side information. Different from most existing methods, the proposed approach employs the deep visual-semantic embedding model to directly compute the similarity between the query and video thumbnails by mapping them into a common latent semantic space, where even unseen query-thumbnail pairs can be correctly matched. In particular, we train the embedding model by exploring the large-scale and freely accessible click-through video and image data, as well as employing a multi-task learning strategy to holistically exploit the query-thumbnail relevance from these two highly related datasets. Finally, a thumbnail is selected by fusing both the representative and query relevance scores. The evaluations on 1,000 query-thumbnail dataset labeled by 191 workers in Amazon Mechanical Turk have demonstrated the effectiveness of our proposed method.

251 citations

Proceedings ArticleDOI
20 Mar 2016
TL;DR: A Siamese neural network based gait recognition framework to automatically extract robust and discriminative gait features for human identification that impressively outperforms state-of-the-arts models.
Abstract: As the remarkable characteristics of remote accessed, robust and security, gait recognition has gained significant attention in the biometrics based human identification task. However, the existed methods mainly employ the handcrafted gait features, which cannot well handle the indistinctive inter-class differences and large intra-class variations of human gait in real-world situation. In this paper, we have developed a Siamese neural network based gait recognition framework to automatically extract robust and discriminative gait features for human identification. Different from conventional deep neural network, the Siamese network can employ distance metric learning to drive the similarity metric to be small for pairs of gait from the same person, and large for pairs from different persons. In particular, to further learn effective model with limited training data, we composite the gait energy images instead of raw sequence of gaits. Consequently, the experiments on the world's largest gait database show our framework impressively outperforms state-of-the-arts.

156 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.
Abstract: Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the field of generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. More than 300 research contributions are included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics. We finish the survey by identifying promising directions for future research.

1,897 citations

Journal ArticleDOI
TL;DR: A comprehensive review of recent Kinect-based computer vision algorithms and applications covering topics including preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping.
Abstract: With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information provided by the Kinect sensor opens up new opportunities to solve fundamental problems in computer vision. This paper presents a comprehensive review of recent Kinect-based computer vision algorithms and applications. The reviewed approaches are classified according to the type of vision problems that can be addressed or enhanced by means of the Kinect sensor. The covered topics include preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping. For each category of methods, we outline their main algorithmic contributions and summarize their advantages/differences compared to their RGB counterparts. Finally, we give an overview of the challenges in this field and future research trends. This paper is expected to serve as a tutorial and source of references for Kinect-based computer vision researchers.

1,513 citations

Posted Content
TL;DR: Multi-task learning (MTL) as mentioned in this paper is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks.
Abstract: Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to leverage useful information contained in multiple related tasks to help improve the generalization performance of all the tasks. In this paper, we give a survey for MTL from the perspective of algorithmic modeling, applications and theoretical analyses. For algorithmic modeling, we give a definition of MTL and then classify different MTL algorithms into five categories, including feature learning approach, low-rank approach, task clustering approach, task relation learning approach and decomposition approach as well as discussing the characteristics of each approach. In order to improve the performance of learning tasks further, MTL can be combined with other learning paradigms including semi-supervised learning, active learning, unsupervised learning, reinforcement learning, multi-view learning and graphical models. When the number of tasks is large or the data dimensionality is high, we review online, parallel and distributed MTL models as well as dimensionality reduction and feature hashing to reveal their computational and storage advantages. Many real-world applications use MTL to boost their performance and we review representative works in this paper. Finally, we present theoretical analyses and discuss several future directions for MTL.

1,202 citations

Journal ArticleDOI
TL;DR: A comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications can be found in this paper.
Abstract: In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

1,027 citations