scispace - formally typeset
Search or ask a question
Author

Hefeng Wu

Other affiliations: Sun Yat-sen University
Bio: Hefeng Wu is an academic researcher from Guangdong University of Foreign Studies. The author has contributed to research in topics: Sketch & Feature learning. The author has an hindex of 7, co-authored 18 publications receiving 141 citations. Previous affiliations of Hefeng Wu include Sun Yat-sen University.

Papers
More filters
Journal ArticleDOI
Tianshui Chen1, Liang Lin1, Xiaolu Hui1, Riquan Chen1, Hefeng Wu1 
TL;DR: Zhang et al. as discussed by the authors proposed a knowledge-guided graph routing (KGGR) framework, which unifies prior knowledge of statistical label correlations with deep neural networks to facilitate multi-label analysis and reduce the dependency of training samples.
Abstract: Recognizing multiple labels of an image is a practical yet challenging task, and remarkable progress has been achieved by searching for semantic regions and exploiting label dependencies. However, current works utilize RNN/LSTM to implicitly capture sequential region/label dependencies, which cannot fully explore mutual interactions among the semantic regions/labels and do not explicitly integrate label co-occurrences. In addition, these works require large amounts of training samples and have limited generalization ability to new categories. To address these issues, we propose a knowledge-guided graph routing (KGGR) framework, which unifies prior knowledge of statistical label correlations with deep neural networks. The framework exploits prior knowledge to guide adaptive information propagation among different categories to facilitate multi-label analysis and reduce the dependency of training samples. Specifically, it first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence. Then, it introduces the label semantics to guide learning semantic-specific features to initialize the graph, and it exploits a graph propagation network to explore graph node interactions, enabling learning contextualized image feature representations. We conduct extensive experiments on the traditional multi-label image recognition (MLR) and multi-label few-shot learning (ML-FSL) tasks and show that our KGGR framework outperforms the current state-of-the-art methods.

46 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel embedding method termed focus ranking that can be easily unified into a CNN for jointly learning image representations and metrics in the context of fine-grained fabric image retrieval and shows the superiority of the proposed model over existing metric embedding models.

30 citations

Journal ArticleDOI
TL;DR: A deep learning based MPT approach that learns instance-aware representations of tracked persons and robustly online infers states of the tracked persons is presented, demonstrating its excellent performance in comparison with recent online MPT methods.

29 citations

Journal ArticleDOI
TL;DR: A scale-communicative aggregation network (SCANet) for crowd counting that contains different streams of convolutional neural networks, and a multi-scale structural similarity metric along with Euclidean distance is introduced for optimizing the quality of generated density maps.

22 citations

Journal ArticleDOI
TL;DR: The Multi-column Point-CNN (MCPNet), which (1) directly takes sampled points as its input to reduce computational costs, and (2) adopts multiple columns with different filter sizes to better capture the structures of sketches.

16 citations


Cited by
More filters
Proceedings ArticleDOI
15 Jun 2019
TL;DR: An attention-injective deformable convolutional network for crowd understanding that can address the accuracy degradation problem of highly congested noisy scenes and achieves the capability of being more effective to capture the crowd features and more resistant to various noises.
Abstract: We propose an attention-injective deformable convolutional network called ADCrowdNet for crowd understanding that can address the accuracy degradation problem of highly congested noisy scenes. ADCrowdNet contains two concatenated networks. An attention-aware network called Attention Map Generator (AMG) first detects crowd regions in images and computes the congestion degree of these regions. Based on detected crowd regions and congestion priors, a multi-scale deformable network called Density Map Estimator (DME) then generates high-quality density maps. With the attention-aware training scheme and multi-scale deformable convolutional scheme, the proposed ADCrowdNet achieves the capability of being more effective to capture the crowd features and more resistant to various noises. We have evaluated our method on four popular crowd counting datasets (ShanghaiTech, UCF_CC_50, WorldEXPO'10, and UCSD) and an extra vehicle counting dataset TRANCOS, and our approach beats existing state-of-the-art approaches on all of these datasets.

247 citations

Journal ArticleDOI
TL;DR: This survey carefully examines various graph-based deep learning architectures in many traffic applications to discuss their shared deep learning techniques, clarifying the utilization of each technique in traffic tasks.
Abstract: In recent years, various deep learning architectures have been proposed to solve complex challenges (e.g. spatial dependency, temporal dependency) in traffic domain, which have achieved satisfactory performance. These architectures are composed of multiple deep learning techniques in order to tackle various challenges in traffic tasks. Traditionally, convolution neural networks (CNNs) are utilized to model spatial dependency by decomposing the traffic network as grids. However, many traffic networks are graph-structured in nature. In order to utilize such spatial information fully, it's more appropriate to formulate traffic networks as graphs mathematically. Recently, various novel deep learning techniques have been developed to process graph data, called graph neural networks (GNNs). More and more works combine GNNs with other deep learning techniques to construct an architecture dealing with various challenges in a complex traffic task, where GNNs are responsible for extracting spatial correlations in traffic network. These graph-based architectures have achieved state-of-the-art performance. To provide a comprehensive and clear picture of such emerging trend, this survey carefully examines various graph-based deep learning architectures in many traffic applications. We first give guidelines to formulate a traffic problem based on graph and construct graphs from various kinds of traffic datasets. Then we decompose these graph-based architectures to discuss their shared deep learning techniques, clarifying the utilization of each technique in traffic tasks. What's more, we summarize some common traffic challenges and the corresponding graph-based deep learning solutions to each challenge. Finally, we provide benchmark datasets, open source codes and future research directions in this rapidly growing field.

115 citations

Journal ArticleDOI
TL;DR: This work proposes a method to address MOT by defining a dissimilarity measure based on object motion, appearance, structure, and size, which can achieve state-of-the-art results in both benchmarks.
Abstract: Objective of multiple object tracking (MOT) is to assign a unique track identity for all the objects of interest in a video, across the whole sequence. Tracking-by-detection is the most common approach used in addressing MOT problem. In this work, we propose a method to address MOT by defining a dissimilarity measure based on object motion, appearance, structure, and size. We calculate the appearance and structure-based dissimilarity measure by matching histograms following a grid architecture. Motion and size for each track are predicted using the information from track's history. These dissimilarity values are then used in the Hungarian algorithm, in the data association step for track identity assignment. In addition, we introduce a method to address any false detection in stable tracks. The proposed method runs in real time following an online approach. We evaluate our method in both MOT17 benchmark data-set for pedestrian tracking and KITTI benchmark data-set for vehicle tracking using the same system parameters to verify the robustness of the proposed method. The method can achieve state-of-the-art results in both benchmarks.

111 citations

Posted Content
TL;DR: The Classification Transformer (C-Tran) is proposed, a general framework for multi-label image classification that leverages Transformers to exploit the complex dependencies among visual features and labels.
Abstract: Multi-label image classification is the task of predicting a set of labels corresponding to objects, attributes or other entities present in an image. In this work we propose the Classification Transformer (C-Tran), a general framework for multi-label image classification that leverages Transformers to exploit the complex dependencies among visual features and labels. Our approach consists of a Transformer encoder trained to predict a set of target labels given an input set of masked labels, and visual features from a convolutional neural network. A key ingredient of our method is a label mask training objective that uses a ternary encoding scheme to represent the state of the labels as positive, negative, or unknown during training. Our model shows state-of-the-art performance on challenging datasets such as COCO and Visual Genome. Moreover, because our model explicitly represents the uncertainty of labels during training, it is more general by allowing us to produce improved results for images with partial or extra label annotations during inference. We demonstrate this additional capability in the COCO, Visual Genome, News500, and CUB image datasets.

103 citations

Posted Content
TL;DR: This paper collects the first three releases of the MOTChallenge and provides a categorization of state-of-the-art trackers and a broad error analysis, to help newcomers understand the related work and research trends in the MOT community, and hopefully shed some light into potential future research directions.
Abstract: Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for single-camera Multiple Object Tracking (MOT) launched in late 2014, to collect existing and new data, and create a framework for the standardized evaluation of multiple object tracking methods. The benchmark is focused on multiple people tracking, since pedestrians are by far the most studied object in the tracking community, with applications ranging from robot navigation to self-driving cars. This paper collects the first three releases of the benchmark: (i) MOT15, along with numerous state-of-the-art results that were submitted in the last years, (ii) MOT16, which contains new challenging videos, and (iii) MOT17, that extends MOT16 sequences with more precise labels and evaluates tracking performance on three different object detectors. The second and third release not only offers a significant increase in the number of labeled boxes but also provide labels for multiple object classes beside pedestrians, as well as the level of visibility for every single object of interest. We finally provide a categorization of state-of-the-art trackers and a broad error analysis. This will help newcomers understand the related work and research trends in the MOT community, and hopefully shed some light on potential future research directions.

96 citations