scispace - formally typeset
Search or ask a question
Author

Zhiyong Wang

Bio: Zhiyong Wang is an academic researcher from University of Sydney. The author has contributed to research in topics: Automatic summarization & Image retrieval. The author has an hindex of 24, co-authored 160 publications receiving 2407 citations. Previous affiliations of Zhiyong Wang include Hong Kong Polytechnic University & Information Technology University.


Papers
More filters
Proceedings ArticleDOI
14 Jun 2020
TL;DR: A simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D are presented and a powerful feature extractor named MS-G3D is developed based on which the model outperforms previous state-of-the-art methods on three large-scale datasets.
Abstract: Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics. To capture robust movement patterns from these graphs, long-range and multi-scale context aggregation and spatial-temporal dependency modeling are critical aspects of a powerful feature extractor. However, existing methods have limitations in achieving (1) unbiased long-range joint relationship modeling under multi-scale operators and (2) unobstructed cross-spacetime information flow for capturing complex spatial-temporal dependencies. In this work, we present (1) a simple method to disentangle multi-scale graph convolutions and (2) a unified spatial-temporal graph convolutional operator named G3D. The proposed multi-scale aggregation scheme disentangles the importance of nodes in different neighborhoods for effective long-range modeling. The proposed G3D module leverages dense cross-spacetime edges as skip connections for direct information propagation across the spatial-temporal graph. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.

331 citations

Posted Content
TL;DR: Wang et al. as discussed by the authors proposed a unified spatial-temporal graph convolutional operator named G3D, which disentangles the importance of nodes in different neighborhoods for effective long-range modeling.
Abstract: Spatial-temporal graphs have been widely used by skeleton-based action recognition algorithms to model human action dynamics. To capture robust movement patterns from these graphs, long-range and multi-scale context aggregation and spatial-temporal dependency modeling are critical aspects of a powerful feature extractor. However, existing methods have limitations in achieving (1) unbiased long-range joint relationship modeling under multi-scale operators and (2) unobstructed cross-spacetime information flow for capturing complex spatial-temporal dependencies. In this work, we present (1) a simple method to disentangle multi-scale graph convolutions and (2) a unified spatial-temporal graph convolutional operator named G3D. The proposed multi-scale aggregation scheme disentangles the importance of nodes in different neighborhoods for effective long-range modeling. The proposed G3D module leverages dense cross-spacetime edges as skip connections for direct information propagation across the spatial-temporal graph. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets: NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400.

285 citations

Journal ArticleDOI
08 Apr 2003
TL;DR: Experimental results on 1400 leaf images from 140 plants show that the proposed approach can achieve a better retrieval performance than both the curvature scale space (CSS) method and the modified Fourier descriptor (MFD) method.
Abstract: The authors present an efficient two-stage approach for leaf image retrieval by using simple shape features including centroid-contour distance (CCD) curve, eccentricity and angle code histogram (ACH). In the first stage, the images that are dissimilar with the query image are first filtered out by using eccentricity to reduce the search space, and fine retrieval follows by using all three sets of features in the reduced search space in the second stage. Different from eccentricity and ACH, the CCD curve is neither scaling-invariant nor rotation-invariant. Therefore, normalisation is required for the CCD curve to achieve scaling invariance, and starting point location is required to achieve rotation invariance with the similarity measure of CCD curves. A thinning-based method is proposed to locate starting points of leaf image contours, so that the approach used is more computationally efficient. Actually, the method can benefit other shape representations that are sensitive to starting points by reducing the matching time in image recognition and retrieval. Experimental results on 1400 leaf images from 140 plants show that the proposed approach can achieve a better retrieval performance than both the curvature scale space (CSS) method and the modified Fourier descriptor (MFD) method. In addition, the two-stage approach can achieve a performance comparable to an exhaustive search, but with a much reduced computational complexity.

255 citations

Journal ArticleDOI
TL;DR: This paper formulate the video summarization task with a novel minimum sparse reconstruction (MSR) problem, where the original video sequence can be best reconstructed with as few selected keyframes as possible.
Abstract: The rapid growth of video data demands both effective and efficient video summarization methods so that users are empowered to quickly browse and comprehend a large amount of video content. In this paper, we formulate the video summarization task with a novel minimum sparse reconstruction (MSR) problem. That is, the original video sequence can be best reconstructed with as few selected keyframes as possible. Different from the recently proposed convex relaxation based sparse dictionary selection method, our proposed method utilizes the true sparse constraint L0 norm, instead of the relaxed constraint L 2 , 1 norm, such that keyframes are directly selected as a sparse dictionary that can well reconstruct all the video frames. An on-line version is further developed owing to the real-time efficiency of the proposed MSR principle. In addition, a percentage of reconstruction (POR) criterion is proposed to intuitively guide users in obtaining a summary with an appropriate length. Experimental results on two benchmark datasets with various types of videos demonstrate that the proposed methods outperform the state of the art. HighlightsA minimum sparse reconstruction (MSR) based video summarization (VS) model is constructed.An L0 norm based constraint is imposed to ensure real sparsity.Two efficient and effective MSR based VS algorithms are proposed for off-line and on-line applications, respectively.A scalable strategy is designed to provide flexibility for practical applications.

177 citations

Journal ArticleDOI
TL;DR: This work proposes a keypoint-based framework to address the keyframe selection problem so that local features can be employed in selecting keyframes, and introduces two criteria, coverage and redundancy, based on keypoint matching in the selection process.
Abstract: Keyframe selection has been crucial for effective and efficient video content analysis. While most of the existing approaches represent individual frames with global features, we, for the first time, propose a keypoint-based framework to address the keyframe selection problem so that local features can be employed in selecting keyframes. In general, the selected keyframes should both be representative of video content and containing minimum redundancy. Therefore, we introduce two criteria, coverage and redundancy, based on keypoint matching in the selection process. Comprehensive experiments demonstrate that our approach outperforms the state of the art.

134 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: Almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation are surveyed, and the spawning of related subfields are discussed, to discuss the adaptation of existing image retrieval techniques to build systems that can be useful in the real world.
Abstract: We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.

3,433 citations

Proceedings ArticleDOI
01 Dec 2007
TL;DR: This paper employs probabilistic neural network (PNN) with image and data processing techniques to implement a general purpose automated leaf recognition for plant classification with an accuracy greater than 90%.
Abstract: In this paper, we employ probabilistic neural network (PNN) with image and data processing techniques to implement a general purpose automated leaf recognition for plant classification. 12 leaf features are extracted and orthogonalized into 5 principal variables which consist the input vector of the PNN. The PNN is trained by 1800 leaves to classify 32 kinds of plants with an accuracy greater than 90%. Compared with other approaches, our algorithm is an accurate artificial intelligence approach which is fast in execution and easy in implementation.

823 citations

Proceedings ArticleDOI
09 Nov 2015
TL;DR: This paper presents the techniques employed in the team's submissions to the 2015 Emotion Recognition in the Wild contest, for the sub-challenge of Static Facial Expression Recognition In the Wild.
Abstract: This paper presents the techniques employed in our team's submissions to the 2015 Emotion Recognition in the Wild contest, for the sub-challenge of Static Facial Expression Recognition in the Wild. The objective of this sub-challenge is to classify the emotions expressed by the primary human subject in static images extracted from movies. We follow a transfer learning approach for deep Convolutional Neural Network (CNN) architectures. Starting from a network pre-trained on the generic ImageNet dataset, we perform supervised fine-tuning on the network in a two-stage process, first on datasets relevant to facial expressions, followed by the contest's dataset. Experimental results show that this cascading fine-tuning approach achieves better results, compared to a single stage fine-tuning with the combined datasets. Our best submission exhibited an overall accuracy of 48.5% in the validation set and 55.6% in the test set, which compares favorably to the respective 35.96% and 39.13% of the challenge baseline.

570 citations