scispace - formally typeset
Search or ask a question
Author

Syed Afaq Ali Shah

Bio: Syed Afaq Ali Shah is an academic researcher from Murdoch University. The author has contributed to research in topics: Deep learning & Convolutional neural network. The author has an hindex of 16, co-authored 66 publications receiving 825 citations. Previous affiliations of Syed Afaq Ali Shah include Central Queensland University & University of Western Australia.

Papers published on a yearly basis

Papers
More filters
Book
13 Feb 2018
TL;DR: This self-contained guide will benefit those who seek to both understand the theory behind CNNs and to gain hands-on experience on the application of CNNs in computer vision, providing a comprehensive introduction to CNNs.
Abstract: Computer vision has become increasingly important and effective in recent years due to its wide-ranging applications in areas as diverse as smart surveillance and monitoring, health and medicine, sports and recreation, robotics, drones, and self-driving cars. Visual recognition tasks, such as image classification, localization, and detection, are the core building blocks of many of these applications, and recent developments in Convolutional Neural Networks (CNNs) have led to outstanding performance in these state-of-the-art visual recognition tasks and systems. As a result, CNNs now form the crux of deep learning algorithms in computer vision. This self-contained guide will benefit those who seek to both understand the theory behind CNNs and to gain hands-on experience on the application of CNNs in computer vision. It provides a comprehensive introduction to CNNs starting with the essential concepts behind neural networks: training, regularization, and optimization of CNNs. The book also discusses a wide range of loss functions, network layers, and popular CNN architectures, reviews the different techniques for the evaluation of CNNs, and presents some popular CNN tools and libraries that are commonly used in computer vision. Further, this text describes and discusses case studies that are related to the application of CNN in computer vision, including image classification, object detection, semantic segmentation, scene understanding, and image generation. This book is ideal for undergraduate and graduate students, as no prior background knowledge in the field is required to follow the material, as well as new researchers, developers, engineers, and practitioners who are interested in gaining a quick understanding of CNN models.

402 citations

Journal ArticleDOI
TL;DR: An Iterative Deep Learning Model (IDLM) that automatically and hierarchically learns discriminative representations from raw face and object images is proposed that achieves the best performance on all these datasets.
Abstract: We present a novel technique for image set based face/object recognition, where each gallery and query example contains a face/object image set captured from different viewpoints, background, facial expressions, resolution and illumination levels. While several image set classification approaches have been proposed in recent years, most of them represent each image set as a single linear subspace, mixture of linear subspaces or Lie group of Riemannian manifold. These techniques make prior assumptions in regards to the specific category of the geometric surface on which images of the set are believed to lie. This could result in a loss of discriminative information for classification. This paper alleviates these limitations by proposing an Iterative Deep Learning Model (IDLM) that automatically and hierarchically learns discriminative representations from raw face and object images. In the proposed approach, low level translationally invariant features are learnt by the Pooled Convolutional Layer (PCL). The latter is followed by Artificial Neural Networks (ANNs) applied iteratively in a hierarchical fashion to learn a discriminative non-linear feature representation of the input image sets. The proposed technique was extensively evaluated for the task of image set based face and object recognition on YouTube Celebrities, Honda/UCSD, CMU Mobo and ETH-80 (object) dataset, respectively. Experimental results and comparisons with state-of-the-art methods show that our technique achieves the best performance on all these datasets.

70 citations

Proceedings Article
03 Dec 2018
TL;DR: A new variant of L STM is derived, in which the convolutional structures are only embedded into the input-to-state transition of LSTM, and it is demonstrated that the spatial convolutions in the three gates scarcely contribute to the spatiotemporal feature fusion, and the attention mechanisms embedded in the input and output gates cannot improve the feature fusion.
Abstract: Convolutional long short-term memory (LSTM) networks have been widely used for action/gesture recognition, and different attention mechanisms have also been embedded into the LSTM or the convolutional LSTM (ConvLSTM) networks. Based on the previous gesture recognition architectures which combine the three-dimensional convolution neural network (3DCNN) and ConvLSTM, this paper explores the effects of attention mechanism in ConvLSTM. Several variants of Con-vLSTM are evaluated: (a) Removing the convolutional structures of the three gates in ConvLSTM, (b) Applying the attention mechanism on the input of ConvLSTM, (c) Reconstructing the input and (d) output gates respectively with the modified channel-wise attention mechanism. The evaluation results demonstrate that the spatial convolutions in the three gates scarcely contribute to the spatiotemporal feature fusion, and the attention mechanisms embedded into the input and output gates cannot improve the feature fusion. In other words, ConvLSTM mainly contributes to the temporal fusion along with the recurrent steps to learn the long-term spatiotemporal features, when taking as input the spatial or spatiotemporal features. On this basis, a new variant of LSTM is derived, in which the convolutional structures are only embedded into the input-to-state transition of LSTM. The code of the LSTM variants is publicly available.

69 citations

Journal ArticleDOI
TL;DR: An effective deep architecture for continuous gesture recognition is presented and a balanced squared hinge loss function is proposed to deal with the imbalance between boundaries and nonboundaries.
Abstract: Continuous gesture recognition aims at recognizing the ongoing gestures from continuous gesture sequences and is more meaningful for the scenarios, where the start and end frames of each gesture instance are generally unknown in practical applications This paper presents an effective deep architecture for continuous gesture recognition First, continuous gesture sequences are segmented into isolated gesture instances using the proposed temporal dilated Res3D network A balanced squared hinge loss function is proposed to deal with the imbalance between boundaries and nonboundaries Temporal dilation can preserve the temporal information for the dense detection of the boundaries at fine granularity, and the large temporal receptive field makes the segmentation results more reasonable and effective Then, the recognition network is constructed based on the 3-D convolutional neural network (3DCNN), the convolutional long-short-term-memory network (ConvLSTM), and the 2-D convolutional neural network (2DCNN) for isolated gesture recognition The “3DCNN-ConvLSTM-2DCNN” architecture is more effective to learn long-term and deep spatiotemporal features The proposed segmentation and recognition networks obtain the Jaccard index of 07163 on the Chalearn LAP ConGD dataset, which is 0106 higher than the winner of 2017 ChaLearn LAP Large-Scale Continuous Gesture Recognition Challenge

62 citations

Journal ArticleDOI
TL;DR: This work proposes an efficient cascaded V-Net model that can take full advantage of features from the first stage network and make the cascaded structure more efficient, and combines stacked small and large kernels with an inception-like structure to help the model to learn more patterns.
Abstract: Multi-organ segmentation is a challenging task due to the label imbalance and structural differences between different organs. In this work, we propose an efficient cascaded V-Net model to improve the performance of multi-organ segmentation by establishing dense Block Level Skip Connections ( BLSC ) across cascaded V-Net. Our model can take full advantage of features from the first stage network and make the cascaded structure more efficient. We also combine stacked small and large kernels with an inception-like structure to help our model to learn more patterns, which produces superior results for multi-organ segmentation. In addition, some small organs are commonly occluded by large organs and have unclear boundaries with other surrounding tissues, which makes them hard to be segmented. We therefore first locate the small organs through a multi-class network and crop them randomly with the surrounding region, then segment them with a single-class network. We evaluated our model on SegTHOR 2019 challenge unseen testing set and Multi-Atlas Labeling Beyond the Cranial Vault challenge validation set. Our model has achieved an average dice score gain of 1.62 percents and 3.90 percents compared to traditional cascaded networks on these two datasets, respectively. For hard-to-segment small organs, such as the esophagus in SegTHOR 2019 challenge, our technique has achieved a gain of 5.63 percents on dice score, and four organs in Multi-Atlas Labeling Beyond the Cranial Vault challenge have achieved a gain of 5.27 percents on average dice score.

49 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work proposes a new neural network module suitable for CNN-based high-level tasks on point clouds, including classification and segmentation called EdgeConv, which acts on graphs dynamically computed in each layer of the network.
Abstract: Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to the point cloud world. Point clouds inherently lack topological information, so designing a model to recover topology can enrich the representation power of point clouds. To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds, including classification and segmentation. EdgeConv acts on graphs dynamically computed in each layer of the network. It is differentiable and can be plugged into existing architectures. Compared to existing modules operating in extrinsic space or treating each point independently, EdgeConv has several appealing properties: It incorporates local neighborhood information; it can be stacked applied to learn global shape properties; and in multi-layer systems affinity in feature space captures semantic characteristics over potentially long distances in the original embedding. We show the performance of our model on standard benchmarks, including ModelNet40, ShapeNetPart, and S3DIS.

3,727 citations

01 Jan 2006

3,012 citations

Posted Content
TL;DR: In this paper, a new neural network module called EdgeConv is proposed for CNN-based high-level tasks on point clouds including classification and segmentation, which is differentiable and can be plugged into existing architectures.
Abstract: Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to the point cloud world. Point clouds inherently lack topological information so designing a model to recover topology can enrich the representation power of point clouds. To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds including classification and segmentation. EdgeConv acts on graphs dynamically computed in each layer of the network. It is differentiable and can be plugged into existing architectures. Compared to existing modules operating in extrinsic space or treating each point independently, EdgeConv has several appealing properties: It incorporates local neighborhood information; it can be stacked applied to learn global shape properties; and in multi-layer systems affinity in feature space captures semantic characteristics over potentially long distances in the original embedding. We show the performance of our model on standard benchmarks including ModelNet40, ShapeNetPart, and S3DIS.

1,048 citations

Posted Content
TL;DR: This work proposes the Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities, and performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques.
Abstract: When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.

1,037 citations

Book ChapterDOI
20 Dec 2013

780 citations