scispace - formally typeset
Search or ask a question

Showing papers by "Stephen J. Maybank published in 2021"


Journal ArticleDOI
TL;DR: A comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, distillation algorithms and applications is provided.
Abstract: In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher–student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

105 citations


Journal ArticleDOI
TL;DR: An efficient decomposition and pruning scheme via constructing a compressed-aware block that can automatically minimize the rank of the weight matrix and identify the redundant channels and is further compressed and optimized by a novel Pruning & Merging module.
Abstract: Model compression methods have become popular in recent years, which aim to alleviate the heavy load of deep neural networks (DNNs) in real-world applications. However, most of the existing compression methods have two limitations: 1) they usually adopt a cumbersome process, including pretraining, training with a sparsity constraint, pruning/decomposition, and fine-tuning. Moreover, the last three stages are usually iterated multiple times. 2) The models are pretrained under explicit sparsity or low-rank assumptions, which are difficult to guarantee wide appropriateness. In this article, we propose an efficient decomposition and pruning (EDP) scheme via constructing a compressed-aware block that can automatically minimize the rank of the weight matrix and identify the redundant channels. Specifically, we embed the compressed-aware block by decomposing one network layer into two layers: a new weight matrix layer and a coefficient matrix layer. By imposing regularizers on the coefficient matrix, the new weight matrix learns to become a low-rank basis weight, and its corresponding channels become sparse. In this way, the proposed compressed-aware block simultaneously achieves low-rank decomposition and channel pruning by only one single data-driven training stage. Moreover, the network of architecture is further compressed and optimized by a novel Pruning & Merging (PM) module which prunes redundant channels and merges redundant decomposed layers. Experimental results (17 competitors) on different data sets and networks demonstrate that the proposed EDP achieves a high compression ratio with acceptable accuracy degradation and outperforms state-of-the-arts on compression rate, accuracy, inference time, and run-time memory.

30 citations


Journal ArticleDOI
TL;DR: In this article, a feedback graph convolutional network (FGCN) is proposed to extract spatial-temporal features for action recognition in a coarse-to-fine process.
Abstract: Skeleton-based action recognition has attracted considerable attention since the skeleton data is more robust to the dynamic circumstances and complicated backgrounds than other modalities. Recently, many researchers have used the Graph Convolutional Network (GCN) to model spatial-temporal features of skeleton sequences by an end-to-end optimization. However, conventional GCNs are feedforward networks for which it is impossible for the shallower layers to access semantic information in the high-level layers. In this paper, we propose a novel network, named Feedback Graph Convolutional Network (FGCN). This is the first work that introduces a feedback mechanism into GCNs for action recognition. Compared with conventional GCNs, FGCN has the following advantages: (1) A multi-stage temporal sampling strategy is designed to extract spatial-temporal features for action recognition in a coarse to fine process; (2) A Feedback Graph Convolutional Block (FGCB) is proposed to introduce dense feedback connections into the GCNs. It transmits the high-level semantic features to the shallower layers and conveys temporal information stage by stage to model video level spatial-temporal features for action recognition; (3) The FGCN model provides predictions on-the-fly. In the early stages, its predictions are relatively coarse. These coarse predictions are treated as priors to guide the feature learning in later stages, to obtain more accurate predictions. Extensive experiments on three datasets, NTU-RGB+D, NTU-RGB+D120 and Northwestern-UCLA, demonstrate that the proposed FGCN is effective for action recognition. It achieves the state-of-the-art performance on all three datasets.

25 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a motion offset estimation framework to model pixel-wise displacements of the latent sharp image at multiple timepoints, which can recover dense, non-linear exposure trajectories, which significantly reduce temporal disorder and ill-posed problems.
Abstract: Motion blur in dynamic scenes is an important yet challenging research topic. Recently, deep learning methods have achieved impressive performance for dynamic scene deblurring. However, the motion information contained in a blurry image has yet to be fully explored and accurately formulated because: (i) the ground truth of dynamic motion is difficult to obtain; (ii) the temporal ordering is destroyed during the exposure; and (iii) the motion estimation from a blurry image is highly ill-posed. By revisiting the principle of camera exposure, motion blur can be described by the relative motions of sharp content with respect to each exposed position. In this paper, we define exposure trajectories, which represent the motion information contained in a blurry image and explain the causes of motion blur. A novel motion offset estimation framework is proposed to model pixel-wise displacements of the latent sharp image at multiple timepoints. Under mild constraints, our method can recover dense, (non-)linear exposure trajectories, which significantly reduce temporal disorder and ill-posed problems. Finally, experiments demonstrate that the recovered exposure trajectories not only capture accurate and interpretable motion information from a blurry image, but also benefit motion-aware image deblurring and warping-based video extraction tasks. Codes are available on https://github.com/yjzhang96/Motion-ETR.

12 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a deep regression architecture with progressive reinitialization and a new error-driven learning loss function to explicitly address the issues related to the initial alignment estimation and the final learning objective.
Abstract: Regression-based face alignment involves learning a series of mapping functions to predict the true landmark from an initial estimation of the alignment. Most existing approaches focus on learning efficacious mapping functions from some feature representations to improve performance. The issues related to the initial alignment estimation and the final learning objective, however, receive less attention. This work proposes a deep regression architecture with progressive reinitialization and a new error-driven learning loss function to explicitly address the above two issues. Given an image with a rough face detection result, the full face region is firstly mapped by a supervised spatial transformer network to a normalized form and trained to regress coarse positions of landmarks. Then, different face parts are further respectively reinitialized to their own normalized states, followed by another regression sub-network to refine the landmark positions. To deal with the inconsistent annotations in existing training datasets, we further propose an adaptive landmark-weighted loss function. It dynamically adjusts the importance of different landmarks according to their learning errors during training without depending on any hyper-parameters manually set by trial and error. The whole deep architecture permits training from end to end, and extensive experimental comparisons demonstrate its effectiveness and efficiency.

6 citations



Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors presented 3D Furniture shape with TextURE (3D-FUTURE): a richly annotated and large-scale repository of 3D furniture shapes in the household scenario.
Abstract: The 3D CAD shapes in current 3D benchmarks are mostly collected from online model repositories. Thus, they typically have insufficient geometric details and less informative textures, making them less attractive for comprehensive and subtle research in areas such as high-quality 3D mesh and texture recovery. This paper presents 3D Furniture shape with TextURE (3D-FUTURE): a richly-annotated and large-scale repository of 3D furniture shapes in the household scenario. At the time of this technical report, 3D-FUTURE contains 9992 modern 3D furniture shapes with high-resolution textures and detailed attributes. To support the studies of 3D modeling from images, we couple the CAD models with 20,240 scene images. The room scenes are designed by professional designers or generated by an industrial scene creating system. Given the well-organized 3D-FUTURE and its characteristics, we provide a package of baseline experiments, such as joint 2D instance segmentation and 3D object pose estimation, image-based 3D shape retrieval, 3D object reconstruction from a single image, texture recovery for 3D shapes, and furniture composition, to facilitate related future researches on our database.

1 citations