Ayan Kumar Bhunia
Other affiliations: Future Institute of Engineering and Management, Beijing University of Posts and Telecommunications, Nanyang Technological University
Bio: Ayan Kumar Bhunia is an academic researcher from University of Surrey. The author has contributed to research in topics: Computer science & Image retrieval. The author has an hindex of 14, co-authored 63 publications receiving 772 citations. Previous affiliations of Ayan Kumar Bhunia include Future Institute of Engineering and Management & Beijing University of Posts and Telecommunications.
TL;DR: In this article, a mutual channel loss (MC-Loss) is proposed for fine-grained image categorization, which consists of two channel-specific components: a discriminality component and a diversity component.
Abstract: The key to solving fine-grained image categorization is finding discriminate and local regions that correspond to subtle visual traits. Great strides have been made, with complex networks designed specifically to learn part-level discriminate feature representations. In this paper, we show that it is possible to cultivate subtle details without the need for overly complicated network designs or training mechanisms – a single loss is all it takes. The main trick lies with how we delve into individual feature channels early on, as opposed to the convention of starting from a consolidated feature map. The proposed loss function, termed as mutual-channel loss (MC-Loss), consists of two channel-specific components: a discriminality component and a diversity component. The discriminality component forces all feature channels belonging to the same class to be discriminative, through a novel channel-wise attention mechanism. The diversity component additionally constraints channels so that they become mutually exclusive across the spatial dimension. The end result is therefore a set of feature channels, each of which reflects different locally discriminative regions for a specific class. The MC-Loss can be trained end-to-end, without the need for any bounding-box/part annotations, and yields highly discriminative regions during inference. Experimental results show our MC-Loss when implemented on top of common base networks can achieve state-of-the-art performance on all four fine-grained categorization datasets (CUB-Birds, FGVC-Aircraft, Flowers-102, and Stanford Cars). Ablative studies further demonstrate the superiority of the MC-Loss when compared with other recently proposed general-purpose losses for visual classification, on two different base networks. Codes are available at: https://github.com/dongliangchang/Mutual-Channel-Loss .
23 Aug 2020
TL;DR: PMG-Progressive multi-granularity training as mentioned in this paper proposes a progressive training strategy that effectively fuses features from different granularities, and a random jigsaw patch generator that encourages the network to learn features at specific granularity.
Abstract: Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks due to the inherently subtle intra-class object variations. Recent works are mainly part-driven (either explicitly or implicitly), with the assumption that fine-grained information naturally rests within the parts. In this paper, we take a different stance, and show that part operations are not strictly necessary – the key lies with encouraging the network to learn at different granularities and progressively fusing multi-granularity features together. In particular, we propose: (i) a progressive training strategy that effectively fuses features from different granularities, and (ii) a random jigsaw patch generator that encourages the network to learn features at specific granularities. We evaluate on several standard FGVC benchmark datasets, and show the proposed method consistently outperforms existing alternatives or delivers competitive results. The code is available at https://github.com/PRIS-CV/PMG-Progressive-Multi-Granularity-Training.
TL;DR: A novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification is proposed and achieves superior results in comparison to conventional methods.
Abstract: Script identification plays a significant role in analysing documents and videos. In this paper, we focus on the problem of script identification in scene text images and video scripts. Because of low image quality, complex background and similar layout of characters shared by some scripts like Greek, Latin, etc., text recognition in those cases become challenging. In this paper, we propose a novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification. First, we convert the images into patches and feed them into a CNN-LSTM framework. Attention-based patch weights are calculated applying softmax layer after LSTM. Next, we do patch-wise multiplication of these weights with corresponding CNN to yield local features. Global features are also extracted from last cell state of LSTM. We employ a fusion technique which dynamically weights the local and global features for an individual patch. Experiments have been done in four public script identification datasets: SIW-13, CVSI2015, ICDAR-17 and MLe2e. The proposed framework achieves superior results in comparison to conventional methods.
TL;DR: An efficient word recognition framework by segmenting the handwritten word images horizontally into three zones (upper, middle and lower) and then recognize the corresponding zones to reduce the number of distinct component classes compared to the total number of classes in Indic scripts is proposed.
Abstract: This paper presents a novel approach towards Indic handwritten word recognition using zone-wise information Because of complex nature due to compound characters, modifiers, overlapping and touching, etc, character segmentation and recognition is a tedious job in Indic scripts (eg Devanagari, Bangla, Gurumukhi, and other similar scripts) To avoid character segmentation in such scripts, HMM-based sequence modeling has been used earlier in holistic way This paper proposes an efficient word recognition framework by segmenting the handwritten word images horizontally into three zones (upper, middle and lower) and then recognize the corresponding zones The main aim of this zone segmentation approach is to reduce the number of distinct component classes compared to the total number of classes in Indic scripts As a result, use of this zone segmentation approach enhances the recognition performance of the system The components in middle zone, where characters are mostly touching, are recognized using HMM After the recognition of middle zone, HMM based Viterbi forced alignment is applied to mark the left and right boundaries of the characters in the middle zone Next, the residue components, if any, in upper and lower zones are obtained in a character boundary then the components are combined with the character to achieve the final word level recognition Water reservoir-based properties have been integrated in this framework to improve the zone segmentation and character boundary detection defects while segmentation A novel sliding window-based feature, called Pyramid Histogram of Oriented Gradient (PHOG) is proposed for middle zone recognition PHOG features have been compared with other existing features and found robust for Indic script recognition An exhaustive experiment is performed on two Indic scripts namely, Bangla and Devanagari for the performance evaluation From the experiment, it has been noted that proposed zone-wise recognition improves accuracy with respect to the traditional way of Indic word recognition A novel approach of Indic handwritten word recognition using zone segmentationEfficient PHOG features developed to improve the performance of HMM based middle zone recognitionIntegration of water reservoir concept for better character alignment in a word imageA detailed study of experimental results in Bangla and Devanagari scripts has been performedThe proposed framework outperforms traditional without-zone-segmentation based recognition systems
TL;DR: This work proposes a novel framework for fine-grained visual classification with a progressive training strategy that effectively fuses features from different granularities, and a random jigsaw patch generator that encourages the network to learn features at specificgranularities.
Abstract: Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks due to the inherently subtle intra-class object variations. Recent works mainly tackle this problem by focusing on how to locate the most discriminative parts, more complementary parts, and parts of various granularities. However, less effort has been placed to which granularities are the most discriminative and how to fuse information cross multi-granularity. In this work, we propose a novel framework for fine-grained visual classification to tackle these problems. In particular, we propose: (i) a progressive training strategy that effectively fuses features from different granularities, and (ii) a random jigsaw patch generator that encourages the network to learn features at specific granularities. We obtain state-of-the-art performances on several standard FGVC benchmark datasets, where the proposed method consistently outperforms existing methods or delivers competitive results. The code will be available at this https URL.
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.
TL;DR: Deep Convolutional Neural Networks (CNNs) as mentioned in this paper are a special type of Neural Networks, which has shown exemplary performance on several competitions related to Computer Vision and Image Processing.
Abstract: Deep Convolutional Neural Network (CNN) is a special type of Neural Networks, which has shown exemplary performance on several competitions related to Computer Vision and Image Processing. Some of the exciting application areas of CNN include Image Classification and Segmentation, Object Detection, Video Processing, Natural Language Processing, and Speech Recognition. The powerful learning ability of deep CNN is primarily due to the use of multiple feature extraction stages that can automatically learn representations from the data. The availability of a large amount of data and improvement in the hardware technology has accelerated the research in CNNs, and recently interesting deep CNN architectures have been reported. Several inspiring ideas to bring advancements in CNNs have been explored, such as the use of different activation and loss functions, parameter optimization, regularization, and architectural innovations. However, the significant improvement in the representational capacity of the deep CNN is achieved through architectural innovations. Notably, the ideas of exploiting spatial and channel information, depth and width of architecture, and multi-path information processing have gained substantial attention. Similarly, the idea of using a block of layers as a structural unit is also gaining popularity. This survey thus focuses on the intrinsic taxonomy present in the recently reported deep CNN architectures and, consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature-map exploitation, channel boosting, and attention. Additionally, the elementary understanding of CNN components, current challenges, and applications of CNN are also provided.
TL;DR: A comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications can be found in this paper.
Abstract: In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.
TL;DR: A comprehensive survey of the major applications of deep learning covering variety of areas is presented, study of the techniques and architectures used and further the contribution of that respective application in the real world are presented.
Abstract: Nowadays, deep learning is a current and a stimulating field of machine learning. Deep learning is the most effective, supervised, time and cost efficient machine learning approach. Deep learning is not a restricted learning approach, but it abides various procedures and topographies which can be applied to an immense speculum of complicated problems. The technique learns the illustrative and differential features in a very stratified way. Deep learning methods have made a significant breakthrough with appreciable performance in a wide variety of applications with useful security tools. It is considered to be the best choice for discovering complex architecture in high-dimensional data by employing back propagation algorithm. As deep learning has made significant advancements and tremendous performance in numerous applications, the widely used domains of deep learning are business, science and government which further includes adaptive testing, biological image classification, computer vision, cancer detection, natural language processing, object detection, face recognition, handwriting recognition, speech recognition, stock market analysis, smart city and many more. This paper focuses on the concepts of deep learning, its basic and advanced architectures, techniques, motivational aspects, characteristics and the limitations. The paper also presents the major differences between the deep learning, classical machine learning and conventional learning approaches and the major challenges ahead. The main intention of this paper is to explore and present chronologically, a comprehensive survey of the major applications of deep learning covering variety of areas, study of the techniques and architectures used and further the contribution of that respective application in the real world. Finally, the paper ends with the conclusion and future aspects.