scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A feature fusion based localized multiple kernel learning system for real world image classification

29 Nov 2017-Eurasip Journal on Image and Video Processing (SpringerOpen)-Vol. 2017, Iss: 1, pp 1-11
TL;DR: This paper proposes a feature fusion based multiple kernel learning (MKL) model for image classification by using multiple kernels extracted from multiple features to address the first challenge and provides a solution for the second challenge.
Abstract: Real-world image classification, which aims to determine the semantic class of un-labeled images, is a challenging task. In this paper, we focus on two challenges of image classification and propose a method to address both of them simultaneously. The first challenge is that representing images by heterogeneous features, such as color, shape and texture, helps to provide better classification accuracy. The second challenge comes from dissimilarities in the visual appearance of images from the same class (intra class variance) and similarities between images from different classes (inter class relationship). In addition to these two challenges, we should note that the feature space of real-world images is highly complex so they cannot be linearly classified. The kernel trick is efficacious to classify them. This paper proposes a feature fusion based multiple kernel learning (MKL) model for image classification. By using multiple kernels extracted from multiple features, we address the first challenge. To provide a solution for the second challenge, we use the idea of a localized MKL by assigning separate local weights to each kernel. We employed spatial pyramid match (SPM) representation of images and computed kernel weights based on Χ 2kernel. Experimental results demonstrate that our proposed model has achieved promising results.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this study, eight crop types were identified using gamma naught values and polarimetric parameters calculated from TerraSAR-X (or TanDEM-X) dual-polarimetric (HH/VV) data and the classification accuracy of four widely used machine-learning algorithms was evaluated.
Abstract: Cropland maps are useful for the management of agricultural fields and the estimation of harvest yield. Some local governments have documented field properties, including crop type and location, based on site investigations. This process, which is generally done manually, is labor-intensive, and remote-sensing techniques can be used as alternatives. In this study, eight crop types (beans, beetroot, grass, maize, potatoes, squash, winter wheat, and yams) were identified using gamma naught values and polarimetric parameters calculated from TerraSAR-X (or TanDEM-X) dual-polarimetric (HH/VV) data. Three indices (difference (D-type), simple ratio (SR), and normalized difference (ND)) were calculated using gamma naught values and m-chi decomposition parameters and were evaluated in terms of crop classification. We also evaluated the classification accuracy of four widely used machine-learning algorithms (kernel-based extreme learning machine, support vector machine, multilayer feedforward neural network (FNN), and random forest) and two multiple-kernel methods (multiple kernel extreme learning machine (MKELM) and multiple kernel learning (MKL)). MKL performed best, achieving an overall accuracy of 92.1%, and proved useful for the identification of crops with small sample sizes. The difference (raw or normalized) between double-bounce scattering and odd-bounce scattering helped to improve the identification of squash and yams fields.

24 citations


Cites methods from "A feature fusion based localized mu..."

  • ...In the LMKL framework, multiple kernels are used instead of a single kernel, but local weights are computed for kernels in the training phase, unlike canonical MKL where fixed weights for kernels are computed in the training phase and the weighted sum of kernels is computed [60]....

    [...]

Journal ArticleDOI
TL;DR: A novel Neuromorphic Perception Understanding Action (PUA) system is presented, that aims to combine the feature extraction benefits of CNNs with low latency processing of SCNNs, and can deliver robust results of over 96 and 81% for accuracy and Intersection over Union, ensuring such a system can be successfully used within object recognition, classification and tracking problem.
Abstract: Traditionally the Perception Action cycle is the first stage of building an autonomous robotic system and a practical way to implement a low latency reactive system within a low Size, Weight and Power (SWaP) package. However, within complex scenarios, this method can lack contextual understanding about the scene, such as object recognition-based tracking or system attention. Object detection, identification and tracking along with semantic segmentation and attention are all modern computer vision tasks in which Convolutional Neural Networks (CNN) have shown significant success, although such networks often have a large computational overhead and power requirements, which are not ideal in smaller robotics tasks. Furthermore, cloud computing and massively parallel processing like in Graphic Processing Units (GPUs) are outside the specification of many tasks due to their respective latency and SWaP constraints. In response to this, Spiking Convolutional Neural Networks (SCNNs) look to provide the feature extraction benefits of CNNs, while maintaining low latency and power overhead thanks to their asynchronous spiking event-based processing. A novel Neuromorphic Perception Understanding Action (PUA) system is presented, that aims to combine the feature extraction benefits of CNNs with low latency processing of SCNNs. The PUA utilizes a Neuromorphic Vision Sensor for Perception that facilitates asynchronous processing within a Spiking fully Convolutional Neural Network (SpikeCNN) to provide semantic segmentation and Understanding of the scene. The output is fed to a spiking control system providing Actions. With this approach, the aim is to bring features of deep learning into the lower levels of autonomous robotics, while maintaining a biologically plausible STDP rule throughout the learned encoding part of the network. The network will be shown to provide a more robust and predictable management of spiking activity with an improved thresholding response. The reported experiments show that this system can deliver robust results of over 96 and 81% for accuracy and Intersection over Union, ensuring such a system can be successfully used within object recognition, classification and tracking problem. This demonstrates that the attention of the system can be tracked accurately, while the asynchronous processing means the controller can give precise track updates with minimal latency.

9 citations


Cites background from "A feature fusion based localized mu..."

  • ...As the network is only looking for natural spatial structural similarity avoidance of classes which have a large intraclass variance compared to the overall interclass relationship (Zamani and Jamzad, 2017)....

    [...]

Journal ArticleDOI
01 Feb 2022-Optik
TL;DR: In this article , a multispectral pedestrian head direction estimation method is proposed to circumvent the problem of pedestrian head estimation at night, where feature vectors are extracted from the visible and thermal images of pedestrian heads at night.

4 citations

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed an end-to-end multi-task feature fusion method for Chinese painting classification using RGB images and auxiliary gray-level co-occurrence matrix (GLCM) to enhance the discriminative power of the features.
Abstract: Different artists have their unique painting styles, which can be hardly recognized by ordinary people without professional knowledge. How to intelligently analyze such artistic styles via underlying features remains to be a challenging research problem. In this paper, we propose a novel multi-task feature fusion architecture (MTFFNet), for cognitive classification of traditional Chinese paintings. Specifically, by taking the full advantage of the pre-trained DenseNet as backbone, MTFFNet benefits from the fusion of two different types of feature information: semantic and brush stroke features. These features are learned from the RGB images and auxiliary gray-level co-occurrence matrix (GLCM) in an end-to-end manner, to enhance the discriminative power of the features for the first time. Through abundant experiments, our results demonstrate that our proposed model MTFFNet achieves significantly better classification performance than many state-of-the-art approaches. In this paper, an end-to-end multi-task feature fusion method for Chinese painting classification is proposed. We come up with a new model named MTFFNet, composed of two branches, in which one branch is top-level RGB feature learning and the other branch is low-level brush stroke feature learning. The semantic feature learning branch takes the original image of traditional Chinese painting as input, extracting the color and semantic information of the image, while the brush feature learning branch takes the GLCM feature map as input, extracting the texture and edge information of the image. Multi-kernel learning SVM (supporting vector machine) is selected as the final classifier. Evaluated by experiments, this method improves the accuracy of Chinese painting classification and enhances the generalization ability. By adopting the end-to-end multi-task feature fusion method, MTFFNet could extract more semantic features and texture information in the image. When compared with state-of-the-art classification method for Chinese painting, the proposed method achieves much higher accuracy on our proposed datasets, without lowering speed or efficiency. The proposed method provides an effective solution for cognitive classification of Chinese ink painting, where the accuracy and efficiency of the approach have been fully validated.

2 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

01 Jan 2011
TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Abstract: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images. These features can then be used to reliably match objects in diering images. The algorithm was rst proposed by Lowe [12] and further developed to increase performance resulting in the classic paper [13] that served as foundation for SIFT which has played an important role in robotic and machine vision in the past decade.

14,708 citations


"A feature fusion based localized mu..." refers background or methods in this paper

  • ...The selected features for Caltech 101 include dense SIFT (scale invariant feature transform) [30], dense color SIFT and SSIM (structural similarity) [35]....

    [...]

  • ...Introducing the bag of word (BoW) model to compute image feature significantly improves the performance of image classification systems [30]....

    [...]

Proceedings ArticleDOI
17 Jun 2006
TL;DR: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence that exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories.
Abstract: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence. This technique works by partitioning the image into increasingly fine sub-regions and computing histograms of local features found inside each sub-region. The resulting "spatial pyramid" is a simple and computationally efficient extension of an orderless bag-of-features image representation, and it shows significantly improved performance on challenging scene categorization tasks. Specifically, our proposed method exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories. The spatial pyramid framework also offers insights into the success of several recently proposed image descriptions, including Torralba’s "gist" and Lowe’s SIFT descriptors.

8,736 citations


"A feature fusion based localized mu..." refers background or methods or result in this paper

  • ...There is a very rich literature on image classification including methods based on bag of word [1, 2], Sparse representation [3–7], and Deep learning [8–10]....

    [...]

  • ...We employed spatial pyramid match (SPM) representation of images and computed kernel weights based on Χ2kernel....

    [...]

  • ...The abovementioned SPM based feature vectors were fed to the proposed classifier....

    [...]

  • ...For fair comparison with other works, we followed the experimental setup suggested in [1] and randomly selected 30 images per class for training, leaving the rest for testing....

    [...]

  • ...proposed the spatial pyramid match (SPM) approach to address the mentioned problem [1]....

    [...]

Book
01 Jan 2004
TL;DR: This book provides an easy introduction for students and researchers to the growing field of kernel-based pattern analysis, demonstrating with examples how to handcraft an algorithm or a kernel for a new specific application, and covering all the necessary conceptual and mathematical tools to do so.
Abstract: Kernel methods provide a powerful and unified framework for pattern discovery, motivating algorithms that can act on general types of data (e.g. strings, vectors or text) and look for general types of relations (e.g. rankings, classifications, regressions, clusters). The application areas range from neural networks and pattern recognition to machine learning and data mining. This book, developed from lectures and tutorials, fulfils two major roles: firstly it provides practitioners with a large toolkit of algorithms, kernels and solutions ready to use for standard pattern discovery problems in fields such as bioinformatics, text analysis, image analysis. Secondly it provides an easy introduction for students and researchers to the growing field of kernel-based pattern analysis, demonstrating with examples how to handcraft an algorithm or a kernel for a new specific application, and covering all the necessary conceptual and mathematical tools to do so.

6,050 citations

Proceedings ArticleDOI
25 Oct 2010
TL;DR: VLFeat is an open and portable library of computer vision algorithms that includes rigorous implementations of common building blocks such as feature detectors, feature extractors, (hierarchical) k-means clustering, randomized kd-tree matching, and super-pixelization.
Abstract: VLFeat is an open and portable library of computer vision algorithms. It aims at facilitating fast prototyping and reproducible research for computer vision scientists and students. It includes rigorous implementations of common building blocks such as feature detectors, feature extractors, (hierarchical) k-means clustering, randomized kd-tree matching, and super-pixelization. The source code and interfaces are fully documented. The library integrates directly with MATLAB, a popular language for computer vision research.

3,417 citations


"A feature fusion based localized mu..." refers methods in this paper

  • ...Dense SIFT is calculated over regular grids of 16 × 16 image patches with eight pixels spacing using VLFeat Lib [36]....

    [...]