scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Feature and Region Selection for Visual Learning

05 Jan 2016-IEEE Transactions on Image Processing (IEEE)-Vol. 25, Iss: 3, pp 1084-1094
TL;DR: A method for feature selection and region selection in the visual BoW model is presented, to assign latent weights to the features or regions, and jointly optimize these latent variables with the parameters of a classifier.
Abstract: Visual learning problems, such as object classification and action recognition, are typically approached using extensions of the popular bag-of-words (BoWs) model. Despite its great success, it is unclear what visual features the BoW model is learning. Which regions in the image or video are used to discriminate among classes? Which are the most discriminative visual words? Answering these questions is fundamental for understanding existing BoW models and inspiring better models for visual recognition. To answer these questions, this paper presents a method for feature selection and region selection in the visual BoW model. This allows for an intermediate visualization of the features and regions that are important for visual learning. The main idea is to assign latent weights to the features or regions, and jointly optimize these latent variables with the parameters of a classifier (e.g., support vector machine). There are four main benefits of our approach: 1) our approach accommodates non-linear additive kernels, such as the popular $\chi ^{2}$ and intersection kernel; 2) our approach is able to handle both regions in images and spatio-temporal regions in videos in a unified way; 3) the feature selection problem is convex, and both problems can be solved using a scalable reduced gradient method; and 4) we point out strong connections with multiple kernel learning and multiple instance learning approaches. Experimental results in the PASCAL VOC 2007, MSR Action Dataset II and YouTube illustrate the benefits of our approach.
Citations
More filters
Proceedings ArticleDOI
01 Nov 2019
TL;DR: A unique approach utilizing the underlying standard metrics and creating internals 3D evaluation of fitness factors, a part of enhanced machine learning engine engineering for enhanced Weighted Performance Metric.
Abstract: Accuracy metric has become a gold standard for measuring various models and systems. However, the latest development and research have shown its limitations, especially during training/testing of the machine learning models. It fails to keep the integrity of the reliability of the outcome for unbalanced data modeling. This problem is known as Accuracy Paradox. This paper presents a unique approach utilizing the underlying standard metrics and creating internals 3D evaluation of fitness factors, a part of enhanced machine learning engine engineering. The model is trained under the internal tuning of parameters and trade-off is handled as a collateral benefit using built-in parallelism. Moreover, the way this metric is engineered, it opens up generalized options for data engineers and model creators to incorporate (mathematically) the metrics (standard or the new one) of their choice based on how the model is trained for any type of data domain. This approach shows the improved integrity of the internal metrics measurements using mathematical constructs and governing algorithm for enhanced Weighted Performance Metric. Included sample experiments on real-world datasets, and results support the contribution of this research.

16 citations

Journal ArticleDOI
TL;DR: Inspired by the successful application of bag-of-words (BoW) to feature representation, this work leverages it at instance-level to model the distributions of the positive class and negative class, and then incorporates the BoW learning and instance labeling in a single optimization formulation.
Abstract: In this paper, we aim at irregular-shape object localization under weak supervision. With over-segmentation, this task can be transformed into multiple-instance context. However, most multiple-instance learning methods only emphasize single most positive instance in a positive bag to optimize bag-level classification, and leads to imprecise or incomplete localization. To address this issue, we propose a scheme for instance annotation, where all of the positive instances are detected by labeling each instance in each positive bag. Inspired by the successful application of bag-of-words (BoW) to feature representation, we leverage it at instance-level to model the distributions of the positive class and negative class, and then incorporate the BoW learning and instance labeling in a single optimization formulation. We also demonstrate that the scheme is well suited to weakly supervised object localization of irregular-shape. Experimental results validate the effectiveness both for the problem of generic instance annotation and for the application of weakly supervised object localization compared to some existing methods.

16 citations


Cites background from "Feature and Region Selection for Vi..."

  • ...content-based image retrieval [7], action recognition [8], natural scene classification [9] etc....

    [...]

  • ...This has promoted the emergence of object localization under weak supervision [5], [6], [8], [10]–[12]....

    [...]

Journal ArticleDOI
TL;DR: The problem is tackled through proposing a novel model based on the optimization algorithm that is the integration of Cat Swarm Optimization Al algorithm and Crow Search Algorithm and inherits the advantages of both the optimization algorithms.
Abstract: Object detection and localization attract the researchers to address the challenges associated with the computer vision. The literature presents numerous unsupervised methods to detect and localize the objects, but with inaccuracies and inconsistencies. The problem is tackled through proposing a novel model based on the optimization algorithm. The object in the image is detected using the Sparse Fuzzy C-Means (Sparse FCM) that is the enhanced Fuzzy C-Means algorithm used to manage the high-dimensional data. The detected objects are subjected to the object localization, which is performed using the proposed Cat Crow Optimization (CCO)-based Deep Convolutional Neural Network. The proposed CCO is the integration of Cat Swarm Optimization Algorithm and Crow Search Algorithm and inherits the advantages of both the optimization algorithms. The experimentation of the proposed method is performed using images obtained from the Visual Object Classes Challenge 2012 dataset. The analysis revealed that the proposed method acquired an average accuracy, precision, and recall of 0.8278, 0.8549, and 0.7911, respectively.

7 citations

Proceedings ArticleDOI
01 May 2017
TL;DR: The method proposed in the paper is to build a drone with camera to capture images of crops, soils, flodded areas and those images are processed to get required results.
Abstract: The method proposed in the paper is to build a drone with camera to capture images of crops, soils, flodded areas and those images are processed to get required results. Building of Drone is the first part. The Bag of Words algorithm is then applied on the test image and specific features areextracted. These features are matched against the features of images in the predefined data set. The data set has classified categories of different types of cropimages. After matching result is given as the name of the category to which the testimage belongs.

5 citations


Cites background from "Feature and Region Selection for Vi..."

  • ...The objective of the research [1][2] is to classify data finding discriminative features and its analysis....

    [...]

Journal ArticleDOI
TL;DR: In this article , the authors compared the performance of three different machine learning models: logistic regression, random forest, and decision trees to classify, predict, and detect fraudulent credit card transactions and showed that random forest produces a maximum accuracy of 96% with an area under the curve value of 98.9%.

5 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Journal ArticleDOI
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Abstract: The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

15,935 citations

Journal ArticleDOI
TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.
Abstract: We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL data sets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI--SVM in terms of latent variables. A latent SVM is semiconvex, and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.

10,501 citations


"Feature and Region Selection for Vi..." refers methods in this paper

  • ...Many MIL algorithms have been successfully used for weakly-supervised learning, such as MILboost [13], MI-SVM [1], [4], [14], [15] and SparseMIL [16]....

    [...]

  • ...MIL has been applied to object detection for images [1], [15], time series [1] and videos [2], [5], [18]....

    [...]

Journal ArticleDOI
TL;DR: This paper investigates two fundamental problems in computer vision: contour detection and image segmentation and presents state-of-the-art algorithms for both of these tasks.
Abstract: This paper investigates two fundamental problems in computer vision: contour detection and image segmentation. We present state-of-the-art algorithms for both of these tasks. Our contour detector combines multiple local cues into a globalization framework based on spectral clustering. Our segmentation algorithm consists of generic machinery for transforming the output of any contour detector into a hierarchical region tree. In this manner, we reduce the problem of image segmentation to that of contour detection. Extensive experimental evaluation demonstrates that both our contour detection and segmentation methods significantly outperform competing algorithms. The automatically generated hierarchical segmentations can be interactively refined by user-specified annotations. Computation at multiple image resolutions provides a means of coupling our system to recognition applications.

5,068 citations


"Feature and Region Selection for Vi..." refers methods in this paper

  • ...For images, we used a hierarchical image segmentation to obtain superpixels [33]....

    [...]

  • ...superpixels [33] or spatio-temporal regions [34]....

    [...]

Journal ArticleDOI
01 Sep 2005
TL;DR: This paper builds on the idea of the Harris and Förstner interest point operators and detects local structures in space-time where the image values have significant local variations in both space and time and illustrates how a video representation in terms of local space- time features allows for detection of walking people in scenes with occlusions and dynamic cluttered backgrounds.
Abstract: Local image features or interest points provide compact and abstract representations of patterns in an image. In this paper, we extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features often reflect interesting events that can be used for a compact representation of video data as well as for interpretation of spatio-temporal events. To detect spatio-temporal events, we build on the idea of the Harris and Forstner interest point operators and detect local structures in space-time where the image values have significant local variations in both space and time. We estimate the spatio-temporal extents of the detected events by maximizing a normalized spatio-temporal Laplacian operator over spatial and temporal scales. To represent the detected events, we then compute local, spatio-temporal, scale-invariant N-jets and classify each event with respect to its jet descriptor. For the problem of human motion analysis, we illustrate how a video representation in terms of local space-time features allows for detection of walking people in scenes with occlusions and dynamic cluttered backgrounds.

2,684 citations