scispace - formally typeset
Search or ask a question

Showing papers on "Feature (computer vision) published in 2014"


Proceedings ArticleDOI
23 Jun 2014
TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also present experiments that provide insight into what the network learns, revealing a rich hierarchy of image features. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

21,729 citations


Proceedings ArticleDOI
23 Jun 2014
TL;DR: In this paper, features extracted from the OverFeat network are used as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets.
Abstract: Recent results indicate that the generic descriptors extracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13. We use features extracted from the OverFeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. We selected these tasks and datasets as they gradually move further away from the original task and data the OverFeat network was trained to solve. Astonishingly, we report consistent superior results compared to the highly tuned state-of-the-art systems in all the visual classification tasks on various datasets. For instance retrieval it consistently outperforms low memory footprint methods except for sculptures dataset. The results are achieved using a linear SVM classifier (or L2 distance in case of retrieval) applied to a feature representation of size 4096 extracted from a layer in the net. The representations are further modified using simple augmentation techniques e.g. jittering. The results strongly suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.

3,346 citations


Proceedings Article
23 Feb 2014
TL;DR: In this article, a multiscale and sliding window approach is proposed to predict object boundaries, which is then accumulated rather than suppressed in order to increase detection confidence, and OverFeat is the winner of the ImageNet Large Scale Visual Recognition Challenge 2013.
Abstract: We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to predict object boundaries. Bounding boxes are then accumulated rather than suppressed in order to increase detection confidence. We show that different tasks can be learned simultaneously using a single shared network. This integrated framework is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very competitive results for the detection and classifications tasks. In post-competition work, we establish a new state of the art for the detection task. Finally, we release a feature extractor from our best model called OverFeat.

3,043 citations


Journal ArticleDOI
TL;DR: For a broad family of features, this work finds that features computed at octave-spaced scale intervals are sufficient to approximate features on a finely-sampled pyramid, and this approximation yields considerable speedups with negligible loss in detection accuracy.
Abstract: Multi-resolution image features may be approximated via extrapolation from nearby scales, rather than being computed explicitly. This fundamental insight allows us to design object detection algorithms that are as accurate, and considerably faster, than the state-of-the-art. The computational bottleneck of many modern detectors is the computation of features at every scale of a finely-sampled image pyramid. Our key insight is that one may compute finely sampled feature pyramids at a fraction of the cost, without sacrificing performance: for a broad family of features we find that features computed at octave-spaced scale intervals are sufficient to approximate features on a finely-sampled pyramid. Extrapolation is inexpensive as compared to direct feature computation. As a result, our approximation yields considerable speedups with negligible loss in detection accuracy. We modify three diverse visual recognition systems to use fast feature pyramids and show results on both pedestrian detection (measured on the Caltech, INRIA, TUD-Brussels and ETH data sets) and general object detection (measured on the PASCAL VOC). The approach is general and is widely applicable to vision algorithms requiring fine-grained multi-scale analysis. Our approximation is valid for images with broad spectra (most natural images) and fails for images with narrow band-pass spectra (e.g., periodic textures).

2,000 citations


Book ChapterDOI
06 Sep 2014
TL;DR: In this paper, a new geocentric embedding for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity is proposed.
Abstract: In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features. We propose a new geocentric embedding for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity. We demonstrate that this geocentric embedding works better than using raw depth images for learning feature representations with convolutional neural networks. Our final object detection system achieves an average precision of 37.3%, which is a 56% relative improvement over existing methods. We then focus on the task of instance segmentation where we label pixels belonging to object instances found by our detector. For this task, we propose a decision forest approach that classifies pixels in the detection window as foreground or background using a family of unary and binary tests that query shape and geocentric pose features. Finally, we use the output from our object detectors in an existing superpixel classification framework for semantic scene segmentation and achieve a 24% relative improvement over current state-of-the-art for the object categories that we study. We believe advances such as those represented in this paper will facilitate the use of perception in fields like robotics.

1,414 citations


Posted Content
TL;DR: A new geocentric embedding is proposed for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity to facilitate the use of perception in fields like robotics.
Abstract: In this paper we study the problem of object detection for RGB-D images using semantically rich image and depth features. We propose a new geocentric embedding for depth images that encodes height above ground and angle with gravity for each pixel in addition to the horizontal disparity. We demonstrate that this geocentric embedding works better than using raw depth images for learning feature representations with convolutional neural networks. Our final object detection system achieves an average precision of 37.3%, which is a 56% relative improvement over existing methods. We then focus on the task of instance segmentation where we label pixels belonging to object instances found by our detector. For this task, we propose a decision forest approach that classifies pixels in the detection window as foreground or background using a family of unary and binary tests that query shape and geocentric pose features. Finally, we use the output from our object detectors in an existing superpixel classification framework for semantic scene segmentation and achieve a 24% relative improvement over current state-of-the-art for the object categories that we study. We believe advances such as those represented in this paper will facilitate the use of perception in fields like robotics.

1,059 citations


Journal ArticleDOI
TL;DR: There is a much richer matching collection of expressions, enabling depiction of most human facial actions, in FaceWarehouse, a database of 3D facial expressions for visual computing applications.
Abstract: We present FaceWarehouse, a database of 3D facial expressions for visual computing applications. We use Kinect, an off-the-shelf RGBD camera, to capture 150 individuals aged 7-80 from various ethnic backgrounds. For each person, we captured the RGBD data of her different expressions, including the neutral expression and 19 other expressions such as mouth-opening, smile, kiss, etc. For every RGBD raw data record, a set of facial feature points on the color image such as eye corners, mouth contour, and the nose tip are automatically localized, and manually adjusted if better accuracy is required. We then deform a template facial mesh to fit the depth data as closely as possible while matching the feature points on the color image to their corresponding points on the mesh. Starting from these fitted face meshes, we construct a set of individual-specific expression blendshapes for each person. These meshes with consistent topology are assembled as a rank-3 tensor to build a bilinear face model with two attributes: identity and expression. Compared with previous 3D facial databases, for every person in our database, there is a much richer matching collection of expressions, enabling depiction of most human facial actions. We demonstrate the potential of FaceWarehouse for visual computing with four applications: facial image manipulation, face component transfer, real-time performance-based facial image animation, and facial animation retargeting from video to image.

952 citations


Journal ArticleDOI
TL;DR: A novel method for a high-level latent and shared feature representation from neuroimaging modalities via deep learning that could hierarchically discover the complex latent patterns inherent in both MRI and PET.

728 citations


Proceedings ArticleDOI
09 Oct 2014
TL;DR: In this paper, some widely used feature selection and feature extraction techniques have analyzed with the purpose of how effectively these techniques can be used to achieve high performance of learning algorithms that ultimately improves predictive accuracy of classifier.
Abstract: Dimensionality reduction as a preprocessing step to machine learning is effective in removing irrelevant and redundant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection and feature extraction methods with respect to efficiency and effectiveness. In the field of machine learning and pattern recognition, dimensionality reduction is important area, where many approaches have been proposed. In this paper, some widely used feature selection and feature extraction techniques have analyzed with the purpose of how effectively these techniques can be used to achieve high performance of learning algorithms that ultimately improves predictive accuracy of classifier. An endeavor to analyze dimensionality reduction techniques briefly with the purpose to investigate strengths and weaknesses of some widely used dimensionality reduction methods is presented.

726 citations


Proceedings ArticleDOI
23 Jun 2014
TL;DR: A novel Boosted Deep Belief Network for performing the three training stages iteratively in a unified loopy framework and showed that the BDBN framework yielded dramatic improvements in facial expression analysis.
Abstract: A training process for facial expression recognition is usually performed sequentially in three individual stages: feature learning, feature selection, and classifier construction. Extensive empirical studies are needed to search for an optimal combination of feature representation, feature set, and classifier to achieve good recognition performance. This paper presents a novel Boosted Deep Belief Network (BDBN) for performing the three training stages iteratively in a unified loopy framework. Through the proposed BDBN framework, a set of features, which is effective to characterize expression-related facial appearance/shape changes, can be learned and selected to form a boosted strong classifier in a statistical way. As learning continues, the strong classifier is improved iteratively and more importantly, the discriminative capabilities of selected features are strengthened as well according to their relative importance to the strong classifier via a joint fine-tune process in the BDBN framework. Extensive experiments on two public databases showed that the BDBN framework yielded dramatic improvements in facial expression analysis.

608 citations


Proceedings ArticleDOI
01 Jan 2014
TL;DR: A customized Convolutional Neural Networks with shallow convolution layer to classify lung image patches with interstitial lung disease and the same architecture can be generalized to perform other medical image or texture classification tasks.
Abstract: Image patch classification is an important task in many different medical imaging applications. In this work, we have designed a customized Convolutional Neural Networks (CNN) with shallow convolution layer to classify lung image patches with interstitial lung disease (ILD). While many feature descriptors have been proposed over the past years, they can be quite complicated and domain-specific. Our customized CNN framework can, on the other hand, automatically and efficiently learn the intrinsic image features from lung image patches that are most suitable for the classification purpose. The same architecture can be generalized to perform other medical image or texture classification tasks.

Journal ArticleDOI
TL;DR: The utility of a new Multimodal Surface Matching (MSM) algorithm capable of driving alignment using a wide variety of descriptors of brain architecture, function and connectivity is demonstrated.

Proceedings ArticleDOI
23 Jun 2014
TL;DR: It is shown that the dark-channel feature is the most informative one for this task, which confirms the observation of He et al.
Abstract: Haze is one of the major factors that degrade outdoor images. Removing haze from a single image is known to be severely ill-posed, and assumptions made in previous methods do not hold in many situations. In this paper, we systematically investigate different haze-relevant features in a learning framework to identify the best feature combination for image dehazing. We show that the dark-channel feature is the most informative one for this task, which confirms the observation of He et al. [8] from a learning perspective, while other haze-relevant features also contribute significantly in a complementary way. We also find that surprisingly, the synthetic hazy image patches we use for feature investigation serve well as training data for realworld images, which allows us to train specific models for specific applications. Experiment results demonstrate that the proposed algorithm outperforms state-of-the-art methods on both synthetic and real-world datasets.

Book ChapterDOI
06 Sep 2014
TL;DR: This paper proposes a novel salient color names based color descriptor (SCNCD) to describe colors that outperforms the state-of-the-art performance (without user’s feedback optimization) on two challenging datasets (VIPeR and PRID 450S).
Abstract: Color naming, which relates colors with color names, can help people with a semantic analysis of images in many computer vision applications. In this paper, we propose a novel salient color names based color descriptor (SCNCD) to describe colors. SCNCD utilizes salient color names to guarantee that a higher probability will be assigned to the color name which is nearer to the color. Based on SCNCD, color distributions over color names in different color spaces are then obtained and fused to generate a feature representation. Moreover, the effect of background information is employed and analyzed for person re-identification. With a simple metric learning method, the proposed approach outperforms the state-of-the-art performance (without user’s feedback optimization) on two challenging datasets (VIPeR and PRID 450S). More importantly, the proposed feature can be obtained very fast if we compute SCNCD of each color in advance.

Journal ArticleDOI
15 Jul 2014-PLOS ONE
TL;DR: 3D-Slicer segmented tumor volumes provide a better alternative to the manual delineation for feature quantification, as they yield more reproducible imaging descriptors and can be employed for quantitative image feature extraction and image data mining research in large patient cohorts.
Abstract: Due to advances in the acquisition and analysis of medical imaging, it is currently possible to quantify the tumor phenotype. The emerging field of Radiomics addresses this issue by converting medical images into minable data by extracting a large number of quantitative imaging features. One of the main challenges of Radiomics is tumor segmentation. Where manual delineation is time consuming and prone to inter-observer variability, it has been shown that semi-automated approaches are fast and reduce inter-observer variability. In this study, a semiautomatic region growing volumetric segmentation algorithm, implemented in the free and publicly available 3D-Slicer platform, was investigated in terms of its robustness for quantitative imaging feature extraction. Fifty-six 3D-radiomic features, quantifying phenotypic differences based on tumor intensity, shape and texture, were extracted from the computed tomography images of twenty lung cancer patients. These radiomic features were derived from the 3D-tumor volumes defined by three independent observers twice using 3D-Slicer, and compared to manual slice-by-slice delineations of five independent physicians in terms of intra-class correlation coefficient (ICC) and feature range. Radiomic features extracted from 3D-Slicer segmentations had significantly higher reproducibility (ICC = 0.85±0.15, p = 0.0009) compared to the features extracted from the manual segmentations (ICC = 0.77±0.17). Furthermore, we found that features extracted from 3D-Slicer segmentations were more robust, as the range was significantly smaller across observers (p = 3.819e-07), and overlapping with the feature ranges extracted from manual contouring (boundary lower: p = 0.007, higher: p = 5.863e-06). Our results show that 3D-Slicer segmented tumor volumes provide a better alternative to the manual delineation for feature quantification, as they yield more reproducible imaging descriptors. Therefore, 3D-Slicer can be employed for quantitative image feature extraction and image data mining research in large patient cohorts.

Proceedings ArticleDOI
TL;DR: The method yielded the best quantitative results for automatic detection of IDC regions in WSI in terms of F-measure and balanced accuracy and suggest that at least some of the tissue classification mistakes were less due to any fundamental problems associated with the approach, than the inherent limitations in obtaining a very highly granular annotation of the diseased area of interest by an expert pathologist.
Abstract: This paper presents a deep learning approach for automatic detection and visual analysis of invasive ductal carcinoma (IDC) tissue regions in whole slide images (WSI) of breast cancer (BCa). Deep learning approaches are learn-from-data methods involving computational modeling of the learning process. This approach is similar to how human brain works using dierent interpretation levels or layers of most representative and useful features resulting into a hierarchical learned representation. These methods have been shown to outpace traditional approaches of most challenging problems in several areas such as speech recognition and object detection. Invasive breast cancer detection is a time consuming and challenging task primarily because it involves a pathologist scanning large swathes of benign regions to ultimately identify the areas of malignancy. Precise delineation of IDC in WSI is crucial to the subsequent estimation of grading tumor aggressiveness and predicting patient outcome. DL approaches are particularly adept at handling these types of problems, especially if a large number of samples are available for training, which would also ensure the generalizability of the learned features and classier. The DL framework in this paper extends a number of convolutional neural networks (CNN) for visual semantic analysis of tumor regions for diagnosis support. The CNN is trained over a large amount of image patches (tissue regions) from WSI to learn a hierarchical part-based representation. The method was evaluated over a WSI dataset from 162 patients diagnosed with IDC. 113 slides were selected for training and 49 slides were held out for independent testing. Ground truth for quantitative evaluation was provided via expert delineation of the region of cancer by an expert pathologist on the digitized slides. The experimental evaluation was designed to measure classier accuracy in detecting IDC tissue regions in WSI. Our method yielded the best quantitative results for automatic detection of IDC regions in WSI in terms of F-measure and balanced accuracy (71.80%, 84.23%), in comparison with an approach using handcrafted image features (color, texture and edges, nuclear textural and architecture), and a machine learning classier for invasive tumor classication using a Random Forest. The best performing handcrafted features were fuzzy color histogram (67.53%, 78.74%) and RGB histogram (66.64%, 77.24%). Our results also suggest that at least some of the tissue classication mistakes (false positives and false negatives) were less due to any fundamental problems associated with the approach, than the inherent limitations in obtaining a very highly granular annotation of the diseased area of interest by an expert pathologist.

Journal ArticleDOI
TL;DR: This paper proposes to learn affect-salient features for SER using convolutional neural networks (CNN), and shows that this approach leads to stable and robust recognition performance in complex scenes and outperforms several well-established SER features.
Abstract: As an essential way of human emotional behavior understanding, speech emotion recognition (SER) has attracted a great deal of attention in human-centered signal processing. Accuracy in SER heavily depends on finding good affect- related , discriminative features. In this paper, we propose to learn affect-salient features for SER using convolutional neural networks (CNN). The training of CNN involves two stages. In the first stage, unlabeled samples are used to learn local invariant features (LIF) using a variant of sparse auto-encoder (SAE) with reconstruction penalization. In the second step, LIF is used as the input to a feature extractor, salient discriminative feature analysis (SDFA), to learn affect-salient, discriminative features using a novel objective function that encourages feature saliency, orthogonality, and discrimination for SER. Our experimental results on benchmark datasets show that our approach leads to stable and robust recognition performance in complex scenes (e.g., with speaker and language variation, and environment distortion) and outperforms several well-established SER features.

Posted Content
TL;DR: An architecture for fine-grained visual categorization that approaches expert human performance in the classification of bird species recognition is proposed, and a novel graph-based clustering algorithm for learning a compact pose normalization space is proposed.
Abstract: We propose an architecture for fine-grained visual categorization that approaches expert human performance in the classification of bird species. Our architecture first computes an estimate of the object's pose; this is used to compute local image features which are, in turn, used for classification. The features are computed by applying deep convolutional nets to image patches that are located and normalized by the pose. We perform an empirical study of a number of pose normalization schemes, including an investigation of higher order geometric warping functions. We propose a novel graph-based clustering algorithm for learning a compact pose normalization space. We perform a detailed investigation of state-of-the-art deep convolutional feature implementations and fine-tuning feature learning for fine-grained classification. We observe that a model that integrates lower-level feature layers with pose-normalized extraction routines and higher-level feature layers with unaligned image features works best. Our experiments advance state-of-the-art performance on bird species recognition, with a large improvement of correct classification rates over previous methods (75% vs. 55-65%).

Journal ArticleDOI
01 May 2014
TL;DR: Experiments on twenty benchmark datasets show that PSO with the new initialisation strategies and/or the new updating mechanisms can automatically evolve a feature subset with a smaller number of features and higher classification performance than using all features.
Abstract: In classification, feature selection is an important data pre-processing technique, but it is a difficult problem due mainly to the large search space. Particle swarm optimisation (PSO) is an efficient evolutionary computation technique. However, the traditional personal best and global best updating mechanism in PSO limits its performance for feature selection and the potential of PSO for feature selection has not been fully investigated. This paper proposes three new initialisation strategies and three new personal best and global best updating mechanisms in PSO to develop novel feature selection approaches with the goals of maximising the classification performance, minimising the number of features and reducing the computational time. The proposed initialisation strategies and updating mechanisms are compared with the traditional initialisation and the traditional updating mechanism. Meanwhile, the most promising initialisation strategy and updating mechanism are combined to form a new approach (PSO(4-2)) to address feature selection problems and it is compared with two traditional feature selection methods and two PSO based methods. Experiments on twenty benchmark datasets show that PSO with the new initialisation strategies and/or the new updating mechanisms can automatically evolve a feature subset with a smaller number of features and higher classification performance than using all features. PSO(4-2) outperforms the two traditional methods and two PSO based algorithm in terms of the computational time, the number of features and the classification performance. The superior performance of this algorithm is due mainly to both the proposed initialisation strategy, which aims to take the advantages of both the forward selection and backward selection to decrease the number of features and the computational time, and the new updating mechanism, which can overcome the limitations of traditional updating mechanisms by taking the number of features into account, which reduces the number of features and the computational time.

Journal ArticleDOI
TL;DR: Feature reduction is an essential step before training a machine learning model to avoid overfitting and therefore improving model prediction accuracy and generalization ability and in this review, feature reduction techniques used with machine learning in neuroimaging studies are discussed.
Abstract: Machine learning techniques are increasingly being used in making relevant predictions and inferences on individual subjects neuroimaging scan data. Previous studies have mostly focused on categorical discrimination of patients and matched healthy controls and more recently, on prediction of individual continuous variables such as clinical scores or age. However, these studies are greatly hampered by the large number of predictor variables (voxels) and low observations (subjects) also known as the curse-of-dimensionality or small-n-large-p problem. As a result, feature reduction techniques such as feature subset selection and dimensionality reduction are used to remove redundant predictor variables and experimental noise, a process which mitigates the curse-of-dimensionality and small-n-large-p effects. Feature reduction is an essential step before training a machine learning model to avoid overfitting and therefore improving model prediction accuracy and generalization ability. In this review, we discuss feature reduction techniques used with machine learning in neuroimaging studies.

Journal ArticleDOI
TL;DR: The proposed aerial scene classification method can be highly effective in developing a detection system that can be used to automatically scan large-scale high-resolution satellite imagery for detecting large facilities such as a shopping mall.
Abstract: The rich data provided by high-resolution satellite imagery allow us to directly model aerial scenes by understanding their spatial and structural patterns. While pixel- and object-based classification approaches are widely used for satellite image analysis, often these approaches exploit the high-fidelity image data in a limited way. In this paper, we explore an unsupervised feature learning approach for scene classification. Dense low-level feature descriptors are extracted to characterize the local spatial patterns. These unlabeled feature measurements are exploited in a novel way to learn a set of basis functions. The low-level feature descriptors are encoded in terms of the basis functions to generate new sparse representation for the feature descriptors. We show that the statistics generated from the sparse features characterize the scene well producing excellent classification accuracy. We apply our technique to several challenging aerial scene data sets: ORNL-I data set consisting of 1-m spatial resolution satellite imagery with diverse sensor and scene characteristics representing five land-use categories, UCMERCED data set representing twenty one different aerial scene categories with sub-meter resolution, and ORNL-II data set for large-facility scene detection. Our results are highly promising and, on the UCMERCED data set we outperform the previous best results. We demonstrate that the proposed aerial scene classification method can be highly effective in developing a detection system that can be used to automatically scan large-scale high-resolution satellite imagery for detecting large facilities such as a shopping mall.

Posted Content
02 Dec 2014
TL;DR: Convolution 3D feature is proposed, a generic spatio-temporal feature obtained by training a deep 3-dimensional convolutional network on a large annotated video dataset comprising objects, scenes, actions, and other frequently occurring concepts that encapsulate appearance and motion cues and perform well on different video classification tasks.
Abstract: Videos have become ubiquitous due to the ease of capturing and sharing via social platforms like Youtube, Facebook, Instagram, and others. The computer vision community has tried to tackle various video analysis problems independently. As a consequence, even though some really good hand-crafted features have been proposed there is a lack of generic feature for video analysis. On the other hand, the image domain has progressed rapidly by using features from deep convolutional networks. These deep features are proving to be generic and perform well on variety of image tasks. In this work we propose Convolution 3D(C3D) feature, a generic spatio-temporal feature obtained by training a deep 3-dimensional convolutional network on a large annotated video dataset comprising objects, scenes, actions, and other frequently occurring concepts. We show that by using spatio-temporal convolutions the trained features encapsulate appearance and motion cues and perform well on different discriminative video classification tasks. C3D has three main advantages. First, it is generic: achieving state of the art results on object recognition, scene classification, and action similarity labeling in videos. Second, it is compact: obtaining better accuracies than best hand-crafted features and best deep image features with a lower dimensional feature descriptor. Third, it is efficient to compute: 91 times faster than current hand-crafted features, and two orders of magnitude faster than current deep-learning based video classification methods.

Journal ArticleDOI
TL;DR: The concepts of feature relevance, general procedures, evaluation criteria, and the characteristics of feature selection are introduced and guidelines are provided for user to select a feature selection algorithm without knowing the information of each algorithm.
Abstract: Relevant feature identification has become an essential task to apply data mining algorithms effectively in real-world scenarios. Therefore, many feature selection methods have been proposed to obtain the relevant feature or feature subsets in the literature to achieve their objectives of classification and clustering. This paper introduces the concepts of feature relevance, general procedures, evaluation criteria, and the characteristics of feature selection. A comprehensive overview, categorization, and comparison of existing feature selection methods are also done, and the guidelines are also provided for user to select a feature selection algorithm without knowing the information of each algorithm. We conclude this work with real world applications, challenges, and future research directions of feature selection.

Proceedings Article
08 Dec 2014
TL;DR: Inspired by recent work on discriminative decorrelation of HOG features, an efficient feature transform that removes correlations in local neighborhoods is proposed, resulting in an overcomplete but locally decorrelated representation ideally suited for use with orthogonal decision trees.
Abstract: Even with the advent of more sophisticated, data-hungry methods, boosted decision trees remain extraordinarily successful for fast rigid object detection, achieving top accuracy on numerous datasets. While effective, most boosted detectors use decision trees with orthogonal (single feature) splits, and the topology of the resulting decision boundary may not be well matched to the natural topology of the data. Given highly correlated data, decision trees with oblique (multiple feature) splits can be effective. Use of oblique splits, however, comes at considerable computational expense. Inspired by recent work on discriminative decorrelation of HOG features, we instead propose an efficient feature transform that removes correlations in local neighborhoods. The result is an overcomplete but locally decorrelated representation ideally suited for use with orthogonal decision trees. In fact, orthogonal trees with our locally decorrelated features outperform oblique trees trained over the original features at a fraction of the computational cost. The overall improvement in accuracy is dramatic: on the Caltech Pedestrian Dataset, we reduce false positives nearly tenfold over the previous state-of-the-art.

Proceedings ArticleDOI
04 May 2014
TL;DR: Although convolutional neural networks do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.
Abstract: Content-based music information retrieval tasks have traditionally been solved using engineered features and shallow processing architectures. In recent years, there has been increasing interest in using feature learning and deep architectures instead, thus reducing the required engineering effort and the need for prior knowledge. However, this new approach typically still relies on mid-level representations of music audio, e.g. spectrograms, instead of raw audio signals. In this paper, we investigate whether it is possible to apply feature learning directly to raw audio signals. We train convolutional neural networks using both approaches and compare their performance on an automatic tagging task. Although they do not outperform a spectrogram-based approach, the networks are able to autonomously discover frequency decompositions from raw audio, as well as phase-and translation-invariant feature representations.

Proceedings ArticleDOI
04 May 2014
TL;DR: In this article, the authors used multiple instance learning (MIL) framework in classification training with deep learning features and found that automatic feature learning outperformed manual feature learning and achieved performance that's close to fully supervised approach.
Abstract: This paper studies the effectiveness of accomplishing high-level tasks with a minimum of manual annotation and good feature representations for medical images. In medical image analysis, objects like cells are characterized by significant clinical features. Previously developed features like SIFT and HARR are unable to comprehensively represent such objects. Therefore, feature representation is especially important. In this paper, we study automatic extraction of feature representation through deep learning (DNN). Furthermore, detailed annotation of objects is often an ambiguous and challenging task. We use multiple instance learning (MIL) framework in classification training with deep learning features. Several interesting conclusions can be drawn from our work: (1) automatic feature learning outperforms manual feature; (2) the unsupervised approach can achieve performance that's close to fully supervised approach (93.56%) vs. (94.52%); and (3) the MIL performance of coarse label (96.30%) outweighs the supervised performance of fine label (95.40%) in supervised deep learning features.

Journal ArticleDOI
TL;DR: Conventional methods of EEG feature extraction methods are discussed, comparing their performances for specific task, and recommending the most suitable method for feature extraction based on performance.
Abstract: Technically, a feature represents a distinguishing property, a recognizable measurement, and a functional component obtained from a section of a pattern. Extracted features are meant to minimize the loss of important information embedded in the signal. In addition, they also simplify the amount of resources needed to describe a huge set of data accurately. This is necessary to minimize the complexity of implementation, to reduce the cost of information processing, and to cancel the potential need to compress the information. More recently, a variety of methods have been widely used to extract the features from EEG signals, among these methods are time frequency distributions (TFD), fast fourier transform (FFT), eigenvector methods (EM), wavelet transform (WT), and auto regressive method (ARM), and so on. In general, the analysis of EEG signal has been the subject of several studies, because of its ability to yield an objective mode of recording brain stimulation which is widely used in brain-computer interface researches with application in medical diagnosis and rehabilitation engineering. The purposes of this paper, therefore, shall be discussing some conventional methods of EEG feature extraction methods, comparing their performances for specific task, and finally, recommending the most suitable method for feature extraction based on performance.

Proceedings ArticleDOI
Jifeng Dai1, Kaiming He1, Jian Sun1
TL;DR: This paper proposes a joint method to handle objects and “stuff” (e.g., grass, sky, water) in the same framework and presents state-of-the-art results on benchmarks of PASCAL VOC and new PASCal-CONTEXT.
Abstract: The topic of semantic segmentation has witnessed considerable progress due to the powerful features learned by convolutional neural networks (CNNs). The current leading approaches for semantic segmentation exploit shape information by extracting CNN features from masked image regions. This strategy introduces artificial boundaries on the images and may impact the quality of the extracted features. Besides, the operations on the raw image domain require to compute thousands of networks on a single image, which is time-consuming. In this paper, we propose to exploit shape information via masking convolutional features. The proposal segments (e.g., super-pixels) are treated as masks on the convolutional feature maps. The CNN features of segments are directly masked out from these maps and used to train classifiers for recognition. We further propose a joint method to handle objects and "stuff" (e.g., grass, sky, water) in the same framework. State-of-the-art results are demonstrated on benchmarks of PASCAL VOC and new PASCAL-CONTEXT, with a compelling computational speed.

Proceedings ArticleDOI
08 Feb 2014
TL;DR: Various types of features, feature extraction techniques and explaining in what scenario, which features extraction technique, will be better are discussed and referred in case of character recognition application.
Abstract: Feature plays a very important role in the area of image processing. Before getting features, various image preprocessing techniques like binarization, thresholding, resizing, normalization etc. are applied on the sampled image. After that, feature extraction techniques are applied to get features that will be useful in classifying and recognition of images. Feature extraction techniques are helpful in various image processing applications e.g. character recognition. As features define the behavior of an image, they show its place in terms of storage taken, efficiency in classification and obviously in time consumption also. Here in this paper, we are going to discuss various types of features, feature extraction techniques and explaining in what scenario, which features extraction technique, will be better. Hereby in this paper, we are going to refer features and feature extraction methods in case of character recognition application.

Journal ArticleDOI
TL;DR: A novel two-step hierarchical classification approach is proposed where the nonlesions or false positives are rejected in the first step and the bright lesions areclassified as hard exudates and cotton wool spots, and the red lesions are classified as hemorrhages and micro-aneurysms.
Abstract: This paper presents a computer-aided screening system (DREAM) that analyzes fundus images with varying illumination and fields of view, and generates a severity grade for diabetic retinopathy (DR) using machine learning. Classifiers such as the Gaussian Mixture model (GMM), k-nearest neighbor (kNN), support vector machine (SVM), and AdaBoost are analyzed for classifying retinopathy lesions from nonlesions. GMM and kNN classifiers are found to be the best classifiers for bright and red lesion classification, respectively. A main contribution of this paper is the reduction in the number of features used for lesion classification by feature ranking using Adaboost where 30 top features are selected out of 78. A novel two-step hierarchical classification approach is proposed where the nonlesions or false positives are rejected in the first step. In the second step, the bright lesions are classified as hard exudates and cotton wool spots, and the red lesions are classified as hemorrhages and micro-aneurysms. This lesion classification problem deals with unbalanced datasets and SVM or combination classifiers derived from SVM using the Dempster-Shafer theory are found to incur more classification error than the GMM and kNN classifiers due to the data imbalance. The DR severity grading system is tested on 1200 images from the publicly available MESSIDOR dataset. The DREAM system achieves 100% sensitivity, 53.16% specificity, and 0.904 AUC, compared to the best reported 96% sensitivity, 51% specificity, and 0.875 AUC, for classifying images as with or without DR. The feature reduction further reduces the average computation time for DR severity per image from 59.54 to 3.46 s.