Showing papers on "Feature extraction published in 2014"

PDF

Open Access

Proceedings Article•DOI•

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

[...]

Ross Girshick¹, Jeff Donahue¹, Trevor Darrell¹, Jitendra Malik¹•Institutions (1)

23 Jun 2014

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.

...read moreread less

Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also present experiments that provide insight into what the network learns, revealing a rich hierarchy of image features. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

...read moreread less

21,729 citations

Proceedings Article•DOI•

Large-Scale Video Classification with Convolutional Neural Networks

[...]

Andrej Karpathy¹, George Toderici¹, Sanketh Shetty¹, Thomas Leung¹, Rahul Sukthankar¹, Li Fei-Fei¹ - Show less +2 more•Institutions (1)

Stanford University¹

23 Jun 2014

TL;DR: This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training.

...read moreread less

Abstract: Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).

...read moreread less

4,876 citations

Journal Article•DOI•

A survey on feature selection methods

[...]

Girish Chandrashekar¹, Ferat Sahin¹•Institutions (1)

Rochester Institute of Technology¹

01 Jan 2014-Computers & Electrical Engineering

TL;DR: The objective is to provide a generic introduction to variable elimination which can be applied to a wide array of machine learning problems and focus on Filter, Wrapper and Embedded methods.

...read moreread less

3,517 citations

Proceedings Article•DOI•

CNN Features Off-the-Shelf: An Astounding Baseline for Recognition

[...]

Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, Stefan Carlsson

23 Jun 2014

TL;DR: In this paper, features extracted from the OverFeat network are used as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets.

...read moreread less

Abstract: Recent results indicate that the generic descriptors extracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13. We use features extracted from the OverFeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. We selected these tasks and datasets as they gradually move further away from the original task and data the OverFeat network was trained to solve. Astonishingly, we report consistent superior results compared to the highly tuned state-of-the-art systems in all the visual classification tasks on various datasets. For instance retrieval it consistently outperforms low memory footprint methods except for sculptures dataset. The results are achieved using a linear SVM classifier (or L2 distance in case of retrieval) applied to a feature representation of size 4096 extracted from a layer in the net. The representations are further modified using simple augmentation techniques e.g. jittering. The results strongly suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.

...read moreread less

3,346 citations

Journal Article•DOI•

Deep Learning-Based Classification of Hyperspectral Data

[...]

Yushi Chen¹, Zhouhan Lin¹, Xing Zhao¹, Gang Wang², Yanfeng Gu¹ - Show less +1 more•Institutions (2)

Harbin Institute of Technology¹, Nanyang Technological University²

26 Jun 2014-IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

TL;DR: The concept of deep learning is introduced into hyperspectral data classification for the first time, and a new way of classifying with spatial-dominated information is proposed, which is a hybrid of principle component analysis (PCA), deep learning architecture, and logistic regression.

...read moreread less

Abstract: Classification is one of the most popular topics in hyperspectral remote sensing. In the last two decades, a huge number of methods were proposed to deal with the hyperspectral data classification problem. However, most of them do not hierarchically extract deep features. In this paper, the concept of deep learning is introduced into hyperspectral data classification for the first time. First, we verify the eligibility of stacked autoencoders by following classical spectral information-based classification. Second, a new way of classifying with spatial-dominated information is proposed. We then propose a novel deep learning framework to merge the two features, from which we can get the highest classification accuracy. The framework is a hybrid of principle component analysis (PCA), deep learning architecture, and logistic regression. Specifically, as a deep learning architecture, stacked autoencoders are aimed to get useful high-level features. Experimental results with widely-used hyperspectral data indicate that classifiers built in this deep learning-based framework provide competitive performance. In addition, the proposed joint spectral-spatial deep neural network opens a new window for future research, showcasing the deep learning-based methods' huge potential for accurate hyperspectral data classification.

...read moreread less

2,071 citations

Proceedings Article•DOI•

Deep Learning Face Representation from Predicting 10,000 Classes

[...]

Yi Sun¹, Xiaogang Wang¹, Xiaoou Tang¹•Institutions (1)

The Chinese University of Hong Kong¹

23 Jun 2014

TL;DR: It is argued that DeepID can be effectively learned through challenging multi-class face identification tasks, whilst they can be generalized to other tasks (such as verification) and new identities unseen in the training set.

...read moreread less

Abstract: This paper proposes to learn a set of high-level feature representations through deep learning, referred to as Deep hidden IDentity features (DeepID), for face verification. We argue that DeepID can be effectively learned through challenging multi-class face identification tasks, whilst they can be generalized to other tasks (such as verification) and new identities unseen in the training set. Moreover, the generalization capability of DeepID increases as more face classes are to be predicted at training. DeepID features are taken from the last hidden layer neuron activations of deep convolutional networks (ConvNets). When learned as classifiers to recognize about 10, 000 face identities in the training set and configured to keep reducing the neuron numbers along the feature extraction hierarchy, these deep ConvNets gradually form compact identity-related features in the top layers with only a small number of hidden neurons. The proposed features are extracted from various face regions to form complementary and over-complete representations. Any state-of-the-art classifiers can be learned based on these high-level representations for face verification. 97:45% verification accuracy on LFW is achieved with only weakly aligned faces.

...read moreread less

2,026 citations

Journal Article•DOI•

Fast Feature Pyramids for Object Detection

[...]

Piotr Dollár¹, Ron Appel², Serge Belongie³, Pietro Perona²•Institutions (3)

Microsoft¹, California Institute of Technology², Cornell University³

01 Aug 2014-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: For a broad family of features, this work finds that features computed at octave-spaced scale intervals are sufficient to approximate features on a finely-sampled pyramid, and this approximation yields considerable speedups with negligible loss in detection accuracy.

...read moreread less

Abstract: Multi-resolution image features may be approximated via extrapolation from nearby scales, rather than being computed explicitly. This fundamental insight allows us to design object detection algorithms that are as accurate, and considerably faster, than the state-of-the-art. The computational bottleneck of many modern detectors is the computation of features at every scale of a finely-sampled image pyramid. Our key insight is that one may compute finely sampled feature pyramids at a fraction of the cost, without sacrificing performance: for a broad family of features we find that features computed at octave-spaced scale intervals are sufficient to approximate features on a finely-sampled pyramid. Extrapolation is inexpensive as compared to direct feature computation. As a result, our approximation yields considerable speedups with negligible loss in detection accuracy. We modify three diverse visual recognition systems to use fast feature pyramids and show results on both pedestrian detection (measured on the Caltech, INRIA, TUD-Brussels and ETH data sets) and general object detection (measured on the PASCAL VOC). The approach is general and is widely applicable to vision algorithms requiring fine-grained multi-scale analysis. Our approximation is valid for images with broad spectra (most natural images) and fails for images with narrow band-pass spectra (e.g., periodic textures).

...read moreread less

2,000 citations

Proceedings Article•DOI•

SVO: Fast semi-direct monocular visual odometry

[...]

Christian Forster¹, Matia Pizzoli¹, Davide Scaramuzza¹•Institutions (1)

University of Zurich¹

29 Sep 2014

TL;DR: A semi-direct monocular visual odometry algorithm that is precise, robust, and faster than current state-of-the-art methods and applied to micro-aerial-vehicle state-estimation in GPS-denied environments is proposed.

...read moreread less

Abstract: We propose a semi-direct monocular visual odometry algorithm that is precise, robust, and faster than current state-of-the-art methods. The semi-direct approach eliminates the need of costly feature extraction and robust matching techniques for motion estimation. Our algorithm operates directly on pixel intensities, which results in subpixel precision at high frame-rates. A probabilistic mapping method that explicitly models outlier measurements is used to estimate 3D points, which results in fewer outliers and more reliable points. Precise and high frame-rate motion estimation brings increased robustness in scenes of little, repetitive, and high-frequency texture. The algorithm is applied to micro-aerial-vehicle state-estimation in GPS-denied environments and runs at 55 frames per second on the onboard embedded computer and at more than 300 frames per second on a consumer laptop. We call our approach SVO (Semi-direct Visual Odometry) and release our implementation as open-source software.

...read moreread less

1,814 citations

Journal Article•DOI•

Using Fourier transform IR spectroscopy to analyze biological materials

[...]

Matthew J. Baker¹, Júlio Trevisan², Paul Bassan³, Rohit Bhargava⁴, Holly J. Butler², Konrad Matthew Dorling⁵, Peter R. Fielden², Simon W. Fogarty², Nigel J. Fullwood², Kelly A. Heys², Caryn Hughes³, Peter Lasch⁶, Pierre L. Martin-Hirsch², Blessing Obinaju², Ganesh D. Sockalingum⁷, Josep Sulé-Suso⁸, Rebecca J. Strong², Michael J. Walsh⁹, Bayden R. Wood¹⁰, Peter Gardner³, Francis Martin² - Show less +17 more•Institutions (10)

University of Strathclyde¹, Lancaster University², University of Manchester³, University of Illinois at Urbana–Champaign⁴, University of Central Lancashire⁵, Robert Koch Institute⁶, University of Reims Champagne-Ardenne⁷, Keele University⁸, University of Illinois at Chicago⁹, Monash University, Clayton campus¹⁰

01 Aug 2014-Nature Protocols

TL;DR: This manuscript brings together some of the leaders in this field to allow the standardization of methods and procedures for adapting a multistage approach to a methodology that can be applied to a variety of cell biological questions or used within a clinical setting for disease screening or diagnosis.

...read moreread less

Abstract: IR spectroscopy is an excellent method for biological analyses. It enables the nonperturbative, label-free extraction of biochemical information and images toward diagnosis and the assessment of cell functionality. Although not strictly microscopy in the conventional sense, it allows the construction of images of tissue or cell architecture by the passing of spectral data through a variety of computational algorithms. Because such images are constructed from fingerprint spectra, the notion is that they can be an objective reflection of the underlying health status of the analyzed sample. One of the major difficulties in the field has been determining a consensus on spectral pre-processing and data analysis. This manuscript brings together as coauthors some of the leaders in this field to allow the standardization of methods and procedures for adapting a multistage approach to a methodology that can be applied to a variety of cell biological questions or used within a clinical setting for disease screening or diagnosis. We describe a protocol for collecting IR spectra and images from biological samples (e.g., fixed cytology and tissue sections, live cells or biofluids) that assesses the instrumental options available, appropriate sample preparation, different sampling modes as well as important advances in spectral data acquisition. After acquisition, data processing consists of a sequence of steps including quality control, spectral pre-processing, feature extraction and classification of the supervised or unsupervised type. A typical experiment can be completed and analyzed within hours. Example results are presented on the use of IR spectra combined with multivariate data processing.

...read moreread less

1,340 citations

Book Chapter•DOI•

Neural Codes for Image Retrieval

[...]

Artem Babenko¹, Artem Babenko², Anton Slesarev¹, Alexander Chigorin¹, Victor Lempitsky³ - Show less +1 more•Institutions (3)

Yandex¹, Moscow Institute of Physics and Technology², Skolkovo Institute of Science and Technology³

06 Sep 2014

TL;DR: It is established that neural codes perform competitively even when the convolutional neural network has been trained for an unrelated classification task (e.g. Image-Net), and the improvement in the retrieval performance of neural codes, when the network is retrained on a dataset of images that are similar to images encountered at test time.

...read moreread less

Abstract: It has been shown that the activations invoked by an image within the top layers of a large convolutional neural network provide a high-level descriptor of the visual content of the image. In this paper, we investigate the use of such descriptors (neural codes) within the image retrieval application. In the experiments with several standard retrieval benchmarks, we establish that neural codes perform competitively even when the convolutional neural network has been trained for an unrelated classification task (e.g. Image-Net). We also evaluate the improvement in the retrieval performance of neural codes, when the network is retrained on a dataset of images that are similar to images encountered at test time.

...read moreread less

1,062 citations

Proceedings Article•DOI•

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

[...]

Ming-Ming Cheng¹, Ziming Zhang², Wen-Yan Lin, Philip H. S. Torr¹•Institutions (2)

University of Oxford¹, Boston University²

23 Jun 2014

TL;DR: It is observed that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their corresponding image windows in to a small fixed size, so as to train a generic objectness measure.

...read moreread less

Abstract: Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We observe that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their corresponding image windows in to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gradients as a simple 64D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g. ADD, BITWISE SHIFT, etc.). Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single laptop CPU) generates a small set of category-independent, high quality object windows, yielding 96.2% object detection rate (DR) with 1, 000 proposals. Increasing the numbers of proposals and color spaces for computing BING features, our performance can be further improved to 99.5% DR.

...read moreread less

Proceedings Article•DOI•

Convolutional Neural Networks for No-Reference Image Quality Assessment

[...]

Le Kang¹, Peng Ye¹, Yi Li², David Doermann¹•Institutions (2)

University of Maryland, College Park¹, NICTA²

23 Jun 2014

TL;DR: A Convolutional Neural Network is described to accurately predict image quality without a reference image to achieve state of the art performance on the LIVE dataset and shows excellent generalization ability in cross dataset experiments.

...read moreread less

Abstract: In this work we describe a Convolutional Neural Network (CNN) to accurately predict image quality without a reference image. Taking image patches as input, the CNN works in the spatial domain without using hand-crafted features that are employed by most previous methods. The network consists of one convolutional layer with max and min pooling, two fully connected layers and an output node. Within the network structure, feature learning and regression are integrated into one optimization process, which leads to a more effective model for estimating image quality. This approach achieves state of the art performance on the LIVE dataset and shows excellent generalization ability in cross dataset experiments. Further experiments on images with local distortions demonstrate the local quality estimation ability of our CNN, which is rarely reported in previous literature.

...read moreread less

Journal Article•DOI•

Data-Driven Grasp Synthesis—A Survey

[...]

Jeannette Bohg, Antonio Morales, Tamim Asfour¹, Danica Kragic•Institutions (1)

Karlsruhe Institute of Technology¹

01 Apr 2014-IEEE Transactions on Robotics

TL;DR: A review of the work on data-driven grasp synthesis and the methodologies for sampling and ranking candidate grasps and an overview of the different methodologies are provided, which draw a parallel to the classical approaches that rely on analytic formulations.

...read moreread less

Abstract: We review the work on data-driven grasp synthesis and the methodologies for sampling and ranking candidate grasps. We divide the approaches into three groups based on whether they synthesize grasps for known, familiar, or unknown objects. This structure allows us to identify common object representations and perceptual processes that facilitate the employed data-driven grasp synthesis technique. In the case of known objects, we concentrate on the approaches that are based on object recognition and pose estimation. In the case of familiar objects, the techniques use some form of a similarity matching to a set of previously encountered objects. Finally, for the approaches dealing with unknown objects, the core part is the extraction of specific features that are indicative of good grasps. Our survey provides an overview of the different methodologies and discusses open problems in the area of robot grasping. We also draw a parallel to the classical approaches that rely on analytic formulations.

...read moreread less

Journal Article•DOI•

Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks

[...]

Jonathan Tompson¹, Murphy Stein¹, Yann LeCun¹, Ken Perlin¹•Institutions (1)

New York University¹

23 Sep 2014-ACM Transactions on Graphics

TL;DR: A novel method for real-time continuous pose recovery of markerless complex articulable objects from a single depth image using a randomized decision forest classifier for image segmentation, a robust method for labeled dataset generation, a convolutional network for dense feature extraction, and finally an inverse kinematics stage for stable real- time pose recovery.

...read moreread less

Abstract: We present a novel method for real-time continuous pose recovery of markerless complex articulable objects from a single depth image. Our method consists of the following stages: a randomized decision forest classifier for image segmentation, a robust method for labeled dataset generation, a convolutional network for dense feature extraction, and finally an inverse kinematics stage for stable real-time pose recovery. As one possible application of this pipeline, we show state-of-the-art results for real-time puppeteering of a skinned hand-model.

...read moreread less

Journal Article•DOI•

Feature Extraction and Selection for Emotion Recognition from EEG

[...]

Robert Jenke¹, Angelika Peer¹, Martin Buss¹•Institutions (1)

Technische Universität München¹

01 Jul 2014-IEEE Transactions on Affective Computing

TL;DR: This work reviews feature extraction methods for emotion recognition from EEG based on 33 studies, and results suggest preference to locations over parietal and centro-parietal lobes.

...read moreread less

Abstract: Emotion recognition from EEG signals allows the direct assessment of the “inner” state of a user, which is considered an important factor in human-machine-interaction. Many methods for feature extraction have been studied and the selection of both appropriate features and electrode locations is usually based on neuro-scientific findings. Their suitability for emotion recognition, however, has been tested using a small amount of distinct feature sets and on different, usually small data sets. A major limitation is that no systematic comparison of features exists. Therefore, we review feature extraction methods for emotion recognition from EEG based on 33 studies. An experiment is conducted comparing these features using machine learning techniques for feature selection on a self recorded data set. Results are presented with respect to performance of different feature selection methods, usage of selected feature types, and selection of electrode locations. Features selected by multivariate methods slightly outperform univariate methods. Advanced feature extraction techniques are found to have advantages over commonly used spectral power bands. Results also suggest preference to locations over parietal and centro-parietal lobes.

...read moreread less

Proceedings Article•DOI•

Discriminative Deep Metric Learning for Face Verification in the Wild

[...]

Junlin Hu¹, Jiwen Lu, Yap-Peng Tan¹•Institutions (1)

Nanyang Technological University¹

23 Jun 2014

TL;DR: The proposed DDML trains a deep neural network which learns a set of hierarchical nonlinear transformations to project face pairs into the same feature subspace, under which the distance of each positive face pair is less than a smaller threshold and that of each negative pair is higher than a larger threshold.

...read moreread less

Abstract: This paper presents a new discriminative deep metric learning (DDML) method for face verification in the wild. Different from existing metric learning-based face verification methods which aim to learn a Mahalanobis distance metric to maximize the inter-class variations and minimize the intra-class variations, simultaneously, the proposed DDML trains a deep neural network which learns a set of hierarchical nonlinear transformations to project face pairs into the same feature subspace, under which the distance of each positive face pair is less than a smaller threshold and that of each negative pair is higher than a larger threshold, respectively, so that discriminative information can be exploited in the deep network. Our method achieves very competitive face verification performance on the widely used LFW and YouTube Faces (YTF) datasets.

...read moreread less

Proceedings Article•DOI•

A survey of feature selection and feature extraction techniques in machine learning

[...]

Samina Khalid¹, Tehmina Khalil¹, Shamila Nasreen²•Institutions (2)

Bahria University¹, University of Engineering and Technology²

09 Oct 2014

TL;DR: In this paper, some widely used feature selection and feature extraction techniques have analyzed with the purpose of how effectively these techniques can be used to achieve high performance of learning algorithms that ultimately improves predictive accuracy of classifier.

...read moreread less

Abstract: Dimensionality reduction as a preprocessing step to machine learning is effective in removing irrelevant and redundant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection and feature extraction methods with respect to efficiency and effectiveness. In the field of machine learning and pattern recognition, dimensionality reduction is important area, where many approaches have been proposed. In this paper, some widely used feature selection and feature extraction techniques have analyzed with the purpose of how effectively these techniques can be used to achieve high performance of learning algorithms that ultimately improves predictive accuracy of classifier. An endeavor to analyze dimensionality reduction techniques briefly with the purpose to investigate strengths and weaknesses of some widely used dimensionality reduction methods is presented.

...read moreread less

Proceedings Article•DOI•

Convolutional Neural Networks for human activity recognition using mobile sensors

[...]

Ming Zeng¹, Le T. Nguyen¹, Bo Yu¹, Ole J. Mengshoel¹, Jiang Zhu¹, Pang Wu¹, Joy Zhang¹ - Show less +3 more•Institutions (1)

Carnegie Mellon University¹

28 Nov 2014

TL;DR: An approach to automatically extract discriminative features for activity recognition based on Convolutional Neural Networks, which can capture local dependency and scale invariance of a signal as it has been shown in speech recognition and image recognition domains is proposed.

...read moreread less

Abstract: A variety of real-life mobile sensing applications are becoming available, especially in the life-logging, fitness tracking and health monitoring domains. These applications use mobile sensors embedded in smart phones to recognize human activities in order to get a better understanding of human behavior. While progress has been made, human activity recognition remains a challenging task. This is partly due to the broad range of human activities as well as the rich variation in how a given activity can be performed. Using features that clearly separate between activities is crucial. In this paper, we propose an approach to automatically extract discriminative features for activity recognition. Specifically, we develop a method based on Convolutional Neural Networks (CNN), which can capture local dependency and scale invariance of a signal as it has been shown in speech recognition and image recognition domains. In addition, a modified weight sharing technique, called partial weight sharing, is proposed and applied to accelerometer signals to get further improvements. The experimental results on three public datasets, Skoda (assembly line activities), Opportunity (activities in kitchen), Actitracker (jogging, walking, etc.), indicate that our novel CNN-based approach is practical and achieves higher accuracy than existing state-of-the-art methods.

...read moreread less

Proceedings Article•DOI•

Transfer Joint Matching for Unsupervised Domain Adaptation

[...]

Mingsheng Long¹, Jianmin Wang¹, Guiguang Ding¹, Jiaguang Sun¹, Philip S. Yu² - Show less +1 more•Institutions (2)

Tsinghua University¹, University of Illinois at Chicago²

23 Jun 2014

TL;DR: This paper aims to reduce the domain difference by jointly matching the features and reweighting the instances across domains in a principled dimensionality reduction procedure, and construct new feature representation that is invariant to both the distribution difference and the irrelevant instances.

...read moreread less

Abstract: Visual domain adaptation, which learns an accurate classifier for a new domain using labeled images from an old domain, has shown promising value in computer vision yet still been a challenging problem. Most prior works have explored two learning strategies independently for domain adaptation: feature matching and instance reweighting. In this paper, we show that both strategies are important and inevitable when the domain difference is substantially large. We therefore put forward a novel Transfer Joint Matching (TJM) approach to model them in a unified optimization problem. Specifically, TJM aims to reduce the domain difference by jointly matching the features and reweighting the instances across domains in a principled dimensionality reduction procedure, and construct new feature representation that is invariant to both the distribution difference and the irrelevant instances. Comprehensive experimental results verify that TJM can significantly outperform competitive methods for cross-domain image recognition problems.

...read moreread less

Proceedings Article•DOI•

Facial Expression Recognition via a Boosted Deep Belief Network

[...]

Ping Liu¹, Shizhong Han¹, Zibo Meng¹, Yan Tong¹•Institutions (1)

University of South Carolina¹

23 Jun 2014

TL;DR: A novel Boosted Deep Belief Network for performing the three training stages iteratively in a unified loopy framework and showed that the BDBN framework yielded dramatic improvements in facial expression analysis.

...read moreread less

Abstract: A training process for facial expression recognition is usually performed sequentially in three individual stages: feature learning, feature selection, and classifier construction. Extensive empirical studies are needed to search for an optimal combination of feature representation, feature set, and classifier to achieve good recognition performance. This paper presents a novel Boosted Deep Belief Network (BDBN) for performing the three training stages iteratively in a unified loopy framework. Through the proposed BDBN framework, a set of features, which is effective to characterize expression-related facial appearance/shape changes, can be learned and selected to form a boosted strong classifier in a statistical way. As learning continues, the strong classifier is improved iteratively and more importantly, the discriminative capabilities of selected features are strengthened as well according to their relative importance to the strong classifier via a joint fine-tune process in the BDBN framework. Extensive experiments on two public databases showed that the BDBN framework yielded dramatic improvements in facial expression analysis.

...read moreread less

Journal Article•DOI•

3D Object Recognition in Cluttered Scenes with Local Surface Features: A Survey

[...]

Yulan Guo¹, Mohammed Bennamoun², Ferdous Sohel², Min Lu¹, Jianwei Wan¹ - Show less +1 more•Institutions (2)

National University of Defense Technology¹, University of Western Australia²

01 Nov 2014-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper presents a comprehensive survey of existing local surface feature based 3D object recognition methods and enlists a number of popular and contemporary databases together with their relevant attributes.

...read moreread less

Abstract: 3D object recognition in cluttered scenes is a rapidly growing research area. Based on the used types of features, 3D object recognition methods can broadly be divided into two categories-global or local feature based methods. Intensive research has been done on local surface feature based methods as they are more robust to occlusion and clutter which are frequently present in a real-world scene. This paper presents a comprehensive survey of existing local surface feature based 3D object recognition methods. These methods generally comprise three phases: 3D keypoint detection, local surface feature description, and surface matching. This paper covers an extensive literature survey of each phase of the process. It also enlists a number of popular and contemporary databases together with their relevant attributes.

...read moreread less

Proceedings Article•DOI•

Medical image classification with convolutional neural network

[...]

Qing Li¹, Weidong Cai¹, Xiaogang Wang², Yun Zhou³, David Dagan Feng¹, Mei Chen⁴ - Show less +2 more•Institutions (4)

University of Sydney¹, The Chinese University of Hong Kong², Johns Hopkins University School of Medicine³, Carnegie Mellon University⁴

01 Jan 2014

TL;DR: A customized Convolutional Neural Networks with shallow convolution layer to classify lung image patches with interstitial lung disease and the same architecture can be generalized to perform other medical image or texture classification tasks.

...read moreread less

Abstract: Image patch classification is an important task in many different medical imaging applications. In this work, we have designed a customized Convolutional Neural Networks (CNN) with shallow convolution layer to classify lung image patches with interstitial lung disease (ILD). While many feature descriptors have been proposed over the past years, they can be quite complicated and domain-specific. Our customized CNN framework can, on the other hand, automatically and efficiently learn the intrinsic image features from lung image patches that are most suitable for the classification purpose. The same architecture can be generalized to perform other medical image or texture classification tasks.

...read moreread less

Proceedings Article•DOI•

Investigating Haze-Relevant Features in a Learning Framework for Image Dehazing

[...]

Ketan Tang¹, Jianchao Yang², Jue Wang²•Institutions (2)

University of Hong Kong¹, Adobe Systems²

23 Jun 2014

TL;DR: It is shown that the dark-channel feature is the most informative one for this task, which confirms the observation of He et al.

...read moreread less

Abstract: Haze is one of the major factors that degrade outdoor images. Removing haze from a single image is known to be severely ill-posed, and assumptions made in previous methods do not hold in many situations. In this paper, we systematically investigate different haze-relevant features in a learning framework to identify the best feature combination for image dehazing. We show that the dark-channel feature is the most informative one for this task, which confirms the observation of He et al. [8] from a learning perspective, while other haze-relevant features also contribute significantly in a complementary way. We also find that surprisingly, the synthetic hazy image patches we use for feature investigation serve well as training data for realworld images, which allows us to train specific models for specific applications. Experiment results demonstrate that the proposed algorithm outperforms state-of-the-art methods on both synthetic and real-world datasets.

...read moreread less

Journal Article•DOI•

Fast Compressive Tracking

[...]

Kaihua Zhang¹, Lei Zhang², Ming-Hsuan Yang³•Institutions (3)

Nanjing University of Information Science and Technology¹, Hong Kong Polytechnic University², University of California, Merced³

07 Apr 2014-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A simple yet effective and efficient tracking algorithm with an appearance model based on features extracted from a multiscale image feature space with dataindependent basis that performs favorably against state-of-the-art methods on challenging sequences in terms of efficiency, accuracy and robustness.

...read moreread less

Abstract: It is a challenging task to develop effective and efficient appearance models for robust object tracking due to factors such as pose variation, illumination change, occlusion, and motion blur. Existing online tracking algorithms often update models with samples from observations in recent frames. Despite much success has been demonstrated, numerous issues remain to be addressed. First, while these adaptive appearance models are data-dependent, there does not exist sufficient amount of data for online algorithms to learn at the outset. Second, online tracking algorithms often encounter the drift problems. As a result of self-taught learning, misaligned samples are likely to be added and degrade the appearance models. In this paper, we propose a simple yet effective and efficient tracking algorithm with an appearance model based on features extracted from a multiscale image feature space with data-independent basis. The proposed appearance model employs non-adaptive random projections that preserve the structure of the image feature space of objects. A very sparse measurement matrix is constructed to efficiently extract the features for the appearance model. We compress sample images of the foreground target and the background using the same sparse measurement matrix. The tracking task is formulated as a binary classification via a naive Bayes classifier with online update in the compressed domain. A coarse-to-fine search strategy is adopted to further reduce the computational complexity in the detection procedure. The proposed compressive tracking algorithm runs in real-time and performs favorably against state-of-the-art methods on challenging sequences in terms of efficiency, accuracy and robustness.

...read moreread less

Journal Article•DOI•

Learning Actionlet Ensemble for 3D Human Action Recognition

[...]

Jiang Wang¹, Zicheng Liu², Ying Wu¹, Junsong Yuan³•Institutions (3)

Northwestern University¹, Microsoft², Nanyang Technological University³

01 May 2014-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes to characterize the human actions with a novel actionlet ensemble model, which represents the interaction of a subset of human joints, which is robust to noise, invariant to translational and temporal misalignment, and capable of characterizing both the human motion and the human-object interactions.

...read moreread less

Abstract: Human action recognition is an important yet challenging task. Human actions usually involve human-object interactions, highly articulated motions, high intra-class variations, and complicated temporal structures. The recently developed commodity depth sensors open up new possibilities of dealing with this problem by providing 3D depth data of the scene. This information not only facilitates a rather powerful human motion capturing technique, but also makes it possible to efficiently model human-object interactions and intra-class variations. In this paper, we propose to characterize the human actions with a novel actionlet ensemble model, which represents the interaction of a subset of human joints. The proposed model is robust to noise, invariant to translational and temporal misalignment, and capable of characterizing both the human motion and the human-object interactions. We evaluate the proposed approach on three challenging action recognition datasets captured by Kinect devices, a multiview action recognition dataset captured with Kinect device, and a dataset captured by a motion capture system. The experimental evaluations show that the proposed approach achieves superior performance to the state-of-the-art algorithms.

...read moreread less

Journal Article•DOI•

Click Prediction for Web Image Reranking Using Multimodal Sparse Coding

[...]

Jun Yu¹, Yong Rui², Dacheng Tao³•Institutions (3)

Hangzhou Dianzi University¹, Microsoft², University of Technology, Sydney³

11 Mar 2014-IEEE Transactions on Image Processing

TL;DR: A multimodal hypergraph learning-based sparse coding method is proposed for image click prediction, and the obtained click data is applied to the reranking of images, which shows the use of click prediction is beneficial to improving the performance of prominent graph-based image reranking algorithms.

...read moreread less

Abstract: Image reranking is effective for improving the performance of a text-based image search. However, existing reranking algorithms are limited for two main reasons: 1) the textual meta-data associated with images is often mismatched with their actual visual content and 2) the extracted visual features do not accurately describe the semantic similarities between images. Recently, user click information has been used in image reranking, because clicks have been shown to more accurately describe the relevance of retrieved images to search queries. However, a critical problem for click-based methods is the lack of click data, since only a small number of web images have actually been clicked on by users. Therefore, we aim to solve this problem by predicting image clicks. We propose a multimodal hypergraph learning-based sparse coding method for image click prediction, and apply the obtained click data to the reranking of images. We adopt a hypergraph to build a group of manifolds, which explore the complementarity of different features through a group of weights. Unlike a graph that has an edge between two vertices, a hyperedge in a hypergraph connects a set of vertices, and helps preserve the local smoothness of the constructed sparse codes. An alternating optimization procedure is then performed, and the weights of different modalities and the sparse codes are simultaneously obtained. Finally, a voting strategy is used to describe the predicted click as a binary event (click or no click), from the images' corresponding sparse codes. Thorough empirical studies on a large-scale database including nearly 330 K images demonstrate the effectiveness of our approach for click prediction when compared with several other methods. Additional image reranking experiments on real-world data show the use of click prediction is beneficial to improving the performance of prominent graph-based image reranking algorithms.

...read moreread less

Journal Article•DOI•

Robust Radiomics feature quantification using semiautomatic volumetric segmentation.

[...]

Chintan Parmar¹, Chintan Parmar², Chintan Parmar³, Emmanuel Rios Velazquez¹, Emmanuel Rios Velazquez², Ralph T.H. Leijenaar², Mohammed Jermoumi¹, Mohammed Jermoumi⁴, Sara Carvalho², Raymond H. Mak¹, Sushmita Mitra³, B. Uma Shankar³, Ron Kikinis¹, Benjamin Haibe-Kains⁵, Philippe Lambin¹, Hugo J.W.L. Aerts², Hugo J.W.L. Aerts¹ - Show less +13 more•Institutions (5)

Brigham and Women's Hospital¹, Maastricht University², Indian Statistical Institute³, University of Massachusetts Lowell⁴, Princess Margaret Cancer Centre⁵

15 Jul 2014-PLOS ONE

TL;DR: 3D-Slicer segmented tumor volumes provide a better alternative to the manual delineation for feature quantification, as they yield more reproducible imaging descriptors and can be employed for quantitative image feature extraction and image data mining research in large patient cohorts.

...read moreread less

Abstract: Due to advances in the acquisition and analysis of medical imaging, it is currently possible to quantify the tumor phenotype. The emerging field of Radiomics addresses this issue by converting medical images into minable data by extracting a large number of quantitative imaging features. One of the main challenges of Radiomics is tumor segmentation. Where manual delineation is time consuming and prone to inter-observer variability, it has been shown that semi-automated approaches are fast and reduce inter-observer variability. In this study, a semiautomatic region growing volumetric segmentation algorithm, implemented in the free and publicly available 3D-Slicer platform, was investigated in terms of its robustness for quantitative imaging feature extraction. Fifty-six 3D-radiomic features, quantifying phenotypic differences based on tumor intensity, shape and texture, were extracted from the computed tomography images of twenty lung cancer patients. These radiomic features were derived from the 3D-tumor volumes defined by three independent observers twice using 3D-Slicer, and compared to manual slice-by-slice delineations of five independent physicians in terms of intra-class correlation coefficient (ICC) and feature range. Radiomic features extracted from 3D-Slicer segmentations had significantly higher reproducibility (ICC = 0.85±0.15, p = 0.0009) compared to the features extracted from the manual segmentations (ICC = 0.77±0.17). Furthermore, we found that features extracted from 3D-Slicer segmentations were more robust, as the range was significantly smaller across observers (p = 3.819e-07), and overlapping with the feature ranges extracted from manual contouring (boundary lower: p = 0.007, higher: p = 5.863e-06). Our results show that 3D-Slicer segmented tumor volumes provide a better alternative to the manual delineation for feature quantification, as they yield more reproducible imaging descriptors. Therefore, 3D-Slicer can be employed for quantitative image feature extraction and image data mining research in large patient cohorts.

...read moreread less

Journal Article•DOI•

Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection

[...]

Chenping Hou¹, Feiping Nie², Xuelong Li³, Dongyun Yi¹, Yi Wu¹ - Show less +1 more•Institutions (3)

National University of Defense Technology¹, University of Texas at Arlington², Chinese Academy of Sciences³

01 Jun 2014-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This paper proposes a novel unsupervised feature selection framework, termed as the joint embedding learning and sparse regression (JELSR), in which the embedding learned with sparse regression to perform feature selection.

...read moreread less

Abstract: Feature selection has aroused considerable research interests during the last few decades. Traditional learning-based feature selection methods separate embedding learning and feature ranking. In this paper, we propose a novel unsupervised feature selection framework, termed as the joint embedding learning and sparse regression (JELSR), in which the embedding learning and sparse regression are jointly performed. Specifically, the proposed JELSR joins embedding learning with sparse regression to perform feature selection. To show the effectiveness of the proposed framework, we also provide a method using the weight via local linear approximation and adding the $\ell_{2,1}$ -norm regularization, and design an effective algorithm to solve the corresponding optimization problem. Furthermore, we also conduct some insightful discussion on the proposed feature selection approach, including the convergence analysis, computational complexity, and parameter determination. In all, the proposed framework not only provides a new perspective to view traditional methods but also evokes some other deep researches for feature selection. Compared with traditional unsupervised feature selection methods, our approach could integrate the merits of embedding learning and sparse regression. Promising experimental results on different kinds of data sets, including image, voice data and biological data, have validated the effectiveness of our proposed algorithm.

...read moreread less

Journal Article•DOI•

Image Quality Assessment for Fake Biometric Detection: Application to Iris, Fingerprint, and Face Recognition

[...]

Javier Galbally, Sébastien Marcel¹, Julian Fierrez•Institutions (1)

Idiap Research Institute¹

01 Feb 2014-IEEE Transactions on Image Processing

TL;DR: A novel software-based fake detection method that can be used in multiple biometric systems to detect different types of fraudulent access attempts and the experimental results show that the proposed method is highly competitive compared with other state-of-the-art approaches.

...read moreread less

Abstract: To ensure the actual presence of a real legitimate trait in contrast to a fake self-manufactured synthetic or reconstructed sample is a significant problem in biometric authentication, which requires the development of new and efficient protection measures. In this paper, we present a novel software-based fake detection method that can be used in multiple biometric systems to detect different types of fraudulent access attempts. The objective of the proposed system is to enhance the security of biometric recognition frameworks, by adding liveness assessment in a fast, user-friendly, and non-intrusive manner, through the use of image quality assessment. The proposed approach presents a very low degree of complexity, which makes it suitable for real-time applications, using 25 general image quality features extracted from one image (i.e., the same acquired for authentication purposes) to distinguish between legitimate and impostor samples. The experimental results, obtained on publicly available data sets of fingerprint, iris, and 2D face, show that the proposed method is highly competitive compared with other state-of-the-art approaches and that the analysis of the general image quality of real biometric samples reveals highly valuable information that may be very efficiently used to discriminate them from fake traits.

...read moreread less

Journal Article•DOI•

Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms

[...]

Bichen Zheng¹, Sang Won Yoon¹, Sarah S. Lam¹•Institutions (1)

Binghamton University¹

01 Mar 2014-Expert Systems With Applications

TL;DR: A hybrid of K-means and support vector machine (K-SVM) algorithms is developed to diagnose breast cancer based on the extracted tumor features and shows time savings during the training phase.

...read moreread less

Abstract: With the development of clinical technologies, different tumor features have been collected for breast cancer diagnosis. Filtering all the pertinent feature information to support the clinical disease diagnosis is a challenging and time consuming task. The objective of this research is to diagnose breast cancer based on the extracted tumor features. Feature extraction and selection are critical to the quality of classifiers founded through data mining methods. To extract useful information and diagnose the tumor, a hybrid of K-means and support vector machine (K-SVM) algorithms is developed. The K-means algorithm is utilized to recognize the hidden patterns of the benign and malignant tumors separately. The membership of each tumor to these patterns is calculated and treated as a new feature in the training model. Then, a support vector machine (SVM) is used to obtain the new classifier to differentiate the incoming tumors. Based on 10-fold cross validation, the proposed methodology improves the accuracy to 97.38%, when tested on the Wisconsin Diagnostic Breast Cancer (WDBC) data set from the University of California - Irvine machine learning repository. Six abstract tumor features are extracted from the 32 original features for the training phase. The results not only illustrate the capability of the proposed approach on breast cancer diagnosis, but also shows time savings during the training phase. Physicians can also benefit from the mined abstract tumor features by better understanding the properties of different types of tumors.

...read moreread less

Collapse