scispace - formally typeset
Search or ask a question

Showing papers on "Feature extraction published in 2012"


Proceedings ArticleDOI
16 Jun 2012
TL;DR: This paper proposes a new kernel-based method that takes advantage of low-dimensional structures that are intrinsic to many vision datasets, and introduces a metric that reliably measures the adaptability between a pair of source and target domains.
Abstract: In real-world applications of visual recognition, many factors — such as pose, illumination, or image quality — can cause a significant mismatch between the source domain on which classifiers are trained and the target domain to which those classifiers are applied. As such, the classifiers often perform poorly on the target domain. Domain adaptation techniques aim to correct the mismatch. Existing approaches have concentrated on learning feature representations that are invariant across domains, and they often do not directly exploit low-dimensional structures that are intrinsic to many vision datasets. In this paper, we propose a new kernel-based method that takes advantage of such structures. Our geodesic flow kernel models domain shift by integrating an infinite number of subspaces that characterize changes in geometric and statistical properties from the source to the target domain. Our approach is computationally advantageous, automatically inferring important algorithmic parameters without requiring extensive cross-validation or labeled data from either domain. We also introduce a metric that reliably measures the adaptability between a pair of source and target domains. For a given target domain and several source domains, the metric can be used to automatically select the optimal source domain to adapt and avoid less desirable ones. Empirical studies on standard datasets demonstrate the advantages of our approach over competing methods.

2,154 citations


Journal ArticleDOI
TL;DR: "Radiomics" refers to the extraction and analysis of large amounts of advanced quantitative imaging features with high throughput from medical images obtained with computed tomography, positron emission tomography or magnetic resonance imaging, leading to a very large potential subject pool.

1,608 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: An actionlet ensemble model is learnt to represent each action and to capture the intra-class variance, and novel features that are suitable for depth data are proposed.
Abstract: Human action recognition is an important yet challenging task. The recently developed commodity depth sensors open up new possibilities of dealing with this problem but also present some unique challenges. The depth maps captured by the depth cameras are very noisy and the 3D positions of the tracked joints may be completely wrong if serious occlusions occur, which increases the intra-class variations in the actions. In this paper, an actionlet ensemble model is learnt to represent each action and to capture the intra-class variance. In addition, novel features that are suitable for depth data are proposed. They are robust to noise, invariant to translational and temporal misalignments, and capable of characterizing both the human motion and the human-object interactions. The proposed approach is evaluated on two challenging action recognition datasets captured by commodity depth cameras, and another dataset captured by a MoCap system. The experimental evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.

1,578 citations


Journal ArticleDOI
TL;DR: A vocabulary tree is built that discretizes a binary descriptor space and uses the tree to speed up correspondences for geometrical verification, and presents competitive results with no false positives in very different datasets.
Abstract: We propose a novel method for visual place recognition using bag of words obtained from accelerated segment test (FAST)+BRIEF features. For the first time, we build a vocabulary tree that discretizes a binary descriptor space and use the tree to speed up correspondences for geometrical verification. We present competitive results with no false positives in very different datasets, using exactly the same vocabulary and settings. The whole technique, including feature extraction, requires 22 ms/frame in a sequence with 26 300 images that is one order of magnitude faster than previous approaches.

1,560 citations


Journal ArticleDOI
TL;DR: A novel general strategy for building steganography detectors for digital images by assembling a rich model of the noise component as a union of many diverse submodels formed by joint distributions of neighboring samples from quantized image noise residuals obtained using linear and nonlinear high-pass filters.
Abstract: We describe a novel general strategy for building steganography detectors for digital images. The process starts with assembling a rich model of the noise component as a union of many diverse submodels formed by joint distributions of neighboring samples from quantized image noise residuals obtained using linear and nonlinear high-pass filters. In contrast to previous approaches, we make the model assembly a part of the training process driven by samples drawn from the corresponding cover- and stego-sources. Ensemble classifiers are used to assemble the model as well as the final steganalyzer due to their low computational complexity and ability to efficiently work with high-dimensional feature spaces and large training sets. We demonstrate the proposed framework on three steganographic algorithms designed to hide messages in images represented in the spatial domain: HUGO, edge-adaptive algorithm by Luo , and optimally coded ternary ±1 embedding. For each algorithm, we apply a simple submodel-selection technique to increase the detection accuracy per model dimensionality and show how the detection saturates with increasing complexity of the rich model. By observing the differences between how different submodels engage in detection, an interesting interplay between the embedding and detection is revealed. Steganalysis built around rich image models combined with ensemble classifiers is a promising direction towards automatizing steganalysis for a wide spectrum of steganographic schemes.

1,553 citations


Journal ArticleDOI
TL;DR: An efficient general-purpose blind/no-reference image quality assessment (IQA) algorithm using a natural scene statistics model of discrete cosine transform (DCT) coefficients, which requires minimal training and adopts a simple probabilistic model for score prediction.
Abstract: We develop an efficient general-purpose blind/no-reference image quality assessment (IQA) algorithm using a natural scene statistics (NSS) model of discrete cosine transform (DCT) coefficients. The algorithm is computationally appealing, given the availability of platforms optimized for DCT computation. The approach relies on a simple Bayesian inference model to predict image quality scores given certain extracted features. The features are based on an NSS model of the image DCT coefficients. The estimated parameters of the model are utilized to form features that are indicative of perceptual quality. These features are used in a simple Bayesian inference approach to predict quality scores. The resulting algorithm, which we name BLIINDS-II, requires minimal training and adopts a simple probabilistic model for score prediction. Given the extracted features from a test image, the quality score that maximizes the probability of the empirically determined inference model is chosen as the predicted quality score of that image. When tested on the LIVE IQA database, BLIINDS-II is shown to correlate highly with human judgments of quality, at a level that is competitive with the popular SSIM index.

1,484 citations


Journal ArticleDOI
TL;DR: In this study, most complete and up-to-date thirty-seven time domain and frequency domain features have been proposed and it is indicated that most time domain features are superfluity and redundancy.
Abstract: Feature extraction is a significant method to extract the useful information which is hidden in surface electromyography (EMG) signal and to remove the unwanted part and interferences. To be successful in classification of the EMG signal, selection of a feature vector ought to be carefully considered. However, numerous studies of the EMG signal classification have used a feature set that have contained a number of redundant features. In this study, most complete and up-to-date thirty-seven time domain and frequency domain features have been proposed to be studied their properties. The results, which were verified by scatter plot of features, statistical analysis and classifier, indicated that most time domain features are superfluity and redundancy. They can be grouped according to mathematical property and information into four main types: energy and complexity, frequency, prediction model, and time-dependence. On the other hand, all frequency domain features are calculated based on statistical parameters of EMG power spectral density. Its performance in class separability viewpoint is not suitable for EMG recognition system. Recommendation of features to avoid the usage of redundant features for classifier in EMG signal classification applications is also proposed in this study.

1,151 citations


Journal ArticleDOI
TL;DR: This paper proposes an alternative and well-known machine learning tool-ensemble classifiers implemented as random forests-and argues that they are ideally suited for steganalysis.
Abstract: Today, the most accurate steganalysis methods for digital media are built as supervised classifiers on feature vectors extracted from the media. The tool of choice for the machine learning seems to be the support vector machine (SVM). In this paper, we propose an alternative and well-known machine learning tool-ensemble classifiers implemented as random forests-and argue that they are ideally suited for steganalysis. Ensemble classifiers scale much more favorably w.r.t. the number of training examples and the feature dimensionality with performance comparable to the much more complex SVMs. The significantly lower training complexity opens up the possibility for the steganalyst to work with rich (high-dimensional) cover models and train on larger training sets-two key elements that appear necessary to reliably detect modern steganographic algorithms. Ensemble classification is portrayed here as a powerful developer tool that allows fast construction of steganography detectors with markedly improved detection accuracy across a wide range of embedding methods. The power of the proposed framework is demonstrated on three steganographic methods that hide messages in JPEG images.

967 citations


Proceedings Article
01 Nov 2012
TL;DR: This paper combines the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows them to use a common framework to train highly-accurate text detector and character recognizer modules.
Abstract: Full end-to-end text recognition in natural images is a challenging problem that has received much attention recently. Traditional systems in this area have relied on elaborate models incorporating carefully hand-engineered features or large amounts of prior knowledge. In this paper, we take a different route and combine the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows us to use a common framework to train highly-accurate text detector and character recognizer modules. Then, using only simple off-the-shelf methods, we integrate these two modules into a full end-to-end, lexicon-driven, scene text recognition system that achieves state-of-the-art performance on standard benchmarks, namely Street View Text and ICDAR 2003.

900 citations


Journal ArticleDOI
TL;DR: This work introduces explicit feature maps for the additive class of kernels, such as the intersection, Hellinger's, and χ2 kernels, commonly used in computer vision, and enables their use in large scale problems.
Abstract: Large scale nonlinear support vector machines (SVMs) can be approximated by linear ones using a suitable feature map. The linear SVMs are in general much faster to learn and evaluate (test) than the original nonlinear SVMs. This work introduces explicit feature maps for the additive class of kernels, such as the intersection, Hellinger's, and χ2 kernels, commonly used in computer vision, and enables their use in large scale problems. In particular, we: 1) provide explicit feature maps for all additive homogeneous kernels along with closed form expression for all common kernels; 2) derive corresponding approximate finite-dimensional feature maps based on a spectral analysis; and 3) quantify the error of the approximation, showing that the error is independent of the data dimension and decays exponentially fast with the approximation order for selected kernels such as χ2. We demonstrate that the approximations have indistinguishable performance from the full kernels yet greatly reduce the train/test times of SVMs. We also compare with two other approximation methods: Nystrom's approximation of Perronnin et al. [1], which is data dependent, and the explicit map of Maji and Berg [2] for the intersection kernel, which, as in the case of our approximations, is data independent. The approximations are evaluated on a number of standard data sets, including Caltech-101 [3], Daimler-Chrysler pedestrians [4], and INRIA pedestrians [5].

804 citations


Proceedings ArticleDOI
14 May 2012
TL;DR: An approach to simultaneous localization and mapping (SLAM) for RGB-D cameras like the Microsoft Kinect that concurrently estimates the trajectory of a hand-held Kinect and generates a dense 3D model of the environment is presented.
Abstract: We present an approach to simultaneous localization and mapping (SLAM) for RGB-D cameras like the Microsoft Kinect. Our system concurrently estimates the trajectory of a hand-held Kinect and generates a dense 3D model of the environment. We present the key features of our approach and evaluate its performance thoroughly on a recently published dataset, including a large set of sequences of different scenes with varying camera speeds and illumination conditions. In particular, we evaluate the accuracy, robustness, and processing time for three different feature descriptors (SIFT, SURF, and ORB). The experiments demonstrate that our system can robustly deal with difficult data in common indoor scenarios while being fast enough for online operation. Our system is fully available as open-source.

Proceedings ArticleDOI
16 Jun 2012
TL;DR: GMA solves a joint, relaxed QCQP over different feature spaces to obtain a single (non)linear subspace and is a supervised extension of Canonical Correlational Analysis (CCA), which is useful for cross-view classification and retrieval.
Abstract: This paper presents a general multi-view feature extraction approach that we call Generalized Multiview Analysis or GMA. GMA has all the desirable properties required for cross-view classification and retrieval: it is supervised, it allows generalization to unseen classes, it is multi-view and kernelizable, it affords an efficient eigenvalue based solution and is applicable to any domain. GMA exploits the fact that most popular supervised and unsupervised feature extraction techniques are the solution of a special form of a quadratic constrained quadratic program (QCQP), which can be solved efficiently as a generalized eigenvalue problem. GMA solves a joint, relaxed QCQP over different feature spaces to obtain a single (non)linear subspace. Intuitively, GMA is a supervised extension of Canonical Correlational Analysis (CCA), which is useful for cross-view classification and retrieval. The proposed approach is general and has the potential to replace CCA whenever classification or retrieval is the purpose and label information is available. We outperform previous approaches for textimage retrieval on Pascal and Wiki text-image data. We report state-of-the-art results for pose and lighting invariant face recognition on the MultiPIE face dataset, significantly outperforming other approaches.

Proceedings ArticleDOI
16 Jun 2012
TL;DR: A unified model to incorporate traditional low-level features with higher-level guidance to detect salient objects and can be considered as a prototype framework not only for general salient object detection, but also for potential task-dependent saliency applications.
Abstract: Salient object detection is not a pure low-level, bottom-up process. Higher-level knowledge is important even for task-independent image saliency. We propose a unified model to incorporate traditional low-level features with higher-level guidance to detect salient objects. In our model, an image is represented as a low-rank matrix plus sparse noises in a certain feature space, where the non-salient regions (or background) can be explained by the low-rank matrix, and the salient regions are indicated by the sparse noises. To ensure the validity of this model, a linear transform for the feature space is introduced and needs to be learned. Given an image, its low-level saliency is then extracted by identifying those sparse noises when recovering the low-rank matrix. Furthermore, higher-level knowledge is fused to compose a prior map, and is treated as a prior term in the objective function to improve the performance. Extensive experiments show that our model can comfortably achieves comparable performance to the existing methods even without the help from high-level knowledge. The integration of top-down priors further improves the performance and achieves the state-of-the-art. Moreover, the proposed model can be considered as a prototype framework not only for general salient object detection, but also for potential task-dependent saliency applications.

01 Jan 2012
TL;DR: The principles of the LBP method and implementation to perform face recognition are presented and high recognition rates are obtained, especially compared to other face recognition methods.
Abstract: This paper is about providing efficient face recognition i.e. feature extraction and face matching system using local binary patterns (LBP) method. It is a texture based algorithm for face recognition which describes the texture and shape of digital images. The preprocessed or facial image is first divided into small blocks from which LBP histograms are formed and then concatenated into a single feature vector. This feature vector plays a vital role in efficient representation of the face and is used to measure similarities by calculating the distance between Images. This paper presents the principles of the method and implementation to perform face recognition. Experiments have been carried out on Yale data set; high recognition rates are obtained, especially compared to other face recognition methods. Also few extensions are investigated and implemented successfully to further improve the performance of the method.

Proceedings ArticleDOI
16 Jun 2012
TL;DR: This paper uses raw image patches extracted from a set of unlabeled images to learn a dictionary in an unsupervised manner and uses soft-assignment coding with max pooling to obtain effective image representations for quality estimation.
Abstract: In this paper, we present an efficient general-purpose objective no-reference (NR) image quality assessment (IQA) framework based on unsupervised feature learning. The goal is to build a computational model to automatically predict human perceived image quality without a reference image and without knowing the distortion present in the image. Previous approaches for this problem typically rely on hand-crafted features which are carefully designed based on prior knowledge. In contrast, we use raw-image-patches extracted from a set of unlabeled images to learn a dictionary in an unsupervised manner. We use soft-assignment coding with max pooling to obtain effective image representations for quality estimation. The proposed algorithm is very computationally appealing, using raw image patches as local descriptors and using soft-assignment for encoding. Furthermore, unlike previous methods, our unsupervised feature learning strategy enables our method to adapt to different domains. CORNIA (Codebook Representation for No-Reference Image Assessment) is tested on LIVE database and shown to perform statistically better than the full-reference quality measure, structural similarity index (SSIM) and is shown to be comparable to state-of-the-art general purpose NR-IQA algorithms.

Journal ArticleDOI
TL;DR: A novel framework to generate and rank plausible hypotheses for the spatial extent of objects in images using bottom-up computational processes and mid-level selection cues and it is shown that the algorithm can be used, successfully, in a segmentation-based visual object category recognition pipeline.
Abstract: We present a novel framework to generate and rank plausible hypotheses for the spatial extent of objects in images using bottom-up computational processes and mid-level selection cues. The object hypotheses are represented as figure-ground segmentations, and are extracted automatically, without prior knowledge of the properties of individual object classes, by solving a sequence of Constrained Parametric Min-Cut problems (CPMC) on a regular image grid. In a subsequent step, we learn to rank the corresponding segments by training a continuous model to predict how likely they are to exhibit real-world regularities (expressed as putative overlap with ground truth) based on their mid-level region properties, then diversify the estimated overlap score using maximum marginal relevance measures. We show that this algorithm significantly outperforms the state of the art for low-level segmentation in the VOC 2009 and 2010 data sets. In our companion papers [1], [2], we show that the algorithm can be used, successfully, in a segmentation-based visual object category recognition pipeline. This architecture ranked first in the VOC2009 and VOC2010 image segmentation and labeling challenges.

Journal ArticleDOI
TL;DR: This work reduces the size of the descriptors by representing them as short binary strings and learn descriptor invariance from examples, and shows extensive experimental validation, demonstrating the advantage of the proposed approach.
Abstract: SIFT-like local feature descriptors are ubiquitously employed in computer vision applications such as content-based retrieval, video analysis, copy detection, object recognition, photo tourism, and 3D reconstruction. Feature descriptors can be designed to be invariant to certain classes of photometric and geometric transformations, in particular, affine and intensity scale transformations. However, real transformations that an image can undergo can only be approximately modeled in this way, and thus most descriptors are only approximately invariant in practice. Second, descriptors are usually high dimensional (e.g., SIFT is represented as a 128-dimensional vector). In large-scale retrieval and matching problems, this can pose challenges in storing and retrieving descriptor data. We map the descriptor vectors into the Hamming space in which the Hamming metric is used to compare the resulting representations. This way, we reduce the size of the descriptors by representing them as short binary strings and learn descriptor invariance from examples. We show extensive experimental validation, demonstrating the advantage of the proposed approach.

Journal ArticleDOI
TL;DR: In this paper, a system uses natural language processing and data mining techniques to extract situation awareness information from Twitter messages generated during various disasters and crises, such as hurricanes, floods, and floods.
Abstract: The described system uses natural language processing and data mining techniques to extract situation awareness information from Twitter messages generated during various disasters and crises.

Journal ArticleDOI
TL;DR: A novel image indexing and retrieval algorithm using local tetra patterns (LTrPs) for content-based image retrieval (CBIR) that encodes the relationship between the referenced pixel and its neighbors, based on the directions that are calculated using the first-order derivatives in vertical and horizontal directions.
Abstract: In this paper, we propose a novel image indexing and retrieval algorithm using local tetra patterns (LTrPs) for content-based image retrieval (CBIR). The standard local binary pattern (LBP) and local ternary pattern (LTP) encode the relationship between the referenced pixel and its surrounding neighbors by computing gray-level difference. The proposed method encodes the relationship between the referenced pixel and its neighbors, based on the directions that are calculated using the first-order derivatives in vertical and horizontal directions. In addition, we propose a generic strategy to compute nth-order LTrP using (n - 1)th-order horizontal and vertical derivatives for efficient CBIR and analyze the effectiveness of our proposed algorithm by combining it with the Gabor transform. The performance of the proposed method is compared with the LBP, the local derivative patterns, and the LTP based on the results obtained using benchmark image databases viz., Corel 1000 database (DB1), Brodatz texture database (DB2), and MIT VisTex database (DB3). Performance analysis shows that the proposed method improves the retrieval result from 70.34%/44.9% to 75.9%/48.7% in terms of average precision/average recall on database DB1, and from 79.97% to 85.30% and 82.23% to 90.02% in terms of average retrieval rate on databases DB2 and DB3, respectively, as compared with the standard LBP.

Journal ArticleDOI
TL;DR: This paper created a challenging real-world copy-move dataset, and a software framework for systematic image manipulation, and examined the 15 most prominent feature sets, finding the keypoint-based features Sift and Surf as well as the block-based DCT, DWT, KPCA, PCA, and Zernike features perform very well.
Abstract: A copy-move forgery is created by copying and pasting content within the same image, and potentially postprocessing it. In recent years, the detection of copy-move forgeries has become one of the most actively researched topics in blind image forensics. A considerable number of different algorithms have been proposed focusing on different types of postprocessed copies. In this paper, we aim to answer which copy-move forgery detection algorithms and processing steps (e.g., matching, filtering, outlier detection, affine transformation estimation) perform best in various postprocessing scenarios. The focus of our analysis is to evaluate the performance of previously proposed feature sets. We achieve this by casting existing algorithms in a common pipeline. In this paper, we examined the 15 most prominent feature sets. We analyzed the detection performance on a per-image basis and on a per-pixel basis. We created a challenging real-world copy-move dataset, and a software framework for systematic image manipulation. Experiments show, that the keypoint-based features Sift and Surf, as well as the block-based DCT, DWT, KPCA, PCA, and Zernike features perform very well. These feature sets exhibit the best robustness against various noise sources and downsampling, while reliably identifying the copied regions.

Journal ArticleDOI
TL;DR: This paper conducts a comparative study on 12 selected image fusion metrics over six multiresolution image fusion algorithms for two different fusion schemes and input images with distortion and relates the results to an image quality measurement.
Abstract: Comparison of image processing techniques is critically important in deciding which algorithm, method, or metric to use for enhanced image assessment. Image fusion is a popular choice for various image enhancement applications such as overlay of two image products, refinement of image resolutions for alignment, and image combination for feature extraction and target recognition. Since image fusion is used in many geospatial and night vision applications, it is important to understand these techniques and provide a comparative study of the methods. In this paper, we conduct a comparative study on 12 selected image fusion metrics over six multiresolution image fusion algorithms for two different fusion schemes and input images with distortion. The analysis can be applied to different image combination algorithms, image processing methods, and over a different choice of metrics that are of use to an image processing expert. The paper relates the results to an image quality measurement based on power spectrum and correlation analysis and serves as a summary of many contemporary techniques for objective assessment of image fusion algorithms.

Proceedings ArticleDOI
16 Jun 2012
TL;DR: A complex human activity dataset depicting two person interactions, including synchronized video, depth and motion capture data is created, and techniques related to Multiple Instance Learning (MIL) are explored, finding that the MIL based classifier outperforms SVMs when the sequences extend temporally around the interaction of interest.
Abstract: Human activity recognition has potential to impact a wide range of applications from surveillance to human computer interfaces to content based video retrieval. Recently, the rapid development of inexpensive depth sensors (e.g. Microsoft Kinect) provides adequate accuracy for real-time full-body human tracking for activity recognition applications. In this paper, we create a complex human activity dataset depicting two person interactions, including synchronized video, depth and motion capture data. Moreover, we use our dataset to evaluate various features typically used for indexing and retrieval of motion capture data, in the context of real-time detection of interaction activities via Support Vector Machines (SVMs). Experimentally, we find that the geometric relational features based on distance between all pairs of joints outperforms other feature choices. For whole sequence classification, we also explore techniques related to Multiple Instance Learning (MIL) in which the sequence is represented by a bag of body-pose features. We find that the MIL based classifier outperforms SVMs when the sequences extend temporally around the interaction of interest.

Journal ArticleDOI
TL;DR: This paper analyzes key aspects of the various AIA methods, including both feature extraction and semantic learning methods and provides a comprehensive survey on automatic image annotation.

Proceedings ArticleDOI
16 Jun 2012
TL;DR: In the authors' experiments, it is demonstrated that conditional regression forests outperform regression forests for facial feature detection and close-to-human accuracy is achieved while processing images in real-time.
Abstract: Although facial feature detection from 2D images is a well-studied field, there is a lack of real-time methods that estimate feature points even on low quality images. Here we propose conditional regression forest for this task. While regression forest learn the relations between facial image patches and the location of feature points from the entire set of faces, conditional regression forest learn the relations conditional to global face properties. In our experiments, we use the head pose as a global property and demonstrate that conditional regression forests outperform regression forests for facial feature detection. We have evaluated the method on the challenging Labeled Faces in the Wild [20] database where close-to-human accuracy is achieved while processing images in real-time.

Proceedings ArticleDOI
16 Jun 2012
TL;DR: It is shown that a recognition system using only representations obtained from deep learning can achieve comparable accuracy with a system using a combination of hand-crafted image descriptors, and empirically show that learning weights not only is necessary for obtaining good multilayer representations, but also provides robustness to the choice of the network architecture parameters.
Abstract: Most modern face recognition systems rely on a feature representation given by a hand-crafted image descriptor, such as Local Binary Patterns (LBP), and achieve improved performance by combining several such representations. In this paper, we propose deep learning as a natural source for obtaining additional, complementary representations. To learn features in high-resolution images, we make use of convolutional deep belief networks. Moreover, to take advantage of global structure in an object class, we develop local convolutional restricted Boltzmann machines, a novel convolutional learning model that exploits the global structure by not assuming stationarity of features across the image, while maintaining scalability and robustness to small misalignments. We also present a novel application of deep learning to descriptors other than pixel intensity values, such as LBP. In addition, we compare performance of networks trained using unsupervised learning against networks with random filters, and empirically show that learning weights not only is necessary for obtaining good multilayer representations, but also provides robustness to the choice of the network architecture parameters. Finally, we show that a recognition system using only representations obtained from deep learning can achieve comparable accuracy with a system using a combination of hand-crafted image descriptors. Moreover, by combining these representations, we achieve state-of-the-art results on a real-world face verification database.

Journal ArticleDOI
TL;DR: The results show that the proposed method can effectively get the signal feature to diagnose the occurrence of early fault of rotating machinery.

Book
09 Oct 2012
TL;DR: This book is an essential guide to the implementation of image processing and computer vision techniques, with tutorial introductions and sample code in Matlab, and contains extensive new material on Haar wavelets, Viola-Jones, bilateral filtering, SURF, PCA-SIFT, moving object detection and tracking.
Abstract: This book is an essential guide to the implementation of image processing and computer vision techniques, with tutorial introductions and sample code in Matlab. Algorithms are presented and fully explained to enable complete understanding of the methods and techniques demonstrated. As one reviewer noted, "The main strength of the proposed book is the exemplar code of the algorithms." Fully updated with the latest developments in feature extraction, including expanded tutorials and new techniques, this new edition contains extensive new material on Haar wavelets, Viola-Jones, bilateral filtering, SURF, PCA-SIFT, moving object detection and tracking, development of symmetry operators, LBP texture analysis, Adaboost, and a new appendix on color models. Coverage of distance measures, feature detectors, wavelets, level sets and texture tutorials has been extended. * Named a 2012 Notable Computer Book for Computing Methodologies by Computing Reviews* Essential reading for engineers and students working in this cutting-edge field* Ideal module text and background reference for courses in image processing and computer vision* The only currently available text to concentrate on feature extraction with working implementation and worked through derivation

Journal ArticleDOI
TL;DR: The patch alignment framework is introduced to linearly combine multiple features in the optimal way and obtain a unified low-dimensional representation of these multiple features for subsequent classification in hyperspectral remote sensing image classification.
Abstract: In hyperspectral remote sensing image classification, multiple features, e.g., spectral, texture, and shape features, are employed to represent pixels from different perspectives. It has been widely acknowledged that properly combining multiple features always results in good classification performance. In this paper, we introduce the patch alignment framework to linearly combine multiple features in the optimal way and obtain a unified low-dimensional representation of these multiple features for subsequent classification. Each feature has its particular contribution to the unified representation determined by simultaneously optimizing the weights in the objective function. This scheme considers the specific statistical properties of each feature to achieve a physically meaningful unified low-dimensional representation of multiple features. Experiments on the classification of the hyperspectral digital imagery collection experiment and reflective optics system imaging spectrometer hyperspectral data sets suggest that this scheme is effective.

Journal ArticleDOI
TL;DR: A class of linear transformation techniques based on block wise transformation of MFLE which effectively decorrelate the filter bank log energies and also capture speech information in an efficient manner are studied.

Journal ArticleDOI
TL;DR: This paper explores the effectiveness of sparse representations obtained by learning a set of overcomplete basis (dictionary) in the context of action recognition in videos and presents the idea of a new local spatio-temporal feature that is distinctive, scale invariant, and fast to compute.
Abstract: This paper explores the effectiveness of sparse representations obtained by learning a set of overcomplete basis (dictionary) in the context of action recognition in videos. Although this work concentrates on recognizing human movements-physical actions as well as facial expressions-the proposed approach is fairly general and can be used to address other classification problems. In order to model human actions, three overcomplete dictionary learning frameworks are investigated. An overcomplete dictionary is constructed using a set of spatio-temporal descriptors (extracted from the video sequences) in such a way that each descriptor is represented by some linear combination of a small number of dictionary elements. This leads to a more compact and richer representation of the video sequences compared to the existing methods that involve clustering and vector quantization. For each framework, a novel classification algorithm is proposed. Additionally, this work also presents the idea of a new local spatio-temporal feature that is distinctive, scale invariant, and fast to compute. The proposed approach repeatedly achieves state-of-the-art results on several public data sets containing various physical actions and facial expressions.