scispace - formally typeset
Search or ask a question

Showing papers on "Feature (machine learning) published in 2009"


Proceedings ArticleDOI
01 Sep 2009
TL;DR: It is shown that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks and that two stages of feature extraction yield better accuracy than one.
Abstract: In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a non-linear transformation, and some sort of feature pooling layer Most systems use only one stage of feature extraction in which the filters are hard-wired, or two stages where the filters in one or both stages are learned in supervised or unsupervised mode This paper addresses three questions: 1 How does the non-linearities that follow the filter banks influence the recognition accuracy? 2 does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? 3 Is there any advantage to using an architecture with two stages of feature extraction, rather than one? We show that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks We show that two stages of feature extraction yield better accuracy than one Most surprisingly, we show that a two-stage system with random filters can yield almost 63% recognition rate on Caltech-101, provided that the proper non-linearities and pooling layers are used Finally, we show that with supervised refinement, the system achieves state-of-the-art performance on NORB dataset (56%) and unsupervised pre-training followed by supervised refinement produces good accuracy on Caltech-101 (≫ 65%), and the lowest known error rate on the undistorted, unprocessed MNIST dataset (053%)

2,317 citations


Proceedings ArticleDOI
07 Sep 2009
TL;DR: It is demonstrated that regular sampling of space-time features consistently outperforms all testedspace-time interest point detectors for human actions in realistic settings and is a consistent ranking for the majority of methods over different datasets.
Abstract: Local space-time features have recently become a popular video representation for action recognition. Several methods for feature localization and description have been proposed in the literature and promising recognition results were demonstrated for a number of action classes. The comparison of existing methods, however, is often limited given the different experimental settings used. The purpose of this paper is to evaluate and compare previously proposed space-time features in a common experimental setup. In particular, we consider four different feature detectors and six local feature descriptors and use a standard bag-of-features SVM approach for action recognition. We investigate the performance of these methods on a total of 25 action classes distributed over three datasets with varying difficulty. Among interesting conclusions, we demonstrate that regular sampling of space-time features consistently outperforms all tested space-time interest point detectors for human actions in realistic settings. We also demonstrate a consistent ranking for the majority of methods over different datasets and discuss their advantages and limitations.

1,485 citations


Proceedings Article
07 Dec 2009
TL;DR: In this paper, the authors apply convolutional deep belief networks to audio data and empirically evaluate them on various audio classification tasks and show that the learned features correspond to phones/phonemes.
Abstract: In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning approaches have not been extensively studied for auditory data. In this paper, we apply convolutional deep belief networks to audio data and empirically evaluate them on various audio classification tasks. In the case of speech data, we show that the learned features correspond to phones/phonemes. In addition, our feature representations learned from unlabeled audio data show very good performance for multiple audio classification tasks. We hope that this paper will inspire more research on deep learning approaches applied to a wide range of audio recognition tasks.

1,094 citations


Journal ArticleDOI
TL;DR: An empirical feature analysis for audio environment characterization is performed and a matching pursuit algorithm is proposed to use to obtain effective time-frequency features to yield higher recognition accuracy for environmental sounds.
Abstract: The paper considers the task of recognizing environmental sounds for the understanding of a scene or context surrounding an audio sensor. A variety of features have been proposed for audio recognition, including the popular Mel-frequency cepstral coefficients (MFCCs) which describe the audio spectral shape. Environmental sounds, such as chirpings of insects and sounds of rain which are typically noise-like with a broad flat spectrum, may include strong temporal domain signatures. However, only few temporal-domain features have been developed to characterize such diverse audio signals previously. Here, we perform an empirical feature analysis for audio environment characterization and propose to use the matching pursuit (MP) algorithm to obtain effective time-frequency features. The MP-based method utilizes a dictionary of atoms for feature selection, resulting in a flexible, intuitive and physically interpretable set of features. The MP-based feature is adopted to supplement the MFCC features to yield higher recognition accuracy for environmental sounds. Extensive experiments are conducted to demonstrate the effectiveness of these joint features for unstructured environmental sound classification, including listening tests to study human recognition capabilities. Our recognition system has shown to produce comparable performance as human listeners.

626 citations


Proceedings ArticleDOI
01 Sep 2009
TL;DR: A novel matching, spatio-temporal relationship match, which is designed to measure structural similarity between sets of features extracted from two videos, thereby enabling detection and localization of complex non-periodic activities.
Abstract: Human activity recognition is a challenging task, especially when its background is unknown or changing, and when scale or illumination differs in each video. Approaches utilizing spatio-temporal local features have proved that they are able to cope with such difficulties, but they mainly focused on classifying short videos of simple periodic actions. In this paper, we present a new activity recognition methodology that overcomes the limitations of the previous approaches using local features. We introduce a novel matching, spatio-temporal relationship match, which is designed to measure structural similarity between sets of features extracted from two videos. Our match hierarchically considers spatio-temporal relationships among feature points, thereby enabling detection and localization of complex non-periodic activities. In contrast to previous approaches to ‘classify’ videos, our approach is designed to ‘detect and localize’ all occurring activities from continuous videos where multiple actors and pedestrians are present. We implement and test our methodology on a newly-introduced dataset containing videos of multiple interacting persons and individual pedestrians. The results confirm that our system is able to recognize complex non-periodic activities (e.g. ‘push’ and ‘hug’) from sets of spatio-temporal features even when multiple activities are present in the scene

612 citations


Posted Content
TL;DR: Searn as mentioned in this paper is a meta-algorithm that transforms complex structured prediction problems into simple classification problems to which any binary classifier may be applied, and it is able to learn prediction functions for any loss function and any class of features.
Abstract: We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision. Searn is a meta-algorithm that transforms these complex problems into simple classification problems to which any binary classifier may be applied. Unlike current algorithms for structured learning that require decomposition of both the loss function and the feature functions over the predicted structure, Searn is able to learn prediction functions for any loss function and any class of features. Moreover, Searn comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies good performance on the structured prediction problem.

478 citations


Journal ArticleDOI
TL;DR: Searn is an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision and comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies goodperformance on the structured prediction problem.
Abstract: We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision. Searn is a meta-algorithm that transforms these complex problems into simple classification problems to which any binary classifier may be applied. Unlike current algorithms for structured learning that require decomposition of both the loss function and the feature functions over the predicted structure, Searn is able to learn prediction functions for any loss function and any class of features. Moreover, Searn comes with a strong, natural theoretical guarantee: good performance on the derived classification problems implies good performance on the structured prediction problem.

449 citations


Proceedings Article
07 Dec 2009
TL;DR: This work pursues a similar approach with a richer kind of latent variable—latent features—using a Bayesian nonparametric approach to simultaneously infer the number of features at the same time the authors learn which entities have each feature, and combines these inferred features with known covariates in order to perform link prediction.
Abstract: As the availability and importance of relational data—such as the friendships summarized on a social networking website—increases, it becomes increasingly important to have good models for such data. The kinds of latent structure that have been considered for use in predicting links in such networks have been relatively limited. In particular, the machine learning community has focused on latent class models, adapting Bayesian nonparametric methods to jointly infer how many latent classes there are while learning which entities belong to each class. We pursue a similar approach with a richer kind of latent variable—latent features—using a Bayesian nonparametric approach to simultaneously infer the number of features at the same time we learn which entities have each feature. Our model combines these inferred features with known covariates in order to perform link prediction. We demonstrate that the greater expressiveness of this approach allows us to improve performance on three datasets.

448 citations


Journal ArticleDOI
TL;DR: This paper presents a novel point based surface definition combining the simplicity of implicit MLS surfaces [ SOS04, Kol05 ] with the strength of robust statistics, and reaches this new definition in terms of local kernel regression.
Abstract: Moving least squares (MLS) is a very attractive tool to design effective meshless surface representations. However, as long as approximations are performed in a least square sense, the resulting definitions remain sensitive to outliers, and smooth-out small or sharp features. In this paper, we address these major issues, and present a novel point based surface definition combining the simplicity of implicit MLS surfaces [SOS04,Kol05] with the strength of robust statistics. To reach this new definition, we review MLS surfaces in terms of local kernel regression, opening the doors to a vast and well established literature from which we utilize robust kernel regression. Our novel representation can handle sparse sampling, generates a continuous surface better preserving fine details, and can naturally handle any kind of sharp features with controllable sharpness. Finally, it combines ease of implementation with performance competing with other non-robust approaches.

442 citations


Journal Article
TL;DR: This paper extends ISIS, without explicit definition of residuals, to a general pseudo-likelihood framework, which includes generalized linear models as a special case and improves ISIS by allowing feature deletion in the iterative process.
Abstract: Variable selection in high-dimensional space characterizes many contemporary problems in scientific discovery and decision making. Many frequently-used techniques are based on independence screening; examples include correlation ranking (Fan & Lv, 2008) or feature selection using a two-sample t-test in high-dimensional classification (Tibshirani et al., 2003). Within the context of the linear model, Fan & Lv (2008) showed that this simple correlation ranking possesses a sure independence screening property under certain conditions and that its revision, called iteratively sure independent screening (ISIS), is needed when the features are marginally unrelated but jointly related to the response variable. In this paper, we extend ISIS, without explicit definition of residuals, to a general pseudo-likelihood framework, which includes generalized linear models as a special case. Even in the least-squares setting, the new method improves ISIS by allowing feature deletion in the iterative process. Our technique allows us to select important features in high-dimensional classification where the popularly used two-sample t-method fails. A new technique is introduced to reduce the false selection rate in the feature screening stage. Several simulated and two real data examples are presented to illustrate the methodology.

429 citations


Proceedings ArticleDOI
11 Oct 2009
TL;DR: The experimental results demonstrate that the use of an enriched feature set analyzed by PLS reduces the ambiguity among different appearances and provides higher recognition rates when compared to other machine learning techniques.
Abstract: Appearance information is essential for applications such as tracking and people recognition. One of the main problems of using appearance-based discriminative models is the ambiguities among classes when the number of persons being considered increases. To reduce the amount of ambiguity, we propose the use of a rich set of feature descriptors based on color, textures and edges. Another issue regarding appearance modeling is the limited number of training samples available for each appearance. The discriminative models are created using a powerful statistical tool called Partial Least Squares (PLS), responsible for weighting the features according to their discriminative power for each different appearance. The experimental results, based on appearance-based person recognition, demonstrate that the use of an enriched feature set analyzed by PLS reduces the ambiguity among different appearances and provides higher recognition rates when compared to other machine learning techniques.

Journal ArticleDOI
TL;DR: The aim of the thesis is to demonstrate that deep generative models that contain many layers of latent variables and millions of parameters can be learned efficiently, and that the learned high-level feature representations can be successfully applied in a wide spectrum of application domains, including visual object recognition, information retrieval, and classification and regression tasks.
Abstract: Building intelligent systems that are capable of extracting high-level representations from high-dimensional sensory data lies at the core of solving many AI related tasks, including object recognition, speech perception, and language understanding. Theoretical and biological arguments strongly suggest that building such systems requires models with deep architectures that involve many layers of nonlinear processing. The aim of the thesis is to demonstrate that deep generative models that contain many layers of latent variables and millions of parameters can be learned efficiently, and that the learned high-level feature representations can be successfully applied in a wide spectrum of application domains, including visual object recognition, information retrieval, and classification and regression tasks. In addition, similar methods can be used for nonlinear dimensionality reduction. The first part of the thesis focuses on analysis and applications of probabilistic generative models called Deep Belief Networks. We show that these deep hierarchical models can learn useful feature representations from a large supply of unlabeled sensory inputs. The learned high-level representations capture a lot of structure in the input data, which is useful for subsequent problem-specific tasks, such as classification, regression or information retrieval, even though these tasks are unknown when the generative model is being trained. In the second part of the thesis, we introduce a new learning algorithm for a different type of hierarchical probabilistic model, which we call a Deep Boltzmann Machine. Like Deep Belief Networks, Deep Boltzmann Machines have the potential of learning internal representations that become increasingly complex at higher layers, which is a promising way of solving object and speech recognition problems. Unlike Deep Belief Networks and many existing models with deep architectures, the approximate inference procedure, in addition to a fast bottom-up pass, can incorporate top-down feedback. This allows Deep Boltzmann Machines to better propagate uncertainty about ambiguous inputs.

DOI
01 Jan 2009
TL;DR: This paper proposes an algorithm called MoSIFT, which detects interest points and encodes not only their local appearance but also explicitly models local motion, and introduces a bigram model to construct a correlation between local features to capture the more global structure of actions.
Abstract: The goal of this paper is to build robust human action recognition for real world surveillance videos Local spatio-temporal features around interest points provide compact but descriptive representations for video analysis and motion recognition Current approaches tend to extend spatial descriptions by adding a temporal component for the appearance descriptor, which only implicitly captures motion information We propose an algorithm called MoSIFT, which detects interest points and encodes not only their local appearance but also explicitly models local motion The idea is to detect distinctive local features through local appearance and motion We construct MoSIFT feature descriptors in the spirit of the well-known SIFT descriptors to be robust to small deformations through grid aggregation We also introduce a bigram model to construct a correlation between local features to capture the more global structure of actions The method advances the state of the art result on the KTH dataset to an accuracy of 958% We also applied our approach to 100 hours of surveillance data as part of the TRECVID Event Detection task with very promising results on recognizing human actions in the real world surveillance videos 2

Journal ArticleDOI
TL;DR: Results suggest that human-level expression recognition accuracy in real-life illumination conditions is achievable with machine learning technology.
Abstract: Machine learning approaches have produced some of the highest reported performances for facial expression recognition. However, to date, nearly all automatic facial expression recognition research has focused on optimizing performance on a few databases that were collected under controlled lighting conditions on a relatively small number of subjects. This paper explores whether current machine learning methods can be used to develop an expression recognition system that operates reliably in more realistic conditions. We explore the necessary characteristics of the training data set, image registration, feature representation, and machine learning algorithms. A new database, GENKI, is presented which contains pictures, photographed by the subjects themselves, from thousands of different people in many different real-world imaging conditions. Results suggest that human-level expression recognition accuracy in real-life illumination conditions is achievable with machine learning technology. However, the data sets currently used in the automatic expression recognition literature to evaluate progress may be overly constrained and could potentially lead research into locally optimal algorithmic solutions.

Journal ArticleDOI
TL;DR: According to this study, Random Forests provides the best prediction performance for large datasets and Naive Bayes is thebest prediction algorithm for small datasets in terms of the Area Under Receiver Operating Characteristics Curve (AUC) evaluation parameter.

Proceedings ArticleDOI
31 May 2009
TL;DR: This work proposes a joint model of parsing and named entity recognition, based on a discriminative feature-based constituency parser that produces a consistent output, where the named entity spans do not conflict with the phrasal spans of the parse tree.
Abstract: For many language technology applications, such as question answering, the overall system runs several independent processors over the data (such as a named entity recognizer, a coreference system, and a parser). This easily results in inconsistent annotations, which are harmful to the performance of the aggregate system. We begin to address this problem with a joint model of parsing and named entity recognition, based on a discriminative feature-based constituency parser. Our model produces a consistent output, where the named entity spans do not conflict with the phrasal spans of the parse tree. The joint representation also allows the information from each type of annotation to improve performance on the other, and, in experiments with the OntoNotes corpus, we found improvements of up to 1.36% absolute F1 for parsing, and up to 9.0% F1 for named entity recognition.

Journal ArticleDOI
TL;DR: This work proposes an approach to address the problem of improving content selection in automatic text summarization by using some statistical tools, which takes into account several features, including sentence position, positive keyword, negative keyword, sentence centrality, sentence resemblance to the title, sentenceclusion of name entity, sentence inclusion of numerical data, sentence relative length and aggregated similarity.

Proceedings ArticleDOI
05 Jun 2009
TL;DR: A system for extracting complex events among genes and proteins from biomedical literature, developed in context of the BioNLP'09 Shared Task on Event Extraction, which defines a wide array of features and makes extensive use of dependency parse graphs.
Abstract: We describe a system for extracting complex events among genes and proteins from biomedical literature, developed in context of the BioNLP'09 Shared Task on Event Extraction. For each event, its text trigger, class, and arguments are extracted. In contrast to the prevailing approaches in the domain, events can be arguments of other events, resulting in a nested structure that better captures the underlying biological statements. We divide the task into independent steps which we approach as machine learning problems. We define a wide array of features and in particular make extensive use of dependency parse graphs. A rule-based post-processing step is used to refine the output in accordance with the restrictions of the extraction task. In the shared task evaluation, the system achieved an F-score of 51.95% on the primary task, the best performance among the participants.

Journal ArticleDOI
TL;DR: The purpose of this study was to extend a previously introduced pattern recognition method for the ERP assessment in this application by further extending the feature set and also the employing a methods for the selection of optimal features.

Proceedings ArticleDOI
31 May 2009
TL;DR: It is demonstrated that allowing different values for these hyperparameters significantly improves performance over both a strong baseline and (Daume III, 2007) within both a conditional random field sequence model for named entity recognition and a discriminatively trained dependency parser.
Abstract: Multi-task learning is the problem of maximizing the performance of a system across a number of related tasks. When applied to multiple domains for the same task, it is similar to domain adaptation, but symmetric, rather than limited to improving performance on a target domain. We present a more principled, better performing model for this problem, based on the use of a hierarchical Bayesian prior. Each domain has its own domain-specific parameter for each feature but, rather than a constant prior over these parameters, the model instead links them via a hierarchical Bayesian global prior. This prior encourages the features to have similar weights across domains, unless there is good evidence to the contrary. We show that the method of (Daume III, 2007), which was presented as a simple "preprocessing step," is actually equivalent, except our representation explicitly separates hyperparameters which were tied in his work. We demonstrate that allowing different values for these hyperparameters significantly improves performance over both a strong baseline and (Daume III, 2007) within both a conditional random field sequence model for named entity recognition and a discriminatively trained dependency parser.

ReportDOI
01 Mar 2009
TL;DR: The intent is to provide the reader with an introduction to feature extraction and statistical modelling for feature classification in the context of SHM through the application of the Los Alamos National Laboratory’s statistical pattern recognition paradigm for structural health monitoring (SHM).
Abstract: The real-world structures are subjected to operational and environmental condition changes that impose difficulties in detecting and identifying structural damage. The aim of this report is to detect damage with the presence of such operational and environmental condition changes through the application of the Los Alamos National Laboratory’s statistical pattern recognition paradigm for structural health monitoring (SHM). The test structure is a laboratory three-story building, and the damage is simulated through nonlinear effects introduced by a bumper mechanism that simulates a repetitive impact-type nonlinearity. The report reviews and illustrates various statistical principles that have had wide application in many engineering fields. The intent is to provide the reader with an introduction to feature extraction and statistical modelling for feature classification in the context of SHM. In this process, the strengths and limitations of some actual statistical techniques used to detect damage in the structures are discussed. In the hierarchical structure of damage detection, this report is only concerned with the first step of the damage detection strategy, which is the evaluation of the existence of damage in the structure. The data from this study and a detailed description of the test structure are available for download at: http://institute.lanl.gov/ei/software-and-data/.

Journal ArticleDOI
TL;DR: A statistical learning mechanism in a computational model based on a mixture of Gaussians and competition successfully accounted for developmental enhancement and loss of sensitivity to phonetic contrasts and was used to explore a potentially important distributional property of early speech categories--sparseness.
Abstract: Infants face a difficult problem in acquiring their native language because the acoustic/phonetic variability in the input far exceeds the limited number of distinctive differences that define language-specific phonemes. How do infants attend to the relevant information that distinguishes words? Recent evidence suggests that phonemic categories may be induced, in whole or in part, by a rapid statistical learning mechanism that is sensitive to the distributional properties of phonetic input (Maye, Werker & Gerken, 2002; Maye, Weiss & Aslin, 2008). This evidence suggests that the detailed frequency-of-occurrence of tokens along continuous speech dimensions plays a crucial role in the formation and modification of phonemic categories. The present paper describes a computational model of statistical speech category learning that examines the necessary and sufficient mechanisms needed to account for known empirical data from infants, and the implications of those mechanisms for early speech categories. We demonstrate that statistical learning alone is insufficient: competition is also required. However, once this feature is added to the model, it can account for a number of developmental trajectories in speech category learning. Finally, we examine the possibility that early speech categories are independent and sparsely distributed; that is, they do not fully cover all values along a phonetic dimension.

Proceedings ArticleDOI
20 Jun 2009
TL;DR: This paper proposes a unified action recognition framework fusing local descriptors and holistic features based on frame differencing, bag-of-words and feature fusion, and shows that the proposed approach is effective.
Abstract: In this paper we propose a unified action recognition framework fusing local descriptors and holistic features. The motivation is that the local descriptors and holistic features emphasize different aspects of actions and are suitable for the different types of action databases. The proposed unified framework is based on frame differencing, bag-of-words and feature fusion. We extract two kinds of local descriptors, i.e. 2D and 3D SIFT feature descriptors, both based on 2D SIFT interest points. We apply Zernike moments to extract two kinds of holistic features, one is based on single frames and the other is based on motion energy image. We perform action recognition experiments on the KTH and Weizmann databases, using Support Vector Machines. We apply the leave-one-out and pseudo leave-N-out setups, and compare our proposed approach with state-of-the-art results. Experiments show that our proposed approach is effective. Compared with other approaches our approach is more robust, more versatile, easier to compute and simpler to understand.

Proceedings Article
07 Dec 2009
TL;DR: A new conditional probabilistic graphical model, Conditional Neural Fields (CNF), is proposed, which is the best among approximately 10 machine learning methods for protein secondary structure prediction and also among a few of the best methods for handwriting recognition.
Abstract: Conditional random fields (CRF) are widely used for sequence labeling such as natural language processing and biological sequence analysis. Most CRF models use a linear potential function to represent the relationship between input features and output. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and output is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input and output we propose a new conditional probabilistic graphical model, Conditional Neural Fields (CNF), for sequence labeling. CNF extends CRF by adding one (or possibly more) middle layer between input and output. The middle layer consists of a number of gate functions, each acting as a local neuron or feature extractor to capture the nonlinear relationship between input and output. Therefore, conceptually CNF is much more expressive than CRF. Experiments on two widely-used benchmarks indicate that CNF performs significantly better than a number of popular methods. In particular, CNF is the best among approximately 10 machine learning methods for protein secondary structure prediction and also among a few of the best methods for handwriting recognition.

Book
21 Jul 2009
TL;DR: Deterministic Learning Theory for Identification, Recognition, and Control presents a unified conceptual framework for knowledge acquisition, representation, and knowledge utilization in uncertain dynamic environments and represents a new model of information processing, i.e. a model of dynamical parallel distributed processing (DPDP).
Abstract: Deterministic Learning Theory for Identification, Recognition, and Control presents a unified conceptual framework for knowledge acquisition, representation, and knowledge utilization in uncertain dynamic environments. It provides systematic design approaches for identification, recognition, and control of linear uncertain systems. Unlike many books currently available that focus on statistical principles, this book stresses learning through closed-loop neural control, effective representation and recognition of temporal patterns in a deterministic way. A Deterministic View of Learning in Dynamic EnvironmentsThe authors begin with an introduction to the concepts of deterministic learning theory, followed by a discussion of the persistent excitation property of RBF networks. They describe the elements of deterministic learning, and address dynamical pattern recognition and pattern-based control processes. The results are applicable to areas such as detection and isolation of oscillation faults, ECG/EEG pattern recognition, robot learning and control, and security analysis and control of power systems. A New Model of Information ProcessingThis book elucidates a learning theory which is developed using concepts and tools from the discipline of systems and control. Fundamental knowledge about system dynamics is obtained from dynamical processes, and is then utilized to achieve rapid recognition of dynamical patterns and pattern-based closed-loop control via the so-called internal and dynamical matching of system dynamics. This actually represents a new model of information processing, i.e. a model of dynamical parallel distributed processing (DPDP).

Proceedings Article
01 Dec 2009
TL;DR: Findings show patterns at odds with findings in previous studies: audio features do not always outperform lyrics features, and combining lyrics and audio features can improve performance in many mood categories, but not all of them.
Abstract: This research examines the role lyric text can play in improving audio music mood classification. A new method is proposed to build a large ground truth set of 5,585 songs and 18 mood categories based on social tags so as to reflect a realistic, user-centered perspective. A relatively complete set of lyric features and representation models were investigated. The best performing lyric feature set was also compared to a leading audio-based system. In combining lyric and audio sources, hybrid feature sets built with three different feature selection methods were also examined. The results show patterns at odds with findings in previous studies: audio features do not always outperform lyrics features, and combining lyrics and audio features can improve performance in many mood categories, but not all of them.

Journal ArticleDOI
TL;DR: The problem of unsupervised multi-instance learning is addressed where a multi- instance clustering algorithm named Bamic is proposed and based on the clustering results, a novel multi- instances prediction algorithm named Bartmip is developed.
Abstract: In the setting of multi-instance learning, each object is represented by a bag composed of multiple instances instead of by a single instance in a traditional learning setting. Previous works in this area only concern multi-instance prediction problems where each bag is associated with a binary (classification) or real-valued (regression) label. However, unsupervised multi-instance learning where bags are without labels has not been studied. In this paper, the problem of unsupervised multi-instance learning is addressed where a multi-instance clustering algorithm named Bamic is proposed. Briefly, by regarding bags as atomic data items and using some form of distance metric to measure distances between bags, Bamic adapts the popular k -Medoids algorithm to partition the unlabeled training bags into k disjoint groups of bags. Furthermore, based on the clustering results, a novel multi-instance prediction algorithm named Bartmip is developed. Firstly, each bag is re-represented by a k-dimensional feature vector, where the value of the i-th feature is set to be the distance between the bag and the medoid of the i-th group. After that, bags are transformed into feature vectors so that common supervised learners are used to learn from the transformed feature vectors each associated with the original bag's label. Extensive experiments show that Bamic could effectively discover the underlying structure of the data set and Bartmip works quite well on various kinds of multi-instance prediction problems.

Journal ArticleDOI
TL;DR: This paper presents a taxonomy that organizes new similarity mechanisms and more established similarity mechanisms in a coherent framework and presents research on kernel-based learning is a rich source of novel similarity representations because of the emphasis on encoding domain knowledge in the kernel function.
Abstract: Assessing the similarity between cases is a key aspect of the retrieval phase in case-based reasoning (CBR). In most CBR work, similarity is assessed based on feature value descriptions of cases using similarity metrics, which use these feature values. In fact, it might be said that this notion of a feature value representation is a defining part of the CBR worldview-it underpins the idea of a problem space with cases located relative to each other in this space. Recently, a variety of similarity mechanisms have emerged that are not founded on this feature space idea. Some of these new similarity mechanisms have emerged in CBR research and some have arisen in other areas of data analysis. In fact, research on kernel-based learning is a rich source of novel similarity representations because of the emphasis on encoding domain knowledge in the kernel function. In this paper, we present a taxonomy that organizes these new similarity mechanisms and more established similarity mechanisms in a coherent framework.

Patent
27 Feb 2009
TL;DR: In this article, a coarse sound representation generator is used to determine a likelihood of a coarse representation of a recognized word actually being the recognized word based on comparing it to a database containing the known sound of that recognized word and assigning the likelihood as a robust confidence level parameter.
Abstract: A method, apparatus, and system are described for a continuous speech recognition engine that includes a fine speech recognizer model, a coarse sound representation generator, and a coarse match generator. The fine speech recognizer model receives a time coded sequence of sound feature frames, applies a speech recognition process to the sound feature frames and determines at least a best guess at each recognizable word that corresponds to the sound feature frames. The coarse sound representation generator generates a coarse sound representation of the recognized word. The coarse match generator determines a likelihood of the coarse sound representation actually being the recognized word based on comparing the coarse sound representation of the recognized word to a database containing the known sound of that recognized word and assigns the likelihood as a robust confidence level parameter to that recognized word.

Proceedings ArticleDOI
29 Jun 2009
TL;DR: This paper captures input typing errors via edit distance and shows that a naive approach of invoking an offline edit distance matching algorithm at each step performs poorly and present more efficient algorithms.
Abstract: Autocompletion is a useful feature when a user is doing a look up from a table of records. With every letter being typed, autocompletion displays strings that are present in the table containing as their prefix the search string typed so far. Just as there is a need for making the lookup operation tolerant to typing errors, we argue that autocompletion also needs to be error-tolerant. In this paper, we take a first step towards addressing this problem. We capture input typing errors via edit distance. We show that a naive approach of invoking an offline edit distance matching algorithm at each step performs poorly and present more efficient algorithms. Our empirical evaluation demonstrates the effectiveness of our algorithms.