Showing papers on "Feature vector published in 1998"

PDF

Open Access

Journal Article•DOI•

The random subspace method for constructing decision forests

[...]

Tin Kam Ho¹•Institutions (1)

01 Aug 1998-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity.

...read moreread less

Abstract: Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experiments on publicly available datasets, where the method's superiority is demonstrated. We also discuss independence between trees in a forest and relate that to the combined classification accuracy.

...read moreread less

5,984 citations

Correlation-based Feature Selection for Machine Learning

[...]

Mark Hall

01 Jan 1998

TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.

...read moreread less

Abstract: A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set. Feature selection degraded machine learning performance in cases where some features were eliminated which were highly predictive of very small areas of the instance space. Further experiments compared CFS with a wrapper—a well known approach to feature selection that employs the target learning algorithm to evaluate feature sets. In many cases CFS gave comparable results to the wrapper, and in general, outperformed the wrapper on small datasets. CFS executes many times faster than the wrapper, which allows it to scale to larger datasets. Two methods of extending CFS to handle feature interaction are presented and experimentally evaluated. The first considers pairs of features and the second incorporates iii feature weights calculated by the RELIEF algorithm. Experiments on artificial domains showed that both methods were able to identify interacting features. On natural domains, the pairwise method gave more reliable results than using weights provided by RELIEF.

...read moreread less

3,533 citations

Journal Article•DOI•

Example-based learning for view-based human face detection

[...]

K.-K. Sung¹, Tomaso Poggio²•Institutions (2)

National University of Singapore¹, Massachusetts Institute of Technology²

01 Jan 1998-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An example-based learning approach for locating vertical frontal views of human faces in complex scenes and shows empirically that the distance metric adopted for computing difference feature vectors, and the "nonface" clusters included in the distribution-based model, are both critical for the success of the system.

...read moreread less

Abstract: We present an example-based learning approach for locating vertical frontal views of human faces in complex scenes. The technique models the distribution of human face patterns by means of a few view-based "face" and "nonface" model clusters. At each image location, a difference feature vector is computed between the local image pattern and the distribution-based model. A trained classifier determines, based on the difference feature vector measurements, whether or not a human face exists at the current image location. We show empirically that the distance metric we adopt for computing difference feature vectors, and the "nonface" clusters we include in our distribution-based model, are both critical for the success of our system.

...read moreread less

2,013 citations

Proceedings Article•

An efficient approach to clustering in large multimedia databases with noise

[...]

Alexander Hinneburg¹, Daniel A. Keim¹•Institutions (1)

Martin Luther University of Halle-Wittenberg¹

27 Aug 1998

TL;DR: A new algorithm to clustering in large multimedia databases called DENCLUE (DENsity-based CLUstEring) is introduced, which has a firm mathematical basis, has good clustering properties in data sets with large amounts of noise, allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets and is significantly faster than existing algorithms.

...read moreread less

Abstract: Several clustering algorithms can be applied to clustering in large multimedia databases. The effectiveness and efficiency of the existing algorithms, however, is somewhat limited, since clustering in multimedia databases requires clustering high-dimensional feature vectors and since multimedia databases often contain large amounts of noise. In this paper, we therefore introduce a new algorithm to clustering in large multimedia databases called DENCLUE (DENsity-based CLUstEring). The basic idea of our new approach is to model the overall point density analytically as the sum of influence functions of the data points. Clusters can then be identified by determining density-attractors and clusters of arbitrary shape can be easily described by a simple equation based on the overall density function. The advantages of our new approach are (1) it has a firm mathematical basis, (2) it has good clustering properties in data sets with large amounts of noise, (3) it allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets and (4) it is significantly faster than existing algorithms. To demonstrate the effectiveness and efficiency of DENCLUE, we perform a series of experiments on a number of different data sets from CAD and molecular biology. A comparison with DBSCAN shows the superiority of our new approach.

...read moreread less

1,298 citations

Proceedings Article•

Feature Selection via Concave Minimization and Support Vector Machines

[...]

Paul S. Bradley, Olvi L. Mangasarian

24 Jul 1998

TL;DR: Numerical tests on 6 public data sets show that classi ers trained by the concave minimization approach and those trained by a support vector machine have comparable 10fold cross-validation correctness.

...read moreread less

Abstract: Computational comparison is made between two feature selection approaches for nding a separating plane that discriminates between two point sets in an n-dimensional feature space that utilizes as few of the n features (dimensions) as possible. In the concave minimization approach [19, 5] a separating plane is generated by minimizing a weighted sum of distances of misclassi ed points to two parallel planes that bound the sets and which determine the separating plane midway between them. Furthermore, the number of dimensions of the space used to determine the plane is minimized. In the support vector machine approach [27, 7, 1, 10, 24, 28], in addition to minimizing the weighted sum of distances of misclassi ed points to the bounding planes, we also maximize the distance between the two bounding planes that generate the separating plane. Computational results show that feature suppression is an indirect consequence of the support vector machine approach when an appropriate norm is used. Numerical tests on 6 public data sets show that classi ers trained by the concave minimization approach and those trained by a support vector machine have comparable 10fold cross-validation correctness. However, in all data sets tested, the classi ers obtained by the concave minimization approach selected fewer problem features than those trained by a support vector machine.

...read moreread less

1,074 citations

Proceedings Article•DOI•

Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron

[...]

Zhengyou Zhang, Michael J. Lyons, Mike Schuster, Shigeru Akamatsu

14 Apr 1998

TL;DR: The authors investigate the use of two types of features extracted from face images for recognizing facial expressions, and it turns out that five to seven hidden units are probably enough to represent the space of feature expressions.

...read moreread less

Abstract: The authors investigate the use of two types of features extracted from face images for recognizing facial expressions. The first type is the geometric positions of a set of fiducial points on a face. The second type is a set of multi-scale and multi-orientation Gabor wavelet coefficients extracted from the face image at the fiducial points. They can be used either independently or jointly. The architecture developed is based on a two-layer perceptron. The recognition performance with different types of features has been compared, which shows that Gabor wavelet coefficients are much more powerful than geometric positions. Furthermore, since the first layer of the perceptron actually performs a nonlinear reduction of the dimensionality of the feature space, they have also studied the desired number of hidden units, i.e., the appropriate dimension to represent a facial expression in order to achieve a good recognition rate. It turns out that five to seven hidden units are probably enough to represent the space of feature expressions.

...read moreread less

637 citations

Book Chapter•DOI•

The Support Vector Method of Function Estimation

[...]

Vladimir Vapnik¹•Institutions (1)

AT&T Labs¹

01 Jan 1998

TL;DR: For the Support Vector method both the quality of solution and the complexity of the solution does not depend directly on the dimensionality of an input space, and on the basis of this technique one can obtain a good estimate using a given number of high-dimensional data.

...read moreread less

Abstract: This chapter describes the Support Vector technique for function estimation problems such as pattern recognition, regression estimation, and solving linear operator equations. It shows that for the Support Vector method both the quality of solution and the complexity of the solution does not depend directly on the dimensionality of an input space. Therefore, on the basis of this technique one can obtain a good estimate using a given number of high-dimensional data.

...read moreread less

561 citations

Practical feature subset selection for machine learning

[...]

Mark Hall¹, Lloyd A. Smith•Institutions (1)

University of Waikato¹

01 Jan 1998

TL;DR: A new feature selection algorithm is described that uses a correlation based heuristic to determine the “goodness” of feature subsets, and its effectiveness is evaluated with three common machine learning algorithms.

...read moreread less

Abstract: Machine learning algorithms automatically extract knowledge from machine readable information. Unfortunately, their success is usually dependant on the quality of the data that they operate on. If the data is inadequate, or contains extraneous and irrelevant information, machine learning algorithms may produce less accurate and less understandable results, or may fail to discover anything of use at all. Feature subset selectors are algorithms that attempt to identify and remove as much irrelevant and redundant information as possible prior to learning. Feature subset selection can result in enhanced performance, a reduced hypothesis search space, and, in some cases, reduced storage requirement. This paper describes a new feature selection algorithm that uses a correlation based heuristic to determine the “goodness” of feature subsets, and evaluates its effectiveness with three common machine learning algorithms. Experiments using a number of standard machine learning data sets are presented. Feature subset selection gave significant improvement for all three algorithms.

...read moreread less

515 citations

Journal Article•DOI•

Cepstral domain segmental feature vector normalization for noise robust speech recognition

[...]

Olli Viikki¹, Kari Laurila¹•Institutions (1)

Nokia¹

01 Aug 1998-Speech Communication

TL;DR: A segmental feature vector normalization technique is proposed which makes an automatic speech recognition system more robust to environmental changes by normalizing the output of the signal-processing front-end to have similar segmental parameter statistics in all noise conditions.

...read moreread less

405 citations

Proceedings Article•DOI•

Hidden Markov models for face recognition

[...]

Ara Victor Nefian¹, Monson H. Hayes¹•Institutions (1)

Georgia Institute of Technology¹

12 May 1998

TL;DR: A new method based on the extraction of 2D-DCT feature vectors is described, and the recognition results are compared with other face recognition approaches.

...read moreread less

Abstract: The work presented in this paper focuses on the use of hidden Markov models for face recognition. A new method based on the extraction of 2D-DCT feature vectors is described, and the recognition results are compared with other face recognition approaches. The method introduced reduces significantly the computational complexity of previous HMM-based face recognition system, while preserving the same recognition rate.

...read moreread less

341 citations

Patent•

Method and system for gesture category recognition and training using a feature vector

[...]

Jozsef Kiraly, Ervin Dobler

19 Feb 1998

TL;DR: In this article, a gesture is defined as a hand or body initiated movement of a cursor directing device to outline a particular pattern in particular directions done in particular periods of time, and gestures can be recognized using a radial basis function neural network.

...read moreread less

Abstract: A computer implemented method and system for gesture category recognition and training. Generally, a gesture is a hand or body initiated movement of a cursor directing device to outline a particular pattern in particular directions done in particular periods of time. The present invention allows a computer system to accept input data, originating from a user, in the form gesture data that are made using the cursor directing device. In one embodiment, a mouse device is used, but the present invention is equally well suited for use with other cursor directing devices (e.g., a track ball, a finger pad, an electronic stylus, etc.). In one embodiment, gesture data is accepted by pressing a key on the keyboard and then moving the mouse (with mouse button pressed) to trace out the gesture. Mouse position information and time stamps are recorded. The present invention then determines a multi-dimensional feature vector based on the gesture data. The feature vector is then passed through a gesture category recognition engine that, in one implementation, uses a radial basis function neural network to associate the feature vector to a pre-existing gesture category. Once identified, a set of user commands that are associated with the gesture category are applied to the computer system. The user commands can originate from an automatic process that extracts commands that are associated with the menu items of a particular application program. The present invention also allows user training so that user-defined gestures, and the computer commands associated therewith, can be programmed into the computer system.

...read moreread less

Proceedings Article•DOI•

Maximum likelihood modeling with Gaussian distributions for classification

[...]

Ramesh A. Gopinath¹•Institutions (1)

IBM¹

12 May 1998

TL;DR: It is shown that in some cases sharing parameters across classes can also lead to better discrimination (as evidenced by reduced misclassification error), and some constraints on the parameters are shown to lead to linear discrimination analysis.

...read moreread less

Abstract: Maximum likelihood (ML) modeling of multiclass data for classification often suffers from the following problems: (a) data insufficiency implying overtrained or unreliable models, (b) large storage requirement, (c) large computational requirement and/or (d) the ML is not discriminating between classes. Sharing parameters across classes (or constraining the parameters) clearly tends to alleviate the first three problems. We show that in some cases it can also lead to better discrimination (as evidenced by reduced misclassification error). The parameters considered are the means and variances of the Gaussians and linear transformations of the feature space (or equivalently the Gaussian means). Some constraints on the parameters are shown to lead to linear discrimination analysis (a well-known result) while others are shown to lead to optimal feature spaces (a relatively new result). Applications of some of these ideas to the speech recognition problem are also given.

...read moreread less

Journal Article•DOI•

Audio Feature Extraction and Analysis for Scene Segmentation and Classification

[...]

Zhu Liu¹, Yao Wang¹, Tsuhan Chen²•Institutions (2)

New York University¹, Carnegie Mellon University²

01 Oct 1998

TL;DR: A set of low-level audio features are proposed for characterizing semantic contents of short audio clips and a neural net classifier was successful in separating the above five types of TV programs.

...read moreread less

Abstract: Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence.

...read moreread less

Posted Content•

A Winnow-Based Approach to Context-Sensitive Spelling Correction

[...]

Andrew R. Golding¹, Dan Roth²•Institutions (2)

Mitsubishi Electric¹, University of Illinois at Urbana–Champaign²

31 Oct 1998-arXiv: Learning

TL;DR: The authors presented an algorithm combining variants of Winnow and weighted-majority voting, and applied it to a problem in the aforementioned class: context-sensitive spelling correction, which is the task of fixing spelling errors that happen to result in valid words, such as substituting "to" for "too", "casual" for 'causal", etc.

...read moreread less

Abstract: A large class of machine-learning problems in natural language require the characterization of linguistic context. Two characteristic properties of such problems are that their feature space is of very high dimensionality, and their target concepts refer to only a small subset of the features in the space. Under such conditions, multiplicative weight-update algorithms such as Winnow have been shown to have exceptionally good theoretical properties. We present an algorithm combining variants of Winnow and weighted-majority voting, and apply it to a problem in the aforementioned class: context-sensitive spelling correction. This is the task of fixing spelling errors that happen to result in valid words, such as substituting "to" for "too", "casual" for "causal", etc. We evaluate our algorithm, WinSpell, by comparing it against BaySpell, a statistics-based method representing the state of the art for this task. We find: (1) When run with a full (unpruned) set of features, WinSpell achieves accuracies significantly higher than BaySpell was able to achieve in either the pruned or unpruned condition; (2) When compared with other systems in the literature, WinSpell exhibits the highest performance; (3) The primary reason that WinSpell outperforms BaySpell is that WinSpell learns a better linear separator; (4) When run on a test set drawn from a different corpus than the training set was drawn from, WinSpell is better able than BaySpell to adapt, using a strategy we will present that combines supervised learning on the training set with unsupervised learning on the (noisy) test set.

...read moreread less

Journal Article•DOI•

Computerized analysis of mammographic microcalcifications in morphological and texture feature spaces

[...]

Heang Ping Chan¹, Berkman Sahiner¹, Kwok L. Lam¹, Nicholas Petrick¹, Mark A. Helvie¹, Mitchell M. Goodsitt¹, Dorit D. Adler¹ - Show less +3 more•Institutions (1)

University of Michigan¹

01 Oct 1998-Medical Physics

TL;DR: The results of this study indicate the potential of using combined morphological and texture features for computer-aided classification of microcalcifications.

...read moreread less

Abstract: We are developing computerized feature extraction and classification methods to analyze malignant and benign microcalcifications on digitized mammograms. Morphological features that described the size, contrast, and shape of microcalcifications and their variations within a cluster were designed to characterize microcalcifications segmented from the mammographic background. Texture features were derived from the spatial gray-level dependence (SGLD) matrices constructed at multiple distances and directions from tissue regions containing microcalcifications. A genetic algorithm (GA) based feature selection technique was used to select the best feature subset from the multi-dimensional feature spaces. The GA-based method was compared to the commonly used feature selection method based on the stepwise linear discriminant analysis (LDA) procedure. Linear discriminant classifiers using the selected features as input predictor variables were formulated for the classification task. The discriminant scores output from the classifiers were analyzed by receiver operating characteristic (ROC) methodology and the classification accuracy was quantified by the area, A z , under the ROC curve. We analyzed a data set of 145 mammographic microcalcification clusters in this study. It was found that the feature subsets selected by the GA-based method are comparable to or slightly better than those selected by the stepwise LDA method. The texture features (A z =0.84) were more effective than morphological features (A z =0.79) in distinguishing malignant and benign microcalcifications. The highest classification accuracy (A z =0.89) was obtained in the combined texture and morphological feature space. The improvement was statistically significant in comparison to classification in either the morphological (p=0.002) or the texture (p=0.04) feature space alone. The classifier using the best feature subset from the combined feature space and an appropriate decision threshold could correctly identify 35% of the benign clusters without missing a malignant cluster. When the average discriminant score from all views of the same cluster was used for classification, the A z value increased to 0.93 and the classifier could identify 50% of the benign clusters at 100% sensitivity for malignancy. Alternatively, if the minimum discriminant score from all views of the same cluster was used, the A z value would be 0.90 and a specificity of 32% would be obtained at 100% sensitivity. The results of this study indicate the potential of using combined morphological and texture features for computer-aided classification of microcalcifications.

...read moreread less

Journal Article•DOI•

Properties of Support Vector Machines

[...]

Massimiliano Pontil¹, Alessandro Verri¹•Institutions (1)

University of Genoa¹

15 May 1998-Neural Computation

TL;DR: It is shown that the decision surface can be written as the sum of two orthogonal terms, the first depending on only the margin vectors (which are SVs lying on the margin), the second proportional to the regularization parameter for almost all values of the parameter.

...read moreread less

Abstract: Support Vector Machines (SVMs) perform pattern recognition between two point classes by finding a decision surface determined by certain points of the training set, termed Support Vectors (SV). This surface, which in some feature space of possibly infinite dimension can be regarded as a hyperplane, is obtained from the solution of a problem of quadratic programming that depends on a regularization parameter. In this paper we study some mathematical properties of support vectors and show that the decision surface can be written as the sum of two orthogonal terms, the first depending only on the {\em margin vectors} (which are SVs lying on the margin), the second proportional to the regularization parameter. For almost all values of the parameter, this enables us to predict how the decision surface varies for small parameter changes. In the special but important case of feature space of finite dimension m, we also show that there are at most m+1 margin vectors and observe that m+1 SVs are usually sufficient to fully determine the decision surface. For relatively small m this latter result leads to a consistent reduction of the SV number.

...read moreread less

Proceedings Article•

Learning to resolve natural language ambiguities: a unified approach

[...]

Dan Roth

01 Jul 1998

TL;DR: In this paper, a sparse network of linear separators is proposed for natural language disambiguation, which is based on the Winnow learning algorithm and is shown to perform well in a variety of ambiguity resolution problems.

...read moreread less

Abstract: We analyze a few of the commonly used statistics based and machine learning algorithms for natural language disambiguation tasks and observe that they can be recast as learning linear separators in the feature space. Each of the methods makes a priori assumptions which it employs, given the data, when searching for its hypothesis. Nevertheless, as we show, it searches a space that is as rich as the space of all linear separators. We use this to build an argument for a data driven approach which merely searches for a good linear separator in the feature space, without further assumptions on the domain or a specific problem.We present such an approach - a sparse network of linear separators, utilizing the Winnow learning algorithm - and show how to use it in a variety of ambiguity resolution problems. The learning approach presented is attribute-efficient and, therefore, appropriate for domains having very large number of attributes.In particular, we present an extensive experimental comparison of our approach with other methods on several well studied lexical disambiguation tasks such as context-sensitive spelling correction, prepositional phrase attachment and part of speech tagging. In all cases we show that our approach either outperforms other methods tried for these tasks or performs comparably to the best.

...read moreread less

Journal Article•DOI•

Feature Selection Via Mathematical Programming

[...]

Paul S. Bradley, Olvi L. Mangasarian, W. N. Street

01 Feb 1998-Informs Journal on Computing

TL;DR: Computational tests of three approaches to feature selection algorithm via concave minimization on publicly available real-world databases have been carried out and compared with an adaptation of the optimal brain damage method for reducing neural network complexity.

...read moreread less

Abstract: The problem of discriminating between two finite point sets in n-dimensional feature space by a separating plane that utilizes as few of the features as possible is formulated as a mathematical program with a parametric objective function and linear constraints. The step function that appears in the objective function can be approximated by a sigmoid or by a concave exponential on the nonnegative real line, or it can be treated exactly by considering the equivalent linear program with equilibrium constraints. Computational tests of these three approaches on publicly available real-world databases have been carried out and compared with an adaptation of the optimal brain damage method for reducing neural network complexity. One feature selection algorithm via concave minimization reduced cross-validation error on a cancer prognosis database by 35.4% while reducing problem features from 32 to 4.

...read moreread less

Book•

Nonlinear modeling : advanced black-box techniques

[...]

Johan A. K. Suykens, Joos Vandewalle

01 Jan 1998

TL;DR: This work focuses on the development of neural nets and related model Structures for Nonlinear System Identification based on Fuzzy Models, and their applications in Speech Recognition and Nonlinear Time-Series Analysis.

...read moreread less

Abstract: Preface. 1. Neural Nets and Related Model Structures for Nonlinear System Identification J. Sjoberg, L.S.H. Ngia. 2. Enhanced Multi-Stream Kalman Filter Training for Recurrent Networks L.A. Feldkamp, et al. 3. The Support Vector Method of Function Estimation V. Vapnik. 4. Parametric Density Estimation for the Classification of Acoustic Feature Vectors in Speech Recognition S. Basu, C.A. Micchelli. 5. Wavelet Based Modeling of Nonlinear Systems Yi Yu, et al. 6. Nonlinear Identification Based on Fuzzy Models V. Wertz, S. Yurkovich. 7. Statistical Learning in Control and Matrix Theory M. Vidyasagar. 8. Nonlinear Time-Series Analysis U. Parlitz. 9. The K.U. Leuven Time Series Prediction Competition J.A.K. Suykens, J. Vandewalle. References. Index.

...read moreread less

Proceedings Article•DOI•

Vehicle sound signature recognition by frequency vector principal component analysis

[...]

Huadong Wu¹, Mel Siegel¹, P. Khosla¹•Institutions (1)

Carnegie Mellon University¹

18 May 1998

TL;DR: The "eigenfaces method", originally used in human face recognition, is introduced, to model the sound frequency distribution features and it is shown that it can be a simple and reliable acoustic identification method if the training samples can be properly chosen and classified.

...read moreread less

Abstract: The sound (engine, noise, etc.) of a working vehicle provides an important clue, e.g., for surveillance mission robots, to recognize the vehicle type. In this paper, we introduce the "eigenfaces method", originally used in human face recognition, to model the sound frequency distribution features. We show that it can be a simple and reliable acoustic identification method if the training samples can be properly chosen and classified. We treat the frequency spectra of about 200 ms of sound (a "frame") as a vector in a high-dimensional frequency feature space. In this space, we study the vector distribution for each kind of vehicle sound produced under similar working conditions. A collection of typical sound samples is used as the training data set. The mean frequency vector of the training set is first calculated, and subtracted from each vector in the set. To capture the frequency vectors' variation within the training set, we then calculate the eigenvectors of the covariance matrix of the zero-mean-adjusted sample data set. These eigenvectors represent the principal components of the vector distribution: for each such eigenvector, its corresponding eigenvalue indicates its importance in capturing the variation distribution, with the largest eigenvalues accounting for the most variance within this data set. Thus for each set of training data, its mean vector and its moat important eigenvectors together characterize its sound signature. When a new frame (not in the training set) is tested, its spectrum vector is compared against the mean vector; the difference vector is then projected into the principal component directions, and the residual is found. The coefficients of the unknown vector, in the training set eigenvector basis subspace, identify the unknown vehicle noise in terms of the classes represented in the training set. The magnitude of the residual vector measures the extent to which the unknown vehicle sound cannot be well characterized by the vehicle sounds included in the training set.

...read moreread less

Proceedings Article•

Classification on Pairwise Proximity Data

[...]

Thore Graepel¹, Ralf Herbrich, Peter Bollmann-Sdorra, Klaus Obermayer¹•Institutions (1)

Technical University of Berlin¹

01 Dec 1998

TL;DR: This work investigates the problem of learning a classification task on data represented in terms of their pairwise proximities, which does not refer to an explicit feature representation of the data items and is thus more general than the standard approach of using Euclidean feature vectors.

...read moreread less

Abstract: We investigate the problem of learning a classification task on data represented in terms of their pairwise proximities. This representation does not refer to an explicit feature representation of the data items and is thus more general than the standard approach of using Euclidean feature vectors, from which pairwise proximities can always be calculated. Our first approach is based on a combined linear embedding and classification procedure resulting in an extension of the Optimal Hyperplane algorithm to pseudo-Euclidean data. As an alternative we present another approach based on a linear threshold model in the proximity values themselves, which is optimized using Structural Risk Minimization. We show that prior knowledge about the problem can be incorporated by the choice of distance measures and examine different metrics W.r.t. their generalization. Finally, the algorithms are successfully applied to protein structure data and to data from the cat's cerebral cortex. They show better performance than K-nearest-neighbor classification.

...read moreread less

Proceedings Article•DOI•

RIME: A replicated image detector for the World-Wide Web

[...]

Edward Y. Chang¹, James Z. Wang², Chen Li¹, Gio Wiederhold¹•Institutions (2)

Stanford University¹, Penn State College of Information Sciences and Technology²

05 Oct 1998

TL;DR: This paper describes RIME (Replicated IMage dEtector), an alternative approach to watermarking for detecting unauthorized image copying on the Internet and shows that it can detect image copies effectively.

...read moreread less

Abstract: This paper describes RIME (Replicated IMage dEtector), an alternative approach to watermarking for detecting unauthorized image copying on the Internet. RIME profiles internet images and stores the feature vectors of the images and their URLs in its repository. When a copy detection request is received, RIME matches the requested image's feature vector with the vectors stored in the repository and returns a list of suspect URLs. RIME characterizes each image using Daubechies' wavelets. The wavelet coefficients are stored as the feature vector. RIME uses a multidimensional extensible hashing scheme to index these high-dimensional feature vectors. Our preliminary result shows that it can detect image copies effectively: It can find the top suspects and copes well with image format conversion, resampling, and requantization.

...read moreread less

Journal Article•DOI•

Self-organizing maps: Generalizations and new optimization techniques

[...]

Thore Graepel¹, Matthias Burger¹, Klaus Obermayer¹•Institutions (1)

Technical University of Berlin¹

06 Nov 1998-Neurocomputing

TL;DR: Three algorithms for the generation of topographic mappings to the practitioner of unsupervised data analysis are offered based on the minimization of a cost function which is performed using an EM algorithm and deterministic annealing.

...read moreread less

Book Chapter•DOI•

Support vector regression with automatic accuracy control.

[...]

Bernhard Schölkopf¹, Peter L. Bartlett¹, Alexander J. Smola¹, Robert C. Williamson¹•Institutions (1)

Australian National University¹

02 Sep 1998

TL;DR: A new algorithm for Support Vector regression is proposed that automatically adjusts a flexible tube of minimal radius to the data such that at most a fraction of the data points lie outside.

...read moreread less

Abstract: A new algorithm for Support Vector regression is proposed. For a priori chosen ν, it automatically adjusts a flexible tube of minimal radius to the data such that at most a fraction ν of the data points lie outside. The algorithm is analysed theoretically and experimentally.

...read moreread less

Book Chapter•DOI•

Metrics and Models for Handwritten Character Recognition

[...]

Trevor Hastie¹, Patrice Y. Simard²•Institutions (2)

Stanford University¹, Bell Labs²

01 Feb 1998-Statistical Science

TL;DR: The handwritten digits taken from US envelopes are regarded as a feature vector to be used as input to a classifier, which will automatically assign a digit class based on the pixel values.

...read moreread less

Abstract: Figure 1 shows some handwritten digits taken from US envelopes. Each image consists of 16 × 16 pixels of greyscale values ranging from 0 – 255. These 256 pixel values are regarded as a feature vector to be used as input to a classifier, which will automatically assign a digit class based on the pixel values.

...read moreread less

Journal Article•DOI•

System for screening objectionable images

[...]

James Z. Wang¹, Jia Li¹, Gio Wiederhold¹, Oscar Firschein¹•Institutions (1)

Stanford University¹

01 Oct 1998-Computer Communications

TL;DR: WIPE(TM) (Wavelet Image Pornography Elimination), a system capable of classifying an image as objectionable or benign, is described, which is practical for real-world applications and has demonstrated 96% sensitivity over a test set of 1076 digital photographs found on objectionable news groups.

...read moreread less

Patent•

Method and apparatus for recognizing patterns

[...]

Takahiko Kawatani¹, Hiroyuki Shimizu¹•Institutions (1)

Hewlett-Packard¹

13 Feb 1998

TL;DR: In this paper, a pattern recognition apparatus consisting of a feature extractor, a feature transform module, a recognition section, a dictionary, and a categorizer is presented. But the categorizer identifies the category to which the pattern belongs in response to the at least one difference value.

...read moreread less

Abstract: A pattern recognition apparatus that comprises an input section, a feature extraction module, a feature transform module, a recognition section that includes a recognition dictionary, and a categorizer. The input section receives input patterns that include a pattern belonging to one of plural categories constituting a category set. The feature extraction module that expresses features of the pattern as a feature vector. The feature transform module uses transform vector matrices to transform at least part of the feature vector to generate an at least partially transformed feature vector corresponding to each of the categories. The transform vector matrices include a transform vector matrix generated in response to a rival pattern set composed of rival patterns misrecognized as belonging to plural ones of the categories. The plural ones of the categories constitute a category subset. The at least partially transformed feature vector is common to the ones of the categories constituting the category subset. The recognition dictionary stores both matching information and transformed matching information for each of the categories. The first transformed matching information has been transformed using the transform vector matrices. The recognition section generates at least one difference value for each of the categories by performing a matching operation between the matching information and the first transformed matching information on one hand, and at least one matching vector derived at least from the at least partially transformed feature vector corresponding to each of the categories on the other hand. The categorizer identifies the category to which the pattern belongs in response to the at least one difference value.

...read moreread less

Proceedings Article•DOI•

Using word probabilities as confidence measures

[...]

Frank Wessel, K. Macherey¹, Ralf Schlüter•Institutions (1)

RWTH Aachen University¹

12 May 1998

TL;DR: An approach to estimate the confidence in a hypothesized word as its posterior probability, given all acoustic feature vectors of the speaker utterance, as the sum of all word hypothesis probabilities which represent the occurrence of the same word in more or less the same segment of time.

...read moreread less

Abstract: Estimates of confidence for the output of a speech recognition system can be used in many practical applications of speech recognition technology. They can be employed for detecting possible errors and can help to avoid undesirable verification turns in automatic inquiry systems. We propose to estimate the confidence in a hypothesized word as its posterior probability, given all acoustic feature vectors of the speaker utterance. The basic idea of our approach is to estimate the posterior word probabilities as the sum of all word hypothesis probabilities which represent the occurrence of the same word in more or less the same segment of time. The word hypothesis probabilities are approximated by paths in a wordgraph and are computed using a simplified forward-backward algorithm. We present experimental results on the North American Business (NAB'94) and the German Verbmobil recognition task.

...read moreread less

Book Chapter•

Content-based query of image databases, inspirations from text retrieval: inverted files, frequency-based weights and relevance feedback

[...]

David McG. Squire, Wolfgang Müller, Henning Müller, Jilali Raki

01 Jan 1998

TL;DR: A scheme based on the frequency of occurrence of features in both individual images and in the whole collection provides a means of weighting possibly incommensurate features in a compatible manner, and naturally extends to incorporate relevance feedback queries.

...read moreread less

Abstract: In this paper we report the application of techniques inspired by text retrieval research to the content-based query of image databases. In particular, we show how the use of an inverted file data structure permits the use of a feature space of $\mathcal{O}(104)$ dimensions, by restricting search to the subspace spanned by the features present in the query. A suitably sparse set of colour and texture features is proposed. A scheme based on the frequency of occurrence of features in both individual images and in the whole collection provides a means of weighting possibly incommensurate features in a compatible manner, and naturally extends to incorporate relevance feedback queries. The use of relevance feedback is shown consistently to improve system performance, as measured by precision and recall.

...read moreread less

Proceedings Article•

Automated Text Categorization Using Support Vector Machine.

[...]

James T. Kwok

01 Jan 1998

TL;DR: SVM adapts eeciently in dynamic environments that require frequent additions to the document collection, and allows easy incorporation of new documents into an existing trained system.

...read moreread less

Abstract: In this paper, we study the use of support vector machine in text categorization. Unlike other machine learning techniques , it allows easy incorporation of new documents into an existing trained system. Moreover, dimension reduction, which is usually imperative, now becomes optional. Thus, SVM adapts eeciently in dynamic environments that require frequent additions to the document collection. Empirical results on the Reuters-22173 collection are also discussed.

...read moreread less

Collapse