scispace - formally typeset
Search or ask a question

Showing papers on "Feature vector published in 1998"


Journal ArticleDOI
Tin Kam Ho1
TL;DR: A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity.
Abstract: Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experiments on publicly available datasets, where the method's superiority is demonstrated. We also discuss independence between trees in a forest and relate that to the combined classification accuracy.

5,984 citations


01 Jan 1998
TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.
Abstract: A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set. Feature selection degraded machine learning performance in cases where some features were eliminated which were highly predictive of very small areas of the instance space. Further experiments compared CFS with a wrapper—a well known approach to feature selection that employs the target learning algorithm to evaluate feature sets. In many cases CFS gave comparable results to the wrapper, and in general, outperformed the wrapper on small datasets. CFS executes many times faster than the wrapper, which allows it to scale to larger datasets. Two methods of extending CFS to handle feature interaction are presented and experimentally evaluated. The first considers pairs of features and the second incorporates iii feature weights calculated by the RELIEF algorithm. Experiments on artificial domains showed that both methods were able to identify interacting features. On natural domains, the pairwise method gave more reliable results than using weights provided by RELIEF.

3,533 citations


Journal ArticleDOI
TL;DR: An example-based learning approach for locating vertical frontal views of human faces in complex scenes and shows empirically that the distance metric adopted for computing difference feature vectors, and the "nonface" clusters included in the distribution-based model, are both critical for the success of the system.
Abstract: We present an example-based learning approach for locating vertical frontal views of human faces in complex scenes. The technique models the distribution of human face patterns by means of a few view-based "face" and "nonface" model clusters. At each image location, a difference feature vector is computed between the local image pattern and the distribution-based model. A trained classifier determines, based on the difference feature vector measurements, whether or not a human face exists at the current image location. We show empirically that the distance metric we adopt for computing difference feature vectors, and the "nonface" clusters we include in our distribution-based model, are both critical for the success of our system.

2,013 citations


Proceedings Article
27 Aug 1998
TL;DR: A new algorithm to clustering in large multimedia databases called DENCLUE (DENsity-based CLUstEring) is introduced, which has a firm mathematical basis, has good clustering properties in data sets with large amounts of noise, allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets and is significantly faster than existing algorithms.
Abstract: Several clustering algorithms can be applied to clustering in large multimedia databases. The effectiveness and efficiency of the existing algorithms, however, is somewhat limited, since clustering in multimedia databases requires clustering high-dimensional feature vectors and since multimedia databases often contain large amounts of noise. In this paper, we therefore introduce a new algorithm to clustering in large multimedia databases called DENCLUE (DENsity-based CLUstEring). The basic idea of our new approach is to model the overall point density analytically as the sum of influence functions of the data points. Clusters can then be identified by determining density-attractors and clusters of arbitrary shape can be easily described by a simple equation based on the overall density function. The advantages of our new approach are (1) it has a firm mathematical basis, (2) it has good clustering properties in data sets with large amounts of noise, (3) it allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets and (4) it is significantly faster than existing algorithms. To demonstrate the effectiveness and efficiency of DENCLUE, we perform a series of experiments on a number of different data sets from CAD and molecular biology. A comparison with DBSCAN shows the superiority of our new approach.

1,298 citations


Proceedings Article
24 Jul 1998
TL;DR: Numerical tests on 6 public data sets show that classi ers trained by the concave minimization approach and those trained by a support vector machine have comparable 10fold cross-validation correctness.
Abstract: Computational comparison is made between two feature selection approaches for nding a separating plane that discriminates between two point sets in an n-dimensional feature space that utilizes as few of the n features (dimensions) as possible. In the concave minimization approach [19, 5] a separating plane is generated by minimizing a weighted sum of distances of misclassi ed points to two parallel planes that bound the sets and which determine the separating plane midway between them. Furthermore, the number of dimensions of the space used to determine the plane is minimized. In the support vector machine approach [27, 7, 1, 10, 24, 28], in addition to minimizing the weighted sum of distances of misclassi ed points to the bounding planes, we also maximize the distance between the two bounding planes that generate the separating plane. Computational results show that feature suppression is an indirect consequence of the support vector machine approach when an appropriate norm is used. Numerical tests on 6 public data sets show that classi ers trained by the concave minimization approach and those trained by a support vector machine have comparable 10fold cross-validation correctness. However, in all data sets tested, the classi ers obtained by the concave minimization approach selected fewer problem features than those trained by a support vector machine.

1,074 citations


Proceedings ArticleDOI
14 Apr 1998
TL;DR: The authors investigate the use of two types of features extracted from face images for recognizing facial expressions, and it turns out that five to seven hidden units are probably enough to represent the space of feature expressions.
Abstract: The authors investigate the use of two types of features extracted from face images for recognizing facial expressions. The first type is the geometric positions of a set of fiducial points on a face. The second type is a set of multi-scale and multi-orientation Gabor wavelet coefficients extracted from the face image at the fiducial points. They can be used either independently or jointly. The architecture developed is based on a two-layer perceptron. The recognition performance with different types of features has been compared, which shows that Gabor wavelet coefficients are much more powerful than geometric positions. Furthermore, since the first layer of the perceptron actually performs a nonlinear reduction of the dimensionality of the feature space, they have also studied the desired number of hidden units, i.e., the appropriate dimension to represent a facial expression in order to achieve a good recognition rate. It turns out that five to seven hidden units are probably enough to represent the space of feature expressions.

637 citations


Book ChapterDOI
Vladimir Vapnik1
01 Jan 1998
TL;DR: For the Support Vector method both the quality of solution and the complexity of the solution does not depend directly on the dimensionality of an input space, and on the basis of this technique one can obtain a good estimate using a given number of high-dimensional data.
Abstract: This chapter describes the Support Vector technique for function estimation problems such as pattern recognition, regression estimation, and solving linear operator equations. It shows that for the Support Vector method both the quality of solution and the complexity of the solution does not depend directly on the dimensionality of an input space. Therefore, on the basis of this technique one can obtain a good estimate using a given number of high-dimensional data.

561 citations


01 Jan 1998
TL;DR: A new feature selection algorithm is described that uses a correlation based heuristic to determine the “goodness” of feature subsets, and its effectiveness is evaluated with three common machine learning algorithms.
Abstract: Machine learning algorithms automatically extract knowledge from machine readable information. Unfortunately, their success is usually dependant on the quality of the data that they operate on. If the data is inadequate, or contains extraneous and irrelevant information, machine learning algorithms may produce less accurate and less understandable results, or may fail to discover anything of use at all. Feature subset selectors are algorithms that attempt to identify and remove as much irrelevant and redundant information as possible prior to learning. Feature subset selection can result in enhanced performance, a reduced hypothesis search space, and, in some cases, reduced storage requirement. This paper describes a new feature selection algorithm that uses a correlation based heuristic to determine the “goodness” of feature subsets, and evaluates its effectiveness with three common machine learning algorithms. Experiments using a number of standard machine learning data sets are presented. Feature subset selection gave significant improvement for all three algorithms.

515 citations


Journal ArticleDOI
Olli Viikki1, Kari Laurila1
TL;DR: A segmental feature vector normalization technique is proposed which makes an automatic speech recognition system more robust to environmental changes by normalizing the output of the signal-processing front-end to have similar segmental parameter statistics in all noise conditions.

405 citations


Proceedings ArticleDOI
12 May 1998
TL;DR: A new method based on the extraction of 2D-DCT feature vectors is described, and the recognition results are compared with other face recognition approaches.
Abstract: The work presented in this paper focuses on the use of hidden Markov models for face recognition. A new method based on the extraction of 2D-DCT feature vectors is described, and the recognition results are compared with other face recognition approaches. The method introduced reduces significantly the computational complexity of previous HMM-based face recognition system, while preserving the same recognition rate.

341 citations


Patent
19 Feb 1998
TL;DR: In this article, a gesture is defined as a hand or body initiated movement of a cursor directing device to outline a particular pattern in particular directions done in particular periods of time, and gestures can be recognized using a radial basis function neural network.
Abstract: A computer implemented method and system for gesture category recognition and training. Generally, a gesture is a hand or body initiated movement of a cursor directing device to outline a particular pattern in particular directions done in particular periods of time. The present invention allows a computer system to accept input data, originating from a user, in the form gesture data that are made using the cursor directing device. In one embodiment, a mouse device is used, but the present invention is equally well suited for use with other cursor directing devices (e.g., a track ball, a finger pad, an electronic stylus, etc.). In one embodiment, gesture data is accepted by pressing a key on the keyboard and then moving the mouse (with mouse button pressed) to trace out the gesture. Mouse position information and time stamps are recorded. The present invention then determines a multi-dimensional feature vector based on the gesture data. The feature vector is then passed through a gesture category recognition engine that, in one implementation, uses a radial basis function neural network to associate the feature vector to a pre-existing gesture category. Once identified, a set of user commands that are associated with the gesture category are applied to the computer system. The user commands can originate from an automatic process that extracts commands that are associated with the menu items of a particular application program. The present invention also allows user training so that user-defined gestures, and the computer commands associated therewith, can be programmed into the computer system.

Proceedings ArticleDOI
Ramesh A. Gopinath1
12 May 1998
TL;DR: It is shown that in some cases sharing parameters across classes can also lead to better discrimination (as evidenced by reduced misclassification error), and some constraints on the parameters are shown to lead to linear discrimination analysis.
Abstract: Maximum likelihood (ML) modeling of multiclass data for classification often suffers from the following problems: (a) data insufficiency implying overtrained or unreliable models, (b) large storage requirement, (c) large computational requirement and/or (d) the ML is not discriminating between classes. Sharing parameters across classes (or constraining the parameters) clearly tends to alleviate the first three problems. We show that in some cases it can also lead to better discrimination (as evidenced by reduced misclassification error). The parameters considered are the means and variances of the Gaussians and linear transformations of the feature space (or equivalently the Gaussian means). Some constraints on the parameters are shown to lead to linear discrimination analysis (a well-known result) while others are shown to lead to optimal feature spaces (a relatively new result). Applications of some of these ideas to the speech recognition problem are also given.

Journal ArticleDOI
01 Oct 1998
TL;DR: A set of low-level audio features are proposed for characterizing semantic contents of short audio clips and a neural net classifier was successful in separating the above five types of TV programs.
Abstract: Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence.

Posted Content
TL;DR: The authors presented an algorithm combining variants of Winnow and weighted-majority voting, and applied it to a problem in the aforementioned class: context-sensitive spelling correction, which is the task of fixing spelling errors that happen to result in valid words, such as substituting "to" for "too", "casual" for 'causal", etc.
Abstract: A large class of machine-learning problems in natural language require the characterization of linguistic context. Two characteristic properties of such problems are that their feature space is of very high dimensionality, and their target concepts refer to only a small subset of the features in the space. Under such conditions, multiplicative weight-update algorithms such as Winnow have been shown to have exceptionally good theoretical properties. We present an algorithm combining variants of Winnow and weighted-majority voting, and apply it to a problem in the aforementioned class: context-sensitive spelling correction. This is the task of fixing spelling errors that happen to result in valid words, such as substituting "to" for "too", "casual" for "causal", etc. We evaluate our algorithm, WinSpell, by comparing it against BaySpell, a statistics-based method representing the state of the art for this task. We find: (1) When run with a full (unpruned) set of features, WinSpell achieves accuracies significantly higher than BaySpell was able to achieve in either the pruned or unpruned condition; (2) When compared with other systems in the literature, WinSpell exhibits the highest performance; (3) The primary reason that WinSpell outperforms BaySpell is that WinSpell learns a better linear separator; (4) When run on a test set drawn from a different corpus than the training set was drawn from, WinSpell is better able than BaySpell to adapt, using a strategy we will present that combines supervised learning on the training set with unsupervised learning on the (noisy) test set.

Journal ArticleDOI
TL;DR: The results of this study indicate the potential of using combined morphological and texture features for computer-aided classification of microcalcifications.
Abstract: We are developing computerized feature extraction and classification methods to analyze malignant and benign microcalcifications on digitized mammograms. Morphological features that described the size, contrast, and shape of microcalcifications and their variations within a cluster were designed to characterize microcalcifications segmented from the mammographic background. Texture features were derived from the spatial gray-level dependence (SGLD) matrices constructed at multiple distances and directions from tissue regions containing microcalcifications. A genetic algorithm (GA) based feature selection technique was used to select the best feature subset from the multi-dimensional feature spaces. The GA-based method was compared to the commonly used feature selection method based on the stepwise linear discriminant analysis (LDA) procedure. Linear discriminant classifiers using the selected features as input predictor variables were formulated for the classification task. The discriminant scores output from the classifiers were analyzed by receiver operating characteristic (ROC) methodology and the classification accuracy was quantified by the area, A z , under the ROC curve. We analyzed a data set of 145 mammographic microcalcification clusters in this study. It was found that the feature subsets selected by the GA-based method are comparable to or slightly better than those selected by the stepwise LDA method. The texture features (A z =0.84) were more effective than morphological features (A z =0.79) in distinguishing malignant and benign microcalcifications. The highest classification accuracy (A z =0.89) was obtained in the combined texture and morphological feature space. The improvement was statistically significant in comparison to classification in either the morphological (p=0.002) or the texture (p=0.04) feature space alone. The classifier using the best feature subset from the combined feature space and an appropriate decision threshold could correctly identify 35% of the benign clusters without missing a malignant cluster. When the average discriminant score from all views of the same cluster was used for classification, the A z value increased to 0.93 and the classifier could identify 50% of the benign clusters at 100% sensitivity for malignancy. Alternatively, if the minimum discriminant score from all views of the same cluster was used, the A z value would be 0.90 and a specificity of 32% would be obtained at 100% sensitivity. The results of this study indicate the potential of using combined morphological and texture features for computer-aided classification of microcalcifications.

Journal ArticleDOI
TL;DR: It is shown that the decision surface can be written as the sum of two orthogonal terms, the first depending on only the margin vectors (which are SVs lying on the margin), the second proportional to the regularization parameter for almost all values of the parameter.
Abstract: Support Vector Machines (SVMs) perform pattern recognition between two point classes by finding a decision surface determined by certain points of the training set, termed Support Vectors (SV). This surface, which in some feature space of possibly infinite dimension can be regarded as a hyperplane, is obtained from the solution of a problem of quadratic programming that depends on a regularization parameter. In this paper we study some mathematical properties of support vectors and show that the decision surface can be written as the sum of two orthogonal terms, the first depending only on the {\em margin vectors} (which are SVs lying on the margin), the second proportional to the regularization parameter. For almost all values of the parameter, this enables us to predict how the decision surface varies for small parameter changes. In the special but important case of feature space of finite dimension m, we also show that there are at most m+1 margin vectors and observe that m+1 SVs are usually sufficient to fully determine the decision surface. For relatively small m this latter result leads to a consistent reduction of the SV number.

Proceedings Article
01 Jul 1998
TL;DR: In this paper, a sparse network of linear separators is proposed for natural language disambiguation, which is based on the Winnow learning algorithm and is shown to perform well in a variety of ambiguity resolution problems.
Abstract: We analyze a few of the commonly used statistics based and machine learning algorithms for natural language disambiguation tasks and observe that they can be recast as learning linear separators in the feature space. Each of the methods makes a priori assumptions which it employs, given the data, when searching for its hypothesis. Nevertheless, as we show, it searches a space that is as rich as the space of all linear separators. We use this to build an argument for a data driven approach which merely searches for a good linear separator in the feature space, without further assumptions on the domain or a specific problem.We present such an approach - a sparse network of linear separators, utilizing the Winnow learning algorithm - and show how to use it in a variety of ambiguity resolution problems. The learning approach presented is attribute-efficient and, therefore, appropriate for domains having very large number of attributes.In particular, we present an extensive experimental comparison of our approach with other methods on several well studied lexical disambiguation tasks such as context-sensitive spelling correction, prepositional phrase attachment and part of speech tagging. In all cases we show that our approach either outperforms other methods tried for these tasks or performs comparably to the best.

Journal ArticleDOI
TL;DR: Computational tests of three approaches to feature selection algorithm via concave minimization on publicly available real-world databases have been carried out and compared with an adaptation of the optimal brain damage method for reducing neural network complexity.
Abstract: The problem of discriminating between two finite point sets in n-dimensional feature space by a separating plane that utilizes as few of the features as possible is formulated as a mathematical program with a parametric objective function and linear constraints. The step function that appears in the objective function can be approximated by a sigmoid or by a concave exponential on the nonnegative real line, or it can be treated exactly by considering the equivalent linear program with equilibrium constraints. Computational tests of these three approaches on publicly available real-world databases have been carried out and compared with an adaptation of the optimal brain damage method for reducing neural network complexity. One feature selection algorithm via concave minimization reduced cross-validation error on a cancer prognosis database by 35.4% while reducing problem features from 32 to 4.

Book
01 Jan 1998
TL;DR: This work focuses on the development of neural nets and related model Structures for Nonlinear System Identification based on Fuzzy Models, and their applications in Speech Recognition and Nonlinear Time-Series Analysis.
Abstract: Preface. 1. Neural Nets and Related Model Structures for Nonlinear System Identification J. Sjoberg, L.S.H. Ngia. 2. Enhanced Multi-Stream Kalman Filter Training for Recurrent Networks L.A. Feldkamp, et al. 3. The Support Vector Method of Function Estimation V. Vapnik. 4. Parametric Density Estimation for the Classification of Acoustic Feature Vectors in Speech Recognition S. Basu, C.A. Micchelli. 5. Wavelet Based Modeling of Nonlinear Systems Yi Yu, et al. 6. Nonlinear Identification Based on Fuzzy Models V. Wertz, S. Yurkovich. 7. Statistical Learning in Control and Matrix Theory M. Vidyasagar. 8. Nonlinear Time-Series Analysis U. Parlitz. 9. The K.U. Leuven Time Series Prediction Competition J.A.K. Suykens, J. Vandewalle. References. Index.

Proceedings ArticleDOI
18 May 1998
TL;DR: The "eigenfaces method", originally used in human face recognition, is introduced, to model the sound frequency distribution features and it is shown that it can be a simple and reliable acoustic identification method if the training samples can be properly chosen and classified.
Abstract: The sound (engine, noise, etc.) of a working vehicle provides an important clue, e.g., for surveillance mission robots, to recognize the vehicle type. In this paper, we introduce the "eigenfaces method", originally used in human face recognition, to model the sound frequency distribution features. We show that it can be a simple and reliable acoustic identification method if the training samples can be properly chosen and classified. We treat the frequency spectra of about 200 ms of sound (a "frame") as a vector in a high-dimensional frequency feature space. In this space, we study the vector distribution for each kind of vehicle sound produced under similar working conditions. A collection of typical sound samples is used as the training data set. The mean frequency vector of the training set is first calculated, and subtracted from each vector in the set. To capture the frequency vectors' variation within the training set, we then calculate the eigenvectors of the covariance matrix of the zero-mean-adjusted sample data set. These eigenvectors represent the principal components of the vector distribution: for each such eigenvector, its corresponding eigenvalue indicates its importance in capturing the variation distribution, with the largest eigenvalues accounting for the most variance within this data set. Thus for each set of training data, its mean vector and its moat important eigenvectors together characterize its sound signature. When a new frame (not in the training set) is tested, its spectrum vector is compared against the mean vector; the difference vector is then projected into the principal component directions, and the residual is found. The coefficients of the unknown vector, in the training set eigenvector basis subspace, identify the unknown vehicle noise in terms of the classes represented in the training set. The magnitude of the residual vector measures the extent to which the unknown vehicle sound cannot be well characterized by the vehicle sounds included in the training set.

Proceedings Article
01 Dec 1998
TL;DR: This work investigates the problem of learning a classification task on data represented in terms of their pairwise proximities, which does not refer to an explicit feature representation of the data items and is thus more general than the standard approach of using Euclidean feature vectors.
Abstract: We investigate the problem of learning a classification task on data represented in terms of their pairwise proximities. This representation does not refer to an explicit feature representation of the data items and is thus more general than the standard approach of using Euclidean feature vectors, from which pairwise proximities can always be calculated. Our first approach is based on a combined linear embedding and classification procedure resulting in an extension of the Optimal Hyperplane algorithm to pseudo-Euclidean data. As an alternative we present another approach based on a linear threshold model in the proximity values themselves, which is optimized using Structural Risk Minimization. We show that prior knowledge about the problem can be incorporated by the choice of distance measures and examine different metrics W.r.t. their generalization. Finally, the algorithms are successfully applied to protein structure data and to data from the cat's cerebral cortex. They show better performance than K-nearest-neighbor classification.

Proceedings ArticleDOI
05 Oct 1998
TL;DR: This paper describes RIME (Replicated IMage dEtector), an alternative approach to watermarking for detecting unauthorized image copying on the Internet and shows that it can detect image copies effectively.
Abstract: This paper describes RIME (Replicated IMage dEtector), an alternative approach to watermarking for detecting unauthorized image copying on the Internet. RIME profiles internet images and stores the feature vectors of the images and their URLs in its repository. When a copy detection request is received, RIME matches the requested image's feature vector with the vectors stored in the repository and returns a list of suspect URLs. RIME characterizes each image using Daubechies' wavelets. The wavelet coefficients are stored as the feature vector. RIME uses a multidimensional extensible hashing scheme to index these high-dimensional feature vectors. Our preliminary result shows that it can detect image copies effectively: It can find the top suspects and copes well with image format conversion, resampling, and requantization.

Journal ArticleDOI
TL;DR: Three algorithms for the generation of topographic mappings to the practitioner of unsupervised data analysis are offered based on the minimization of a cost function which is performed using an EM algorithm and deterministic annealing.

Book ChapterDOI
02 Sep 1998
TL;DR: A new algorithm for Support Vector regression is proposed that automatically adjusts a flexible tube of minimal radius to the data such that at most a fraction of the data points lie outside.
Abstract: A new algorithm for Support Vector regression is proposed. For a priori chosen ν, it automatically adjusts a flexible tube of minimal radius to the data such that at most a fraction ν of the data points lie outside. The algorithm is analysed theoretically and experimentally.

Book ChapterDOI
TL;DR: The handwritten digits taken from US envelopes are regarded as a feature vector to be used as input to a classifier, which will automatically assign a digit class based on the pixel values.
Abstract: Figure 1 shows some handwritten digits taken from US envelopes. Each image consists of 16 × 16 pixels of greyscale values ranging from 0 – 255. These 256 pixel values are regarded as a feature vector to be used as input to a classifier, which will automatically assign a digit class based on the pixel values.

Journal ArticleDOI
TL;DR: WIPE(TM) (Wavelet Image Pornography Elimination), a system capable of classifying an image as objectionable or benign, is described, which is practical for real-world applications and has demonstrated 96% sensitivity over a test set of 1076 digital photographs found on objectionable news groups.

Patent
13 Feb 1998
TL;DR: In this paper, a pattern recognition apparatus consisting of a feature extractor, a feature transform module, a recognition section, a dictionary, and a categorizer is presented. But the categorizer identifies the category to which the pattern belongs in response to the at least one difference value.
Abstract: A pattern recognition apparatus that comprises an input section, a feature extraction module, a feature transform module, a recognition section that includes a recognition dictionary, and a categorizer. The input section receives input patterns that include a pattern belonging to one of plural categories constituting a category set. The feature extraction module that expresses features of the pattern as a feature vector. The feature transform module uses transform vector matrices to transform at least part of the feature vector to generate an at least partially transformed feature vector corresponding to each of the categories. The transform vector matrices include a transform vector matrix generated in response to a rival pattern set composed of rival patterns misrecognized as belonging to plural ones of the categories. The plural ones of the categories constitute a category subset. The at least partially transformed feature vector is common to the ones of the categories constituting the category subset. The recognition dictionary stores both matching information and transformed matching information for each of the categories. The first transformed matching information has been transformed using the transform vector matrices. The recognition section generates at least one difference value for each of the categories by performing a matching operation between the matching information and the first transformed matching information on one hand, and at least one matching vector derived at least from the at least partially transformed feature vector corresponding to each of the categories on the other hand. The categorizer identifies the category to which the pattern belongs in response to the at least one difference value.

Proceedings ArticleDOI
12 May 1998
TL;DR: An approach to estimate the confidence in a hypothesized word as its posterior probability, given all acoustic feature vectors of the speaker utterance, as the sum of all word hypothesis probabilities which represent the occurrence of the same word in more or less the same segment of time.
Abstract: Estimates of confidence for the output of a speech recognition system can be used in many practical applications of speech recognition technology. They can be employed for detecting possible errors and can help to avoid undesirable verification turns in automatic inquiry systems. We propose to estimate the confidence in a hypothesized word as its posterior probability, given all acoustic feature vectors of the speaker utterance. The basic idea of our approach is to estimate the posterior word probabilities as the sum of all word hypothesis probabilities which represent the occurrence of the same word in more or less the same segment of time. The word hypothesis probabilities are approximated by paths in a wordgraph and are computed using a simplified forward-backward algorithm. We present experimental results on the North American Business (NAB'94) and the German Verbmobil recognition task.

Book Chapter
01 Jan 1998
TL;DR: A scheme based on the frequency of occurrence of features in both individual images and in the whole collection provides a means of weighting possibly incommensurate features in a compatible manner, and naturally extends to incorporate relevance feedback queries.
Abstract: In this paper we report the application of techniques inspired by text retrieval research to the content-based query of image databases. In particular, we show how the use of an inverted file data structure permits the use of a feature space of $\mathcal{O}(104)$ dimensions, by restricting search to the subspace spanned by the features present in the query. A suitably sparse set of colour and texture features is proposed. A scheme based on the frequency of occurrence of features in both individual images and in the whole collection provides a means of weighting possibly incommensurate features in a compatible manner, and naturally extends to incorporate relevance feedback queries. The use of relevance feedback is shown consistently to improve system performance, as measured by precision and recall.

Proceedings Article
01 Jan 1998
TL;DR: SVM adapts eeciently in dynamic environments that require frequent additions to the document collection, and allows easy incorporation of new documents into an existing trained system.
Abstract: In this paper, we study the use of support vector machine in text categorization. Unlike other machine learning techniques , it allows easy incorporation of new documents into an existing trained system. Moreover, dimension reduction, which is usually imperative, now becomes optional. Thus, SVM adapts eeciently in dynamic environments that require frequent additions to the document collection. Empirical results on the Reuters-22173 collection are also discussed.