scispace - formally typeset
Search or ask a question

Showing papers on "Feature vector published in 2000"


Journal ArticleDOI
TL;DR: A new method that is close to the support vector machines insofar as the GDA method provides a mapping of the input vectors into high-dimensional feature space to deal with nonlinear discriminant analysis using kernel function operator.
Abstract: We present a new method that we call generalized discriminant analysis (GDA) to deal with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the support vector machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high-dimensional feature space. In the transformed space, linear properties make it easy to extend and generalize the classical linear discriminant analysis (LDA) to nonlinear discriminant analysis. The formulation is expressed as an eigenvalue problem resolution. Using a different kernel, one can cover a wide class of nonlinearities. For both simulated data and alternate kernels, we give classification results, as well as the shape of the decision function. The results are confirmed using real data to perform seed classification.

1,743 citations


Proceedings Article
Mark Hall1
29 Jun 2000
TL;DR: In this article, a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems is described, which often outperforms the ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees.
Abstract: Algorithms for feature selection fall into two broad categories: wrappers that use the learning algorithm itself to evaluate the usefulness of features and filters that evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems. The algorithm often outperforms the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does—reducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the preprocessed data are often significantly smaller.

1,511 citations


Proceedings Article
01 Jan 2000
TL;DR: In this article, an inner product in the feature space consisting of all subsequences of length k was introduced for comparing two text documents, where a subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously.
Abstract: We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences which are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. A preliminary experimental comparison of the performance of the kernel compared with a standard word feature space kernel [6] is made showing encouraging results.

1,464 citations


Proceedings Article
01 Jan 2000
TL;DR: An on-line recursive algorithm for training support vector machines, one vector at a time, is presented and interpretation of decremental unlearning in feature space sheds light on the relationship between generalization and geometry of the data.
Abstract: An on-line recursive algorithm for training support vector machines, one vector at a time, is presented. Adiabatic increments retain the Kuhn-Tucker conditions on all previously seen training data, in a number of steps each computed analytically. The incremental procedure is reversible, and decremental "unlearning" offers an efficient method to exactly evaluate leave-one-out generalization performance. Interpretation of decremental unlearning in feature space sheds light on the relationship between generalization and geometry of the data.

1,319 citations


Proceedings Article
01 Jan 2000
TL;DR: The results show that the use of the Mel scale for modeling music is at least not harmful for this problem, although further experimentation is needed to verify that this is the optimal scale in the general case and whether this transform is valid for music spectra.
Abstract: We examine in some detail Mel Frequency Cepstral Coefficients (MFCCs) the dominant features used for speech recognition and investigate their applicability to modeling music. In particular, we examine two of the main assumptions of the process of forming MFCCs: the use of the Mel frequency scale to model the spectra; and the use of the Discrete Cosine Transform (DCT) to decorrelate the Mel-spectral vectors. We examine the first assumption in the context of speech/music discrimination. Our results show that the use of the Mel scale for modeling music is at least not harmful for this problem, although further experimentation is needed to verify that this is the optimal scale in the general case. We investigate the second assumption by examining the basis vectors of the theoretically optimal transform to decorrelate music and speech spectral vectors. Our results demonstrate that the use of the DCT to decorrelate vectors is appropriate for both speech and music spectra. MFCCs for Music Analysis Of all the human generated sounds which influence our lives, speech and music are arguably the most prolific. Speech has received much focused attention and decades of research in this community have led to usable systems and convergence of the features used for speech analysis. In the music community however, although the field of synthesis is very mature, a dominant paradigm has yet to emerge to solve other problems such as music classification or transcription. Consequently, many representations for music have been proposed (e.g. (Martin1998), (Scheirer1997), (Blum1999)). In this paper, we examine some of the assumptions of Mel Frequency Cepstral Coefficients (MFCCs) the dominant features used for speech recognition and examine whether these assumptions are valid for modeling music. MFCCs have been used by other authors to model music and audio sounds (e.g. (Blum1999)). These works however use cepstral features merely because they have been so successful for speech recognition without examining the assumptions made in great detail. MFCCs (e.g. see (Rabiner1993)) are short-term spectral features. They are calculated as follows (the steps and assumptions made are explained in more detail in the full paper): 1. Divide signal into frames. 2. For each frame, obtain the amplitude spectrum. 3. Take the logarithm. 4. Convert to Mel (a perceptually-based) spectrum. 5. Take the discrete cosine transform (DCT). We seek to determine whether this process is suitable for creating features to model music. We examine only steps 4 and 5 since, as explained in the full paper, the other steps are less controversial. Step 4 calculates the log amplitude spectrum on the so-called Mel scale. This transformation emphasizes lower frequencies which are perceptually more meaningful for speech. It is possible however that the Mel scale may not be optimal for music as there may be more information in say higher frequencies. Step 5 takes the DCT of the Mel spectra. For speech, this approximates principal components analysis (PCA) which decorrelates the components of the feature vectors. We investigate whether this transform is valid for music spectra. Mel vs Linear Spectral Modeling To investigate the effect of using the Mel scale, we examine the performance of a simple speech/music discriminator. We use around 3 hours of labeled data from a broadcast news show, divided into 2 hours of training data and 40 minutes of testing data. We convert the data to ‘Mel’ and ‘Linear’ cepstral features and train mixture of Gaussian classifiers for each class. We then classify each segment in the test data using these models. This process is described in more detail in the full paper. We find that for this speech/music classification problem, the results are (statistically) significantly better if Mel-based cepstral features rather than linear-based cepstral features are used. However, whether this is simply because the Mel scale models speech better or because it also models music better is not clear. At worst, we can conclude that using the Mel cepstrum to model music in this speech/music discrimination problem is not harmful. Further tests are needed to verify that the Mel cepstrum is appropriate for modeling music in the general case. Using the DCT to Approximate Principal Components Analysis We additionally investigate the effectiveness of using the DCT to decorrelate Mel spectral features. The mathematically correct way to decorrelate components is to use PCA (or equivalently the KL transform). This transform uses the eigenvalues of the covariance matrix of the data to be modeled as basis vectors. By investigating how closely these vectors approximate cosine functions we can get a feel for how well the DCT approximates PCA. By inspecting the eigenvectors for the Mel log spectra for around 3 hours of speech and 4 hours of music we see that the DCT is an appropriate transform for decorrelating music (and speech) log spectra. Future Work Future work should focus on a more thorough examination the parameters used to generate MFCC features such as the sampling rate of the signal, the frequency scaling (Mel or otherwise) and the number of bins to use when smoothing. Also worthy of investigation is the windowing size and frame rate. Suggested Readings Blum, T, Keislar, D., Wheaton, J. and Wold, E., 1999, Method and article of manufacture for content-based analysis, storage, retrieval, and segmentation of audio information, U.S. Patent 5, 918, 223. Martin, K.. 1998, Toward automatic sound source recognition: identifying musical instruments, Proceedings NATO Computational Hearing Advanced Study Institute. Rabiner, L. and Juang, B., 1993, Fundamentals of Speech Recognition, Prentice-Hall. Scheirer, E. and Slaney, M., 1997, Construction and evaluation of a robust multifeature speech/music discriminator, Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing.

1,189 citations


Proceedings Article
01 Jan 2000
TL;DR: The resulting algorithms are shown to be superior to some standard feature selection algorithms on both toy data and real-life problems of face recognition, pedestrian detection and analyzing DNA microarray data.
Abstract: We introduce a method of feature selection for Support Vector Machines. The method is based upon finding those features which minimize bounds on the leave-one-out error. This search can be efficiently performed via gradient descent. The resulting algorithms are shown to be superior to some standard feature selection algorithms on both toy data and real-life problems of face recognition, pedestrian detection and analyzing DNA microarray data.

1,112 citations


Journal ArticleDOI
TL;DR: A system that is able to organize vast document collections according to textual similarities based on the self-organizing map (SOM) algorithm, based on 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms.
Abstract: Describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6840568 patent abstracts onto a 1002240-node SOM. As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms.

1,007 citations


Journal ArticleDOI
TL;DR: This work presents a new approach to feature extraction in which feature selection and extraction and classifier training are performed simultaneously using a genetic algorithm, and employs this technique in combination with the k nearest neighbor classification rule.
Abstract: Pattern recognition generally requires that objects be described in terms of a set of measurable features. The selection and quality of the features representing each pattern affect the success of subsequent classification. Feature extraction is the process of deriving new features from original features to reduce the cost of feature measurement, increase classifier efficiency, and allow higher accuracy. Many feature extraction techniques involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. While this is useful for data visualization and classification efficiency, it does not necessarily reduce the number of features to be measured since each new feature may be a linear combination of all of the features in the original pattern vector. Here, we present a new approach to feature extraction in which feature selection and extraction and classifier training are performed simultaneously using a genetic algorithm. The genetic algorithm optimizes a feature weight vector used to scale the individual features in the original pattern vectors. A masking vector is also employed for simultaneous selection of a feature subset. We employ this technique in combination with the k nearest neighbor classification rule, and compare the results with classical feature selection and extraction techniques, including sequential floating forward feature selection, and linear discriminant analysis. We also present results for the identification of favorable water-binding sites on protein surfaces.

849 citations


Proceedings ArticleDOI
05 Jun 2000
TL;DR: A large improvement in word recognition performance is shown by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling.
Abstract: Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks to estimate the probability distribution among subword units given the acoustic observations. In this work we show a large improvement in word recognition performance by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling. By training the network to generate the subword probability posteriors, then using transformations of these estimates as the base features for a conventionally-trained Gaussian-mixture based system, we achieve relative error rate reductions of 35% or more on the multicondition Aurora noisy continuous digits task.

803 citations


Journal ArticleDOI
Kyoung-jae Kim1, Ingoo Han1
TL;DR: Genetic algorithms approach to feature discretization and the determination of connection weights for artificial neural networks (ANNs) to predict the stock price index is proposed.
Abstract: This paper proposes genetic algorithms (GAs) approach to feature discretization and the determination of connection weights for artificial neural networks (ANNs) to predict the stock price index. Previous research proposed many hybrid models of ANN and GA for the method of training the network, feature subset selection, and topology optimization. In most of these studies, however, GA is only used to improve the learning algorithm itself. In this study, GA is employed not only to improve the learning algorithm, but also to reduce the complexity in feature space. GA optimizes simultaneously the connection weights between layers and the thresholds for feature discretization. The genetically evolved weights mitigate the well-known limitations of the gradient descent algorithm. In addition, globally searched feature discretization reduces the dimensionality of the feature space and eliminates irrelevant factors. Experimental results show that GA approach to the feature discretization model outperforms the other two conventional models.

669 citations


Journal ArticleDOI
Charu C. Aggarwal1, Philip S. Yu1
16 May 2000
TL;DR: Very general techniques for projected clustering are discussed which are able to construct clusters in arbitrarily aligned subspaces of lower dimensionality, which is substantially more general and realistic than currently available techniques.
Abstract: High dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Recent research results indicate that in high dimensional data, even the concept of proximity or clustering may not be meaningful. We discuss very general techniques for projected clustering which are able to construct clusters in arbitrarily aligned subspaces of lower dimensionality. The subspaces are specific to the clusters themselves. This definition is substantially more general and realistic than currently available techniques which limit the method to only projections from the original set of attributes. The generalized projected clustering technique may also be viewed as a way of trying to redefine clustering for high dimensional applications by searching for hidden subspaces with clusters which are created by inter-attribute correlations. We provide a new concept of using extended cluster feature vectors in order to make the algorithm scalable for very large databases. The running time and space requirements of the algorithm are adjustable, and are likely ta tradeoff with better accuracy.

Journal ArticleDOI
TL;DR: The wavelet packet transform (WPT) is introduced as an alternative means of extracting time-frequency information from vibration signatures and significantly reduces the long training time that is often associated with the neural network classifier and improves its generalization capability.
Abstract: Condition monitoring of dynamic systems based on vibration signatures has generally relied upon Fourier-based analysis as a means of translating vibration signals in the time domain into the frequency domain. However, Fourier analysis provided a poor representation of signals well localized in time. In this case, it is difficult to detect and identify the signal pattern from the expansion coefficients because the information is diluted across the whole basis. The wavelet packet transform (WPT) is introduced as an alternative means of extracting time-frequency information from vibration signatures. The resulting WPT coefficients provide one with arbitrary time-frequency resolution of a signal. With the aid of statistical-based feature selection criteria, many of the feature components containing little discriminant information could be discarded, resulting in a feature subset having a reduced number of parameters without compromising the classification performance. The extracted reduced dimensional feature vector is then used as input to a neural network classifier. This significantly reduces the long training time that is often associated with the neural network classifier and improves its generalization capability.

Proceedings ArticleDOI
15 Jun 2000
TL;DR: An approach for image retrieval using a very large number of highly selective features and efficient online learning based on the assumption that each image is generated by a sparse set of visual "causes" and that images which are visually similar share causes.
Abstract: We present an approach for image retrieval using a very large number of highly selective features and efficient online learning. Our approach is predicated on the assumption that each image is generated by a sparse set of visual "causes" and that images which are visually similar share causes. We propose a mechanism for computing a very large number of highly selective features which capture some aspects of this causal structure (in our implementation there are over 45,000 highly selective features). At query time a user selects a few example images, and a technique known as "boosting" is used to learn a classification function in this feature space. By construction, the boosting procedure learns a simple classifier which only relies on 20 of the features. As a result a very large database of images can be scanned rapidly, perhaps a million images per second. Finally we will describe a set of experiments performed using our retrieval system on a database of 3000 images.

01 Jan 2000
TL;DR: In this paper, the meaning of object dimension is discussed with a special focus on applications for environmental monitoring, and it is hypothesised that object-based image analysis will trigger new developments towards a full integration of GIS and remote sensing functions.
Abstract: While remote sensing has made enormous progress over recent years and a variety of sensors now deliver medium and high resolution data on an operational basis, a vast majority of applications still rely on basic image processing concepts developed in the early 70s: classification of single pixels in a multi-dimensional feature space. Although the techniques are well developed and sophisticated variations include soft classifiers, subpixel classifiers and spectral un-mixing techniques, it is argued that they do not make use of spatial concepts. Looking at high-resolution images it is very likely that a neighbouring pixel belongs to the same land cover class as the pixel under consideration. Algorithms in physics or mechanical engineering developed over the last twenty years successfully delineate objects based on context-information in an image on the basis of texture or fractal dimension. With the advent of high-resolution satellite imagery, the increasing use of airborne digital data and radar data the need for context-based algorithms and object-oriented image processing is increasing. Recently available commercial products reflect this demand. In a case study, ‘traditional’ pixel based classification methods and context-based methods are compared. Experiences are encouraging and it is hypothesised that object-based image analysis will trigger new developments towards a full integration of GIS and remote sensing functions. If the resulting objects prove to be ‘meaningful’, subsequent application specific analysis can take the attributes of these objects into account. The meaning of object dimension is discussed with a special focus on applications for environmental monitoring.

Journal ArticleDOI
TL;DR: The emerging machine learning technique called support vector machines is proposed as a method for performing nonlinear equalization in communication systems and yields a nonlinear processing method that is somewhat different than the nonlinear decision feedback method whereby the linear feedback filter of the decision feedback equalizer is replaced by a Volterra filter.
Abstract: The emerging machine learning technique called support vector machines is proposed as a method for performing nonlinear equalization in communication systems. The support vector machine has the advantage that a smaller number of parameters for the model can be identified in a manner that does not require the extent of prior information or heuristic assumptions that some previous techniques require. Furthermore, the optimization method of a support vector machine is quadratic programming, which is a well-studied and understood mathematical programming technique. Support vector machine simulations are carried out on nonlinear problems previously studied by other researchers using neural networks. This allows initial comparison against other techniques to determine the feasibility of using the proposed method for nonlinear detection. Results show that support vector machines perform as well as neural networks on the nonlinear problems investigated. A method is then proposed to introduce decision feedback processing to support vector machines to address the fact that intersymbol interference (ISI) data generates input vectors having temporal correlation, whereas a standard support vector machine assumes independent input vectors. Presenting the problem from the viewpoint of the pattern space illustrates the utility of a bank of support vector machines. This approach yields a nonlinear processing method that is somewhat different than the nonlinear decision feedback method whereby the linear feedback filter of the decision feedback equalizer is replaced by a Volterra filter. A simulation using a linear system shows that the proposed method performs equally to a conventional decision feedback equalizer for this problem.

Journal ArticleDOI
TL;DR: An algorithm is proposed that models images by two dimensional (2-D) hidden Markov models (HMMs) that outperforms CART/sup TM/, LVQ, and Bayes VQ in classification by context.
Abstract: For block-based classification, an image is divided into blocks, and a feature vector is formed for each block by grouping statistics extracted from the block. Conventional block-based classification algorithms decide the class of a block by examining only the feature vector of this block and ignoring context information. In order to improve classification by context, an algorithm is proposed that models images by two dimensional (2-D) hidden Markov models (HMMs). The HMM considers feature vectors statistically dependent through an underlying state process assumed to be a Markov mesh, which has transition probabilities conditioned on the states of neighboring blocks from both horizontal and vertical directions. Thus, the dependency in two dimensions is reflected simultaneously. The HMM parameters are estimated by the EM algorithm. To classify an image, the classes with maximum a posteriori probability are searched jointly for all the blocks. Applications of the HMM algorithm to document and aerial image segmentation show that the algorithm outperforms CART/sup TM/, LVQ, and Bayes VQ.

Journal ArticleDOI
01 Feb 2000
TL;DR: WaveCluster is proposed, a novel clustering approach based on wavelet transforms, which satisfies all the above requirements and can effectively identify arbitrarily shaped clusters at different degrees of detail.
Abstract: Many applications require the management of spatial data in a multidimensional feature space. Clustering large spatial databases is an important problem, which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shape. It must be insensitive to the noise (outliers) and the order of input data. We propose WaveCluster, a novel clustering approach based on wavelet transforms, which satisfies all the above requirements. Using the multiresolution property of wavelet transforms, we can effectively identify arbitrarily shaped clusters at different degrees of detail. We also demonstrate that WaveCluster is highly efficient in terms of time complexity. Experimental results on very large datasets are presented, which show the efficiency and effectiveness of the proposed approach compared to the other recent clustering methods.

Journal ArticleDOI
09 Nov 2000-Nature
TL;DR: It is shown that one can keep track of a stationary item solely on the basis of its changing appearance—specified by its trajectory along colour, orientation, and spatial frequency dimensions—even when a distractor shares the same spatial location.
Abstract: Visual attention allows an observer to select certain visual information for specialized processing. Selection is readily apparent in ‘tracking’ tasks where even with the eyes fixed, observers can track a target as it moves among identical distractor items1. In such a case, a target is distinguished by its spatial trajectory. Here we show that one can keep track of a stationary item solely on the basis of its changing appearance—specified by its trajectory along colour, orientation, and spatial frequency dimensions—even when a distractor shares the same spatial location. This ability to track through feature space bears directly on competing theories of attention, that is, on whether attention can select locations in space2,3,4, features such as colour or shape5,6,7, or particular visual objects composed of constellations of visual features. Our results affirm, consistent with a growing body of psychophysical8,9,10,11,12,13 and neurophysiological14,15,16 evidence, that attention can indeed select specific visual objects. Furthermore, feature-space tracking extends the definition of visual object17 to include not only items with well defined spatio-temporal trajectories18, but also those with well defined featuro-temporal trajectories.

Journal ArticleDOI
TL;DR: The results of this study indicate that the use of neural networks for the integration of large multisource datasets used in regional mineral exploration, and for prediction of mineral prospectivity, offers several advantages over existing methods.
Abstract: A multilayer feed‐forward neural network, trained with a gradient descent, back‐propagation algorithm, is used to estimate the favourability for gold deposits using a raster GIS database for the Tenterfield 1:100 000 sheet area, New South Wales. The database consists of solid geology, regional faults, airborne magnetic and gamma‐ray survey data (U, Th, K and total count channels), and 63 deposit and occurrence locations. Input to the neural network consists of feature vectors formed by combining the values from co‐registered grid cells in each GIS thematic layer. The network was trained using binary target values to indicate the presence or absence of deposits. Although the neural network was trained as a binary classifier, output values for the trained network are in the range [0.1, 0.9] and are interpreted to indicate the degree of similarity of each input vector to a composite of all the deposit vectors used in training. These values are rescaled to produce a multiclass prospectivity map. To validate and assess the effectiveness of the neural‐network method, mineral‐prospectivity maps are also prepared using the empirical weights of evidence and the conceptual fuzzy‐logic methods. The neural‐network method produces a geologically plausible mineral‐prospectivity map similar, but superior, to the fuzzy logic and weights of evidence maps. The results of this study indicate that the use of neural networks for the integration of large multisource datasets used in regional mineral exploration, and for prediction of mineral prospectivity, offers several advantages over existing methods. These include the ability of neural networks to: (i) respond to critical combinations of parameters rather than increase the estimated prospectivity in response to each individual favourable parameter; (ii) combine datasets without the loss of information inherent in existing methods; and (iii) produce results that are relatively unaffected by redundant data, spurious data and data containing multiple populations. Statistical measures of map quality indicate that the neural‐network method performs as well as, or better than, existing methods while using approximately one‐third less data than the weights of evidence method.

Proceedings ArticleDOI
Yihong Gong1, Xin Liu1
15 Jun 2000
TL;DR: From this SVD, the authors are able, to not only derive the refined feature space to better cluster visually similar frames, but also define a metric to measure the amount of visual content contained in each frame cluster using its degree of visual changes.
Abstract: The authors propose a novel technique for video summarization based on singular value decomposition (SVD). For the input video sequence, we create a feature-frame matrix A, and perform the SVD on it. From this SVD, we are able, to not only derive the refined feature space to better cluster visually similar frames, but also define a metric to measure the amount of visual content contained in each frame cluster using its degree of visual changes. Then, in the refined feature space, we find the most static frame cluster, define it as the content unit, and use the context value computed from it as the threshold to cluster the rest of the frames. Based on this clustering result, either the optimal set of keyframes, or a summarized motion video with the user specified time length can be generated to support different user requirements for video browsing and content overview. Our approach ensures that the summarized video representation contains little redundancy, and gives equal attention to the same amount of contents.

Proceedings ArticleDOI
15 Jun 2000
TL;DR: A novel approach for estimating articulated body posture and motion from monocular video sequences is proposed, characterized using real and artificially generated body postures, showing promising results.
Abstract: A novel approach for estimating articulated body posture and motion from monocular video sequences is proposed. Human pose is defined as the instantaneous two dimensional configuration (i.e. the projection onto the image plane) of a single articulated body in terms of the position of a predetermined sets of joints. First, statistical segmentation of the human bodies from the background is performed and low-level visual features are found given the segmented body shape. The goal is to be able to map these generally low level visual features to body configurations. The system estimates different mappings, each one with a specific cluster in the visual feature space. Given a set of body motion sequences for training, unsupervised clustering is obtained via the Expectation Maximization algorithm. For each of the clusters, a function is estimated to build the mapping between low-level features to 2D pose. Given new visual features, a mapping from each cluster is performed to yield a set of possible poses. From this set, the system selects the most likely pose given the learned probability distribution and the visual feature of the proposed approach is characterized using real and artificially generated body postures, showing promising results.

Proceedings ArticleDOI
30 Oct 2000
TL;DR: New technologies for improving retrieval accuracy, such as partial feature vectors and or'ed retrieval among multiple search keys, are proposed, and it is found that the retrieval accuracy increases by more than 20% compared with the previous system.
Abstract: A music retrieval system that accepts hummed tunes as queries is described in this paper This system uses similarity retrieval because a hummed tune may contain errors The retrieval result is a list of song names ranked according to the closeness of the match Our ultimate goal is that the correct song should be first on the list This means that eventually our system's similarity retrieval should allow for only one correct answerThe most significant improvement our system has over general query-by-humming systems is that all processing of musical information is done based on beats instead of notes This type of query processing is robust against queries generated from erroneous input In addition, acoustic information is transcribed and converted into relative intervals and is used for making feature vectors This increases the resolution of the retrieval system compared with other general systems, which use only pitch direction informationThe database currently holds over 10,000 songs, and the retrieval time is at most one second This level of performance is mainly achieved through the use of indices for retrieval In this paper, we also report on the results of music analyses of the songs in the database Based on these results, new technologies for improving retrieval accuracy, such as partial feature vectors and or'ed retrieval among multiple search keys, are proposed The effectiveness of these technologies is evaluated quantitatively, and it is found that the retrieval accuracy increases by more than 20% compared with the previous system [9] Practical user interfaces for the system are also described

Journal ArticleDOI
TL;DR: Numerical results on fully dense publicly available datasets, numbering 20,000 to 1 million points in 32-dimensional space, confirm the theoretical results and demonstrate the ability to handle very large problems.
Abstract: A linear support vector machine formulation is used to generate a fast, finitely-terminating linear-programming algorithm for discriminating between two massive sets in n-dimen-sional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of sufficiently small linear programs that separate chunks of the data at a time. The key idea is that a small number of support vectors, corresponding to linear programming constraints with positive dual variables, are carried over between the successive small linear programs, each of which containing a chunk of the data. We prove that this procedure is monotonic and terminates in a finite number of steps at an exact solution that leads to an optimal separating plane for the entire dataset. Numerical results on fully dense publicly available datasets, numbering 20,000 to 1 million points in 32-dimensional space, confirm the theoretical results and demonstrate the ability to handle very large problems

Proceedings ArticleDOI
01 Feb 2000
TL;DR: An approach for retrieving images from an image database based on content similarity using a multiple-instance learning method known as the diverse density algorithm, which tries to retrieve images with similar feature vectors from the remainder of the database.
Abstract: In this paper, we develop and test an approach for retrieving images from an image database based on content similarity. First, each picture is divided into many overlapping regions. For each region, the sub-picture is filtered and converted into a feature vector. In this way, each picture is represented by a number of different feature vectors. The user selects positive and negative image examples to train the system. During the training, a multiple-instance learning method known as the diverse density algorithm is employed to determine which feature vector in each image best represents the user's concept, and which dimensions of the feature vectors are important. The system tries to retrieve images with similar feature vectors from the remainder of the database. A variation of the weighted correlation statistic is used to determine image similarity. The approach is tested on a medium-sized database of natural scenes as well as single- and multiple-object images.

Proceedings ArticleDOI
01 Jul 2000
TL;DR: An approach to the design of an automatic text summarizer that generates a summary by extracting sentence segments that compares very favorably with other approaches in terms of precision, recall, and classification accuracy.
Abstract: With the proliferation of the Internet and the huge amount of data it transfers, text summarization is becoming more important. We present an approach to the design of an automatic text summarizer that generates a summary by extracting sentence segments. First, sentences are broken into segments by special cue markers. Each segment is represented by a set of predefined features (e.g. location of the segment, average term frequencies of the words occurring in the segment, number of title words in the segment, and the like). Then a supervised learning algorithm is used to train the summarizer to extract important sentence segments, based on the feature vector. Results of experiments on U.S. patents indicate that the performance of the proposed approach compares very favorably with other approaches (including Microsoft Word summarizer) in terms of precision, recall, and classification accuracy.

Journal ArticleDOI
TL;DR: Both Bayesian classifiers and neural networks are employed to test the efficiency of the proposed feature and the achieved identification success using a long word exceeds 95%.

Patent
20 Sep 2000
TL;DR: In this paper, a method for indexing and retrieving manufacturing-specific digital images based on image content comprises three steps, which include two data reductions, the first performed based upon a query vector extracted from a query image, and the second level data reduction can result in a subset of feature vectors comparable to the prototype vector, and further comparable to query vector.
Abstract: A method for indexing and retrieving manufacturing-specific digital images based on image content comprises three steps. First, at least one feature vector can be extracted from a manufacturing-specific digital image stored in an image database. In particular, each extracted feature vector corresponds to a particular characteristic of the manufacturing-specific digital image, for instance, a digital image modality and overall characteristic, a substrate/background characteristic, and an anomaly/defect characteristic. Notably, the extracting step includes generating a defect mask using a detection process. Second, using an unsupervised clustering method, each extracted feature vector can be indexed in a hierarchical search tree. Third, a manufacturing-specific digital image associated with a feature vector stored in the hierarchicial search tree can be retrieved, wherein the manufacturing-specific digital image has image content comparably related to the image content of the query image. More particularly, can include two data reductions, the first performed based upon a query vector extracted from a query image. Subsequently, a user can select relevant images resulting from the first data reduction. From the selection, a prototype vector can be calculated, from which a second-level data reduction can be performed. The second-level data reduction can result in a subset of feature vectors comparable to the prototype vector, and further comparable to the query vector. An additional fourth step can include managing the hierarchical search tree by substituting a vector average for several redundant feature vectors encapsulated by nodes in the hierarchical search tree.

Journal ArticleDOI
TL;DR: This paper develops a Bayesian formulation for the shot segmentation problem that is shown to extend the standard thresholding model in an adaptive and intuitive way, leading to improved segmentation accuracy.
Abstract: Content structure plays an important role in the understanding of video. In this paper, we argue that knowledge about structure can be used both as a means to improve the performance of content analysis and to extract features that convey semantic information about the content. We introduce statistical models for two important components of this structure, shot duration and activity, and demonstrate the usefulness of these models with two practical applications. First, we develop a Bayesian formulation for the shot segmentation problem that is shown to extend the standard thresholding model in an adaptive and intuitive way, leading to improved segmentation accuracy. Second, by applying the transformation into the shot duration/activity feature space to a database of movie clips, we also illustrate how the Bayesian model captures semantic properties of the content. We suggest ways in which these properties can be used as a basis for intuitive content-based access to movie libraries.

Journal ArticleDOI
TL;DR: It is shown how an efficient and reliable probabilistic metric derived from the Bhattacharrya distance can be used in order to classify the face feature vectors into person classes.

Journal ArticleDOI
TL;DR: A way of formulating neuro-fuzzy approaches for both feature selection and extraction under unsupervised learning of a fuzzy feature evaluation index for a set of features is demonstrated.
Abstract: Demonstrates a way of formulating neuro-fuzzy approaches for both feature selection and extraction under unsupervised learning. A fuzzy feature evaluation index for a set of features is defined in terms of degree of similarity between two patterns in both the original and transformed feature spaces. A concept of flexible membership function incorporating weighted distance is introduced for computing membership values in the transformed space. Two new layered networks are designed. The tasks of membership computation and minimization of the evaluation index, through unsupervised learning process, are embedded into them without requiring the information on the number of clusters in the feature space. The network for feature selection results in an optimal order of individual importance of the features. The other one extracts a set of optimum transformed features, by projecting n-dimensional original space directly to n'-dimensional (n'