scispace - formally typeset
Search or ask a question

Showing papers on "Feature (machine learning) published in 1996"


Journal ArticleDOI
TL;DR: This paper evaluates the performance both of some texture measures which have been successfully used in various applications and of some new promising approaches proposed recently.

6,650 citations


Book
01 Jan 1996
TL;DR: Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks in this self-contained account.
Abstract: From the Publisher: Pattern recognition has long been studied in relation to many different (and mainly unrelated) applications, such as remote sensing, computer vision, space research, and medical imaging. In this book Professor Ripley brings together two crucial ideas in pattern recognition; statistical methods and machine learning via neural networks. Unifying principles are brought to the fore, and the author gives an overview of the state of the subject. Many examples are included to illustrate real problems in pattern recognition and how to overcome them.This is a self-contained account, ideal both as an introduction for non-specialists readers, and also as a handbook for the more expert reader.

5,632 citations


Journal ArticleDOI
TL;DR: This work considers the problem of predicting the mutagenic activity of small molecules: a property that is related to carcinogenicity, and an important consideration in developing less hazardous drugs, and compares the predictive power of the logical theories constructed against benchmarks set by regression, neural, and tree-based methods.

353 citations


Journal ArticleDOI
TL;DR: A radial basis function network architecture is developed that learns the correlation of facial feature motion patterns and human expressions through a hierarchical approach which at the highest level identifies expressions, at the mid level determines motion of facial features, and at the low level recovers motion directions.
Abstract: In this paper a radial basis function network architecture is developed that learns the correlation of facial feature motion patterns and human expressions. We describe a hierarchical approach which at the highest level identifies expressions, at the mid level determines motion of facial features, and at the low level recovers motion directions. Individual expression networks were trained to recognize the "smile" and "surprise" expressions. Each expression network was trained by viewing a set of sequences of one expression for many subjects. The trained neural network was then tested for retention, extrapolation, and rejection ability. Success rates were 88% for retention, 88% for extrapolation, and 83% for rejection.

313 citations


Proceedings Article
William W. Cohen1
04 Aug 1996
TL;DR: It is argued that many decision tree and rule learning algorithms can be easily extended to set-valued features, and it is shown by example that many real-world learning problems can be efficiently and naturally represented with set- valued features.
Abstract: In most learning systems examples are represented as fixed-length "feature vectors", the components of which are either real numbers or nominal values. We propose an extension of the feature-vector representation that allows the value of a feature to be a set of strings; for instance, to represent a small white and black dog with the nominal features size and species and the set-valued feature color, one might use a feature vector with size=small, species=canis-familiaris and color-{white, black}. Since we make no assumptions about the number of possible set elements, this extension of the traditional feature-vector representation is closely connected to Blum's "infinite attribute" representation. We argue that many decision tree and rule learning algorithms can be easily extended to set-valued features. We also show by example that many real-world learning problems can be efficiently and naturally represented with set-valued features; in particular, text categorization problems and problems that arise in propositionalizing first-order representations lend themselves to set-valued features.

281 citations


Proceedings ArticleDOI
18 Jun 1996
TL;DR: The results demonstrate that even in the absence of multiple training examples for each class, it is sometimes possible to infer from a statistical model of training data, a significantly improved distance function for use in pattern recognition.
Abstract: We consider the problem of feature-based face recognition in the setting where only a single example of each face is available for training. The mixture-distance technique we introduce achieves a recognition rate of 95% on a database of 685 people in which each face is represented by 30 measured distances. This is currently the best recorded recognition rate for a feature-based system applied to a database of this size. By comparison, nearest neighbor search using Euclidean distance yields 84%. In our work a novel distance function is constructed based on local second order statistics as estimated by modeling the training data as a mixture of normal densities. We report on the results from mixtures of several sizes. We demonstrate that a flat mixture of mixtures performs as well as the best model and therefore represents an effective solution to the model selection problem. A mixture perspective is also taken for individual Gaussians to choose between first order (variance) and second order (covariance) models. Here an approximation to flat combination is proposed and seen to perform well in practice. Our results demonstrate that even in the absence of multiple training examples for each class, it is sometimes possible to infer from a statistical model of training data, a significantly improved distance function for use in pattern recognition.

266 citations


Book
15 Jul 1996
TL;DR: Genetic Algorithms for Pattern Recognition covers a broad range of applications in science and technology, describing the integration of genetic algorithms in pattern recognition and machine learning problems to build intelligent recognition systems.
Abstract: From the Publisher: Solving pattern recognition problems involves an enormous amount ofcomputational effort. By applying genetic algorithms - a computational method based on the way chromosomes in DNA recombine - these problems are more efficiently and more accurately solved. Genetic Algorithms for Pattern Recognition covers a broad range of applications in science and technology, describing the integration of genetic algorithms in pattern recognition and machine learning problems to build intelligent recognition systems. The articles, written by leading experts from around the world, accomplish several objectives: they provide insight into the theory of genetic algorithms; they develop pattern recognition theory in light of genetic algorithms; and they illustrate applications in artificial neural networks and fuzzy logic. The cross-sectional view of current research presented in Genetic Algorithms for Pattern Recognition makes it a unique text, ideal for graduate students and researchers.

234 citations


Patent
18 Apr 1996
TL;DR: In a speech recognition system, a microphone converts an input utterance speech composed of a plurality of words into a speech signal, and a feature extractor extracts predetermined acoustic feature parameters from the converted speech signal as discussed by the authors.
Abstract: In a word clustering apparatus for clustering words, a plurality of words is clustered to obtain a total tree diagram of a word dictionary representing a word clustering result, where the total tree diagram includes tree diagrams of an upper layer, a middle layer and a lower layer. In a speech recognition apparatus, a microphone converts an input utterance speech composed of a plurality of words into a speech signal, and a feature extractor extracts predetermined acoustic feature parameters from the converted speech signal. Then, a speech recognition controller executes a speech recognition process on the extracted acoustic feature parameters with reference to a predetermined Hidden Markov Model and the obtained total tree diagram of the word dictionary, and outputs a result of the speech recognition.

204 citations


Proceedings ArticleDOI
C. Podilchuk1, Xiaoyu Zhang2
07 May 1996
TL;DR: An automatic face recognition system which is VQ-based is described and the effects of feature selection, feature dimensionality and codebook size on recognition performance in the VQ framework are examined.
Abstract: Face recognition has many applications ranging from security access to video indexing by content. We describe an automatic face recognition system which is VQ-based and examine the effects of feature selection, feature dimensionality and codebook size on recognition performance in the VQ framework. In particular, we examine DCT-based feature vectors in such a system. DCT-based feature vectors have the additional appeal that the recognition can be performed directly on the bitstream of compressed images which are DCT-based. The system described consists of three parts: a preprocessing step to segment the face, the feature selection process and the classification. Recognition rates for a database of 500 images shows promising results.

198 citations


Book
01 Mar 1996
TL;DR: This volume provides a unified approach to the study of predictive learning, i.e., generalization from examples, and contains an up-to-date review and in-depth treatment of major issues and methods related to predictive learning in statistics, Artificial Neural Networks, and pattern recognition.
Abstract: This volume provides a unified approach to the study of predictive learning, i.e., generalization from examples. It contains an up-to-date review and in-depth treatment of major issues and methods related to predictive learning in statistics, Artificial Neural Networks (ANN), and pattern recognition. Topics range from theoretical modeling and adaptive computational methods to empirical comparisons between statistical and ANN methods, and applications. Most contributions fall into one of the three themes: unified framework for the study of predictive learning in statistics and ANNs; similarities and differences between statistical and ANN methods for nonparametric estimation (learning); and fundamental connections between artificial and biological learning systems.

163 citations


Proceedings ArticleDOI
18 Jun 1996
TL;DR: This paper describes an approach for integrating a large number of context-dependent features into a semi-automated tool that provides a learning algorithm for selecting and combining groupings of the data, where groupings can be induced by highly specialized features.
Abstract: Digital library access is driven by features, but the relevance of a feature for a query is not always obvious. This paper describes an approach for integrating a large number of context-dependent features into a semi-automated tool. Instead of requiring universal similarity measures or manual selection of relevant features, the approach provides a learning algorithm for selecting and combining groupings of the data, where groupings can be induced by highly specialized features. The selection process is guided by positive and negative examples from the user. The inherent combinatorics of using multiple features is reduced by a multistage grouping generation, weighting, and collection process. The stages closest to the user are trained fastest and slowly propagate their adaptations back to earlier stages. The weighting stage adapts the collection stage's search space across uses, so that, in later interactions, good groupings are found given few examples from the user.

Proceedings ArticleDOI
03 Oct 1996
TL;DR: This paper examines a maximum a-posteriori decoding strategy for feature-based recognizers and develops a normalization criterion that is useful for a segment-based Viterbi or A* search.
Abstract: Most current speech recognizers use an observation space which is based on a temporal sequence of "frames" (e.g. Mel-cepstra). There is another class of recognizer which further processes these frames to produce a segment-based network, and represents each segment by fixed-dimensional "features". In such feature-based recognizers, the observation space takes the form of a temporal network of feature vectors, so that a single segmentation of an utterance uses a subset of all possible feature vectors. In this paper, we examine a maximum a-posteriori decoding strategy for feature-based recognizers and develop a normalization criterion that is useful for a segment-based Viterbi or A* search. We report experimental results for the task of phonetic recognition on the TIMIT corpus, where we achieved context-independent and context-dependent (using diphones) results on the core test set of 64.1% and 69.5% respectively.

Patent
01 Oct 1996
TL;DR: In this article, a character recognition system includes a character input device such as a stylus and tablet or optical scanner, and a processor, which determines which of a number of model characters best matches the inputted character.
Abstract: A character recognition system includes a character input device, such as a stylus and tablet or optical scanner, for receiving inputted characters, and a processor. The processor determines which of a number of model characters best matches the inputted character. To that end, the processor compares each inputted character to each of a plurality of classes into which the model characters are organized. Specifically, the processor extracts a feature value vector from the inputted character, and compares it to the mean feature value vector of each class. The processor recognizes the inputted character as the model character corresponding to the mean feature value vector which is closest to the feature value vector of the inputted character. The processor also constructs the database from multiple specimens of each model character. The processor organizes the specimens of each model character into multiple classes. The processor then determines the mean feature value vector of each class.

PatentDOI
TL;DR: In this paper, a language identification and verification system is described whereby language identification is determined by finding the closest match of a speech utterance to multiple speaker sets, and a language decision is arrived on based on a closest match between the unknown speech features and speech features for such well matched reference speakers in a particular language.
Abstract: A language identification and verification system is described whereby language identification is determined by finding the closest match of a speech utterance to multiple speaker sets. The language identification and verification system is implemented through use of a speaker identification/verification system as a baseline to find a set of well matched speakers in each of a plurality of languages. A comparison of unknown speech to speech features from such well-matched speakers is then made and a language decision is arrived on based on a closest match between the unknown speech features and speech features for such well matched reference speakers in a particular language. To avoid a problem associated with prior-art language identification systems, wherein speech feature are based on short-term spectral features determined at a system frame rate--thereby seriously limiting the resolution and accuracy of such prior-art systems, the invention uses speech features derived from vocalic or syllabic nuclei, from which related phonetic speech features may then be extracted. Detection of such vocalic centers or syllabic nuclei is accomplished using a trained back-error propagation multi-level neural network.

Journal ArticleDOI
J. Bala, K. De Jong1, J. Huang1, Halleh Vafaie1, Harry Wechsler1 
TL;DR: Experimental results are presented to show how increasing the amount of learning significantly improves feature set evolution for difficult visual recognition problems involving satellite and facial image data.
Abstract: This paper describes a hybrid methodology that integrates genetic algorithms (GAs) and decision tree learning in order to evolve useful subsets of discriminatory features for recognizing complex visual concepts. A GA is used to search the space of all possible subsets of a large set of candidate discrimination features. Candidate feature subsets are evaluated by using C4.5, a decision tree learning algorithm, to produce a decision tree based on the given features using a limited amount of training data. The classification performance of the resulting decision tree on unseen testing data is used as the fitness of the underlying feature subset. Experimental results are presented to show how increasing the amount of learning significantly improves feature set evolution for difficult visual recognition problems involving satellite and facial image data. In addition, we also report on the extent to which other more subtle aspects of the Baldwin effect are exhibited by the system.

Journal ArticleDOI
TL;DR: A generic approach for solving problems in pattern recognition based on the synthesis of accurate multiclass discriminators from large numbers of very inaccurate weak models through the use of discrete stochastic processes is introduced.
Abstract: We will introduce a generic approach for solving problems in pattern recognition based on the synthesis of accurate multiclass discriminators from large numbers of very inaccurate "weak" models through the use of discrete stochastic processes. Contrary to the standard expectation held for the many statistical and heuristic techniques normally associated with the field, a significant feature of this method of "stochastic modeling" is its resistance to so-called "overtraining." The drop in performance of any stochastic model in going from training to test data remains comparable to that of the component weak models from which it is synthesized; and since these component models are very simple, their performance drop is small, resulting in a stochastic model whose performance drop is also small despite its high level of accuracy.

Book
01 Feb 1996
TL;DR: In this article, the basic ideas of and some synergisms between probabilistic, fuzzy, and computational neural networks models as they apply to pattern recognition are discussed and a brief discussion of the relationship of both approaches to statistical pattern recognition methodologies is provided.
Abstract: Fuzzy sets were introduced by Zadeh in 1965 to represent and manipulate data and information that possess nonstatistical uncertainty. Computational neural networks were first discussed by McCullough and Pitts in 1943 as a means of imitating the power of biologic systems for data and information processing. Probabilistic models for data analysis, are, of course, several hundred years old. This article discusses the basic ideas of and some synergisms between probabilistic, fuzzy, and computational neural networks models as they apply to pattern recognition. We also provide a brief discussion of the relationship of both approaches to statistical pattern recognition methodologies.

Patent
Hirohiko Sagawa1, Masahiro Abe1
09 Sep 1996
TL;DR: In this paper, a continuous sign language recognition apparatus and input apparatus which employ expressions of template patterns of sign-language word as well as the template pattern of sign language word to realize high speed and highly accurate signlanguage recognition.
Abstract: A continuous sign-language recognition apparatus and input apparatus which employ expressions of template patterns of sign-language word as well as the template patterns of sign-language word to realize high speed and highly accurate sign-language recognition. Each of component patterns constituting a template pattern of a sign-language word is expressed by a feature vector representing the pattern when the pattern is a static pattern, or by a time-series pattern of a feature vector when the pattern is a dynamic pattern. Also, using the template patterns of sign-language word, different matching methods are applied to the static and dynamic patterns for recognizing each component pattern, and the respective results are integrated on the basis of a temporal overlap of the respective component patterns to continuously recognize sign-language words.

Journal ArticleDOI
TL;DR: The development of a high-performance alphabet recognizer that has been evaluated on studio quality as well as on telephone-bandwidth speech is presented, based on context-dependent phoneme hidden Markov models (HMMs), which have been found to outperform whole-word models by as much as 8%.
Abstract: Alphabet recognition is needed in many applications for retrieving information associated with the spelling of a name, such as telephone numbers, addresses, etc. This is a difficult recognition task due to the acoustic similarities existing between letters in the alphabet (e.g., the E-set letters). This paper presents the development of a high-performance alphabet recognizer that has been evaluated on studio quality as well as on telephone-bandwidth speech. Unlike previously proposed systems, the alphabet recognizer presented is based on context-dependent phoneme hidden Markov models (HMMs), which have been found to outperform whole-word models by as much as 8%. The proposed recognizer incorporates a series of new approaches to tackle the problems associated with the confusions occurring between the stop consonants in the E-set and the confusions between the nasals (i.e., letters M and N). First, a new feature representation is proposed for improved stop consonant discrimination, and second, two subspace approaches are proposed for improved nasal discrimination. The subspace approach was found to yield a 45% error-rate reduction in nasal discrimination. Various other techniques are also proposed, yielding a 97.3% speaker-independent performance on alphabet recognition and 95% speaker-independent performance on E-set recognition, A telephone alphabet recognizer was also developed using context-dependent HMMs. When tested on the recognition of 300 last names (which are contained in a list of 50,000 common last names) spelled by 300 speakers, the recognizer achieved 91.7% correct letter recognition with 1.1% letter insertions.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: This paper considers a pattern recognition approach to accurate camera-based color measurements using three common color spaces and two sets of test images to evaluate Swain and Ballard's color indexing method.
Abstract: This paper considers a pattern recognition approach to accurate camera-based color measurements. The performances of Swain and Ballard's color indexing method based on 3-dimensional histograms and of a simplified version based on three 1-dimensional histograms are evaluated using three common color spaces and two sets of test images.

Journal ArticleDOI
TL;DR: An integrated approach to feature and architecture selection for single hidden layer-feedforward neural networks trained via backpropagation is presented, which employs a likelihood-ratio test statistic as a model selection criterion.
Abstract: In this paper, we present an integrated approach to feature and architecture selection for single hidden layer-feedforward neural networks trained via backpropagation. In our approach, we adopt a statistical model building perspective in which we analyze neural networks within a nonlinear regression framework. The algorithm presented in this paper employs a likelihood-ratio test statistic as a model selection criterion. This criterion is used in a sequential procedure aimed at selecting the best neural network given an initial architecture as determined by heuristic rules. Application results for an object recognition problem demonstrate the selection algorithm's effectiveness in identifying reduced neural networks with equivalent prediction accuracy.

Proceedings ArticleDOI
07 May 1996
TL;DR: The discriminatory power of different segments of a human face is studied end a new scheme for face recognition is proposed and an efficient projection based feature extraction and classification scheme for recognition of human faces is proposed.
Abstract: The discriminatory power of different segments of a human face is studied end a new scheme for face recognition is proposed. We first focus on the linear discriminant analysis (LDA) of human faces in spatial and wavelet domains, which enables us to objectively evaluate the significant of visual information in different parts of the face for identifying the person. The results of this study can be compared with subjective psychovisual findings. The LDA of faces also provides us with a small set of features that carry the most relevant information for face recognition. The features are obtained through the eigenvector analysis of scatter matrices with the objective of maximizing between class variations and minimizing within class variations. The result is an efficient projection based feature extraction and classification scheme for recognition of human faces. For a midsize database of faces excellent classification accuracy is achieved with only four features.

Journal ArticleDOI
TL;DR: Computer simulation results with 80 test images of 20 persons show that the proposed neuro-fuzzy method yields higher recognition rate than the conventional ones.

Journal Article
TL;DR: This paper formally distinguish three types of features: primary, contextual, and irrelevant features, and formally define what it means for one feature to be context-sensitive to another feature.
Abstract: A large body of research in machine learning is concerned with supervised learning from examples. The examples are typically represented as vectors in a multi- dimensional feature space (also known as attribute-value descriptions). A teacher partitions a set of training examples into a finite number of classes. The task of the learning algorithm is to induce a concept from the training examples. In this paper, we formally distinguish three types of features: primary, contextual, and irrelevant features. We also formally define what it means for one feature to be context-sensitive to another feature. Context-sensitive features complicate the task of the learner and potentially impair the learner's performance. Our formal definitions make it possible for a learner to automatically identify context-sensitive features. After context-sensitive features have been identified, there are several strategies that the learner can employ for managing the features; however, a discussion of these strategies is outside of the scope of this paper. The formal definitions presented here correct a flaw in previously proposed definitions. We discuss the relationship between our work and a formal definition of relevance.

Posted Content
Diane J. Litman1
TL;DR: This paper explores the use of machine learning for classifying cue phrases as discourse or sentential in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition.
Abstract: Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. This paper explores the use of machine learning for classifying cue phrases as discourse or sentential. Two machine learning programs (Cgrendel and C4.5) are used to induce classification models from sets of pre-classified cue phrases and their features in text and speech. Machine learning is shown to be an effective technique for not only automating the generation of classification models, but also for improving upon previous results. When compared to manually derived classification models already in the literature, the learned models often perform with higher accuracy and contain new linguistic insights into the data. In addition, the ability to automatically construct classification models makes it easier to comparatively analyze the utility of alternative feature representations of the data. Finally, the ease of retraining makes the learning approach more scalable and flexible than manual methods.

Journal ArticleDOI
Diane J. Litman1
TL;DR: This article used machine learning for classifying cue phrases as discourse or sentential in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition.
Abstract: Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. This paper explores the use of machine learning for classifying cue phrases as discourse or sentential. Two machine learning programs (cgrendel and C4.5) are used to induce classification models from sets of pre-classified cue phrases and their features in text and speech. Machine learning is shown to be an effective technique for not only automating the generation of classification models, but also for improving upon previous results. When compared to manually derived classification models already in the literature, the learned models often perform with higher accuracy and contain new linguistic insights into the data. In addition, the ability to automatically construct classification models makes it easier to comparatively analyze the utility of alternative feature representations of the data. Finally, the ease of retraining makes the learning approach more scalable and flexible than manual methods.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: A comparison between an off-line and an on-line recognition system using the same databases and system design is presented, which uses a sliding window technique which avoids any segmentation before recognition.
Abstract: Off-line handwriting recognition has wider applications than on-line recognition, yet it seems to be a harder problem. While on-line recognition is based on pen trajectory data, off-line recognition has to rely on pixel data only. We present a comparison between an off-line and an on-line recognition system using the same databases and system design. Both systems use a sliding window technique which avoids any segmentation before recognition. The recognizer is a hybrid system containing a neural network and a hidden Markov model. New normalization and feature extraction techniques for the off-line recognition are presented, including a connectionist approach for non-linear core height estimation. Results for uppercase, cursive and mixed case word recognition are reported. Finally a system combining the on- and off-line recognition is presented.

Proceedings ArticleDOI
03 Oct 1996
TL;DR: A novel approach to speech recognition which is based on phonetic features as basic recognition units and the delayed synchronisation of these features within a higher-level prosodic domain, viz. the syllable is described.
Abstract: Describes a novel approach to speech recognition which is based on phonetic features as basic recognition units and the delayed synchronisation of these features within a higher-level prosodic domain, viz. the syllable. The object of this approach is to avoid a rigid segmentation of the speech signal as it is usually carried out by standard segment-based recognition systems. The architectural setup of the system is described, as well as evaluation tests carried out on a medium-sized corpus of spontaneous speech (German). Syllable and phoneme recognition results are given and compared to recognition rates obtained by a standard triphone-based HMM recogniser trained and tested on the same data set.

Journal ArticleDOI
TL;DR: This paper looks at LSI from a different perspective, comparing it to statistical regression and Bayesian methods and finding relationships found can be useful in explaining the performance of LSI and in suggesting variations on the LSI approach.
Abstract: Latent Semantic Indexing (LSI) is an effective automated method for determining if a document is relevant to a reader based on a few words or an abstract describing the reader's needs. A particular feature of LSI is its ability to deal automatically with synonyms. LSI generally is explained in terms of a mathematical concept called the Singular Value Decomposition and statistical methods such as factor analysis. This paper looks at LSI from a different perspective, comparing it to statistical regression and Bayesian methods. The relationships found can be useful in explaining the performance of LSI and in suggesting variations on the LSI approach.

Proceedings Article
Paul Davidsson1
04 Jun 1996
TL;DR: A novel method for learning characteristic decision trees is applied to the problem of learning the decision mechanism of coin-sorting machines by augmenting each leaf of the decision tree with a subtree that imposes further restrictions on the values of each feature in that leaf.
Abstract: A novel method for learning characteristic decision trees is applied to the problem of learning the decision mechanism of coin-sorting machines. Decision trees constructed by ID3-like algorithms are unable to detect instances of categories not present in the set of training examples. Instead of being rejected, such instances are assigned to one of the classes actually present in the training set. To solve this problem the algorithm must learn characteristic, rather than discriminative, category descriptions. In addition, the ability to control the degree of generalization is identified as an essential property of such algorithms. A novel method using the information about the statistical distribution of the feature values that can be extracted from the training examples is developed to meet these requirements. The central idea is to augment each leaf of the decision tree with a subtree that imposes further restrictions on the values of each feature in that leaf. (Less)