scispace - formally typeset
Search or ask a question

Showing papers on "Decision tree model published in 2000"


Proceedings ArticleDOI
31 Jul 2000
TL;DR: Initial results are presented showing that a tree-based model derived from aTree-annotated corpus improves on a tree modelderived from an unannotated Corpus, and that a Tree-based stochastic model with a hand-crafted grammar outperforms both.
Abstract: Previous stochastic approaches to generation do not include a tree-based representation of syntax. While this may be adequate or even advantageous for some applications, other applications profit from using as much syntactic knowledge as is available, leaving to a stochastic model only those issues that are not determined by the grammar. We present initial results showing that a tree-based model derived from a tree-annotated corpus improves on a tree model derived from an unannotated corpus, and that a tree-based stochastic model with a hand-crafted grammar outperforms both.

209 citations


Proceedings Article
11 Apr 2000

144 citations


Journal ArticleDOI
TL;DR: A new (easier and more modular) proof for the results of BNS and Chung is used, which gives a simpler way to prove lower bounds for the multi-party communication complexity of a function.
Abstract: The "Number on the Forehead" model of multi-party communication complexity was first suggested by Chandra, Furst and Lipton. The best known lower bound, for an explicit function (in this model), is a lower bound of $ \Omega(n/2^k) $ , where n is the size of the input of each player, and k is the number of players (first proved by Babai, Nisan and Szegedy). This lower bound has many applications in complexity theory. Proving a better lower bound, for an explicit function, is a major open problem. Based on the result of BNS, Chung gave a sufficient criterion for a function to have large multi-party communication complexity (up to $ \Omega(n/2^k) $ ). In this paper, we use some of the ideas of BNS and Chung, together with some new ideas, resulting in a new (easier and more modular) proof for the results of BNS and Chung. This gives a simpler way to prove lower bounds for the multi-party communication complexity of a function.

87 citations


Patent
03 Feb 2000
TL;DR: In this article, a method and apparatus for automatically locating sources of semantic error in a multi-agent system based on setup connection tree information, and informing the appropriate agents so that they can avoid using the faulty resources in the future.
Abstract: A method and apparatus for automatically locating sources of semantic error in a multi-agent system based on setup connection tree information, and informing the appropriate agents so that they can avoid using the faulty resources in the future. The setup connection tree model is established based on patterns of agent actions for expressing the logical relationship between available resources in the disjunctive normal form (d.n.f.). A table is used to record different sets of resources for use in the resource selection process. Thus, faulty resources can be located by means of induction. A global database is also maintained for updating information on semantic errors in the system.

63 citations


Journal ArticleDOI
W. Reichl1, Wu Chou
TL;DR: A new two-level segmental clustering approach is devised which combines the decision tree based state tying with agglomerative clustering of rare acoustic phonetic events.
Abstract: Methods of improving the robustness and accuracy of acoustic modeling using decision tree based state tying are described. A new two-level segmental clustering approach is devised which combines the decision tree based state tying with agglomerative clustering of rare acoustic phonetic events. In addition, a unified maximum likelihood framework for incorporating both phonetic and nonphonetic features in decision tree based state tying is presented. In contrast to other heuristic data separation methods, which often lead to training data depletion, a tagging scheme is used to attach various features of interest and the selection of these features in the decision tree is data driven. Finally, two methods of using multiple-mixture parameterization to improve the quality of the evaluation function in decision tree state tying are described. One method is based on the approach of k-means fitting and the other method is based on a novel use of a local multilevel optimal subtree. Both methods provide more accurate likelihood evaluation in decision tree clustering and are consistent with the structure of the decision tree. Experimental results on Wall Street Journal corpora demonstrate that the proposed approaches lead to a significant improvement in model quality and recognition performance.

59 citations


01 May 2000
TL;DR: Three results in the context of surface realization are presented: a stochastic tree model derived from a parsed corpus outperforms a tree models derived from unannotated corpus and exploiting a hand-crafted grammar in conjunction with a tree model outpe1fonns aTree model without a grammar.
Abstract: Srinivas Bangalore et Owen Rambow AT &T Labs-Research, B233 180 Park Ave, PO Box 971 F1orham Park, NJ 07932-0971, USA srini, rambow@research.att.com 33 Previous stochastic approaches to sentence realization da not include a tree-based representation of syntax. While this may be adequate or even advantageous for some applications, other applications profitfrom using as much syntactic knowledge as is available, leaving to a stochastic model only those issues that are not determined by the grammar. In this paper, we present three results in the context of surface realization: a stochastic tree model derivedfrom a parsed corpus outperforms a tree model derivedfrom unannotated corpus; exploiting a hand-crafted grammar in conjunction with a tree model outpe1fonns a tree model without a grammar; and exploiting a tree model in conjunction with a linear language model outperforms just the tree model.

49 citations


Journal ArticleDOI
TL;DR: The class of all information systems (finite and infinite) is described for which this algorithm has polynomial time complexity depending on the number of columns (attributes) in decision tables.
Abstract: An algorithm is considered which for a given decision table constructs a decision tree with minimal depth. The class of all information systems (finite and infinite) is described for which this algorithm has polynomial time complexity depending on the number of columns (attributes) in decision tables.

38 citations


Proceedings Article
30 Jun 2000
TL;DR: A Markov random field (MRF) approach based on frequent sets and maximum entropy is studied, and it is found that the MRF model provides substantially more accurate probability estimates than the other methods but is more expensive from a computational and memory viewpoint.
Abstract: Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents containing words are just three typical examples. Real-time query selectivity estimation (the problem of estimating the number of rows in the data satisfying a given predicate) is an important practical problem for such databases. We investigate the application of probabilistic models to this problem. In particular, we study a Markov random field (MRF) approach based on frequent sets and maximum entropy, and compare it to the independence model and the Chow-Liu tree model. We find that the MRF model provides substantially more accurate probability estimates than the other methods but is more expensive from a computational and memory viewpoint. To alleviate the computational requirements we show how one can apply bucket elimination and clique tree approaches to take advantage of structure in the models and in the queries. We provide experimental results on two large real-world transaction datasets.

30 citations


Proceedings Article
20 Aug 2000
TL;DR: A strong form of the tree model property is used to boost the performance of resolution-based first-order theorem provers on the so-called relational translations of modal formulas.
Abstract: We use a strong form of the tree model property to boost the performance of resolution-based first-order theorem provers on the so-called relational translations of modal formulas. We provide both the mathematical underpinnings and experimental results concerning our improved translation method.

21 citations


Book ChapterDOI
TL;DR: The class of all information systems (finite and infinite) is described for which this algorithm has polynomial time complexity depending on the number of columns (attributes) in decision tables.
Abstract: An algorithm is considered which for a given decision table constructs a decision tree with minimal number of nodes. The class of all information systems (finite and infinite) is described for which this algorithm has polynomial time complexity depending on the number of columns (attributes) in decision tables.

14 citations


Proceedings Article
28 Jun 2000
TL;DR: The PAC-learnability of decision trees under the uniform distribution without using membership queries has been shown in this article, where it is shown that the size of the decision tree is at most that of the smallest decision tree which can represent f and this construction can be obtained in quasi-polynomial time.
Abstract: Decision trees are popular representations of Boolean functions. We show that, given an alternative representation of a Boolean function f, say as a read-once branching program, one can find a decision tree T which approximates f to any desired amount of accuracy. Moreover, the size of the decision tree is at most that of the smallest decision tree which can represent f and this construction can be obtained in quasi-polynomial time. We also extend this result to the case where one has access only to a source of random evaluations of the Boolean function f instead of a complete representation. In this case, we show that a similar approximation can be obtained with any specified amount of confidence (as opposed to the absolute certainty of the former case.) This latter result implies proper PAC-learnability of decision trees under the uniform distribution without using membership queries.

Book ChapterDOI
21 Jun 2000
TL;DR: Boosting of tree-based classifiers has been interfaced to the GIS GRASS to create predictive classification models from digital maps and the best performance is obtained without controlling tree sizes, which indicates that there is a strong interaction between input variables.
Abstract: Boosting of tree-based classifiers has been interfaced to the Geographical Information System (GIS) GRASS to create predictive classification models from digital maps. On a risk management problem in landscape ecology, the performance of the boosted tree model is better than either with a single classifier or with bagging. This results in an improved digital map of the risk of human exposure to tick-borne diseases in Trentino (Italian Alps) given sampling on 388 sites and the use of several overlaying georeferenced data bases. Margin distributions are compared for bagging and boosting. Boosting is confirmed to give the most accurate model on two additional and independent test sets of reported cases of bites on humans and of infestation measured on roe deer. An interesting feature of combining classification models within a GIS is the visualization through maps of the single elements of the combination: each boosting step map focuses on different details of data distribution. In this problem, the best performance is obtained without controlling tree sizes, which indicates that there is a strong interaction between input variables.

Book ChapterDOI
01 Jan 2000
TL;DR: In this paper, the authors measure the quality of the estimated dissimilarity values using simulated noisy tree distances and evaluate the missing values using a tree model, which is based on the same approach as ours.
Abstract: In phylogeny, one tries to approximate a given dissimilarity by a tree distance. In some cases, especially when comparing biological sequences, some dissimilarity values cannot be evaluated and a partial dissimilarity with undefined values is only available. In that case one can develop a sequential method to reconstruct a weighted tree or to evaluate the missing values using a tree model. In this paper we study the latter approach and measure the quality of the estimated values using simulated noisy tree distances.

01 Jan 2000
TL;DR: This thesis investigates variable complexity algorithms and proposes two fast algorithms based on fast distance metric computation or fast matching approaches that allow computational scalability in distance computation with graceful degradation in the overall image quality.
Abstract: In this thesis we investigate variable complexity algorithms. The complexities of these algorithms are input-dependent, i.e., the type of input determines the complexity required to complete the operation. The key idea is to enable the algorithm to classify the inputs so that unnecessary operations can be pruned. The goal of the design of the variable complexity algorithm is to minimize the average complexity over all possible input types, including the cost of classifying the inputs. We study two of the fundamental operations in standard image/video compression, namely, the discrete cosine transform (DCT) and motion estimation (ME). We first explore variable complexity in inverse DCT by testing for zero inputs. The test structure can also be optimized for minimal total complexity for a given inputs statistics. In this case, the larger the number of zero coefficients, i.e., the coarser the quantization stepsize, the greater the complexity reduction. As a consequence, tradeoffs between complexity and distortion can be achieved. For direct DCT we propose a variable complexity fast approximation algorithm. The variable complexity part computes only DCT coefficients that will not be quantized to zeros according to the classification results (in addition the quantizer can benefit from this information by by-passing its operations for zero coefficients). The classification structure can also be optimized for a given input statistics. On the other hand, the fast approximation part approximates the DCT coefficients with much less complexity. The complexity can be scaled, i.e., it allows more complexity reduction at lower quality coding, and can be made quantization-dependent to keep the distortion degradation at a certain level. In video coding, ME is the part of the encoder that requires the most complexity and therefore achieving significant complexity reduction in ME has always been a goal in video coding research. We propose two fast algorithms based on fast distance metric computation or fast matching approaches. Both of our algorithms allow computational scalability in distance computation with graceful degradation in the overall image quality. The first algorithm exploits hypothesis testing in fast metric computation whereas the second algorithm uses thresholds obtained from partial distances in hierarchical candidate elimination. (Abstract shortened by UMI.)

Journal ArticleDOI
TL;DR: A non-discrete tree model that has been designed to be easier to evolve than more commonly used Boolean trees, encoded by a genome that contains binary data as well as real-valued data is presented.
Abstract: Boolean keyword trees can be used for information filtering. When they are generated by an evolutionary algorithm they can adapt to changes in the user's interests. This paper presents a non-discrete tree model that has been designed to be easier to evolve than more commonly used Boolean trees. Each tree is encoded by a genome that contains binary data as well as real-valued data. The results of the experiments are promising and suggest that the proposed model indeed offers an improvement.

Journal ArticleDOI
Raymond L. Major1
01 Feb 2000
TL;DR: This work introduces a practical algorithm that forms a finite number of features using a decision tree in a polynomial amount of time and shows empirically that this procedure forms many features that subsequently appear in a tree and the new features aid in producing simpler trees when concepts are being learned from certain problem domains.
Abstract: Using decision trees as a concept description language, we examine the time complexity for learning Boolean functions with polynomial-sized disjunctive normal form expressions when feature construction is performed on an initial decision tree containing only primitive attributes. A shortcoming of several feature-construction algorithms found in the literature is that it is difficult to develop time complexity results for them. We illustrate a way to determine a limit on the number of features to use for building more concise trees within a standard amount of time. We introduce a practical algorithm that forms a finite number of features using a decision tree in a polynomial amount of time. We show empirically that our procedure forms many features that subsequently appear in a tree and the new features aid in producing simpler trees when concepts are being learned from certain problem domains. Expert systems developers can use a method such as this to create a knowledge base of information that contains specific knowledge in the form of If-Then rules.

Proceedings ArticleDOI
30 Aug 2000
TL;DR: This paper documents experiments performed using a GA to optimise the parameters of a dynamic neural tree model, created from two selected clustering measures, and a population of genotypes, specifying parameters of the model were evolved.
Abstract: This paper documents experiments performed using a GA to optimise the parameters of a dynamic neural tree model. Two fitness functions were created from two selected clustering measures, and a population of genotypes, specifying parameters of the model were evolved. This process mirrors genomic evolution and ontogeny. It is shown that the evolved parameter values improved performance.

Journal Article
TL;DR: This paper tries to analyze the different implementations of an algorithm and to predict the relative performance differences among them through combining the memory complexity analysis and the data movement/floating point operation ratio analysis.
Abstract: Memory systems become more and more complicated with so many efforts on bridging the large speed gap between processor and main memory. It is now difficult to gain high performance from a processor or a large parallel processing systems without considering the specific memory system features. Thus it becomes not enough just to use the time and space complexity to explain why different forms of one algorithm explore so different performance on one same platform. The complexity of memory systems must be incorporated into the analysis of algorithms. In 1996, Sun Jiachang first presented a new concept on memory complexity. It is believed that the complexity of an algorithm should consist of its computational complexity and memory complexity, among them, computational complexity consists of time complexity and space complexity, which are the basic characteristics of algorithm; while memory complexity is a varying characteristic, which will change with different implementations of the same algorithm and different platforms. The purpose of algorithmic optimization is to reduce the memory complexity, while the reduction of computational complexity needs new algorithmic research activity. In this paper, we try to analyze the different implementations of an algorithm and to predict the relative performance differences among them through combining the memory complexity analysis and the data movement/floating point operation ratio analysis. Further analysis with remote communication in parallel processing will be our future work.

Book ChapterDOI
18 Apr 2000
TL;DR: This work describes a new oblique decision tree induction algorithm, VQTree, which uses Learning Vector Quantization to form a non-parametric model of the training set, and obtains a set of hyperplanes which are used as oblique splits in the nodes of a decision tree.
Abstract: We describe a new oblique decision tree induction algorithm. The VQTree algorithm uses Learning Vector Quantization to form a non-parametric model of the training set, and from that obtains a set of hyperplanes which are used as oblique splits in the nodes of a decision tree. We use a set of public data sets to compare VQTree with two existing decision tree induction algorithms, C5.0 and OCl. Our experiments show that VQTree produces compact decision trees with higher accuracy than either C5.0 or OCl on some datasets.

01 Jan 2000
TL;DR: A technique for measuring the tradeoff between pre- dictive performance and available run time system resources is presented and an algorithm for pruning the ensemble meta- classifier is described as a means to reduce its size while preserving its accuracy.
Abstract: In this paper we study methods that combine multiple clas- sification models learned over separate data sets in a dis- tributed database setting. Numerous studies posit that such approaches provide the means to efficiently scale learning to large datasets, while also boosting the accuracy of individ- ual classifiers. These gains, however, come at the expense of an increased demand for run-time system resources. The fi- nal ensemble meta-classifier may consist of a large collection of base classifiers that require increased memory resources while also slowing down classification throughput. Here, we present a technique for measuring the tradeoff between pre- dictive performance and available run time system resources and we describe an algorithm for pruning (i.e. discarding a subset of the available base classifiers) the ensemble meta- classifier as a means to reduce its size while preserving its accuracy. The algorithm is independent of the method used initially when computing the meta-classifier. It is based on decision tree pruning methods and relies on the map- ping of an arbitrary ensemble meta-classifier to a decision tree model. Through an extensive empirical study on meta- classifiers computed over two real data sets, we illustrate our pruning algorithm to be a robust approach to discarding classification models without degrading the overall predic- tive performance of an ensemble computed over those that remain after pruning.

Proceedings ArticleDOI
07 Apr 2000
TL;DR: The background and justification for a new approach to studying computation and computational complexity is presented, focusing on categories of problems and categories of solutions which provide the logical definition on which to base an algorithm.
Abstract: We present the background and justification for a new approach to studying computation and computational complexity. We focus on categories of problems and categories of solutions which provide the logical definition on which to base an algorithm. Computational capability is introduced via a formalization of computation termed a model of computation. The concept of algorithm is formalized using the methods of Traub, Wasilkowski and Wozniakowski, from which we can formalize the differences between deterministic, non-deterministic, and heuristic algorithms. Finally, we introduce our measure of complexity: the Hartley entropy measure. We provide many examples to amplify the concepts introduced.