Learning Algorithms for Grammars of Variable Arity Trees
13 Dec 2007-pp 98-103
TL;DR: This paper gives algorithms for inference of local, single type and regular grammar and also considers the use of negative samples in the inference of tree grammars from a set of sample input trees.
Abstract: Grammatical Inference is the technique by which a grammar that best describes a given set of input samples is inferred. This paper considers the inference of tree grammars from a set of sample input trees. Inference of grammars for fixed arity trees is well studied, in this paper we extend the method to give algorithms for inference of grammars for variable arity trees. We give algorithms for inference of local, single type and regular grammar and also consider the use of negative samples. The variable arity trees we consider can be used for representation of XML documents and the algorithms we have given can be used for validation as well as for schema inference.
Content maybe subject to copyright Report
Citations
More filters
[...]
TL;DR: A systems engineering view towards the requirements specification process and a method for the flowdown process is presented and a case study based on an electric Unmanned Aerial Vehicle scenario demonstrates how top level requirements for performance, cost, and safety flow down to the health management level and specify quantitative requirements for prognostic algorithm performance.
Abstract: Prognostics and Health Management (PHM) principles have considerable promise to change the game of lifecycle cost of engineering systems at high safety levels by providing a reliable estimate of future system states. This estimate is a key for planning and decision making in an operational setting. While technology solutions have made considerable advances, the tie-in into the systems engineering process is lagging behind, which delays fielding of PHM-enabled systems. The derivation of specifications from high level requirements for algorithm performance to ensure quality predictions is not well developed. From an engineering perspective some key parameters driving the requirements for prognostics performance include: (1) maximum allowable Probability of Failure (PoF) of the prognostic system to bound the risk of losing an asset, (2) tolerable limits on proactive maintenance to minimize missed opportunity of asset usage, (3) lead time to specify the amount of advanced warning needed for actionable decisions, and (4) required confidence to specify when prognosis is sufficiently good to be used. This paper takes a systems engineering view towards the requirements specification process and presents a method for the flowdown process. A case study based on an electric Unmanned Aerial Vehicle (e-UAV) scenario demonstrates how top level requirements for performance, cost, and safety flow down to the health management level and specify quantitative requirements for prognostic algorithm performance.
32 citations
Journal Article•
[...]
TL;DR: This paper considers bottom-up tree automata and discusses the sequential distributed version of this model, and finds that the ∗- mode does not increase the power, whereas the other modes increase thePower.
Abstract: Tree automata have been defined to accept trees. Different types of acceptance like bottom-up, top-down, tree walking have been considered in the literature. In this paper, we consider bottom-up tree automata and discuss the sequential distributed version of this model. Generally, this type of distribution is called cooperative distributed automata or the blackboard model. We define the traditional five modes of cooperation, viz. ∗-mode, t-mode, = k, ≥ k, ≤ k (k ≥ 1) modes on bottom-up tree automata. We discuss the accepting power of cooperative distributed tree automata under these modes of cooperation. We find that the ∗- mode does not increase the power, whereas the other modes increase the power. We discuss a few results comparing the acceptance power under different modes of cooperation.
2 citations
References
More filters
[...]
TL;DR: The CCR ratio form introduced by Charnes, Cooper and Rhodes, as part of their Data Envelopment Analysis approach, comprehends both technical and scale inefficiencies via the optimal value of the ratio form, as obtained directly from the data without requiring a priori specification of weights and/or explicit delineation of assumed functional forms of relations between inputs and outputs as mentioned in this paper.
Abstract: In management contexts, mathematical programming is usually used to evaluate a collection of possible alternative courses of action en route to selecting one which is best. In this capacity, mathematical programming serves as a planning aid to management. Data Envelopment Analysis reverses this role and employs mathematical programming to obtain ex post facto evaluations of the relative efficiency of management accomplishments, however they may have been planned or executed. Mathematical programming is thereby extended for use as a tool for control and evaluation of past accomplishments as well as a tool to aid in planning future activities. The CCR ratio form introduced by Charnes, Cooper and Rhodes, as part of their Data Envelopment Analysis approach, comprehends both technical and scale inefficiencies via the optimal value of the ratio form, as obtained directly from the data without requiring a priori specification of weights and/or explicit delineation of assumed functional forms of relations between inputs and outputs. A separation into technical and scale efficiencies is accomplished by the methods developed in this paper without altering the latter conditions for use of DEA directly on observational data. Technical inefficiencies are identified with failures to achieve best possible output levels and/or usage of excessive amounts of inputs. Methods for identifying and correcting the magnitudes of these inefficiencies, as supplied in prior work, are illustrated. In the present paper, a new separate variable is introduced which makes it possible to determine whether operations were conducted in regions of increasing, constant or decreasing returns to scale in multiple input and multiple output situations. The results are discussed and related not only to classical single output economics but also to more modern versions of economics which are identified with "contestable market theories."
13,542 citations
[...]
TL;DR: AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) tests; a standard error that decreased as both AUC and the number of test samples increased; decision threshold independent; and it is invariant to a priori class probabilities.
Abstract: In this paper we investigate the use of the area under the receiver operating characteristic (ROC) curve (AUC) as a performance measure for machine learning algorithms. As a case study we evaluate six machine learning algorithms (C4.5, Multiscale Classifier, Perceptron, Multi-layer Perceptron, k-Nearest Neighbours, and a Quadratic Discriminant Function) on six ''real world'' medical diagnostics data sets. We compare and discuss the use of AUC to the more conventional overall accuracy and find that AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) tests; a standard error that decreased as both AUC and the number of test samples increased; decision threshold independent; and it is invariant to a priori class probabilities. The paper concludes with the recommendation that AUC be used in preference to overall accuracy for ''single number'' evaluation of machine learning algorithms.
4,289 citations
[...]
TL;DR: It is shown theoretically and empirically that AUC is a better measure (defined precisely) than accuracy and reevaluate well-established claims in machine learning based on accuracy using AUC and obtain interesting and surprising new results.
Abstract: The area under the ROC (receiver operating characteristics) curve, or simply AUC, has been traditionally used in medical diagnosis since the 1970s. It has recently been proposed as an alternative single-number measure for evaluating the predictive ability of learning algorithms. However, no formal arguments were given as to why AUC should be preferred over accuracy. We establish formal criteria for comparing two different measures for learning algorithms and we show theoretically and empirically that AUC is a better measure (defined precisely) than accuracy. We then reevaluate well-established claims in machine learning based on accuracy using AUC and obtain interesting and surprising new results. For example, it has been well-established and accepted that Naive Bayes and decision trees are very similar in predictive accuracy. We show, however, that Naive Bayes is significantly better than decision trees in AUC. The conclusions drawn in this paper may make a significant impact on machine learning and data mining applications.
1,162 citations
Posted Content•
[...]
TL;DR: The ROC convex hull (ROCCH) method as mentioned in this paper combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers.
Abstract: In real-world environments it usually is difficult to specify target operating conditions precisely, for example, target misclassification costs. This uncertainty makes building robust classification systems problematic. We show that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions. In some cases, the performance of the hybrid actually can surpass that of the best known classifier. This robust performance extends across a wide variety of comparison frameworks, including the optimization of metrics such as accuracy, expected cost, lift, precision, recall, and workforce utilization. The hybrid also is efficient to build, to store, and to update. The hybrid is based on a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull (ROCCH) method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses. Finally, we point to empirical evidence that a robust hybrid classifier indeed is needed for many real-world problems.
1,114 citations
Related Papers (5)
[...]
[...]
[...]