scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Learning Algorithms for Grammars of Variable Arity Trees

13 Dec 2007-pp 98-103
TL;DR: This paper gives algorithms for inference of local, single type and regular grammar and also considers the use of negative samples in the inference of tree grammars from a set of sample input trees.
Abstract: Grammatical Inference is the technique by which a grammar that best describes a given set of input samples is inferred. This paper considers the inference of tree grammars from a set of sample input trees. Inference of grammars for fixed arity trees is well studied, in this paper we extend the method to give algorithms for inference of grammars for variable arity trees. We give algorithms for inference of local, single type and regular grammar and also consider the use of negative samples. The variable arity trees we consider can be used for representation of XML documents and the algorithms we have given can be used for validation as well as for schema inference.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
29 Jan 2012
TL;DR: A systems engineering view towards the requirements specification process and a method for the flowdown process is presented and a case study based on an electric Unmanned Aerial Vehicle scenario demonstrates how top level requirements for performance, cost, and safety flow down to the health management level and specify quantitative requirements for prognostic algorithm performance.
Abstract: Prognostics and Health Management (PHM) principles have considerable promise to change the game of lifecycle cost of engineering systems at high safety levels by providing a reliable estimate of future system states. This estimate is a key for planning and decision making in an operational setting. While technology solutions have made considerable advances, the tie-in into the systems engineering process is lagging behind, which delays fielding of PHM-enabled systems. The derivation of specifications from high level requirements for algorithm performance to ensure quality predictions is not well developed. From an engineering perspective some key parameters driving the requirements for prognostics performance include: (1) maximum allowable Probability of Failure (PoF) of the prognostic system to bound the risk of losing an asset, (2) tolerable limits on proactive maintenance to minimize missed opportunity of asset usage, (3) lead time to specify the amount of advanced warning needed for actionable decisions, and (4) required confidence to specify when prognosis is sufficiently good to be used. This paper takes a systems engineering view towards the requirements specification process and presents a method for the flowdown process. A case study based on an electric Unmanned Aerial Vehicle (e-UAV) scenario demonstrates how top level requirements for performance, cost, and safety flow down to the health management level and specify quantitative requirements for prognostic algorithm performance.

35 citations

Journal Article
TL;DR: This paper considers bottom-up tree automata and discusses the sequential distributed version of this model, and finds that the ∗- mode does not increase the power, whereas the other modes increase thePower.
Abstract: Tree automata have been defined to accept trees. Different types of acceptance like bottom-up, top-down, tree walking have been considered in the literature. In this paper, we consider bottom-up tree automata and discuss the sequential distributed version of this model. Generally, this type of distribution is called cooperative distributed automata or the blackboard model. We define the traditional five modes of cooperation, viz. ∗-mode, t-mode, = k, ≥ k, ≤ k (k ≥ 1) modes on bottom-up tree automata. We discuss the accepting power of cooperative distributed tree automata under these modes of cooperation. We find that the ∗- mode does not increase the power, whereas the other modes increase the power. We discuss a few results comparing the acceptance power under different modes of cooperation.

2 citations

References
More filters
Journal ArticleDOI
TL;DR: This work presents a formal framework for XML schema languages based on regular tree grammars that helps to describe, compare, and implement such schema languages in a rigorous manner.
Abstract: On the basis of regular tree grammars, we present a formal framework for XML schema languages. This framework helps to describe, compare, and implement such schema languages in a rigorous manner. Our main results are as follows: (1) a simple framework to study three classes of tree languages (local, single-type, and regular); (2) classification and comparison of schema languages (DTD, W3C XML Schema, and RELAX NG) based on these classes; (3) efficient document validation algorithms for these classes; and (4) other grammatical concepts and advanced validation algorithms relevant to an XML model (e.g., binarization, derivative-based validation).

495 citations


"Learning Algorithms for Grammars of..." refers background or methods or result in this paper

  • ...XML Schema in terms of power and that regular tree grammar was similar in power to the Relax NG. The inferred grammars were later validated against the input samples using an algorithm given by [ 6 ] and it was found that all samples were validated, further showing the accuracy of the inferred grammar....

    [...]

  • ...In this paper we show how local, single type as well as regular tree grammars[ 6 ] can be inferred from a set of positive samples of trees having variable arity....

    [...]

  • ...Variable arity trees can be generated using special types of grammars [ 6 ]....

    [...]

  • ...This is also in agreement with the claims made in [ 6 ] that singletype grammar corresponded to the...

    [...]

  • ...This supports the claim made in [ 6 ] that local tree grammars are similar in power to the DTDs....

    [...]

Journal ArticleDOI
16 May 2000
TL;DR: The results of the experiments with real-life and synthetic DTDs demonstrate the effectiveness of XTRACT's approach in inferring concise and semantically meaningful DTD schemas for XML databases.
Abstract: XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which plays the role of a schema for an XML data collection. DTDs contain valuable information on the structure of documents and thus have a crucial role in the efficient storage of XML data, as well as the effective formulation and optimization of XML queries. In this paper, we propose XTRACT, a novel system for inferring a DTD schema for a database of XML documents. Since the DTD syntax incorporates the full expressive power of regular expressions, naive approaches typically fail to produce concise and intuitive DTDs. Instead, the XTRACT inference algorithms employ a sequence of sophisticated steps that involve: (1) finding patterns in the input sequences and replacing them with regular expressions to generate “general” candidate DTDs, (2) factoring candidate DTDs using adaptations of algorithms from the logic optimization literature, and (3) applying the Minimum Description Length (MDL) principle to find the best DTD among the candidates. The results of our experiments with real-life and synthetic DTDs demonstrate the effectiveness of XTRACT's approach in inferring concise and semantically meaningful DTD schemas for XML databases.

240 citations


"Learning Algorithms for Grammars of..." refers methods in this paper

  • ...Other regular expression inference algorithm would also work in its place, such as XTRACT [ 4 ] and [2]....

    [...]

Proceedings ArticleDOI
01 Aug 2000
TL;DR: There is a point/line duality between the two representations of ROC representation, allowing most techniques used in ROC analysis to be readily reproduced in the cost space.
Abstract: This paper proposes an alternative to ROC representation, in which the expected cost of a classi er is represented explicitly. This expected cost representation maintains many of the advantages of ROC representation, but is easier to understand. It allows the experimenter to immediately see the range of costs and class frequencies where a particular classi er is the best and quantitatively how much better it is than other classi ers. This paper demonstrates there is a point/line duality between the two representations. A point in ROC space representing a classi er becomes a line segment spanning the full range of costs and class frequencies. This duality produces equivalent operations in the two spaces, allowing most techniques used in ROC analysis to be readily reproduced in the cost space.

200 citations

Proceedings ArticleDOI
01 Sep 2006
TL;DR: The algorithm iDTD (infer DTD) is presented, that learns SOREs from strings by first inferring an automaton by known techniques and then translating that automaton to a corresponding SORE, possibly by repairing the automaton when no equivalent SORE can be found.
Abstract: We consider the problem to infer a concise Document Type Definition (DTD) for a given set of XML-documents, a problem which basically reduces to learning of concise regular expressions from positive example strings. We identify two such classes: single occurrence regular expressions (SOREs) and chain regular expressions (CHAREs). Both classes capture the far majority of the regular expressions occurring in practical DTDs and are succinct by definition. We present the algorithm iDTD (infer DTD) that learns SOREs from strings by first inferring an automaton by known techniques and then translating that automaton to a corresponding SORE, possibly by repairing the automaton when no equivalent SORE can be found. In the process, we introduce a novel automaton to regular expression rewrite technique which is of independent interest. We show that iDTD outperforms existing systems in accuracy, conciseness and speed. In a scenario where only a very small amount of XML data is available, for instance when generated by Web service requests or by answers to queries, iDTD produces regular expressions which are too specific. Therefore, we introduce a novel learning algorithm CRX that directly infers CHAREs (which form a subclass of SOREs) without going through an automaton representation. We show that CRX performs very well within its target class on very small data sets. Finally, we discuss incremental computation, noise, numerical predicates, and the generation of XML Schemas.

142 citations