Showing papers on "Context-sensitive grammar published in 2014"

PDF

Open Access

Proceedings Article•DOI•

Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules

[...]

01 Apr 2014

TL;DR: This work presents a synchronous grammar formalism in which it is easy to write rules by hand and also acquire them automatically from dependency parses of aligned English and Simple English sentences, which is optimised for monolingual translation.

...read moreread less

Abstract: We present an approach to text simplification based on synchronous dependency grammars. The higher level of abstraction afforded by dependency representations allows for a linguistically sound treatment of complex constructs requiring reordering and morphological change, such as conversion of passive voice to active. We present a synchronous grammar formalism in which it is easy to write rules by hand and also acquire them automatically from dependency parses of aligned English and Simple English sentences. The grammar formalism is optimised for monolingual translation in that it reuses ordering information from the source sentence where appropriate. We demonstrate the superiority of our approach over a leading contemporary system based on quasi-synchronous tree substitution grammars, both in terms of expressivity and performance.

...read moreread less

87 citations

Journal Article•DOI•

On the relation between context-free grammars and parsing expression grammars

[...]

Fabio Mascarenhas¹, Sérgio Medeiros, Roberto Ierusalimschy²•Institutions (2)

Federal University of Rio de Janeiro¹, Pontifical Catholic University of Rio de Janeiro²

01 Sep 2014-Science of Computer Programming

TL;DR: A new formalism for CFGs is presented that borrows from PEGs the use of parsing expressions and the recognition-based semantics, and it is shown how one way of removing non-determinism from this formalism yields a formalism with the semantics of P EGs.

...read moreread less

32 citations

Journal Article•DOI•

Distributional learning of parallel multiple context-free grammars

[...]

Alexander Clark¹, Ryo Yoshinaka²•Institutions (2)

King's College London¹, Kyoto University²

01 Jul 2014-Machine Learning

TL;DR: A family of distributional learning algorithms for context-free grammars are extended to the class of Parallel Multiple Context-Free Grammars (pmcfgs), which are capable of representing all of the syntactic phenomena that have been claimed to exist in natural language.

...read moreread less

Abstract: Natural languages require grammars beyond context-free for their description. Here we extend a family of distributional learning algorithms for context-free grammars to the class of Parallel Multiple Context-Free Grammars (pmcfgs). These grammars have two additional operations beyond the simple context-free operation of concatenation: the ability to interleave strings of symbols, and the ability to copy or duplicate strings. This allows the grammars to generate some non-semilinear languages, which are outside the class of mildly context-sensitive grammars. These grammars, if augmented with a suitable feature mechanism, are capable of representing all of the syntactic phenomena that have been claimed to exist in natural language. We present a learning algorithm for a large subclass of these grammars, that includes all regular languages but not all context-free languages. This algorithm relies on a generalisation of the notion of distribution as a function from tuples of strings to entire sentences; we define nonterminals using finite sets of these functions. Our learning algorithm uses a nonprobabilistic learning paradigm which allows for membership queries as well as positive samples; it runs in polynomial time.

...read moreread less

28 citations

Journal Article•DOI•

An extension of context-free grammars with one-sided context specifications

[...]

Mikhail Barash¹, Alexander Okhotin²•Institutions (2)

Turku Centre for Computer Science¹, University of Turku²

01 Oct 2014-Information & Computation

TL;DR: Two equivalent definitions of grammars with left contexts are given and their basic properties are established, including a transformation to a normal form and a cubic-time parsing algorithm, with a square-time version for unambiguous Grammars.

...read moreread less

Abstract: The paper introduces an extension of context-free grammars equipped with an operator for referring to the left context of the substring being defined. For example, a rule A->[email protected]?B defines a symbol a, as long as it is preceded by a string defined by B. The conjunction operator in this example is taken from conjunctive grammars (Okhotin, 2001), which are an extension of ordinary context-free grammars that maintains most of their practical properties, including many parsing algorithms. This paper gives two equivalent definitions of grammars with left contexts-by logical deduction and by language equations-and establishes their basic properties, including a transformation to a normal form and a cubic-time parsing algorithm, with a square-time version for unambiguous grammars.

...read moreread less

26 citations

Posted Content•

Context-Free Grammars with Storage

[...]

Joost Engelfriet¹•Institutions (1)

Leiden University¹

04 Aug 2014-arXiv: Formal Languages and Automata Theory

TL;DR: The context-free S languages can be obtained from the deterministic one-way S automaton languages by way of the delta operations on languages, introduced in this paper.

...read moreread less

Abstract: Context-free S grammars are introduced, for arbitrary (storage) type S, as a uniform framework for recursion-based grammars, automata, and transducers, viewed as pro- grams. To each occurrence of a nonterminal of a context-free S grammar an object of type S is associated, that can be acted upon by tests and operations, as indicated in the rules of the grammar. Taking particular storage types gives particular formalisms, such as indexed grammars, top-down tree transducers, attribute grammars, etc. Context-free S grammars are equivalent to pushdown S automata. The context-free S languages can be obtained from the deterministic one-way S automaton languages by way of the delta operations on languages, introduced in this paper.

...read moreread less

24 citations

Journal Article•DOI•

The design and implementation of Object Grammars

[...]

Tijs van der Storm¹, William R. Cook², Alex Loh²•Institutions (2)

Centrum Wiskunde & Informatica¹, University of Texas at Austin²

15 Dec 2014-Science of Computer Programming

TL;DR: The approach to Object Grammars is implemented as one of the foundations of the Ens?

...read moreread less

23 citations

Journal Article•DOI•

Systematic rule analysis of generative design grammars

[...]

Corinna Königseder¹, Kristina Shea¹•Institutions (1)

ETH Zurich¹

01 Aug 2014-Ai Edam Artificial Intelligence for Engineering Design, Analysis and Manufacturing

TL;DR: This research presents a grammar rule analysis method to provide a more systematic development process for grammar rules and aims to improve the quality of the rules and in turn have a major impact on thequality of the designs generated.

...read moreread less

Abstract: The use of generative design grammars for computational design synthesis has been shown to be successful in many application areas. The development of advanced search and optimization strategies to guide the computational synthesis process is an active research area with great improvements in the last decades. The development of the grammar rules, however, often resembles an art rather than a science. Poor grammars drive the need for problem specific and sophisticated search and optimization algorithms that guide the synthesis process toward valid and optimized designs in a reasonable amount of time. Instead of tuning search algorithms for inferior grammars, this research focuses on designing better grammars to not unnecessarily burden the search process. It presents a grammar rule analysis method to provide a more systematic development process for grammar rules. The goal of the grammar rule analysis method is to improve the quality of the rules and in turn have a major impact on the quality of the designs generated. Four different grammars for automated gearbox synthesis are used as a case study to validate the developed method and show its potential.

...read moreread less

20 citations

Journal Article•DOI•

Fundamental methodological issues of syntactic pattern recognition

[...]

Mariusz Flasiński¹, Janusz Jurek¹•Institutions (1)

Jagiellonian University¹

01 Aug 2014-Pattern Analysis and Applications

TL;DR: Methodological considerations on crucial issues in areas of string and graph grammar-based syntactic methods are made and recommendations concerning an enhancement of context-free grammars as well as constructing parsable and inducible classes of graph Grammars are formulated.

...read moreread less

Abstract: Fundamental open problems, which are frontiers of syntactic pattern recognition are discussed in the paper. Methodological considerations on crucial issues in areas of string and graph grammar-based syntactic methods are made. As a result, recommendations concerning an enhancement of context-free grammars as well as constructing parsable and inducible classes of graph grammars are formulated.

...read moreread less

19 citations

Posted Content•

Hybrid Type-Logical Grammars, First-Order Linear Logic and the Descriptive Inadequacy of Lambda Grammars

[...]

Richard Moot

26 May 2014-arXiv: Logic in Computer Science

TL;DR: This embedding result has several important consequences: it not only provides a simple new proof theory for the calculus, thereby clarifying the proof-theoretic foundations of hybrid type-logical grammars, but, since the translation is simple and direct, it also provides several new parsing strategies for hybridType-Logical Grammars.

...read moreread less

Abstract: In this article we show that hybrid type-logical grammars are a fragment of first-order linear logic. This embedding result has several important consequences: it not only provides a simple new proof theory for the calculus, thereby clarifying the proof-theoretic foundations of hybrid type-logical grammars, but, since the translation is simple and direct, it also provides several new parsing strategies for hybrid type-logical grammars. Second, NP-completeness of hybrid type-logical grammars follows immediately. The main embedding result also sheds new light on problems with lambda grammars/abstract categorial grammars and shows lambda grammars/abstract categorial grammars suffer from problems of over-generation and from problems at the syntax-semantics interface unlike any other categorial grammar.

...read moreread less

15 citations

Book Chapter•DOI•

Unsafe Order-2 Tree Languages Are Context-Sensitive

[...]

Naoki Kobayashi¹, Kazuhiro Inaba², Takeshi Tsukada³•Institutions (3)

University of Tokyo¹, Google², University of Oxford³

05 Apr 2014

TL;DR: It is shown that any tree languages generated by order-2 unsafe grammars are context-sensitive and this also implies that any unsafe order-3 word languages arecontext-sensitive.

...read moreread less

Abstract: Higher-order grammars have been extensively studied in 1980’s and interests in them have revived recently in the context of higher-order model checking and program verification, where higherorder grammars are used as models of higher-order functional programs A lot of theoretical questions remain open, however, for unsafe higherorder grammars (grammars without the so-called safety condition) In this paper, we show that any tree languages generated by order-2 unsafe grammars are context-sensitive This also implies that any unsafe order-3 word languages are context-sensitive The proof involves novel technique based on typed lambda-calculus, such as type-based grammar transformation

...read moreread less

14 citations

Journal Article•DOI•

Modular grammar specification

[...]

Adrian Johnstone¹, Elizabeth Scott¹, Mark van den Brand²•Institutions (2)

Royal Holloway, University of London¹, Eindhoven University of Technology²

01 Jul 2014-Science of Computer Programming

TL;DR: A semantics for building grammars from a modularised specification in which modules are able to delete productions from imported nonterminals is established, to allow a precise answer to the question: ‘what character level language does this grammar generate’ in the face of difficult issues.

...read moreread less

Book Chapter•DOI•

Theorems on Pre-fixed Points of Non-Monotonic Functions with Applications in Logic Programming and Formal Grammars

[...]

Zoltán Ésik¹, Panos Rondogiannis²•Institutions (2)

University of Szeged¹, National and Kapodistrian University of Athens²

01 Sep 2014

TL;DR: The results of the paper lead to a natural generalization of the model intersection theorem for definite logic programs, to the more general class of normal logic programs.

...read moreread less

Abstract: We derive two novel theorems regarding pre-fixed points of non-monotonic functions and demonstrate that they have immediate applications in logic programming and formal grammars. In particular, the results of the paper lead to a natural generalization of the model intersection theorem for definite logic programs, to the more general class of normal logic programs. Moreover, the obtained results also offer the first to our knowledge model intersection result for Boolean grammars.

...read moreread less

Proceedings Article•DOI•

The Phenogrammar of Coordination

[...]

Chris Worth¹•Institutions (1)

Ohio State University¹

01 Apr 2014

TL;DR: The notion of a phenominator is introduced as a way to encode the term structure of a functor separately from its “string support”, then employed to analyze a range of coordination phenomena typically left unaddressed by Linear Logic-based Curryesque frameworks.

...read moreread less

Abstract: Linear Categorial Grammar (LinCG) is a sign-based, Curryesque, relational, logical categorial grammar (CG) whose central architecture is based on linear logic. Curryesque grammars separate the abstract combinatorics (tectogrammar) of linguistic expressions from their concrete, audible representations (phenogrammar). Most of these grammars encode linear order in string-based lambda terms, in which there is no obvious way to distinguish right from left. Without some notion of directionality, grammars are unable to differentiate, say, subject and object for purposes of building functorial coordinate structures. We introduce the notion of a phenominator as a way to encode the term structure of a functor separately from its “string support”. This technology is then employed to analyze a range of coordination phenomena typically left unaddressed by Linear Logic-based Curryesque frameworks.

...read moreread less

Journal Article•

The Non-Hierarchical Nature of the Chomsky Hierarchy-Driven Artificial-Grammar Learning

[...]

Shiro Ojima¹, Kazuo Okanoya¹•Institutions (1)

University of Tokyo¹

26 May 2014-Biolinguistics

TL;DR: It is argued that even those newer AnBn grammars cannot test the learning of syntactic hierarchy, and serves to interpret recent animal studies, which make surprising claims about animals’ ability to handle center embedding.

...read moreread less

Abstract: Recent artificial-grammar learning (AGL) paradigms driven by the Chomsky hierarchy paved the way for direct comparisons between humans and animals in the learning of center embedding ([A[AB]B]). The AnBn grammars used by the first generation of such research lacked a crucial property of center embedding, where the pairs of elements are explicitly matched ([A1 [A2 B2] B1]). This type of indexing is implemented in the second-generation AnBn grammars. This paper reviews recent studies using such grammars. Against the premises of these studies, we argue that even those newer AnBn grammars cannot test the learning of syntactic hierarchy. These studies nonetheless provide detailed information about the conditions under which human adults can learn an AnBn grammar with indexing. This knowledge serves to interpret recent animal studies, which make surprising claims about animals’ ability to handle center embedding.

...read moreread less

Proceedings Article•

Hybrid Grammars for Discontinuous Parsing

[...]

Mark-Jan Nederhof¹, Heiko Vogler²•Institutions (2)

University of St Andrews¹, Dresden University of Technology²

01 Aug 2014

TL;DR: This work formalizes and generalizes some existing mechanisms for dealing with discontinuous phrase structures and non-projective dependency structures and introduces the concept of hybrid grammars, which are extensions of synchronous grammARS, obtained by coupling of lexical elements.

...read moreread less

Abstract: We introduce the concept of hybrid grammars, which are extensions of synchronous grammars, obtained by coupling of lexical elements. One part of a hybrid grammar generates linear structures, another generates hierarchical structures, and together they generate discontinuous structures. This formalizes and generalizes some existing mechanisms for dealing with discontinuous phrase structures and non-projective dependency structures. Moreover, it allows us to separate the degree of discontinuity from the time complexity of parsing.

...read moreread less

Journal Article•DOI•

On some derivation mechanisms and the complexity of their Szilard languages

[...]

Liliana Cojocaru¹, Erkki Mäkinen¹•Institutions (1)

University of Tampere¹

05 Jun 2014-Theoretical Computer Science

TL;DR: This paper focuses on parallel communicating grammar systems (PCGSs) with context-free components, and it is proved that the class of Szilard languages of centralized (returning or non-returning) PCGSs is included in NC1.

...read moreread less

Book Chapter•DOI•

Automatic Evaluation of Context-Free Grammars (System Description)

[...]

Carles Creus¹, Guillem Godoy¹•Institutions (1)

Polytechnic University of Catalonia¹

14 Jul 2014

TL;DR: An online judge for context-free grammars is implemented, and methods based on hashing, SAT, and automata that perform well in practice are designed and implemented.

...read moreread less

Abstract: We implement an online judge for context-free grammars. Our system contains a list of problems describing formal languages, and asking for grammars generating them. A submitted proposal grammar receives a verdict of acceptance or rejection depending on whether the judge determines that it is equivalent to the reference solution grammar provided by the problem setter. Since equivalence of context-free grammars is an undecidable problem, we consider a maximum length l and only test equivalence of the generated languages up to words of length l. This length restriction is very often sufficient for the well-meant submissions. Since this restricted problem is still NP-complete, we design and implement methods based on hashing, SAT, and automata that perform well in practice.

...read moreread less

Book Chapter•DOI•

A Variant of Pure Two-Dimensional Context-Free Grammars Generating Picture Languages

[...]

Zbynĕk Křivka¹, Carlos Martín-Vide², Alexander Meduna¹, K. G. Subramanian³•Institutions (3)

Brno University of Technology¹, Rovira i Virgili University², Universiti Sains Malaysia³

28 May 2014

TL;DR: A new variant of P2DCFGs that generates picture arrays in a leftmost way is introduced that examines the power of these generators that regulate rewriting by control languages.

...read moreread less

Abstract: Considering a large variety of approaches in generating picture languages, the notion of pure two-dimensional context-free grammar P2DCFG represents a simple yet expressive non-isometric language generator of picture arrays. In the present paper, we introduce a new variant of P2DCFGs that generates picture arrays in a leftmost way. We concentrate our attention on determining their generative power by comparing it with the power of other picture generators. We also examine the power of these generators that regulate rewriting by control languages.

...read moreread less

Book Chapter•DOI•

A Generalization of Linear Indexed Grammars Equivalent to Simple Context-Free Tree Grammars

[...]

Makoto Kanazawa¹•Institutions (1)

National Institute of Informatics¹

16 Aug 2014

TL;DR: A generalization of linear indexed grammars that is equivalent to simple context-free tree Grammars in the same way that linear indexedgrammars are equivalent to tree-adjoining grammARS is defined.

...read moreread less

Abstract: I define a generalization of linear indexed grammars that is equivalent to simple context-free tree grammars in the same way that linear indexed grammars are equivalent to tree-adjoining grammars.

...read moreread less

Book Chapter•DOI•

Learning Context Free Grammars with the Finite Context Property: A Correction of A.źClark's Algorithm

[...]

Hans Leiβ¹•Institutions (1)

Ludwig Maximilian University of Munich¹

16 Aug 2014

TL;DR: It is shown by example that Clark's algorithm may converge to a grammar that does not define the input language, and it is proved that the algorithm is correctness.

...read moreread less

Abstract: A.i¾źClark[2] has shown that the class of languages which have a context-free grammar whose nonterminals can be defined by a finite set of contexts can be identified in the limit, given an enumeration of the language and a test for membership. We show by example that Clark's algorithm may converge to a grammar that does not define the input language. We review the theoretical background, provide a non-obvious modification of the algorithm and prove its correctness.

...read moreread less

Journal Article•DOI•

Bimorphisms and synchronous grammars

[...]

Stuart M. Shieber¹•Institutions (1)

Harvard University¹

07 Jul 2014

TL;DR: This paper provides a synthesis and extension of work that unifies two approaches to such language relations: the automatatheoretic approach based on tree transducers that transform trees to their counterparts in the relation, and the grammatical approachbased on synchronous grammars that derive pairs of trees in the relationship.

...read moreread less

Abstract: We tend to think of the study of language as proceeding by characterizing the strings and structures of a language, and we think of natural-language processing as using those structures to build systems of utility in manipulating the language But many language-related problems are more fruitfully viewed as requiring the specification of a relation between two languages, rather than the specification of a single language In this paper, we provide a synthesis and extension of work that unifies two approaches to such language relations: the automatatheoretic approach based on tree transducers that transform trees to their counterparts in the relation, and the grammatical approach based on synchronous grammars that derive pairs of trees in the relation In particular, we characterize synchronous tree-substitution grammars and synchronous tree-adjoining grammars in terms of bimorphisms, which have previously been used to characterize tree transducers In the process, we provide new approaches to formalizing the various concepts: a metanotation for describing varieties of tree automata and transducers in equational terms; a rigorous formalization of tree-adjoining and tree-substitution grammars and their synchronous counterparts, using trees over ranked alphabets; and generalizations of tree-adjoining grammar allowing multiple adjunction

...read moreread less

Journal Article•DOI•

On distributed cooperation and synchronised collaboration

[...]

Maurice H. ter Beek, Jetty Kleijn¹•Institutions (1)

Leiden University¹

01 Jan 2014-Journal of Automata, Languages and Combinatorics

TL;DR: This work investigates how to transfer the concept of synchronisation to grammars by defining grammar teams that agree on the generation of shared terminal symbols based on a novel notion of competence.

...read moreread less

Abstract: In CD grammar systems, the rewriting process is distributed over component grammars that take turns in the derivation of new symbols. Team automata however collaborate by synchronising their actions. Here we investigate how to transfer this concept of synchronisation to grammars by defining grammar teams that agree on the generation of shared terminal symbols based on a novel notion of competence. We first illustrate this idea for the case of regular grammars and next propose an extension to the case of context-free grammars.

...read moreread less

Journal Article•DOI•

One-sided random context grammars with a limited number of right random context rules

[...]

Alexander Meduna¹, Petr Zemek¹•Institutions (1)

Brno University of Technology¹

01 Jan 2014-Theoretical Computer Science

TL;DR: It is demonstrated that any recursively enumerable language can be generated by one-sided random context grammars with no more than two right random context rules.

...read moreread less

Book Chapter•DOI•

Linear Grammars with One-Sided Contexts and Their Automaton Representation

[...]

Mikhail Barash¹, Mikhail Barash², Alexander Okhotin¹•Institutions (2)

Turku Centre for Computer Science¹, University of Turku²

31 Mar 2014

TL;DR: The main result is the undecidability of the emptiness problem for grammars restricted to a one-symbol alphabet, which is proved by simulating a Turing machine by a cellular automaton with feedback.

...read moreread less

Abstract: The paper considers a family of formal grammars that extends linear context-free grammars with an operator for referring to the left context of a substring being defined, as well as with a conjunction operation (as in linear conjunctive grammars). These grammars are proved to be computationally equivalent to an extension of one-way real-time cellular automata with an extra data channel. The main result is the undecidability of the emptiness problem for grammars restricted to a one-symbol alphabet, which is proved by simulating a Turing machine by a cellular automaton with feedback. The same construction proves the $\Sigma^0_2$-completeness of the finiteness problem for these grammars.

...read moreread less

Journal Article•DOI•

Infinite Hierarchy of Permutation Languages

[...]

Grzegorz Madejski¹•Institutions (1)

University of Gdańsk¹

01 Jul 2014-Fundamenta Informaticae

TL;DR: This paper extends the result proving that a strict inclusion of $Perm_2 \subsetneq Perm_3$ was obtained and says that permutation grammar is of order n and all such grammars generate a family of permutation languages Permn.

...read moreread less

Abstract: Permutation grammars are an extension of context-free grammars with rules having the same symbols on both sides but possibly in a different order An example of a permutation rule of length 3 is ABC → CBA If these non-context-free rules are of length at most n, then we say that permutation grammar is of order n and all such grammars generate a family of permutation languages Permn In 2010 Nagy showed that there exists a language such that it cannot be generated by a grammar of order 2, but rules of length 3 are enough In other words, a strict inclusion $Perm_2 \subsetneq Perm_3$ was obtained We extend this result proving that $Perm_{4n \minus 2} \subsetneq Perm_{4n \minus 1}$ for n ≥ 1

...read moreread less

Book Chapter•DOI•

Pushdown Machines for Weighted Context-Free Tree Translation

[...]

Johannes Osterholzer¹•Institutions (1)

Dresden University of Technology¹

30 Jul 2014

TL;DR: A new weighted tree transducer formalism is suggested and it is proved that the transformations of the restricted grammars are precisely those of the linear and nondeleting instances of these transducers.

...read moreread less

Abstract: In this paper, we consider weighted synchronous context-free tree grammars and identify a certain syntactic restriction of these grammars. We suggest a new weighted tree transducer formalism and prove that the transformations of the restricted grammars are precisely those of the linear and nondeleting instances of these transducers.

...read moreread less

Proceedings Article•

Synchronization of Models of Rich Languages with Triple Graph Grammars

[...]

Dominique Blouin, Pierre Dissaux, Frank Singhoff, Alain Plantec, Jean-Philippe Diguet - Show less +1 more

21 Jul 2014

TL;DR: The experience of using Triple Graph Grammars (TGG) to synchronize models of the rich and complex Architecture Analysis and Design Language (AADL), an aerospace standard of the Society of Automotive Engineers, provides a validation of the TGG approach for synchronizing models of large meta-models, but shows that model synchronization remains a challenging task.

...read moreread less

Abstract: We report our experience of using Triple Graph Grammars (TGG) to synchronize models of the rich and complex Architecture Analysis and Design Language (AADL), an aerospace standard of the Society of Automotive Engineers. A synchronization layer has been developed between the OSATE (Open Source AADL Tool Environment) textual editor and the Adele graphical editor in order to improve their integration. Adele has been designed to support editing AADL models in a way that does not necessarily follow the structure of the language, but is adapted to the way designers think. For this reason, it operates on a different meta-model than OSATE. As a result, changes on the graphical model must be propagated automatically to the textual model to ensure consistency of the models. Since Adele does not cover the complete AADL language, this must be done without re-instantiation of the objects to avoid losing the information not represented in the graphical part. The TGG language implemented in the MoTE tool has been used to synchronize the tools. Our results provide a validation of the TGG approach for synchronizing models of large meta-models, but also show that model synchronization remains a challenging task, since several improvements of the TGG language and its tool were required to succeed.

...read moreread less

Proceedings Article•

ML-Optimization of Ported Constraint Grammars

[...]

Eckhard Bick

01 May 2014

TL;DR: How a Constraint Grammar with linguist-written rules can be optimized and ported to another language using a Machine Learning technique and the effects of rule movements, sorting, grammar-sectioning and systematic rule modifications are discussed and quantitatively evaluated.

...read moreread less

Abstract: In this paper, we describe how a Constraint Grammar with linguist-written rules can be optimized and ported to another language using a Machine Learning technique. The effects of rule movements, sorting, grammar-sectioning and systematic rule modifications are discussed and quantitatively evaluated. Statistical information is used to provide a baseline and to enhance the core of manual rules. The best-performing parameter combinations achieved part-of-speech F-scores of over 92 for a grammar ported from English to Danish, a considerable advance over both the statistical baseline (85.7), and the raw ported grammar (86.1). When the same technique was applied to an existing native Danish CG, error reduction was 10% (F=96.94).

...read moreread less

Posted Content•

On the Computation of Distances for Probabilistic Context-Free Grammars

[...]

Colin de la Higuera, James Scicluna, Mark-Jan Nederhof

06 Jul 2014-arXiv: Formal Languages and Automata Theory

TL;DR: It is proved that computing distances corresponds to solving undecidable questions: this is the case for the L1, L2 norm, the variation distance and the Kullback-Leibler divergence.

...read moreread less

Abstract: Probabilistic context-free grammars (PCFGs) are used to define distributions over strings, and are powerful modelling tools in a number of areas, including natural language processing, software engineering, model checking, bio-informatics, and pattern recognition. A common important question is that of comparing the distributions generated or modelled by these grammars: this is done through checking language equivalence and computing distances. Two PCFGs are language equivalent if every string has identical probability with both grammars. This also means that the distance (whichever norm is used) is null. It is known that the language equivalence problem is interreducible with that of multiple ambiguity for context-free grammars, a long-standing open question. In this work, we prove that computing distances corresponds to solving undecidable questions: this is the case for the L1, L2 norm, the variation distance and the Kullback-Leibler divergence. Two more results are less negative: 1. The most probable string can be computed, and, 2. The Chebyshev distance (where the distance between two distributions is the maximum difference of probabilities over all strings) is interreducible with the language equivalence problem.

...read moreread less

Book Chapter•DOI•

An Application of Stochastic Context Sensitive Grammar Induction to Transfer Learning

[...]

Eray Özkural

01 Aug 2014

TL;DR: This work generalizes Solomonoff’s stochastic context-free grammar induction method to context-sensitive grammars, and applies it to transfer learning problem by means of an efficient update algorithm.

...read moreread less

Abstract: We generalize Solomonoff’s stochastic context-free grammar induction method to context-sensitive grammars, and apply it to transfer learning problem by means of an efficient update algorithm. The stochastic grammar serves as a guiding program distribution which improves future probabilistic induction approximations by learning about the training sequence of problems. Stochastic grammar is updated via extrapolating from the initial grammar and the solution corpus. We introduce a data structure to represent derivations and introduce efficient algorithms to compute an updated grammar which modify production probabilities and add new productions that represent past solutions.

...read moreread less