scispace - formally typeset
Search or ask a question

Showing papers on "Formal grammar published in 2008"


Journal ArticleDOI
Joakim Nivre1
TL;DR: This article presents a general framework for describing and analyzing algorithms for deterministic incremental dependency parsing, formalized as transition systems and shows that all four algorithms give competitive accuracy, although the non-projective list-based algorithm generally outperforms the projective algorithms for languages with a non-negligible proportion of non- projective constructions.
Abstract: Parsing algorithms that process the input from left to right and construct a single derivation have often been considered inadequate for natural language parsing because of the massive ambiguity typically found in natural language grammars. Nevertheless, it has been shown that such algorithms, combined with treebank-induced classifiers, can be used to build highly accurate disambiguating parsers, in particular for dependency-based syntactic representations. In this article, we first present a general framework for describing and analyzing algorithms for deterministic incremental dependency parsing, formalized as transition systems. We then describe and analyze two families of such algorithms: stack-based and list-based algorithms. In the former family, which is restricted to projective dependency structures, we describe an arc-eager and an arc-standard variant; in the latter family, we present a projective and a non-projective variant. For each of the four algorithms, we give proofs of correctness and complexity. In addition, we perform an experimental evaluation of all algorithms in combination with SVM classifiers for predicting the next parsing action, using data from thirteen languages. We show that all four algorithms give competitive accuracy, although the non-projective list-based algorithm generally outperforms the projective algorithms for languages with a non-negligible proportion of non-projective constructions. However, the projective algorithms often produce comparable results when combined with the technique known as pseudo-projective parsing. The linear time complexity of the stack-based algorithms gives them an advantage with respect to efficiency both in learning and in parsing, but the projective list-based algorithm turns out to be equally efficient in practice. Moreover, when the projective algorithms are used to implement pseudo-projective parsing, they sometimes become less efficient in parsing (but not in learning) than the non-projective list-based algorithm. Although most of the algorithms have been partially described in the literature before, this is the first comprehensive analysis and evaluation of the algorithms within a unified framework.

489 citations


Journal ArticleDOI
Mehdi Dastani1
TL;DR: A BDI-based agent-oriented programming language that facilitates the implementation of multi-agent systems consisting of individual agents that may share and access external environments, called 2APL (A Practical Agent Programming Language).
Abstract: This article presents a BDI-based agent-oriented programming language, called 2APL (A Practical Agent Programming Language). This programming language facilitates the implementation of multi-agent systems consisting of individual agents that may share and access external environments. It realizes an effective integration of declarative and imperative style programming by introducing and integrating declarative beliefs and goals with events and plans. It also provides practical programming constructs to allow the generation, repair, and (different modes of) execution of plans based on beliefs, goals, and events. The formal syntax and semantics of the programming language are given and its relation with existing BDI-based agent-oriented programming languages is discussed.

394 citations


Journal ArticleDOI
TL;DR: A general method is presented for comparative semantics of FDs grounded in Harel and Rumpe's guidelines for defining formal visual languages and in Krogstie et al.'s semiotic quality framework.
Abstract: Feature diagrams (FDs) are a family of popular modelling languages, mainly used for managing variability in software product lines. FDs were first introduced by Kang et al. as part of the feature-oriented domain analysis (FODA) method back in 1990. Since then, various extensions of FODA FDs were devised to compensate for purported ambiguity and lack of precision and expressiveness. Recently, the authors surveyed these notations and provided them with a generic formal syntax and semantics, called free feature diagrams (FFDs). The authors also started investigating the comparative semantics of FFD with respect to other recent formalisations of FD languages. Those results were targeted at improving the quality of FD languages and making the comparison between them more objective. The previous results are recalled in a self-contained, better illustrated and better motivated fashion. Most importantly, a general method is presented for comparative semantics of FDs grounded in Harel and Rumpe's guidelines for defining formal visual languages and in Krogstie et al.'s semiotic quality framework. This method being actually applicable to other visual languages, FDs are also used as a language (re)engineering exemplar throughout the paper.

72 citations


Book ChapterDOI
28 Sep 2008
TL;DR: This paper proposes a novel approach for defining the formal semantics of a modeling language based on the Alloy language, and offers the prospect of making formal definitions easier, hopefully paving the way for a wider adoption of formal techniques in the definition of modeling languages.
Abstract: To define the formal semantics of a modeling language, one normally starts from the abstract syntax and then defines the static semantics and dynamic semantics. Having a formal semantics is important for reasoning about the language but also for building tools for the language. In this paper we propose a novel approach for this task based on the Alloy language. With the help of a concrete example language, we contrast this approach with traditional methods based on formal languages, type checking, meta-modeling and operational semantics. Although both Alloy and traditional techniques yield a formal semantics of the language, the Alloy-based approach has two key advantages: a uniform notation, and immediate automatic analyzability using the Alloy analyzer. Together with the simplicity of Alloy, our approach offers the prospect of making formal definitions easier, hopefully paving the way for a wider adoption of formal techniques in the definition of modeling languages.

44 citations


Book ChapterDOI
17 Mar 2008

31 citations



Proceedings Article
13 Jul 2008
TL;DR: A time- and space-efficient incremental arc-consistency algorithm for context-free grammars that shows how to filter a sequence of monotonically tightening problems in cubic time and quadratic space.
Abstract: With the introduction of constraints based on finite automata a new line of research has opened where constraints are based on formal languages. Recently, constraints based on grammars higher up in the Chomsky hierarchy were introduced. We devise a time- and space-efficient incremental arc-consistency algorithm for context-free grammars. Particularly, we show how to filter a sequence of monotonically tightening problems in cubic time and quadratic space. Experiments on a scheduling problem show orders of magnitude improvements in time and space consumption.

26 citations


Journal ArticleDOI
TL;DR: In this article, the computational power of pure insertion grammars of weight 3 was studied and it was shown that a pure insertion grammar with weight 3 can characterize all recursively enumerable languages.

23 citations


01 Jan 2008
TL;DR: This report defines the syntax and formal semantics of the Chi 2.0 formalism, a process algebra that combines a high expressivity and ease of modeling with a formal semantics, and introduces new syntax to ensure better readability and easier modeling.
Abstract: This report defines the syntax and formal semantics of the Chi 2.0 formalism. The Chi formalism integrates concepts from dynamics and control theory with concepts from computer science, in particular from process algebra and hybrid automata. It combines a high expressivity and ease of modeling with a formal semantics. The Chi language is defined by means of an abstract and a concrete syntax. The purpose of the abstract syntax is to allow a straightforward definition of the structured operational semantics (SOS), which associates a hybrid transition system with a Chi process. The Chi Semantics is compositional, and bisimulation is a congruence for all operators. The concrete syntax offers modeling equivalents for the elements of the abstract syntax, and it introduces new syntax to ensure better readability and easier modeling. The meaning of the concrete syntax is defined by means of a mapping to the abstract syntax. The Chi language provides among others discrete, continuous, and algebraic variables, and equation process terms for modeling differential algebraic equations (DAEs), including fully implicit or switched DAEs. Steady state initialization can be specified, and higher index DAEs in Chi are equivalent to the corresponding index 1 DAEs, obtained after differentiation of the hidden constraints. The invariant process term in Chi corresponds to invariants in hybrid automata The following operators are provided (among others): the parallel composition, alternative composition (choice), and sequential composition operators; and the recursion scope operator for modeling automata. The parallel composition operator allows shared variables, shared synchronizing and non-synchronizing action labels, and shared CSP channels for synchronous communication. Two main ways of expressing urgency are provided: First, action labels and channels can be declared as urgent. Delaying is possible only if, and for as long as no urgent actions are enabled. Synchronizing actions are enabled only when the guards of all participating actions in a parallel composition are enabled. Second, urgency can be defined locally by means of the time can progress (tcp) process term, which allows delays for as long as the tcp predicate is true. Scope operators are available for hierarchical modeling. They are used to declare local variables, local action labels, and local channels. Process definition and instantiation provide process re-use and encapsulation. Hybrid automata and networks of hybrid automata can easily be expressed in Chi. Since Chi is a process algebra, its operators can be arbitrarily combined, resulting in a high modeling flexibility.

18 citations


Patent
Lei Fu, Jin Huang, Zhongjun He, Yajuan Lu, Qun Liu 
25 Jun 2008
TL;DR: In this article, a translation method merging sentence pattern template and statistic machine translation technology is presented, which solves the problem that statistical machine translation systems cannot translate the sentences having fixed sentence pattern structures very well.
Abstract: The present invention discloses a translation method merging sentence pattern template and statistic machine translation technology. The method comprises that: A. a sentence pattern template is configured, and a sentence pattern template library is established; B. the configured sentence pattern template is used to match an input source language; if the matching succeeds, the input source language is transformed into a sentence containing source language words and target language words, and Step D is executed, otherwise Step C is executed; C. the input source language is split into clauses according to punctuation marks, and the split clauses are matched and transformed into sentences containing source language words and target language words; D. the sentences containing source language words and target language words gotten by matching are output to a statistic machine translation system to be translated, and then translation results are gotten. The present invention solves the problem that statistic machine translation systems cannot translate the sentences having fixed sentence pattern structures very well, and ensures that the translation for the sentences having fixed sentence pattern structures is smoother.

18 citations


Proceedings ArticleDOI
30 Jun 2008
TL;DR: The new version of a tool to assist in teaching formal languages and automata theory can simulate as well push-down automata and Turing machines.
Abstract: In this paper we present the new version of a tool to assist in teaching formal languages and automata theory. In the previous version the tool provided algorithms for regular expressions, finite automata and context free grammars. The new version can simulate as well push-down automata and Turing machines.

Book ChapterDOI
07 Jul 2008
TL;DR: This work brings together the most pertinent features of both Euler diagram-based notations, creating a new hybrid notation called Conceptual Spider Diagrams, which provides formal syntax and semantics of this new notation, together with examples demonstrating its capabilities.
Abstract: Conceptual Graphs are a common knowledge representation system which are used in conjunction with an explicit type hierarchy of the domain. However, this means the interpretation of information expressed in conceptual graphs requires the combined use of information from different sources, which is not always an easy cognitive task. Though it is possible to explicitly represent the type hierarchy with Conceptual Graphs with Cuts, this less natural expression of the type hierarchy information is not as easy to interpret and soon takes up a lot of space. Now, one of the main advantages of Euler diagram-based notations like Spider diagrams is the natural diagrammatic representation of hierarchies. However, Spider diagrams lack facilities such as the ability to represent general relationships between objects which is necessary for knowledge representation tasks. We bring together the most pertinent features of both of these notations, creating a new hybrid notation called Conceptual Spider Diagrams. We provide formal syntax and semantics of this new notation, together with examples demonstrating its capabilities.

Book ChapterDOI
07 Apr 2008
TL;DR: This work embodying the Chomsky hierarchy, itself, within an infinite complete lattice of algebras that ranges from dioids to quantales, and includes many of the forms of KleeneAlgebra that have been considered in the literature.
Abstract: The algebraic approach to formal language and automata theory is a continuation of the earliest traditions in these fields which had sought to represent languages, translations and other computations as expressions (eg regular expressions) in suitably-defined algebras; and grammars, automata and transitions as relational and equational systems over these algebras, that have such expressions as their solutions The possibility of a comprehensive foundation cast in this form, following such results as the algebraic reformulation of the Parikh Theorem, has been recognized by the Applications of Kleene Algebra (AKA) conference from the time of its inception in 2001 Here, we take another step in this direction by embodying the Chomsky hierarchy, itself, within an infinite complete lattice of algebras that ranges from dioids to quantales, and includes many of the forms of Kleene algebras that have been considered in the literature A notable feature of this development is the generalization of the Chomsky hierarchy, including type 1 languages, to arbitrary monoids

Journal ArticleDOI
27 Feb 2008
TL;DR: The database helps treebank annotators and grammar developers to share precise knowledge about the grammatical status of words that constitute the treebank, allowing for consistent large-scale treebanking and grammar development.
Abstract: We have constructed a large scale and detailed database of lexical types in Japanese from a treebank that includes detailed linguistic information. The database helps treebank annotators and grammar developers to share precise knowledge about the grammatical status of words that constitute the treebank, allowing for consistent large-scale treebanking and grammar development. In addition, it clarifies what lexical types are needed for precise Japanese NLP on the basis of the treebank. In this paper, we report on the motivation and methodology of the database construction.

Book ChapterDOI
16 Sep 2008
TL;DR: It is proved that if the underlying derivation mode is the t-mode derivation, then some variants of these CD grammar systems determine the class of random context ET0L languages.
Abstract: We introduce some new cooperation protocols for cooperating distributed (CD) grammar systems. They depend on the number of different nonterminals present in the sentential form if a component has finished its work, i.e. on the final competence or efficiency of the grammar on the string (the competence is large if the number of the different nonterminals is small). We prove that if the underlying derivation mode is the t-mode derivation, then some variants of these systems determine the class of random context ET0L languages. If these CD grammar systems use the kstep limited derivations (for ki¾? 3) as underlying derivations, they are able to generate any recursively enumerable language.

Journal ArticleDOI
TL;DR: The results of a 24-month effort to reduce the SequenceL language to a very small set of primitives are introduced, including comparisons with other languages, the formal syntax and semantics, and the traces of several example problems run with a prototype interpreter developed in 2006.
Abstract: SequenceL is a concise, high-level language with a simple semantics that provides for the automatic derivation of many iterative and parallel control structures. The semantics repeatedly applies a “Normalize-Transpose-Distribute” operation to functions and operators until base cases are discovered. Base cases include the grounding of variables and the application of built-in operators to operands of appropriate types. This article introduces the results of a 24-month effort to reduce the language to a very small set of primitives. Included are comparisons with other languages, the formal syntax and semantics, and the traces of several example problems run with a prototype interpreter developed in 2006.

Proceedings ArticleDOI
21 Sep 2008
TL;DR: An introduction to the Forlan formal language theory toolset, which was designed to facilitate sophisticated experimentation with formal languages, and an extended example of the kind of experimentation that Forlan makes possible.
Abstract: We give an introduction to the Forlan formal language theory toolset, which was designed to facilitate sophisticated experimentation with formal languages. Forlan is embedded in the functional programming language Standard ML, a language whose notation and concepts are similar to those of mathematics. It is strongly typed and interactive, properties that help make experimentation robust, simple and enjoyable. We give an extended example of the kind of experimentation that Forlan makes possible. It involves the use of closure properties/algorithms for regular languages/finite automata and a "difference" function on strings of zeros and ones.

Book ChapterDOI
01 Jun 2008
TL;DR: A new language definition model is introduced and investigated, based on agreement or consensus between similar strings, which includes the regular languages and also interesting non-semilinear languages.
Abstract: A new language definition model is introduced and investigated, based on agreement or consensus between similar strings. Considering a regular set of strings over a bipartite alphabet made by pairs of unmarked/marked symbols, a match relation is introduced, in order to specify when such strings agree. Then a regular set over the bipartite alphabet can be interpreted as defining another language over the unmarked alphabet, called the consensual language. A string is in the consensual languages if a set of corresponding matching strings is in the original language. The family defined by this approach includes the regular languages and also interesting non-semilinear languages. The word problem can be solved in polynomial time, using a multi-counter machine. Closure properties of consensual languages are proved for intersection with regular sets and inverse alphabetical homomorphism.

Journal ArticleDOI
TL;DR: It is shown that GSPML is well-defined with structural operational semantics and a hypergraph grammar syntax and is presented as a solution that satisfies all of the pragmatic criteria.
Abstract: Existing visual modeling paradigms do not adequately cover the visual modeling of security protocols: sequences of interactions between principals in a security system. A visual formalism for security protocol modeling should not only be well-defined but also satisfy certain pragmatic criteria: support for compositional, comprehensive, laconic, and lucid models. Candidate techniques from the OMG's Model Driven Architecture, based largely on UML 2.0, lack a formal syntax and semantics. Well-defined visual formalisms outside of UML have shortcomings with respect to one or more of the pragmatic criteria. We present the GSPML visual formalism as a solution that satisfies all of the pragmatic criteria. We show that GSPML is well-defined with structural operational semantics and a hypergraph grammar syntax.

Journal ArticleDOI
TL;DR: This paper demonstrates that if a scattered context grammar generates its sentences in this way, it can be converted to a scattered Context Grammar without erasing productions; in general, however, this is not possible.
Abstract: A scattered context grammar erases nonterminals in a generalized k-limited way in a successful derivation, where k is a positive integer, if in every sentential form of a derivation, each of its substrings consisting of nonterminals from which the grammar derives empty strings is of length k or less. This paper demonstrates that if a scattered context grammar generates its sentences in this way, it can be converted to a scattered context grammar without erasing productions; in general, however, this is not possible.

Journal ArticleDOI
TL;DR: It is proved that every recursively enumerable language is generated by a multi-parallel grammar with no more than seven nonterminals and four selectors of length five.

01 Jan 2008
TL;DR: This paper deals mainly with iconic gesture in two-agent route description dialogue and focuses largely on the interface of word semantics and gesture in the Bielefeld Speech-and-Gesture-Alignment Corpus, where ∗MM stands for multi-modal.
Abstract: This paper deals mainly with iconic gesture in two-agent route description dialogue and focuses largely on the interface of word semantics and gesture. The modelling tools used come from formal semantics and pragmatics. The empirical background of the study is a partly annotated corpus of ca 5.000 gestures collected in the Bielefeld Speech-and-Gesture-Alignment Corpus (SAGA). The approach taken is entirely new: an interface comprising word meaning and gesture meaning is constructed, the point of contact being the temporal overlap between gesture and speech in the annotated data. Gesture meaning is computed via a mapping rep from the set of annotation predicates onto a meaning representation. There is a discussion concerning the trade-off between context-free vs. contextdependent word meaning and gesture meaning. The interfaced speech-gesture meaning is represented in a dynamic semantics format easily grafted on a formal syntax fragment. ∗MM stands for multi-modal.

Book ChapterDOI
01 Jan 2008

Journal Article
TL;DR: In this article, both theories of explicit-implicit issues and empirical studies on formal explicit and implicit grammar teaching are presented, and some issues that require to be noticed and attached much importance to this kind of studies, expecting to provide some help to the future research and to the real classroom.
Abstract: In the field of SLA, the explicit-implicit dimension has long been one of the controversial issues and focuses for researchers. It provides relatively fresh theoretical as well as empirical view angle to formal grammar instruction. This paper overviews both theories of explicit-implicit issues and empirical studies on formal explicit and implicit grammar teaching, and presents some issues that require to be noticed and attached much importance to this kind of studies, expecting to provide some help to the future research and to the real classroom.

Journal Article
TL;DR: The generalized scattered context grammars as discussed by the authors are based upon sequences of productions whose left-hand sides are formed by nonterminal strings, not just single nonterminals, and their derivations over sentential forms containing no more than k occurrences of non-terminals.
Abstract: The present paper introduces and discusses generalized scattered context grammars that are based upon sequences of productions whose left-hand sides are formed by nonterminal strings, not just single nonterminals. It places two restrictions on the derivations in these grammars. More specifically, let k be a positive integer. The first restriction requires that all rewritten symbols occur within the first k symbols of the first continuous block of nonterminals in the sentential form during every derivation step. The other restriction defines derivations over sentential forms containing no more than k occurrences of nonterminals. As its main result, the paper demonstrates that both restrictions decrease the generative power of these grammars to the power of context-free grammars.

01 Jan 2008
TL;DR: FOOLPROOF is intended as a component toolkit for implementation of formal languages with binding structures, which provides a coherent collection of components for many common language processing tasks, in particular those related to binding structures.
Abstract: FOOLPROOF is intended as a component toolkit for implementation of formal languages with binding structures. It provides a coherent collection of components for many common language processing tasks, in particular those related to binding structures. FOOLPROOF consists of: a meta-language for specifying signatures with variable bindings; a signature editor for constructing well-formed signatures; a small collection of interfaces for manipulating syntax trees and binding structures at various levels of detail; a set of generic components for processing syntax trees with binding structures, in particular for: copying, substitution, editing, matching, unification and rewriting; a generator which maps signature specifications to signature-specific classes. FOOLPROOF is being implemented in Object Pascal and will eventually take the form of a component library for the Delphi environment.

Book ChapterDOI
01 Jan 2008

DOI
01 Jan 2008
TL;DR: Differences in the methods of instruction did not lead to a difference in the participants' attitudes about error correction, and learners' and teachers' views about these two were close in many respects; however, error correction status diminished in the learners’ views as they improved their proficiency levels.
Abstract: Grammar instruction and error correction are among the most hotly debated issues in second as well as foreign language education. Second language researchers and language educators have expressed different and sometimes contradictory ideas about them. Some believe error correction and grammar instruction are not only beneficial, but they are also necessary. Some others believe that only appropriate incorporation of them in the syllabus can lead to improvement in learning. And still a third group conceives of them as a waste of time and detrimental to the learning process. To gain a better understanding of teachers' and learners' perceptions regarding error correction and the role of formal grammar instruction on learning, opinions of 51 teachers and 627 adolescent and adult learners were surveyed by means of two equivalent questionnaires. The participants received two different kinds of treatment in terms of materials, grammar instruction and error correction moves. In one group, learners received more explicit grammar instruction and systematic error correction, while in the other group the focus was on meaning and no systematic correction was provided. The analysis of the obtained data from the questionnaires revealed that differences in the methods of instruction did not lead to a difference in the participants' attitudes about error correction and/or grammar instruction on learning. Also learners' and teachers' views about these two were close in many respects; however, error correction status diminished in the learners’ views as they improved their proficiency levels. On the other hand, more proficient learners considered more credence for grammar instruction in their learning.

Proceedings ArticleDOI
17 Jun 2008
TL;DR: A formal syntax and operational semantics for activity diagrams to allow for fully executable models and inspired by the scenario-based language of live sequence charts are proposed including a distinction between possible and mandatory behavior.
Abstract: Errors, inconsistences, incompletenesses and ambiguities in the requirements specification are major reasons for the failure of IT projects. Since the new major version 2 of the UML, the suitability of activity diagrams for modeling requirements has increased significantly. UML 2 activity diagrams are based upon a completely reengineered metamodel including many new features and an improved semantic precision. We provide a formal syntax and operational semantics for activity diagrams to allow for fully executable models. Inspired by the scenario-based language of live sequence charts, some extensions for activity diagrams are proposed including a distinction between possible and mandatory behavior. The proposed semantics paves the way for formal reasoning and tool development that allows for early prototyping and validation by simulation.

Journal Article
TL;DR: In this paper, a formal grammar of generics is proposed to make security specifications more precise and coherent, allowing to reach the assurance level more easily, and this approach allows to create an IT security development tool in which the specification means are implemented as the design library elements.
Abstract: The paper deals with the modelling of IT (Information Technology) security development process to be compliant with the Common Criteria (ISO/IEC 15408) family of standards. The paper concerns a more extensive project of the IT security development framework (ITSDF) but special attention is paid to improving and extending the means used to build the IT security specification. A dedicated language is proposed to define specification means for development stages other than the security requirement elaboration stage for which the standard does not provide specification means. The proposed means, called “enhanced generics”, can be used to specify items for the security environment, objectives, environmental requirements and the security functions. The enhanced generics have similar features as the components of the security requirements. They have a well defined structure and allow operations (iteration) on themselves. This solution, instead of the informal textual descriptors, ensures better preciseness of the security features description. The key element of the paper is the formal grammar of generics. It allows to derive all possible “legal” enhanced generics. They have semiformal character and features, similarly to the functional and assurance components defined by the standard to specify the security requirements. The paper, concluding an earlier informal approach, introduces syntax and semantics definitions based on the formal grammar and approach used to define the OCL language formally. The proposed means make security specifications more precise and coherent, allowing to reach the assurance level more easily. Moreover, this approach allows to create an IT security development tool in which the specification means are implemented as the design library elements.