scispace - formally typeset
Search or ask a question

Showing papers on "Formal language published in 2004"


Book
01 Jan 2004
TL;DR: This book describes applications in databases, complexity theory, and formal languages, as well as other branches of computer science, and highlights the computer science aspects of the subject.
Abstract: Emphasizes the computer science aspects of the subject. Details applications in databases, complexity theory, and formal languages, as well as other branches of computer science.

977 citations


Journal ArticleDOI
28 Sep 2004
TL;DR: A finer-grained concurrent model, the mK-calculus, is considered, where interactions have to be at most binary, and it is shown how to embed the coarser- grained language in the latter, a properly which the authors call self-assembly.
Abstract: A language of formal proteins, the K-calculus, is introduced. Interactions are modeled at the domain level, bonds are represented by means of shared names, and reactions are required to satisfy a causality requirement of monotonicity.An example of a simplified signalling pathway is introduced to illustrate how standard biological events can be expressed in our protein language. A more comprehensive example, the lactose operon, is also developed, bringing some confidence in the formalism considered as a modeling language.Then a finer-grained concurrent model, the mK-calculus, is considered, where interactions have to be at most binary. We show how to embed the coarser-grained language in the latter, a properly which we call self-assembly.Finally we show how the finer-grained language can itself be encoded in π-calculus, a standard foundational language for concurrency theory.

550 citations


Proceedings ArticleDOI
01 Jan 2004
TL;DR: PEGs address frequently felt expressiveness limitations of CFGs and REs, simplifying syntax definitions and making it unnecessary to separate their lexical and hierarchical components, and are here proven equivalent in effective recognition power.
Abstract: For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols The power of generative grammars to express ambiguity is crucial to their original purpose of modelling natural languages, but this very power makes it unnecessarily difficult both to express and to parse machine-oriented languages using CFGs Parsing Expression Grammars (PEGs) provide an alternative, recognition-based formal foundation for describing machine-oriented syntax, which solves the ambiguity problem by not introducing ambiguity in the first place Where CFGs express nondeterministic choice between alternatives, PEGs instead use prioritized choice PEGs address frequently felt expressiveness limitations of CFGs and REs, simplifying syntax definitions and making it unnecessary to separate their lexical and hierarchical components A linear-time parser can be built for any PEG, avoiding both the complexity and fickleness of LR parsers and the inefficiency of generalized CFG parsing While PEGs provide a rich set of operators for constructing grammars, they are reducible to two minimal recognition schemas developed around 1970, TS/TDPL and gTS/GTDPL, which are here proven equivalent in effective recognition power

467 citations


Journal ArticleDOI
TL;DR: This paper starts with a gradual introduction to GF, going through a sequence of simpler formalisms till the full power is reached, followed by a systematic presentation of the GF formalism and outlines of the main algorithms: partial evaluation and parser generation.
Abstract: Grammatical Framework (GF) is a special-purpose functional language for defining grammars. It uses a Logical Framework (LF) for a description of abstract syntax, and adds to this a notation for defining concrete syntax. GF grammars themselves are purely declarative, but can be used both for linearizing syntax trees and parsing strings. GF can describe both formal and natural languages. The key notion of this description is a grammatical object, which is not just a string, but a record that contains all information on inflection and inherent grammatical features such as number and gender in natural languages, or precedence in formal languages. Grammatical objects have a type system, which helps to eliminate run-time errors in language processing. In the same way as a LF, GF uses dependent types in abstract syntax to express semantic conditions, such as well-typedness and proof obligations. Multilingual grammars, where one abstract syntax has many parallel concrete syntaxes, can be used for reliable and meaning-preserving translation. They can also be used in authoring systems, where syntax trees are constructed in an interactive editor similar to proof editors based on LF. While being edited, the trees can simultaneously be viewed in different languages. This paper starts with a gradual introduction to GF, going through a sequence of simpler formalisms till the full power is reached. The introduction is followed by a systematic presentation of the GF formalism and outlines of the main algorithms: partial evaluation and parser generation. The paper concludes by brief discussions of the Haskell implementation of GF, existing applications, and related work.

260 citations


Journal ArticleDOI
Oliver Board1
TL;DR: This work defines a formal language in which belief and belief revision statements can be expressed in dynamic models of interactive reasoning, and presents soundness and completeness theorems linking the two.

161 citations


Proceedings ArticleDOI
17 May 2004
TL;DR: The goal is to formalize the study of template engines, thus, providing a common nomenclature, a means of classifying template generational power, and a way to leverage interesting results from formal language theory.
Abstract: The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development. This situation is due mostly to a lack of formal definition of separation and fear that enforcing separation emasculates a template's power. I show that not only is strict separation a worthy design principle, but that we can enforce separation while providing a potent template engine. I demonstrate my StringTemplate engine, used to build jGuru.com and other commercial sites, at work solving some nontrivial generational tasks.My goal is to formalize the study of template engines, thus, providing a common nomenclature, a means of classifying template generational power, and a way to leverage interesting results from formal language theory. I classify three types of restricted templates analogous to Chomsky's type 1..3 grammar classes and formally define separation including the rules that embody separation.Because this paper provides a clear definition of model-view separation, template engine designers may no longer blindly claim enforcement of separation. Moreover, given theoretical arguments and empirical evidence, programmers no longer have an excuse to entangle model and view.

155 citations


Journal ArticleDOI
TL;DR: The paper studies failure diagnosis of discrete-event systems (DESs) with linear-time temporal logic (LTL) specifications, which make the specification specifying process easier and more user-friendly than the formal language/automata-based specifications.
Abstract: The paper studies failure diagnosis of discrete-event systems (DESs) with linear-time temporal logic (LTL) specifications. The LTL formulas are used for specifying failures in the system. The LTL-based specifications make the specification specifying process easier and more user-friendly than the formal language/automata-based specifications; and they can capture the failures representing the violation of both liveness and safety properties, whereas the prior formal language/automaton-based specifications can capture the failures representing the violation of only the safety properties (such as the occurrence of a faulty event or the arrival at a failed state). Prediagnosability and diagnosability of DESs in the temporal logic setting are defined. The problem of testing prediagnosability and diagnosability is reduced to the problem of model checking. An algorithm for the test of prediagnosability and diagnosability, and the synthesis of a diagnoser is obtained. The complexity of the algorithm is exponential in the length of each specification LTL formula, and polynomial in the number of system states and the number of specifications. The requirement of nonexistence of unobservable cycles in the system, which is needed for the diagnosis algorithms in prior methods to work, is relaxed. Finally, a simple example is given for illustration.

153 citations


Journal Article
TL;DR: The current implementation of Obol finds unique non-trivial definitions for around half of the terms in the GO, and has been used to find 223 missing relationships, which have since been added to the ontology.
Abstract: Ontologies are intended to capture and formalize a domain of knowledge. The ontologies comprising the Open Biological Ontologies (OBO) project, which includes the Gene Ontology (GO), are formalizations of various domains of biological knowledge. Ontologies within OBO typically lack computable definitions that serve to differentiate a term from other similar terms. The computer is unable to determine the meaning of a term, which presents problems for tools such as automated reasoners. Reasoners can be of enormous benefit in managing a complex ontology. OBO term names frequently implicitly encode the kind of definitions that can be used by computational tools, such as automated reasoners. The definitions encoded in the names are not easily amenable to computation, because the names are ostensibly natural language phrases designed for human users. These names are highly regular in their grammar, and can thus be treated as valid sentences in some formal or computable language. With a description of the rules underlying this formal language, term names can be parsed to derive computable definitions, which can then be reasoned over. This paper describes the effort to elucidate that language, called Obol, and the attempts to reason over the resulting definitions. The current implementation finds unique non-trivial definitions for around half of the terms in the GO, and has been used to find 223 missing relationships, which have since been added to the ontology. Obol has utility as an ontology maintenance tool, and as a means of generating computable definitions for a whole ontology. The software is available under an open-source license from: http://www.fruitfly. org/∼cjm/obol.

113 citations


Journal ArticleDOI
TL;DR: Oobol as discussed by the authors is an ontology maintenance tool for generating computable definitions for a whole ontology, which can be used as a means of reasoning over the resulting definitions.
Abstract: Ontologies are intended to capture and formalize a domain of knowledge. The ontologies comprising the Open Biological Ontologies (OBO) project, which includes the Gene Ontology (GO), are formalizations of various domains of biological knowledge. Ontologies within OBO typically lack computable definitions that serve to differentiate a term from other similar terms. The computer is unable to determine the meaning of a term, which presents problems for tools such as automated reasoners. Reasoners can be of enormous benefit in managing a complex ontology. OBO term names frequently implicitly encode the kind of definitions that can be used by computational tools, such as automated reasoners. The definitions encoded in the names are not easily amenable to computation, because the names are ostensibly natural language phrases designed for human users. These names are highly regular in their grammar, and can thus be treated as valid sentences in some formal or computable language.With a description of the rules underlying this formal language, term names can be parsed to derive computable definitions, which can then be reasoned over. This paper describes the effort to elucidate that language, called Obol, and the attempts to reason over the resulting definitions. The current implementation finds unique non-trivial definitions for around half of the terms in the GO, and has been used to find 223 missing relationships, which have since been added to the ontology. Obol has utility as an ontology maintenance tool, and as a means of generating computable definitions for a whole ontology.The software is available under an open-source license from: http://www.fruitfly. org/~cjm/obol. Supplementary material for this article can be found at: http://www. interscience.wiley.com/jpages/1531-6912/suppmat.

112 citations


Journal ArticleDOI
TL;DR: It is proved that checking the existence of a function which, given the n observations corresponding to a behavior ρ ∈ L, decides whether ρ is in K or not is undecidable, which is used to show undecidability of a decentralized supervisory control problem in the discrete event system framework.

101 citations


Book ChapterDOI
04 Jul 2004
TL;DR: By using a high-performance implementation of rewriting logic such as Maude, a language’s formal specification can be automatically transformed into an efficient interpreter and several limitations of both SOS and equational semantics are overcome.
Abstract: Formal semantic definitions of concurrent languages, when specified in a well-suited semantic framework and supported by generic and efficient formal tools, can be the basis of powerful software analysis tools. Such tools can be obtained for free from the semantic definitions; in our experience in just the few weeks required to define a language’s semantics even for large languages like Java. By combining, yet distinguishing, both equations and rules, rewriting logic semantic definitions unify both the semantic equations of equational semantics (in their higher-order denotational version or their first-order algebraic counterpart) and the semantic rules of SOS. Several limitations of both SOS and equational semantics are thus overcome within this unified framework. By using a high-performance implementation of rewriting logic such as Maude, a language’s formal specification can be automatically transformed into an efficient interpreter. Furthermore, by using Maude’s breadth first search command, we also obtain for free a semi-decision procedure for finding failures of safety properties; and by using Maude’s LTL model checker, we obtain, also for free, a decision procedure for LTL properties of finite-state programs. These possibilities, and the competitive performance of the analysis tools thus obtained, are illustrated by means of a concurrent Caml-like language; similar experience with Java (source and JVM) programs is also summarized.

Journal ArticleDOI
01 Apr 2004
TL;DR: It is argued that some preliminary “good properties” obtained may plead in favour of the use of analogy in the study of formal languages in relationship with natural language.
Abstract: In this paper, we advocate a study of analogies between strings of symbols for their own sake. We show how some sets of strings, i.e., some formal languages, may be characterized by use of analogies. We argue that some preliminary “good properties” obtained may plead in favour of the use of analogy in the study of formal languages in relationship with natural language.

Journal ArticleDOI
TL;DR: An outline of a formalization of classes of information fusion systems in terms of category theory and formal languages is provided, which should lead to the development of tools that could be used by software engineers to formally derive designs of fusion systems.

DOI
01 Sep 2004
TL;DR: A short introduction to term rewriting and to termination of rewriting is given and a number of contributions to this field are presented, which can be categorized into the following categories: proposing new methods for proving termination and refining the existing ones, developing a tool for provingtermination and proposing a methodology and tools for certification of termination proofs.
Abstract: In programming, termination of a program/algorithm means that its evaluation will eventually terminate, regardless of the input it receives It is an important property and is required for total correctness In general the problem is undecid- able Term rewriting is a formal way of specifying computation and as such it can be seen as a generic model for programming languages Termination, here meaning lack of infinite sequences, is a well-studied concept in this context There exist a number of methods for proving termination as well as a number of tools for doing that automatically There is an on-going work on application of this methodology and tools to proving termination of programs in actual programming languages In this thesis we first give a short introduction to term rewriting and to termination of rewriting Subsequently we present a number of contributions to this field, which can be categorized into the following categories: proposing new methods for proving termination and refining the existing ones, developing a tool for proving termination and proposing a methodology and tools for certification of termination proofs, ie, formal verification of proofs produced by the existing tools for proving termination

Book
01 Jun 2004
TL;DR: Vol 1: Algorithms Computational Complexity Distributed Computing Natural Computing Vol 2: Formal Specification Logic in Computer Science Concurrency Formal Language Theory
Abstract: Vol 1: Algorithms Computational Complexity Distributed Computing Natural Computing Vol 2: Formal Specification Logic in Computer Science Concurrency Formal Language Theory

01 Jan 2004
TL;DR: Research and applications in computer science are creating the need for precise definitions of the concepts that make up the authors' world, and the use of ontologies to specify semantics is emerging as a promising technique for software integration.
Abstract: Research and applications in computer science are creating the need for precise definitions of the concepts that make up our world. Web searching is handicapped by the limitations of specifying search criteria in terms of keywords rather than concepts. Automated natural language understanding, both oral and written, is severely limited by the ambiguity of language. Software engineering is limited by the need for engineers to define concepts to model the world. Computers exist in a world similar to Europe in the Middle Ages in which tiny principalities each had their own language or dialect. Worse yet, these dialects are impoverished and they enable the computers to say only very specific and limited things. In order to enable continued progress in ecommerce and software integration, we must give computers a common language with a richness that more closely approaches that of human language. Integrating the meaning (or semantics) of databases and programs is crucial for creating software that is reliable and scalable. The use of ontologies to specify semantics is emerging as a promising technique for software integration. Creators of different components often assume they understand the terms in the same way. The reality is that is rarely the case. Even the bestdocumented code has implicit assumptions and ambiguity in the definition and usage of terms. Research in several areas including computer science, artificial intelligence, philosophy, library science and linguistics are helping to meet these needs. All these fields have experience with creating precise and standard descriptions and terminology for the things that make up our world. (Sowa 2000) Current research is hampered by several issues. Computer scientists and philosophers lack consensus in their communities for creating the very large, wide-coverage ontologies that are needed, although they have the necessary formal languages to do so. Librarians and linguists have the charter to create large ontologies but those ontologies have typically lacked the formal definitions needed for reasoning and decisionmaking. In fact, it is probably fair to claim that to date no group has taken full advantage of the vast body of historical work in this area One group that has developed a large formal ontology is Cycorp. However, ithas released only a small part to the public, retains proprietary rights to the vast bulk of their ontology, (Lenat 1995) and the contents of the ontology have not been subject to extensive peer review

Proceedings ArticleDOI
19 Jul 2004
TL;DR: It is argued that any pure design methodology will face insurmountable difficulties in todayýs open and complex MAS and recommended instead a methodology based on experimental method ¿ scientific foundations for MAS construction and control.
Abstract: We highlight the limitations of formal methods by exhibiting two results in recursive function theory: that there is no effective means of finding a program that satisfies a given formal specification; or checking that a program meets a specification. We exhibit a simple MAS which has all the power of a Turing machine. We argue that any pure design methodology will face insurmountable difficulties in today's open and complex MAS. We recommend instead a methodology based on experimental method - scientific foundations for MAS construction and control.

Journal ArticleDOI
TL;DR: It is shown that different characterizations of pair of words having the same Parikh matrix have different meanings according to the type of language they are spoken in.

Journal ArticleDOI
TL;DR: The usual regular constructs (concatenation, etc.) are supplemented with superposition, inducing a useful notion of entailment, distinct from that given by models of predicate logic.
Abstract: Events in natural language semantics are characterized in terms of regular languages, each string in which can be regarded as a temporal sequence of observations. The usual regular constructs (concatenation, etc.) are supplemented with superposition, inducing a useful notion of entailment, distinct from that given by models of predicate logic.

01 May 2004
TL;DR: This work presents the first synthesis of synchronous tree-substitution and -adjoining grammars, a framework of bimorphisms as the generalizing formalism in which all can be embedded.
Abstract: Tree transducer formalisms were developed in the formal language theory community as generalizations of finite-state transducers from strings to trees. Independently, synchronous tree-substitution and -adjoining grammars arose in the computational linguistics community as a means to augment strictly syntactic formalisms to provide for parallel semantics. We present the first synthesis of these two independently developed approaches to specifying tree relations, unifying their respective literatures for the first time, by using the framework of bimorphisms as the generalizing formalism in which all can be embedded. The central result is that synchronous treesubstitution grammars are equivalent to bimorphisms where the component homomorphisms are linear and complete.


Journal ArticleDOI
TL;DR: It is shown that deletion along trajectories serves as an inverse to shuffle on trajectories, which leads to results on the decidability of certain language equations, including those of the form LTX = R, where L,R are regular languages and X is unknown.

Book ChapterDOI
01 Jan 2004
TL;DR: This chapter begins with an introduction to the main concepts of formal methods, while the use of semi formal notations and their integration with formal methods is covered.
Abstract: This chapter begins with an introduction to the main concepts of formal methods. Languages and tools for developing formal System modeis are also described, while the use of semi formal notations and their integration with formal methods is covered as well. At the end of the chapter, an overview of the current Status of formal methods in embedded System design is presented.

Journal ArticleDOI
TL;DR: Weak Type Theory is a refinement of de Bruijn's Mathematical Vernacular and hence WTT is faithful to the mathematician's language yet isformal and avoids ambiguities, and acts as an intermediary between the language of mathematicians and that of logicians.
Abstract: We provide a syntax and a derivation system for a formal language of mathematics called Weak Type Theory (WTT). We give the metatheory of WTT and a number of illustrative examples. WTT is a refinement of de Bruijn's Mathematical Vernacular (MV) and hence: – WTT is faithful to the mathematician's language yet is formal and avoids ambiguities. – WTT is close to the usual way in which mathematicians express themselves in writing. – WTT has a syntax based on linguistic categories instead of set/type theoretic constructs. More so than MV however, WTT has a precise abstract syntax whose derivation rules resemble those of modern type theory enabling us to establish important desirable properties of WTT such as strong normalisation, decidability of type checking and subject reduction. The derivation system allows one to establish that a book written in WTT is well-formed following the syntax of WTT, and has great resemblance with ordinary mathematics books. WTT (like MV) is weak as regards correctness: the rules of WTT only concern linguistic correctness, its types are purely linguistic so that the formal translation into WTT is satisfactory as a readable, well-organized text. In WTT, logico-mathematical aspects of truth are disregarded. This separates concerns and means that WTT – can be easily understood by either a mathematician, a logician or a computer scientist, and – acts as an intermediary between the language of mathematicians and that of logicians.

Journal ArticleDOI
TL;DR: A typed formal language is designed for encoding natural language expressions that can cope with phenomena such as under-specification and granularity change and be used to answer a wide range of time-related queries.
Abstract: Automatic extraction and reasoning over temporal properties in natural language discourse has not had wide use in practical systems due to its demand for a rich and compositional, yet inference-friendly, representation of time. Motivated by our study of temporal expressions from the Penn Treebank corpora, we address the problem by proposing a two-level constraint-based framework for processing and reasoning over temporal information in natural language. Within this framework, temporal expressions are viewed as partial assignments to the variables of an underlying calendar constraint system, and multiple expressions together describe a temporal constraint-satisfaction problem (TCSP). To support this framework, we designed a typed formal language for encoding natural language expressions. The language can cope with phenomena such as under-specification and granularity change. The constraint problems can be solved using various constraint propagation and search methods, and the solutions can then be used to answer a wide range of time-related queries.


Journal ArticleDOI
TL;DR: A new formalism for representation of finite languages, referred to as the class of IDL-expressions, is proposed, which combines concepts that were only considered in isolation in existing formalisms and is compared with more standard ones.
Abstract: We propose a formalism for representation of finite languages, referred to as the class of IDL-expressions, which combines concepts that were only considered in isolation in existing formalisms. The suggested applications are in natural language processing, more specifically in surface natural language generation and in machine translation, where a sentence is obtained by first generating a large set of candidate sentences, represented in a compact way, and then filtering such a set through a parser. We study several formal properties of IDL-expressions and compare this new formalism with more standard ones. We also present a novel parsing algorithm for IDL-expressions and prove a non-trivial upper bound on its time complexity.

Journal ArticleDOI
TL;DR: P/O systems are introduced to generate any recursively enumerable language and a class of languages between the context-free and context-sensitive ones is obtained.

01 Jan 2004
TL;DR: A new prototype of a semantic Service Oriented Architecture (SOA) called Spec Services, where services can register to a service manager a powerful syntactic description or even semantic description of their capabilities, to easily support integration of new additional formal languages.
Abstract: This paper describes a new prototype of a semantic Service Oriented Architecture (SOA) called Spec Services. Instead of publishing their API through a protocol like SOAP, as Web Services do, services can register to a service manager a powerful syntactic description or even semantic description of their capabilities. The client entity will then send a syntactic or semantic description of its requirements to the service manager, which will try to find an appropriate formerly registered service and bind them together. Today our service manager can deal with two languages: regular expressions, which is probably the most powerful syntactic-only description language; Prolog, which is only semantic. Nevertheless, this implementation is made, since its beginning, with evolution in mind, i.e. to easily support integration of new additional formal languages.

Journal ArticleDOI
TL;DR: An effective characterization of the “until-since hierarchy” of linear temporal logic over finite models (strings) is provided, that is, it is shown how to compute for a given temporal property of strings the minimal nesting depth in “ until” and “since” required to express it.
Abstract: We provide an effective characterization of the “until-since hierarchy” of linear temporal logic over finite models (strings), that is, we show how to compute for a given temporal property of strings the minimal nesting depth in “until” and “since” required to express it. This settles the most prominent classification problem for linear temporal logic. Our characterization of the individual levels of the “until-since hierarchy” is algebraic: for each n, we present a decidable class of finite semigroups and show that a temporal property is expressible with nesting depth at most n if and only if the syntactic semigroup of the formal language associated with the property belongs to the class provided. The core of our algebraic characterization is a new description of substitution in linear temporal logic in terms of block products of finite semigroups.