scispace - formally typeset
Search or ask a question

Showing papers on "Formal language published in 2010"


Proceedings ArticleDOI
06 Apr 2010
TL;DR: A method and a tool, called Rex, for symbolically expressing and analyzing regular expression constraints, which is implemented using the SMT solver Z3 and provides experimental evaluation of Rex.
Abstract: Constraints in form regular expressions over strings are ubiquitous. They occur often in programming languages like Perl and C#, in SQL in form of LIKE expressions, and in web applications. Providing support for regular expression constraints in program analysis and testing has several useful applications. We introduce a method and a tool called Rex, for symbolically expressing and analyzing regular expression constraints. Rex is implemented using the SMT solver Z3, and we provide experimental evaluation of Rex.

149 citations


Proceedings ArticleDOI
27 Sep 2010
TL;DR: The need for Techne is motivated, the need is introduced through examples, and its formalization is sketched.
Abstract: Techne is an abstract requirements modeling language that lays formal foundations for new modeling languages applicable during early phases of the requirements engineering process. During these phases, the requirements problem for the system-to-be is being structured, its candidate solutions described and compared in terms of how desirable they are to stakeholders. We motivate the need for Techne, introduce it through examples, and sketch its formalization.

140 citations


Proceedings Article
23 Aug 2010
TL;DR: The most mature of these novel languages are presented, show how they can balance the disadvantages of natural languages and formal languages for knowledge representation, and discuss how domain specialists can be supported writing specifications in controlled natural language.
Abstract: This paper presents a survey of research in controlled natural languages that can be used as high-level knowledge representation languages. Over the past 10 years or so, a number of machine-oriented controlled natural languages have emerged that can be used as high-level interface languages to various kinds of knowledge systems. These languages are relevant to the area of computational linguistics since they have two very interesting properties: firstly, they look informal like natural languages and are therefore easier to write and understand by humans than formal languages; secondly, they are precisely defined subsets of natural languages and can be translated automatically (and often deterministically) into a formal target language and then be used for automated reasoning. We present and compare the most mature of these novel languages, show how they can balance the disadvantages of natural languages and formal languages for knowledge representation, and discuss how domain specialists can be supported writing specifications in controlled natural language.

129 citations


Proceedings Article
11 Jul 2010
TL;DR: The Topic-Aspect Model is presented, a Bayesian mixture model which jointly discovers topics and aspects and can generate token assignments in both of these dimensions, rather than assuming words come from only one of two orthogonal models.
Abstract: This paper presents the Topic-Aspect Model (TAM), a Bayesian mixture model which jointly discovers topics and aspects. We broadly define an aspect of a document as a characteristic that spans the document, such as an underlying theme or perspective. Unlike previous models which cluster words by topic or aspect, our model can generate token assignments in both of these dimensions, rather than assuming words come from only one of two orthogonal models. We present two applications of the model. First, we model a corpus of computational linguistics abstracts, and find that the scientific topics identified in the data tend to include both a computational aspect and a linguistic aspect. For example, the computational aspect of GRAMMAR emphasizes parsing, whereas the linguistic aspect focuses on formal languages. Secondly, we show that the model can capture different viewpoints on a variety of topics in a corpus of editorials about the Israeli-Palestinian conflict. We show both qualitative and quantitative improvements in TAM over two other state-of-the-art topic models.

128 citations


Book ChapterDOI
15 Jul 2010
TL;DR: This paper presents libalf, a comprehensive, open-source library for learning formal languages libalf covers various well-known learning techniques for finite automata as well as novel learning algorithms.
Abstract: This paper presents libalf, a comprehensive, open-source library for learning formal languages libalf covers various well-known learning techniques for finite automata (eg Angluin's L*, Biermann, RPNI etc) as well as novel learning algorithms (such as for NFA and visibly one-counter automata) libalf is flexible and allows facilely interchanging learning algorithms and combining domain-specific features in a plug-and-play fashion Its modular design and C++ implementation make it a suitable platform for adding and engineering further learning algorithms for new target models (eg, Buchi automata)

106 citations


Book
Yuri I. Manin1
29 Apr 2010
TL;DR: In this article, the Continuum Problem and Forcing were studied in the context of formal languages and computable sets, and they were shown to be computable and provenable.
Abstract: PROVABILITY.- to Formal Languages.- Truth and Deducibility.- The Continuum Problem and Forcing.- The Continuum Problem and Constructible Sets.- COMPUTABILITY.- Recursive Functions and Church#x2019 s Thesis.- Diophantine Sets and Algorithmic Undecidability.- PROVABILITY AND COMPUTABILITY.- G#x00F6 del#x2019 s Incompleteness Theorem.- Recursive Groups.- Constructive Universe and Computation.- MODEL THEORY.- Model Theory.

67 citations


Journal ArticleDOI
TL;DR: This paper shows how formal languages can be enhanced and used to model the complex regulations of the shift construction problem and can derive specialized graph structures that can be searched efficiently using a Large Neighbourhood Search.
Abstract: The challenge in shift scheduling lies in the construction of a set of work shifts, which are subject to specific regulations, in order to cover fluctuating staff demands. This problem becomes harder when multi-skill employees can perform many different activities during the same shift. In this paper, we show how formal languages (such as regular and context-free languages) can be enhanced and used to model the complex regulations of the shift construction problem. From these languages we can derive specialized graph structures that can be searched efficiently. The overall shift scheduling problem can then be solved using a Large Neighbourhood Search. These approaches are able to return near optimal solution on traditional single activity problems and they scale well on large instances containing up to 10 activities.

56 citations


01 Jan 2010
TL;DR: This paper shows how a question-answering system can be constructed using first-order logic as its language and a resolution-type theorem-prover as its deductive mechanism, and presents one particular approach in detail.
Abstract: This paper shows how a question-answering system can be constructed using first-order logic as its language and a resolution-type theorem-prover as its deductive mechanism A working computer program, qa3, based on these ideas is described The performance of the program compares favorably with several other general question-answering systems 1 QUESTION ANSWERING A question-answering system accepts information about some subject areas and answers questions by utilizing this information The type of question-answering system considered in this paper is ideally one having the following features: 1 A language general enough to describe any reasonable question-answering subjects and express desired questions and answers 2 The ability to search efficiently the stored information and recognize items that are relevant to a particular query 3 The ability to derive an answer that is not stored explicitly, but that is derivable by the use of moderate effort from the stored facts 4 Interactions between subject areas; for example, if the system has facts about Subject a and Subject b, then it should be able to answer a question that requires the use of both sets of facts 5 Capability of allowing the user to add new facts or replace old facts conveniently This paper argues the case for formal methods to achieve such a system and presents one particular approach in detail A natural language facility is not one of the properties sought after or discussed (although Coles, 1968, has added to the program described here a translator from a subset of English to first-order logic) The name ‘question-answering system’ requires clarification The system described above might be named an ‘advice taker’ or a ‘multi-purpose problem-solving system’ or ‘general problem-solving system’ McCarthy (1958) proposed using formal languages and deduction to construct such a system, and suggested allowing the user to give hints or advice on how to answer a question; he referred to the proposed system as an ‘advice taker’ Research on ‘multi-purpose’ or ‘general problem-solving’ tends to differ from question-answering as described above by placing more emphasis on solving deeper, more difficult problems and less emphasis on user interaction, formality, and efficient retrieval of relevant facts from a large data base The situation is further confused by the use of ‘question-answering’ to refer sometimes to natural language systems, sometimes to information retrieval systems having little deductive ability, and sometimes to systems with deductive ability limited to the propositional calculus

55 citations


Journal ArticleDOI
01 Feb 2010
TL;DR: This paper presents a methodology to synthesize model editors equipped with automatic completion from a modeling language’s declarative specification consisting of a meta-model with a visual syntax, powered by a first-order relational logic engine implemented in ALLOY.
Abstract: Integrated development environments such as Eclipse allow users to write programs quickly by presenting a set of recommendations for code completion. Similarly, word processing tools such as Microsoft Word present corrections for grammatical errors in sentences. Both of these existing structure editors use a set of constraints expressed in the form of a natural language grammar to restrict/correct the user ( syntax-directed editing) or formal grammar (language-directed editing ) to aid document completion. Taking this idea further, in this paper we present an integrated software system capable of generating recommendations for model completion of partial models built in editors for domain-specific modeling languages. We present a methodology to synthesize model editors equipped with automatic completion from a modeling language’s declarative specification consisting of a meta-model with a visual syntax. This meta-model directed completion feature is powered by a first-order relational logic engine implemented in ALLOY. We incorporate automatic completion in the generative tool AToM3. We use the finite state machines modeling language as a concise running example. Our approach leverages a correct by construction philosophy that renders subsequent simulation of models considerably less error-prone.

54 citations


Proceedings ArticleDOI
11 Jul 2010
TL;DR: The theory of regular cost functions over finite trees is developed, aquantitative extension to the notion of regular languages of trees, and nondeterministic and alternating finite tree cost automata for describing cost functions are introduced.
Abstract: We develop the theory of regular cost functions over finite trees: aquantitative extension to the notion of regular languages of trees: Cost functions map each input (tree) to a value in~$\omega+1$, and are considered modulo an equivalence relation which forgets about specific values, but preserves boundedness of functions on all subsets of the domain. We introduce nondeterministic and alternating finite tree cost automata for describing cost functions. We show that all these forms of automata are effectively equivalent. We also provide decision procedures for them. Finally, following B\"uchi's seminal idea, we use cost automata for providing decision procedures for cost monadic logic, a quantitative extension of monadic second order logic.

50 citations


Book ChapterDOI
06 Jul 2010
TL;DR: The existence of a minimum recognizer is proved in a very general setting which applies in particular to any BA of subsets of a discrete space and an equational characterization of BA of languages closed under quotients is given, which extends the known results on regular languages to nonregular languages.
Abstract: We propose a new approach to the notion of recognition, which departs from the classical definitions by three specific features. First, it does not rely on automata. Secondly, it applies to any Boolean algebra (BA) of subsets rather than to individual subsets. Thirdly, topology is the key ingredient.We prove the existence of a minimum recognizer in a very general setting which applies in particular to any BA of subsets of a discrete space. Our main results show that this minimum recognizer is a uniform space whose completion is the dual of the original BA in Stone-Priestley duality; in the case of a BA of languages closed under quotients, this completion, called the syntactic space of the BA, is a compact monoid if and only if all the languages of the BA are regular. For regular languages, one recovers the notions of a syntactic monoid and of a free profinite monoid. For nonregular languages, the syntactic space is no longer a monoid but is still a compact space. Further, we give an equational characterization of BA of languages closed under quotients, which extends the known results on regular languages to nonregular languages. Finally, we generalize all these results from BAs to lattices, in which case the appropriate structures are partially ordered.

Book ChapterDOI
01 Nov 2010
TL;DR: This paper focuses on automated generation of runtime monitors from temporal properties, with a focus on minimizing runtime overhead, rather than monitor size or monitor-generation time.
Abstract: SystemC is a modeling language built as an extension of C++. Its growing popularity and the increasing complexity of designs have motivated research efforts aimed at the verification of SystemC models using assertion-based verification (ABV), where the designer asserts properties that capture the design intent in a formal language such as PSL or SVA. The model then can be verified against the properties using runtime or formal verification techniques. In this paper we focus on automated generation of runtime monitors from temporal properties. Our focus is on minimizing runtime overhead, rather than monitor size or monitor-generation time. We identify four issues in monitor generation: state minimization, alphabet representation, alphabet minimization, and monitor encoding. We conduct extensive experimentation on a synthetic workload and identify a configuration that offers the best performance in terms of runtime overhead.

Book ChapterDOI
10 Oct 2010
TL;DR: A generic extension of the popular branching-time logic CTL is introduced which refines the temporal until and release operators with formal languages and shows that even with context-free languages on the until operator the logic still allows for polynomial time model-checking despite the significant increase in expressive power.
Abstract: We introduce a generic extension of the popular branching-time logic CTL which refines the temporal until and release operators with formal languages. For instance, a language may determine the moments along a path that an until property may be fulfilled. We consider several classes of languages leading to logics with different expressive power and complexity, whose importance is motivated by their use in model checking, synthesis, abstract interpretation, etc. We show that even with context-free languages on the until operator the logic still allows for polynomial time model-checking despite the significant increase in expressive power. This makes the logic a promising candidate for applications in verification. In addition, we analyse the complexity of satisfiability and compare the expressive power of these logics to CTL* and extensions of PDL.

BookDOI
03 Dec 2010
TL;DR: The book is a collection of papers going deep into classical topics in computer science inspired formal languages, as well as other ones showing new concepts and problems motivated in linguistics and biology.
Abstract: There are not many interdisciplinary scientific fields as formal language theory In this volume, it is presented as the very intersection point between Mathematics, Computer Science, Linguistics and Biology The book is a collection of papers going deep into classical topics in computer science inspired formal languages, as well as other ones showing new concepts and problems motivated in linguistics and biology The papers are organized in four sections: Grammars and Grammar Systems, Automata, Languages and Combinatorics, and Models of Molecular Computing They clearly prove the power, wealth and vitality of the theory nowadays and sketch some trends for its future development The volume is intended for an audience of computer scientists, computational linguists, theoretical biologists and any other people interested in dealing with the problems and challenges of interdisciplinarity

Journal ArticleDOI
TL;DR: The families of languages defined by components of unique, least and greatest solutions of such systems are shown to coincide with the classes of recursive, recursively enumerable and co-recursive enumerable sets, respectively.

Journal ArticleDOI
TL;DR: In this paper, the authors present two approaches for detecting symmetry in Rebeca models: one that detects symmetry in the topology of interconnections among objects and another one which exploits specific data structures to reflect internal symmetry.
Abstract: Rebeca is an actor-based language with formal semantics which is suitable for modeling concurrent and distributed systems and protocols. Due to its object model, partial order and symmetry detection and reduction techniques can be efficiently applied to dynamic Rebeca models. We present two approaches for detecting symmetry in Rebeca models: One that detects symmetry in the topology of inter-connections among objects and another one which exploits specific data structures to reflect internal symmetry in the internal structure of an object. The former approach is novel in that it does not require any input from the modeler and can deal with the dynamic changes of topology. This approach is potentially applicable to a wide range of modeling languages for distributed and reactive systems. We have also developed a model checking tool that implements all of the above-mentioned techniques. The evaluation results show significant improvements in model size and model-checking time.

Journal ArticleDOI
TL;DR: The rule-based composite event query language XChangeEQ is described, designed to completely cover and integrate the four complementary querying dimensions: event data, event composition, temporal relationships, and event accumulation.
Abstract: Web systems, Web services, and Web-based publish/subscribe systems communicate events as XML messages and in many cases, require composite event detection: it is not sufficient to react to single event messages, but events have to be considered in relation to other events that are received over time. This entails a need for expressive, high-level languages for querying composite events. Emphasizing language design and formal semantics, we describe the rule-based composite event query language XChangeEQ. XChangeEQ is designed to completely cover and integrate the four complementary querying dimensions: event data, event composition, temporal relationships, and event accumulation. Semantics are provided as a model theory with accompanying fixpoint theory, an approach that is established for rule languages but has not been applied to event queries so far. Because they are highly declarative, thus easy to understand and well suited for query optimization, such semantics are desirable for event queries.

Journal ArticleDOI
TL;DR: This paper studies formal power series over a quantale with coefficients in the algebra of all languages over a given alphabet, and representation of fuzzy languages by these formal powerseries, and shows that regular operations on fuzzy languages can be represented byRegular operations on power series are shown by means of operations on ordinary languages.

Posted Content
TL;DR: A survey of non-commutative rational functions, their realization theory and their applications can be found in this paper, where a difference-differential calculus is developed for further analysis.
Abstract: Noncommutative rational functions appeared in many contexts in system theory and control, from the theory of finite automata and formal languages to robust control and LMIs. We survey the construction of noncommutative rational functions, their realization theory and some of their applications. We also develop a difference-differential calculus as a tool for further analysis.

Proceedings ArticleDOI
01 Nov 2010
TL;DR: This investigation shows that the full satisfiability problem is ExpTime-complete in the full scenario, NP- complete if the authors drop isa between relationships, and NLogSpace-complete if they further drop covering over classes.
Abstract: UML class diagrams (UCDs) are the de-facto standard formalism for the analysis and design of information systems. By adopting formal language techniques to capture constraints expressed by UCDs one can exploit automated reasoning tools to detect relevant properties, such as schema and class satisfiability and subsumption between classes. Among the reasoning tasks of interest, the basic one is detecting full satisfiability of a diagram, i.e., whether there exists an instantiation of the diagram where all classes and associations of the diagram are non-empty and all the constraints of the diagram are respected. In this paper we establish tight complexity results for full satisfiability for various fragments of UML class diagrams. This investigation shows that the full satisfiability problem is ExpTime-complete in the full scenario, NP-complete if we drop isa between relationships, and NLogSpace-complete if we further drop covering over classes.

Proceedings ArticleDOI
17 Aug 2010
TL;DR: This paper presents a novel approach to multiformalism compositional modeling, that is based on the possibility of freely specifying the dynamics of the elements of a formal modeling language in an open framework, by the application of consolidated metamodeling foundations to the description of models.
Abstract: The design and the requirements of modern computer-based systems have reached a complexity level that calls for the use of models for the verification of non functional requirements since the beginning of their design cycle. Such systems are however too complex to be modeled directly in a simple unstructured formal language like Queueing Networks or Petri Nets. SIMTHESys (Structured Infrastructure for Multiformalism modeling and Testing of Heterogeneous formalisms and Extensions for SYStems) is a novel approach to multiformalism compositional modeling, that is based on the possibility of freely specifying the dynamics of the elements of a formal modeling language in an open framework. This is obtained by the application of consolidated metamodeling foundations to the description of models, together with the concept of behavior as a bridge between formalism dynamics and solution techniques. In this paper the main concepts of the SIMTHESys approach are presented, together with a running example of how SIMTHESys copes with performance evaluation of multiformalism models.

Journal ArticleDOI
TL;DR: It is proved that every rational language of words indexed by linear orderings is definable in monadic second-order logic, and it is shown that the converse is true for the class of languages indexed by countable scattered linear ordering, but false in the general case.
Abstract: We prove that every rational language of words indexed by linear orderings is definable in monadic second-order logic. We also show that the converse is true for the class of languages indexed by countable scattered linear orderings, but false in the general case. As a corollary we prove that the inclusion problem for rational languages of words indexed by countable linear orderings is decidable.

Journal ArticleDOI
01 Jan 2010
TL;DR: It is proved that the expressive power of 5′ → 3′ WK-automata increases with every additional run that they can make, both for deterministic and non-deterministic machines.
Abstract: 5′ → 3′ WK-automata are Watson-Crick automata whose two heads start on opposite ends of the input word and always run in opposite directions. One full reading in both directions is called a run. We prove that the expressive power of these automata increases with every additional run that they can make, both for deterministic and non-deterministic machines. This defines two incomparable infinite hierarchies of language classes between the regular and the context-sensitive languages. These hierarchies are complemented with classes defined by several restricted variants of 5′ → 3′ WK-automata like stateless automata. Finally we show that several standard problems are undecidable for languages accepted by 5′ → 3′ WK-automata in only one run, for example the emptiness and the finiteness problems.

Journal ArticleDOI
03 Mar 2010-PLOS ONE
TL;DR: A formal language that allows for transposing biological information precisely and rigorously into machine-readable information is proposed, which is grounded on a particular type of non-classical logic and can be used to write algorithms and computer programs.
Abstract: We propose a formal language that allows for transposing biological information precisely and rigorously into machine-readable information. This language, which we call Zsyntax (where Z stands for the Greek word ζωή, life), is grounded on a particular type of non-classical logic, and it can be used to write algorithms and computer programs. We present it as a first step towards a comprehensive formal language for molecular biology in which any biological process can be written and analyzed as a sort of logical “deduction”. Moreover, we illustrate the potential value of this language, both in the field of text mining and in that of biological prediction.

Book
01 Jan 2010
TL;DR: This computer science book represents scattered information by formal languages and gives an in-depth discussion of scattered context grammars as formal means that process these languages with a focus on applications in linguistics.
Abstract: This computer science book represents scattered information by formal languages and gives an in-depth discussion of scattered context grammars as formal means that process these languages. It is primarily meant as a monograph on these grammars, which represent an important trend of today's formal language theory. The text maintains a balance between fundamental concepts, theoretical results, and applications of these grammars. From a theoretical viewpoint, it introduces several variants of scattered context grammatical models. Based on these models, it demonstrates the concepts, methods, and techniques employed in handling scattered pieces of information with enough rigors to make them quite clear. It also explains a close relation between the subject of the book and several important mathematical fields, such as algebra and graph theory. From a more practical point of view, this book describes scattered information processing by fundamental information technologies. Throughout this book, several in-depth case studies and examples are carefully presented. Whilst discussing various methods concerning grammatical processing of scattered information, the text illustrates their applications with a focus on applications in linguistics.

Journal ArticleDOI
TL;DR: The smallest class of languages containing the singletons and closed under Boolean operations, product and shuffle is studied, including the smallest class containing the languages composed of a single word of length 2 which is closed under boolean operations and shuffle by a letter.
Abstract: There is an increasing interest in the shuffle product on formal languages, mainly because it is a standard tool for modeling process algebras. It still remains a mysterious operation on regular languages. Antonio Restivo proposed as a challenge to characterize the smallest class of languages containing the singletons and closed under Boolean operations, product and shuffle. This problem is still widely open, but we present some partial results on it. We also study some other smaller classes, including the smallest class containing the languages composed of a single word of length 2 which is closed under Boolean operations and shuffle by a letter (resp. shuffle by a letter and by the star of a letter). The proof techniques have both an algebraic and a combinatorial flavor.

Journal ArticleDOI
18 Jan 2010
TL;DR: The findings of a study conducted to identify learning difficulties for some of the FLAT topics are reported on.
Abstract: Students taking courses on formal languages and automata theory (FLAT) usually do not find these courses interesting and experience difficulty in grasping the different concepts. While there has been a vast amount of research into methodologies to assist students to conceptualize FLAT topics, there has been no research into the actual learning difficulties experienced by students with the different topics. This paper reports on the findings of a study conducted to identify these learning difficulties for some of the FLAT topics.

Proceedings Article
15 Jul 2010
TL;DR: This paper presents a lattice-theoretic representation for natural language syntax, called Distributional Lattice Grammars, based on a generalisation of distributional learning, and are capable of representing all regular languages, some but not all context-free languages and some non-context- free languages.
Abstract: A central problem for NLP is grammar induction: the development of unsupervised learning algorithms for syntax. In this paper we present a lattice-theoretic representation for natural language syntax, called Distributional Lattice Grammars. These representations are objective or empiricist, based on a generalisation of distributional learning, and are capable of representing all regular languages, some but not all context-free languages and some non-context-free languages. We present a simple algorithm for learning these grammars together with a complete self-contained proof of the correctness and efficiency of the algorithm.

Journal ArticleDOI
TL;DR: An extension of Spatial OCL is proposed based on a geometric model for objects with vague shapes, and an adverbial approach for modelling topological constraints involving regions with broad boundaries, which provides an easiness in the formal modelling of these complex constraints.
Abstract: Integrity constraints can control topological relations of objects in spatial databases. These constraints can be modelled using formal languages such as the spatial extension of the Object Constraint Language (Spatial OCL). This language allows the expression of topological integrity constraints involving crisp spatial objects but it does not support constraints involving spatial objects with vague shapes (e.g. forest stand, pollution zone, valley or lake). In this paper, we propose an extension of Spatial OCL based on (1) a geometric model for objects with vague shapes, and (2) an adverbial approach for modelling topological constraints involving regions with broad boundaries. This new language provides an easiness in the formal modelling of these complex constraints. Our approach has been implemented in a code generator. A case study is also presented in the paper in the field of agriculture spreading activities. AOCL OVS takes account of the shape vagueness of spread parcel and improve spatial reasoning about them.

Journal ArticleDOI
01 Jan 2010
TL;DR: A time- and space-efficient incremental arc-consistency algorithm for context-free grammars, investigate when logic combinations of grammar constraints are tractable, and show how to exploit non-constant size Grammars and reorderings of languages.
Abstract: With the introduction of the Regular Membership Constraint, a new line of research has opened where constraints are based on formal languages. This paper is taking the next step, namely to investigate constraints based on grammars higher up in the Chomsky hierarchy. We devise a time- and space-efficient incremental arc-consistency algorithm for context-free grammars, investigate when logic combinations of grammar constraints are tractable, show how to exploit non-constant size grammars and reorderings of languages, and study where the boundaries run between regular, context-free, and context-sensitive grammar filtering.