# A Canonical Semi-Deterministic Transducer

TL;DR: In this paper, the existence of a canonical form for semi-deterministic transducers with sets of pairwise incomparable output strings is proved. But this form requires domain knowledge only and there is no learning algorithm that uses only domain knowledge.

Abstract: We prove the existence of a canonical form for semi-deterministic transducers with sets of pairwise incomparable output strings. Based on this, we develop an algorithm which learns semi-deterministic transducers given access to translation queries. We also prove that there is no learning algorithm for semi-deterministic transducers that uses only domain knowledge.

##### Citations

More filters

••

TL;DR: It is established that unbounded circumambient processes, phonological processes for which crucial information in the environment may appear unboundedly far away on both sides of a target, are common in tonal phonology, but rare in segmental phonologies, and it is argued that this typological asymmetry is best characterised by positing that tone is more computationally complex than segmental pharmacology.

Abstract: This paper establishes that unbounded circumambient processes, phonological processes for which crucial information in the environment may appear unboundedly far away on both sides of a target, are common in tonal phonology, but rare in segmental phonology. It then argues that this typological asymmetry is best characterised by positing that tone is more computationally complex than segmental phonology. The evidence for the asymmetry is based around attestations of unbounded tonal plateauing, but it is also shown how the ‘sour-grapes’ harmony pathology is unbounded circumambient. The paper argues that such processes are not weakly deterministic, which contrasts with previous typological work on segmental phonology. Positing that weak determinism bounds segmental phonology but not tonal phonology thus captures the typological asymmetry. It is also discussed why this explanation is superior to any offered by Optimality Theory.

55 citations

••

TL;DR: A computational investigation of a range of morphological operations, which includes various types of affixation, reduplication, and non-concatenative morphology, indicates that many of these operations require less than the power of regular relations.

Abstract: This paper presents a computational investigation of a range of morphological operations. These operations are first represented as morphological maps, or functions that take a stem as input and return an output with the operation applied (e.g., the ing-suffixation map takes the input ‘dɹɪŋk’ and returns ‘dɹɪŋk+ɪŋ’). Given such representations, each operation can be classified in terms of the computational complexity needed to map a given input to its correct output. The set of operations analyzed includes various types of affixation, reduplication, and non-concatenative morphology. The results indicate that many of these operations require less than the power of regular relations (i.e., they are subregular functions), the exception being total reduplication. A comparison of the maps that fall into different complexity classes raises important questions for our overall understanding of the computational nature of phonology, morphology, and the morpho-phonological interface.

27 citations

•

01 Oct 2015

TL;DR: This book provides a thorough introduction to the subfield of theoretical computer science known asgrammatical inference from a computational linguistic perspective and summarizes the major lessons and open questions that grammatical inference brings to computational linguistics.

Abstract: This book provides a thorough introduction to the subfield of theoretical computer science known as grammatical inference from a computational linguistic perspective. Grammatical inference provides principled methods for developing computationally sound algorithms that learn structure from strings of symbols. The relationship to computational linguistics is natural because many research problems in computational linguistics are learning problems on words, phrases, and sentences: What algorithm can take as input some finite amount of data (for instance a corpus, annotated or otherwise) and output a system that behaves "correctly" on specific tasks? Throughout the text, the key concepts of grammatical inference are interleaved with illustrative examples drawn from problems in computational linguistics. Special attention is paid to the notion of "learning bias." In the context of computational linguistics, such bias can be thought to reflect common (ideally universal) properties of natural languages. This bias can be incorporated either by identifying a learnable class of languages which contains the language to be learned or by using particular strategies for optimizing parameter values. Examples are drawn largely from two linguistic domains (phonology and syntax) which span major regions of the Chomsky Hierarchy (from regular to context-sensitive classes). The conclusion summarizes the major lessons and open questions that grammatical inference brings to computational linguistics.

20 citations

••

01 Jul 2019TL;DR: This paper proves that for every class C of stochastic languages defined with the coemission product of finitely many probabilistic, deterministic finite-state acceptors (PDFA), the Maximum Likelihood Estimate of D with respect to C can be found efficiently by locally optimizing the parameter values.

Abstract: This paper proves that for every class C of stochastic languages defined with the coemission product of finitely many probabilistic, deterministic finite-state acceptors (PDFA) and for every data sequence D of finitely many strings drawn i.i.d. from some stochastic language, the Maximum Likelihood Estimate of D with respect to C can be found efficiently by locally optimizing the parameter values. We show that a consequence of the co-emission product is that each PDFA behaves like an independent factor in a joint distribution. Thus, the likelihood function decomposes in a natural way. We also show that the negative log likelihood function is convex. These results are motivated by the study of Strictly k-Piecewise (SPk) Stochastic Languages, which form a class of stochastic languages which is both linguistically motivated and naturally understood in terms of the coemission product of certain PDFAs.

4 citations

•

TL;DR: An algorithm for inferring nondeterministic functional transducers that has a lot in common with other well known algorithms such has RPNI and OSTIA and it is argued that this algorithm is a generalisation of both of them.

Abstract: The purpose of this paper is to present an algorithm for inferring nondeterministic functional transducers. It has a lot in common with other well known algorithms such has RPNI and OSTIA. Indeed we will argue that this algorithm is a generalisation of both of them. Functional transducers are all those nondeterministic transducers whose regular relation is a function. Epsilon transitions as well as subsequential output can be erased for such machines, with the exception of output for empty string being lost. Learning partial functional transducers from negative examples is equivalent to learning total from positive-only data.

##### References

More filters

••

05 Nov 1984TL;DR: This paper regards learning as the phenomenon of knowledge acquisition in the absence of explicit programming, and gives a precise methodology for studying this phenomenon from a computational viewpoint.

Abstract: Humans appear to be able to learn new concepts without needing to be programmed explicitly in any conventional sense. In this paper we regard learning as the phenomenon of knowledge acquisition in the absence of explicit programming. We give a precise methodology for studying this phenomenon from a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learnt using it in a reasonable (polynomial) number of steps. We find that inherent algorithmic complexity appears to set serious limits to the range of concepts that can be so learnt. The methodology and results suggest concrete principles for designing realistic learning systems.

5,311 citations

••

TL;DR: It was found that theclass of context-sensitive languages is learnable from an informant, but that not even the class of regular languages is learningable from a text.

Abstract: Language learnability has been investigated. This refers to the following situation: A class of possible languages is specified, together with a method of presenting information to the learner about an unknown language, which is to be chosen from the class. The question is now asked, “Is the information sufficient to determine which of the possible languages is the unknown language?” Many definitions of learnability are possible, but only the following is considered here: Time is quantized and has a finite starting time. At each time the learner receives a unit of information and is to make a guess as to the identity of the unknown language on the basis of the information received so far. This process continues forever. The class of languages will be considered learnable with respect to the specified method of information presentation if there is an algorithm that the learner can use to make his guesses, the algorithm having the following property: Given any language of the class, there is some finite time after which the guesses will all be the same and they will be correct. In this preliminary investigation, a language is taken to be a set of strings on some finite alphabet. The alphabet is the same for all languages of the class. Several variations of each of the following two basic methods of information presentation are investigated: A text for a language generates the strings of the language in any order such that every string of the language occurs at least once. An informant for a language tells whether a string is in the language, and chooses the strings in some order such that every string occurs at least once. It was found that the class of context-sensitive languages is learnable from an informant, but that not even the class of regular languages is learnable from a text.

3,460 citations

••

Yale University

^{1}TL;DR: This work considers the problem of using queries to learn an unknown concept, and several types of queries are described and studied: membership, equivalence, subset, superset, disjointness, and exhaustiveness queries.

Abstract: We consider the problem of using queries to learn an unknown concept. Several types of queries are described and studied: membership, equivalence, subset, superset, disjointness, and exhaustiveness queries. Examples are given of efficient learning methods using various subsets of these queries for formal domains, including the regular languages, restricted classes of context-free languages, the pattern languages, and restricted types of prepositional formulas. Some general lower bound techniques are given. Equivalence queries are compared with Valiant's criterion of probably approximately correct identification under random sampling.

1,797 citations

•

02 Nov 2011

TL;DR: The Foundations of Set Theory and Infinitary Combinatorics are presented, followed by a discussion of easy Consistency Proofs and Defining Definability.

Abstract: The Foundations of Set Theory. Infinitary Combinatorics. The Well-Founded Sets. Easy Consistency Proofs. Defining Definability. The Constructible Sets. Forcing. Iterated Forcing. Bibliography. Indexes.

1,506 citations