scispace - formally typeset
Search or ask a question

Showing papers on "Natural language published in 2006"


Proceedings ArticleDOI
17 Jul 2006
TL;DR: The Natural Language Toolkit has been rewritten, simplifying many linguistic data structures and taking advantage of recent enhancements in the Python language.
Abstract: The Natural Language Toolkit is a suite of program modules, data sets and tutorials supporting research and teaching in computational linguistics and natural language processing. NLTK is written in Python and distributed under the GPL open source license. Over the past year the toolkit has been rewritten, simplifying many linguistic data structures and taking advantage of recent enhancements in the Python language. This paper reports on the simplified toolkit and explains how it is used in teaching NLP.

2,835 citations


Journal ArticleDOI
TL;DR: This article is an attempt to validate the output of Leximancer, using a set of evaluation criteria taken from content analysis that are appropriate for knowledge discovery tasks.
Abstract: The Leximancer system is a relatively new method for transforming lexical co-occurrence information from natural language into semantic patterns in an unsupervised manner. It employs two stages of co-occurrence information extraction—semantic andrelational—using a different algorithm for each stage. The algorithms used are statistical, but they employ nonlinear dynamics and machine learning. This article is an attempt to validate the output of Leximancer, using a set of evaluation criteria taken from content analysis that are appropriate for knowledge discovery tasks.

1,034 citations


Proceedings ArticleDOI
17 Jul 2006
TL;DR: It is shown that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models.
Abstract: We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney.

580 citations


Journal ArticleDOI
TL;DR: The authors argue that language selection depends on a set of factors that vary according to the experience of the bilinguals, the demands of the production task, and the degree of activity of the nontarget language.
Abstract: Bilingual speech requires that the language of utterances be selected prior to articulation. Past research has debated whether the language of speaking can be determined in advance of speech planning and, if not, the level at which it is eventually selected. We argue that the reason that it has been difficult to come to an agreement about language selection is that there is not a single locus of selection. Rather, language selection depends on a set of factors that vary according to the experience of the bilinguals, the demands of the production task, and the degree of activity of the nontarget language. We demonstrate that it is possible to identify some conditions that restrict speech planning to one language alone and others that open the process to cross-language influences. We conclude that the presence of language nonselectivity at all levels of planning spoken utterances renders the system itself fundamentally nonselective.

539 citations


Patent
04 Aug 2006
TL;DR: In this article, a conversational human-machine interface that includes conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message is presented.
Abstract: A system and method are provided for receiving speech and/or non-speech communications of natural language questions and/or commands and executing the questions and/or commands. The invention provides a conversational human-machine interface that includes a conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message. The system and method creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech or non-speech communication and presenting the expected results for a particular question or command.

430 citations


Journal Article
Shi Bing1
TL;DR: Text categorization-assignment of natural language texts to one or more predefined categories based on their content-is an important component in many information organization and management tasks.
Abstract: Text categorization-assignment of natural language texts to one or more predefined categories based on their content-is an important component in many information organization and management tasks.Different automatic learning algorithms for text categori-zation have different classification accuracy.Very accurate text classifiers can be learned automatically from training examples.

384 citations


Proceedings Article
01 Apr 2006
TL;DR: This paper provides a simple algorithm to compute tree kernels in linear average running time and a study on the classification properties of diverse tree kernels show that kernel combinations always improve the traditional methods.
Abstract: In recent years tree kernels have been proposed for the automatic learning of natural language applications. Unfortunately, they show (a) an inherent super linear complexity and (b) a lower accuracy than traditional attribute/value methods. In this paper, we show that tree kernels are very helpful in the processing of natural language as (a) we provide a simple algorithm to compute tree kernels in linear average running time and (b) our study on the classification properties of diverse tree kernels show that kernel combinations always improve the traditional methods. Experiments with Support Vector Machines on the predicate argument classification task provide empirical support to our thesis.

335 citations


Patent
14 Aug 2006
TL;DR: A human-computer interface system and methods for providing intelligent, adaptive, multimodal interaction with users while accomplishing tasks on their behalf in some particular domain or combination of domains as mentioned in this paper.
Abstract: A human-computer interface system and methods for providing intelligent, adaptive, multimodal interaction with users while accomplishing tasks on their behalf in some particular domain or combination of domains. Specifically, this system accepts user input via natural language text, mouse actions, human speech, whistles, gestures, pedal movements, facial or postural changes, and conveys results via natural language text, automatically-generated speech, and displays of graphs, tables, animation, video, and mechanical and chemical effectors that convey heat, tactile sensation, taste and smell.

311 citations


Proceedings ArticleDOI
17 Jul 2006
TL;DR: This paper used temporal reasoning as an over-sampling method to dramatically expand the amount of training data, resulting in predictive accuracy on link labeling as high as 93% using a Maximum Entropy classifier on human annotated data.
Abstract: This paper investigates a machine learning approach for temporally ordering and anchoring events in natural language texts. To address data sparseness, we used temporal reasoning as an over-sampling method to dramatically expand the amount of training data, resulting in predictive accuracy on link labeling as high as 93% using a Maximum Entropy classifier on human annotated data. This method compared favorably against a series of increasingly sophisticated baselines involving expansion of rules derived from human intuitions.

293 citations


Proceedings ArticleDOI
17 Jul 2006
TL;DR: This work presents a new approach for mapping natural language sentences to their formal meaning representations using string-kernel-based classifiers, which compares favorably to other existing systems and is particularly robust to noise.
Abstract: We present a new approach for mapping natural language sentences to their formal meaning representations using string-kernel-based classifiers. Our system learns these classifiers for every production in the formal language grammar. Meaning representations for novel natural language sentences are obtained by finding the most probable semantic parse using these string classifiers. Our experiments on two real-world data sets show that this approach compares favorably to other existing systems and is particularly robust to noise.

253 citations


Journal ArticleDOI
TL;DR: This paper used contrastive rhetoric as an explanation for the difficulty for multilingual writers in composing an essay in English, and they used a "correlationist" model to study the different types of media that can complicate the realization of texts in different languages.
Abstract: The dominant approaches to studying multilingual writing have been ham pered by monolingualist assumptions that conceive literacy as a unidirec tional acquisition of competence, preventing us from fully understanding the resources multilinguals bring to their texts. In this essay, I attempt to change the questions and frameworks of such inquiry in order to do justice to the creativity of multilingual writers.1 How do teachers and researchers of English writing orient to linguistic and cultural difference in the essays they read? In what I will call the "inference" model, if they see a peculiar tone, style, organization, or discourse, many teachers instinc tively turn to the first language (LI) or "native" culture (CI) of the writer for an explanation. This was the practice of some early versions of contrastive rhetoric (see Kaplan). Even now, sympathetic scholars in our field seek explanations from LI or CI for what they perceive as difficulties for multilingual writers in composing an essay in English (see Fox). Among other problems, the writer is treated as being conditioned so strongly by LI and CI that even when he or she writes in another language, those influences are supposed to manifest themselves in the new text. There is also the misleading assumption that one can unproblematically describe the tradi tions of LI literacy by studying the English essay of a multilingual writer (even if the writer is a student in a developmental writing program). While the inference model fails to acknowledge the different types of media tion that can complicate the realization of texts in different languages, some scholars have now slightly modified their approach in what I call a "correlationist" model. They study the texts in LI descriptively before they draw on this information to

BookDOI
15 Jan 2006
TL;DR: This volume presents in-depth introductions to major aspects of language documentation, including overviews on fieldwork ethics and data processing, guidelines for the basic annotation of digitally-stored multimedia corpora and a discussion on how to build and maintain a language archive.
Abstract: Language documentation is a rapidly emerging new field in linguistics which is concerned with the methods, tools and theoretical underpinnings for compiling a representative and lasting multipurpose record of a natural language. This volume presents in-depth introductions to major aspects of language documentation, including overviews on fieldwork ethics and data processing, guidelines for the basic annotation of digitally-stored multimedia corpora and a discussion on how to build and maintain a language archive. It combines theoretical and practical considerations and makes specific suggestions for the most common problems encountered in language documentation. Key features textbook introduction to Language Documentation considers all common problems

Book
Joakim Nivre1
01 Jan 2006
TL;DR: This book provides an in-depth description of the framework of inductive dependency parsing, a methodology for robust and efficient syntactic analysis of unrestricted natural language text.
Abstract: This book provides an in-depth description of the framework of inductive dependency parsing, a methodology for robust and efficient syntactic analysis of unrestricted natural language text. This me ...

Book
06 Jan 2006
TL;DR: With a facsimile of the Siloam Inscription by Euting, J., a table of alphabets by Lidzbarski, M., and a dictionary of all the alphabetic characters.
Abstract: With a facsimile of the Siloam Inscription by: Euting, J.; a table of alphabets by: Lidzbarski, M.;

Book
01 Jan 2006
TL;DR: In this paper, the relation of LANGUAGE to thought is discussed. But the focus is on the use of language in the present tense, not on the past tense.
Abstract: I. LINGUISTICS IS NOT PSYCHOLOGY II. POSITIONS ON PSYCHOLOGICAL REALITY III. 'PHILOSOPHICAL' ARGUMENTS FOR THE REPRESENTATIONAL THESIS IV. THE RELATION OF LANGUAGE TO THOUGHT V. LANGUAGE USE AND ACQUISITION

Journal ArticleDOI
TL;DR: This paper presents an approach to ontology‐based GI retrieval that contributes to solving existing problems of semantic heterogeneity and hides most of the complexity of the required procedure from the requester.
Abstract: Discovering and accessing suitable geographic information (GI) in the open and distributed environments of current Spatial Data Infrastructures (SDIs) is a crucial task. Catalogues provide searchable repositories of information descriptions, but the mechanisms to support GI retrieval are still insufficient. Problems of semantic heterogeneity caused by the ambiguity of natural language can arise during keyword‐based search in catalogues and when formulating a query to access the discovered data. In this paper, we present an approach to ontology‐based GI retrieval that contributes to solving existing problems of semantic heterogeneity and hides most of the complexity of the required procedure from the requester. A query language and graphical user interface allow a requester to intuitively formulate a query using a well‐known domain vocabulary. From this query, an ontology concept is derived, which is then used to search a catalogue for a data source that provides all the information required to answer the ...

Patent
Brian Tunning1, Evan Gridley1
06 Jun 2006
TL;DR: In this article, a PIM application provides a single page natural language interface for entering and managing PIM data, which can be associated with a task, calendar, contact, or other data type.
Abstract: A PIM application provides a single page natural language interface for entering and managing PIM data. The natural language interface may receive a natural language entry as a text character string. The entry may be associated with a task, calendar, contact or other PIM data type. The received entries are processed (for example, parsed) to determine the PIM data type and other information. The original entry is not discarded from the natural language interface as a result of processing. After processing one or more received natural language entries, the entries remain in the natural language interface to be viewed and managed. The entry is maintained so it can be managed with other natural language entries provided in the interface.

Proceedings ArticleDOI
08 Jun 2006
TL;DR: The best performing systems at the TREC Question Answering track employ parsing for analyzing sentences in order to identify the query focus, to extract relations and to disambiguate meanings of words.
Abstract: Parsing natural language is an essential step in several applications that involve document analysis, e.g. knowledge extraction, question answering, summarization, filtering. The best performing systems at the TREC Question Answering track employ parsing for analyzing sentences in order to identify the query focus, to extract relations and to disambiguate meanings of words.

Journal ArticleDOI
TL;DR: The computational theory of perceptions (CTP) as discussed by the authors is based on the methodology of computing with words, which is inspired by the remarkable human capability to perform a wide variety of physical and mental tasks without any measurements and any computations.
Abstract: Interest in issues relating to consciousness has grown markedly during the last several years. And yet, nobody can claim that consciousness is a well-understood concept that lends itself to precise analysis. It may be argued that, as a concept, consciousness is much too complex to fit into the conceptual structure of existing theories based on Aristotelian logic and probability theory. An approach suggested in this paper links consciousness to perceptions and perceptions to their descriptors in a natural language. In this way, those aspects of consciousness which relate to reasoning and concept formation are linked to what is referred to as the methodology of computing with words (CW). Computing, in its usual sense, is centered on manipulation of numbers and symbols. In contrast, computing with words, or CW for short, is a methodology in which the objects of computation are words and propositions drawn from a natural language (e.g., small, large, far, heavy, not very likely, the price of gas is low and declining, Berkeley is near San Francisco, it is very unlikely that there will be a significant increase in the price of oil in the near future, etc.). Computing with words is inspired by the remarkable human capability to perform a wide variety of physical and mental tasks without any measurements and any computations. Familiar examples of such tasks are parking a car, driving in heavy traffic, playing golf, riding a bicycle, understanding speech, and summarizing a story. Underlying this remarkable capability is the brain's crucial ability to manipulate perceptions--perceptions of distance, size, weight, color, speed, time, direction, force, number, truth, likelihood, and other characteristics of physical and mental objects. Manipulation of perceptions plays a key role in human recognition, decision and execution processes. As a methodology, computing with words provides a foundation for a computational theory of perceptions: a theory which may have an important bearing on how humans make--and machines might make--perception-based rational decisions in an environment of imprecision, uncertainty, and partial truth. A basic difference between perceptions and measurements is that, in general, measurements are crisp, whereas perceptions are fuzzy. One of the fundamental aims of science has been and continues to be that of progressing from perceptions to measurements. Pursuit of this aim has led to brilliant successes. We have sent men to the moon; we can build computers that are capable of performing billions of computations per second; we have constructed telescopes that can explore the far reaches of the universe; and we can date the age of rocks that are millions of years old. But alongside the brilliant successes stand conspicuous underachievements and outright failures. We cannot build robots that can move with the agility of animals or humans; we cannot automate driving in heavy traffic; we cannot translate from one language to another at the level of a human interpreter; we cannot create programs that can summarize non-trivial stories; our ability to model the behavior of economic systems leaves much to be desired; and we cannot build machines that can compete with children in the performance of a wide variety of physical and cognitive tasks. It may be argued that underlying the underachievements and failures is the unavailability of a methodology for reasoning and computing with perceptions rather than measurements. An outline of such a methodology--referred to as a computational theory of perceptions--is presented in this paper. The computational theory of perceptions (CTP) is based on the methodology of CW. In CTP, words play the role of labels of perceptions, and, more generally, perceptions are expressed as propositions in a natural language. CW-based techniques are employed to translate propositions expressed in a natural language into what is called the Generalized Constraint Language (GCL). In this language, the meaning of a proposition is expressed as a generalized constraint, X isr R, where X is the constrained variable, R is the constraining relation, and isr is a variable copula in which r is an indexing variable whose value defines the way in which R constrains X. Among the basic types of constraints are possibilistic, veristic, probabilistic, random set, Pawlak set, fuzzy graph, and usuality. The wide variety of constraints in GCL makes GCL a much more expressive language than the language of predicate logic. In CW, the initial and terminal data sets, IDS and TDS, are assumed to consist of propositions expressed in a natural language. These propositions are translated, respectively, into antecedent and consequent constraints. Consequent constraints are derived from antecedent constraints through the use of rules of constraint propagation. The principal constraint propagation rule is the generalized extension principle. (ABSTRACT TRUNCATED)

Book ChapterDOI
05 Nov 2006
TL;DR: This paper introduces GINO, a guided input natural language ontology editor that allows users to edit and query ontologies in a language akin to English and believes that the use of guided entry overcomes thehabitability problem, which adversely affects most natural language systems.
Abstract: The casual user is typically overwhelmed by the formal logic of the Semantic Web. The gap between the end user and the logic-based scaffolding has to be bridged if the Semantic Web’s capabilities are to be utilized by the general public. This paper proposes that controlled natural languages offer one way to bridge the gap. We introduce GINO, a guided input natural language ontology editor that allows users to edit and query ontologies in a language akin to English. It uses a small static grammar, which it dynamically extends with elements from the loaded ontologies. The usability evaluation shows that GINO is well-suited for novice users when editing ontologies. We believe that the use of guided entry overcomes thehabitability problem, which adversely affects most natural language systems. Additionally, the approach’s dynamic grammar generation allows for easy adaptation to new ontologies.

Proceedings ArticleDOI
20 Aug 2006
TL;DR: The authors used deep linguistic structures instead of surface text patterns to extract pairs of a given semantic relation from text documents and applied them to a corpus to find new pairs, and demonstrated the benefits of their approach by extensive experiments with their prototype system LEILA.
Abstract: The World Wide Web provides a nearly endless source of knowledge, which is mostly given in natural language. A first step towards exploiting this data automatically could be to extract pairs of a given semantic relation from text documents - for example all pairs of a person and her birthdate. One strategy for this task is to find text patterns that express the semantic relation, to generalize these patterns, and to apply them to a corpus to find new pairs. In this paper, we show that this approach profits significantly when deep linguistic structures are used instead of surface text patterns. We demonstrate how linguistic structures can be represented for machine learning, and we provide a theoretical analysis of the pattern matching approach. We show the benefits of our approach by extensive experiments with our prototype system LEILA.

Patent
17 Jan 2006
TL;DR: A natural language generation (NLG) software system that generates rich, content-sensitive human language descriptions based on unparsed raw domain-specific data is described in this paper.
Abstract: The invention is directed to a natural language generation (NLG) software system that generates rich, content-sensitive human language descriptions based on unparsed raw domain-specific data. In one embodiment, the NLG software system may include a data parser/normalizer, a comparator, a language engine, and a document generator. The data parser/normalizer may be configured to retrieve specification information for items to be described by the NLG software system, to extract pertinent information from the raw specification information, and to convert and normalize the extracted information so that the items may be compared specification by specification. The comparator may be configured to use the normalized data from the data parser/normalizer to compare the specifications of the items using comparison functions and interpretation rules to determine outcomes of the comparisons. The language engine may be configured to cycle through all or a subset of the normalized specification information, to retrieve all sentence templates associated with each of the item specifications, to call the comparator to compute or retrieve the results of the comparisons between the item specifications, and to recursively generate every possible syntactically legal sentence associated with the specifications based on the retrieved sentence templates. The document generator may be configured to select one or more discourse models having instructions regarding the selection, organization and modification of the generated sentences, and to apply the instructions of the discourse model to the generated sentences to generate a natural language description of the selected items.


Journal ArticleDOI
TL;DR: Evidence that analogical comparison is instrumental in language learning is reviewed, suggesting a larger role for general learning processes in the acquisition of language.
Abstract: The acquisition of language has long stood as a challenge to general learning accounts, leading many theorists to propose domain-specific knowledge and processes to explain language acquisition. Here we review evidence that analogical comparison is instrumental in language learning, suggesting a larger role for general learning processes in the acquisition of language.

Patent
14 Aug 2006
TL;DR: In this article, linguistic analysis is used to identify queries that use different natural language formations to request similar information, and common intent categories are identified for the queries requesting similar information. Intent responses can then be provided that are associated with the identified intent categories.
Abstract: Linguistic analysis is used to identify queries that use different natural language formations to request similar information. Common intent categories are identified for the queries requesting similar information. Intent responses can then be provided that are associated with the identified intent categories. An intent management tool can be used for identifying new intent categories, identifying obsolete intent categories, or refining existing intent categories.

Proceedings ArticleDOI
12 Jul 2006
TL;DR: The successful implementation of the parsing capabilities that are part of the functional version of the SPARCLE authoring utility are presented, including a set of grammars which execute on a shallow parser that are designed to identify the rule elements in privacy policy rules.
Abstract: Today organizations do not have good ways of linking their written privacy policies with the implementation of those policies. To assist organizations in addressing this issue, our human-centered research has focused on understanding organizational privacy management needs, and, based on those needs, creating a usable and effective policy workbench called SPARCLE. SPARCLE will enable organizational users to enter policies in natural language, parse the policies to identify policy elements and then generate a machine readable (XML) version of the policy. In the future, SPARCLE will then enable mapping of policies to the organization's configuration and provide audit and compliance tools to ensure that the policy implementation operates as intended. In this paper, we present the strategies employed in the design and implementation of the natural language parsing capabilities that are part of the functional version of the SPARCLE authoring utility. We have created a set of grammars which execute on a shallow parser that are designed to identify the rule elements in privacy policy rules. We present empirical usability evaluation data from target organizational users of the SPARCLE system and highlight the parsing accuracy of the system with the organizations' privacy policies. The successful implementation of the parsing capabilities is an important step towards our goal of providing a usable and effective method for organizations to link the natural language version of privacy policies to their implementation, and subsequent verification through compliance auditing of the enforcement logs.

Journal ArticleDOI
TL;DR: The focus of this work is to exploit data and to use machine learning techniques to create scalable SLU systems which can be quickly deployed for new domains with minimal human intervention.
Abstract: Spoken language understanding (SLU) aims at extracting meaning from natural language speech. Over the past decade, a variety of practical goal-oriented spoken dialog systems have been built for limited domains. SLU in these systems ranges from understanding predetermined phrases through fixed grammars, extracting some predefined named entities, extracting users' intents for call classification, to combinations of users' intents and named entities. In this paper, we present the SLU system of VoiceTone/spl reg/ (a service provided by ATT 2) extending statistical classifiers to seamlessly integrate hand crafted classification rules with the rules learned from data; and 3) developing an active learning framework to minimize the human labeling effort for quickly building the classifier models and adapting them to changes. We present an evaluation of this system using two deployed applications of VoiceTone/spl reg/.

Proceedings ArticleDOI
26 Sep 2006
TL;DR: A better way to use synonym substitution is proposed, one that is no longer entirely guided by the mark-insertion process, but is also guided by a resilience requirement, subject to a maximum allowed distortion constraint.
Abstract: Information-hiding in natural language text has mainly consisted of carrying out approximately meaning-preserving modifications on the given cover text until it encodes the intended mark. A major technique for doing so has been synonym-substitution. In these previous schemes, synonym substitutions were done until the text "confessed", i.e., carried the intended mark message. We propose here a better way to use synonym substitution, one that is no longer entirely guided by the mark-insertion process: It is also guided by a resilience requirement, subject to a maximum allowed distortion constraint. Previous schemes for information hiding in natural language text did not use numeric quantification of the distortions introduced by transformations, they mainly used heuristic measures of quality based on conformity to a language model (and not in reference to the original cover text). When there are many alternatives to carry out a substitution on a word, we prioritize these alternatives according to a quantitative resilience criterion and use them in that order. In a nutshell, we favor the more ambiguous alternatives. In fact not only do we attempt to achieve the maximum ambiguity, but we want to simultaneously be as close as possible to the above-mentioned distortion limit, as that prevents the adversary from doing further transformations without exceeding the damage threshold; that is, we continue to modify the document even after the text has "confessed" to the mark, for the dual purpose of maximizing ambiguity while deliberately getting as close as possible to the distortion limit. The quantification we use makes possible an application of the existing information-theoretic framework, to the natural language domain, which has unique challenges not present in the image or audio domains. The resilience stems from both (i) the fact that the adversary does not know where the changes were made, and (ii) the fact that automated disambiguation is a major difficulty faced by any natural language processing system (what is bad news for the natural language processing area, is good news for our scheme's resilience). In addition to the above mentioned design and analysis, another contribution of this paper is the description of the implementation of the scheme and of the experimental data obtained.

Proceedings ArticleDOI
05 Apr 2006
TL;DR: Simple techniques based on comparing corpus frequencies, coupled with large quantities of data, are shown to be effective for identifying the events underlying changes in global moods.
Abstract: We describe a method for discovering irregularities in temporal mood patterns appearing in a large corpus of blog posts, and labeling them with a natural language explanation. Simple techniques based on comparing corpus frequencies, coupled with large quantities of data, are shown to be effective for identifying the events underlying changes in global moods.

Journal ArticleDOI
TL;DR: This paper describes in detail an algorithm for the unsupervised learning of natural language morphology, with emphasis on challenges that are encountered in languages typologically similar to European languages.
Abstract: This paper describes in detail an algorithm for the unsupervised learning of natural language morphology, with emphasis on challenges that are encountered in languages typologically similar to European languages. It utilizes the Minimum Description Length analysis described in Goldsmith (2001), and has been implemented in software that is available for downloading and testing.