scispace - formally typeset
Search or ask a question

Showing papers in "Computational Linguistics in 2010"


Journal ArticleDOI
TL;DR: The Distributional Memory approach is shown to be tenable despite the constraints imposed by its multi-purpose nature, and performs competitively against task-specific algorithms recently reported in the literature for the same tasks, and against several state-of-the-art methods.
Abstract: Research into corpus-based semantics has focused on the development of ad hoc models that treat single tasks, or sets of closely related tasks, as unrelated challenges to be tackled by extracting different kinds of distributional information from the corpus. As an alternative to this "one task, one model" approach, the Distributional Memory framework extracts distributional information once and for all from the corpus, in the form of a set of weighted word-link-word tuples arranged into a third-order tensor. Different matrices are then generated from the tensor, and their rows and columns constitute natural spaces to deal with different semantic problems. In this way, the same distributional information can be shared across tasks such as modeling word similarity judgments, discovering synonyms, concept categorization, predicting selectional preferences of verbs, solving analogy problems, classifying relations between word pairs, harvesting qualia structures with patterns or example pairs, predicting the typical properties of concepts, and classifying verbs into alternation classes. Extensive empirical testing in all these domains shows that a Distributional Memory implementation performs competitively against task-specific algorithms recently reported in the literature for the same tasks, and against our implementations of several state-of-the-art methods. The Distributional Memory approach is thus shown to be tenable despite the constraints imposed by its multi-purpose nature.

671 citations


Journal ArticleDOI
TL;DR: A comprehensive and application-independent survey of data-driven phrasal and sentential paraphrase generation methods is conducted, while also conveying an appreciation for the importance and potential use of paraphrases in the field of NLP research.
Abstract: The task of paraphrasing is inherently familiar to speakers of all languages. Moreover, the task of automatically generating or extracting semantic equivalences for the various units of language-words, phrases, and sentences-is an important part of natural language processing (NLP) and is being increasingly employed to improve the performance of several NLP applications. In this article, we attempt to conduct a comprehensive and application-independent survey of data-driven phrasal and sentential paraphrase generation methods, while also conveying an appreciation for the importance and potential use of paraphrases in the field of NLP research. Recent work done in manual and automatic construction of paraphrase corpora is also examined. We also discuss the strategies used for evaluating paraphrase generation techniques and briefly explore some future trends in paraphrase generation.

308 citations


Journal ArticleDOI
TL;DR: This work presents a vector space–based model for selectional preferences that predicts plausibility scores for argument headwords and obtains consistent benefits from using the disambiguation and semantic role information provided by a semantically tagged primary corpus.
Abstract: We present a vector space-based model for selectional preferences that predicts plausibility scores for argument headwords. It does not require any lexical resources (such as WordNet). It can be trained either on one corpus with syntactic annotation, or on a combination of a small semantically annotated primary corpus and a large, syntactically analyzed generalization corpus. Our model is able to predict inverse selectional preferences, that is, plausibility scores for predicates given argument heads. We evaluate our model on one NLP task (pseudo-disambiguation) and one cognitive task (prediction of human plausibility judgments), gauging the influence of different parameters and comparing our model against other model classes. We obtain consistent benefits from using the disambiguation and semantic role information provided by a semantically tagged primary corpus. As for parameters, we identify settings that yield good performance across a range of experimental conditions. However, frequency remains a major influence of prediction quality, and we also identify more robust parameter settings suitable for applications with many infrequent items.

115 citations


Journal ArticleDOI
TL;DR: A discourse-informed model which is capable of producing document compressions that are coherent and informative is presented, inspired by theories of local coherence and formulated within the framework of integer linear programming.
Abstract: Sentence compression holds promise for many applications ranging from summarization to subtitle generation. The task is typically performed on isolated sentences without taking the surrounding context into account, even though most applications would operate over entire documents. In this article we present a discourse-informed model which is capable of producing document compressions that are coherent and informative. Our model is inspired by theories of local coherence and formulated within the framework of integer linear programming. Experimental results show significant improvements over a state-of-the-art discourse agnostic approach.

91 citations


Journal ArticleDOI
TL;DR: The proposed method is compared with regression methods and a state-of-the art classification method, and an application is presented, called Terrace, which retrieves texts with readability similar to that of a given input text.
Abstract: This article presents a novel approach for readability assessment through sorting. A comparator that judges the relative readability between two texts is generated through machine learning, and a given set of texts is sorted by this comparator. Our proposal is advantageous because it solves the problem of a lack of training data, because the construction of the comparator only requires training data annotated with two reading levels. The proposed method is compared with regression methods and a state-of-the art classification method. Moreover, we present our application, called Terrace, which retrieves texts with readability similar to that of a given input text.

90 citations


Journal ArticleDOI
Stefan Riezler1, Yi Liu1
TL;DR: It is shown in an extrinsic evaluation in a real-world Web search task that the combination of a query-to-snippet translation model with a query language model achieves improved contextual query expansion compared to a state-of-the-art query expansion model that is trained on the same query log data.
Abstract: Long queries often suffer from low recall in Web search due to conjunctive term matching. The chances of matching words in relevant documents can be increased by rewriting query terms into new terms with similar statistical properties. We present a comparison of approaches that deploy user query logs to learn rewrites of query terms into terms from the document space. We show that the best results are achieved by adopting the perspective of bridging the "lexical chasm" between queries and documents by translating from a source language of user queries into a target language of Web documents. We train a state-of-the-art statistical machine translation model on query-snippet pairs from user query logs, and extract expansion terms from the query rewrites produced by the monolingual translation system. We show in an extrinsic evaluation in a real-world Web search task that the combination of a query-to-snippet translation model with a query language model achieves improved contextual query expansion compared to a state-of-the-art query expansion model that is trained on the same query log data.

88 citations


Journal ArticleDOI
TL;DR: A model of syntactic processing that operates successfully within severe constraints, by recognizing constituents in a right-corner transformed representation and mapping this representation to random variables in a Hierarchic Hidden Markov Model, a factored time-series model which probabilistically models the contents of a bounded memory store over time.
Abstract: Human syntactic processing shows many signs of taking place within a general-purpose short-term memory. But this kind of memory is known to have a severely constrained storage capacity---possibly constrained to as few as three or four distinct elements. This article describes a model of syntactic processing that operates successfully within these severe constraints, by recognizing constituents in a right-corner transformed representation (a variant of left-corner parsing) and mapping this representation to random variables in a Hierarchic Hidden Markov Model, a factored time-series model which probabilistically models the contents of a bounded memory store over time. Evaluations of the coverage of this model on a large syntactically annotated corpus of English sentences, and the accuracy of a a bounded-memory parsing strategy based on this model, suggest this model may be cognitively plausible.

74 citations


Journal ArticleDOI
TL;DR: HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment is described, finding that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance.
Abstract: In this article we describe HiFST, a lattice-based decoder for hierarchical phrase-based translation and alignment. The decoder is implemented with standard Weighted Finite-State Transducer (WFST) operations as an alternative to the well-known cube pruning procedure. We find that the use of WFSTs rather than k-best lists requires less pruning in translation search, resulting in fewer search errors, better parameter optimization, and improved translation performance. The direct generation of translation lattices in the target language can improve subsequent rescoring procedures, yielding further gains when applying long-span language models and Minimum Bayes Risk decoding. We also provide insights as to how to control the size of the search space defined by hierarchical rules. We show that shallow-n grammars, low-level rule catenation, and other search constraints can help to match the power of the translation system to specific language pairs.

70 citations


Journal ArticleDOI
TL;DR: An approach to the automatic creation of extractive summaries of literary short stories, which relies on assorted surface indicators about clauses in the short story, suggests that the summaries are helpful in achieving the original objective.
Abstract: We present an approach to the automatic creation of extractive summaries of literary short stories. The summaries are produced with a specific objective in mind: to help a reader decide whether she would be interested in reading the complete story. To this end, the summaries give the user relevant information about the setting of the story without revealing its plot. The system relies on assorted surface indicators about clauses in the short story, the most important of which are those related to the aspectual type of a clause and to the main entities in a story. Fifteen judges evaluated the summaries on a number of extrinsic and intrinsic measures. The outcome of this evaluation suggests that the summaries are helpful in achieving the original objective.

58 citations


Journal ArticleDOI
TL;DR: A novel string-to-dependency algorithm for statistical machine translation that employs a target dependency language model during decoding to exploit long distance word relations, which cannot be modeled with a traditional n-gram language model.
Abstract: We propose a novel string-to-dependency algorithm for statistical machine translation. This algorithm employs a target dependency language model during decoding to exploit long distance word relations, which cannot be modeled with a traditional n-gram language model. Experiments show that the algorithm achieves significant improvement in MT performance over a state-of-the-art hierarchical string-to-string system on NIST MT06 and MT08 newswire evaluation sets.

58 citations


Journal ArticleDOI
TL;DR: A passage retrieval system that uses off-the-shelf retrieval technology with a re-ranking step incorporating structural information is extended, based on relatively lightweight overlap measures incorporating syntactic constituents, cue words, and document structure.
Abstract: While developing an approach to why-QA, we extended a passage retrieval system that uses off-the-shelf retrieval technology with a re-ranking step incorporating structural information. We get significantly higher scores in terms of MRR@150 (from 0.25 to 0.34) and success@10. The 23% improvement that we reach in terms of MRR is comparable to the improvement reached on different QA tasks by other researchers in the field, although our re-ranking approach is based on relatively lightweight overlap measures incorporating syntactic constituents, cue words, and document structure.

Journal ArticleDOI
TL;DR: Three modifications to the MT training data are presented to improve the accuracy of a state-of-the-art syntax MT system: re-structuring changes the syntactic structure of training parse trees to enable reuse of substructures; re-labeling alters bracket labels to enrich rule application context; and re-aligning unifies word alignment across sentences to remove bad word alignments and refine good ones.
Abstract: This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a state-of-the-art syntax MT system: re-structuring changes the syntactic structure of training parse trees to enable reuse of substructures; re-labeling alters bracket labels to enrich rule application context; and re-aligning unifies word alignment across sentences to remove bad word alignments and refine good ones. Better structures, labels, and word alignments are learned by the EM algorithm. We show that each individual technique leads to improvement as measured by BLEU, and we also show that the greatest improvement is achieved by combining them. We report an overall 1.48 BLEU improvement on the NIST08 evaluation set over a strong baseline in Chinese/English translation.

Journal ArticleDOI
TL;DR: A discriminative framework for word alignment based on a linear model that achieves state-of-the-art alignment quality on three word alignment shared tasks for five language pairs with varying divergence and richness of resources and improves translation performance for various statistical machine translation systems.
Abstract: Word alignment plays an important role in many NLP tasks as it indicates the correspondence between words in a parallel text. Although widely used to align large bilingual corpora, generative models are hard to extend to incorporate arbitrary useful linguistic information. This article presents a discriminative framework for word alignment based on a linear model. Within this framework, all knowledge sources are treated as feature functions, which depend on a source language sentence, a target language sentence, and the alignment between them. We describe a number of features that could produce symmetric alignments. Our model is easy to extend and can be optimized with respect to evaluation metrics directly. The model achieves state-of-the-art alignment quality on three word alignment shared tasks for five language pairs with varying divergence and richness of resources. We further show that our approach improves translation performance for various statistical machine translation systems.

Journal ArticleDOI
TL;DR: In this first study of novel blends, an accuracy of 40% is achieved on the task of inferring a blend's source words, which corresponds to a reduction in error rate of 39% over an informed baseline.
Abstract: Newly coined words pose problems for natural language processing systems because they are not in a system's lexicon, and therefore no lexical information is available for such words. A common way to form new words is lexical blending, as in cosmeceutical, a blend of cosmetic and pharmaceutical. We propose a statistical model for inferring a blend's source words drawing on observed linguistic properties of blends; these properties are largely based on the recognizability of the source words in a blend. We annotate a set of 1,186 recently coined expressions which includes 515 blends, and evaluate our methods on a 324-item subset. In this first study of novel blends we achieve an accuracy of 40% on the task of inferring a blend's source words, which corresponds to a reduction in error rate of 39% over an informed baseline. We also give preliminary results showing that our features for source word identification can be used to distinguish blends from other kinds of novel words.

Journal ArticleDOI
TL;DR: A multi-level approach to presenting user-tailored information in spoken dialogues is described which brings together for the first time multi-attribute decision models, strategic content planning, surface realization that incorporates prosody prediction, and unit selection synthesis that takes the resulting prosodic structure into account.
Abstract: Generating responses that take user preferences into account requires adaptation at all levels of the generation process. This article describes a multi-level approach to presenting user-tailored information in spoken dialogues which brings together for the first time multi-attribute decision models, strategic content planning, surface realization that incorporates prosody prediction, and unit selection synthesis that takes the resulting prosodic structure into account. The system selects the most important options to mention and the attributes that are most relevant to choosing between them, based on the user model. Multiple options are selected when each offers a compelling trade-off. To convey these trade-offs, the system employs a novel presentation strategy which straightforwardly lends itself to the determination of information structure, as well as the contents of referring expressions. During surface realization, the prosodic structure is derived from the information structure using Combinatory Categorial Grammar in a way that allows phrase boundaries to be determined in a flexible, data-driven fashion. This approach to choosing pitch accents and edge tones is shown to yield prosodic structures with significantly higher acceptability than baseline prosody prediction models in an expert evaluation. These prosodic structures are then shown to enable perceptibly more natural synthesis using a unit selection voice that aims to produce the target tunes, in comparison to two baseline synthetic voices. An expert evaluation and f0 analysis confirm the superiority of the generator-driven intonation and its contribution to listeners' ratings.

Journal ArticleDOI
TL;DR: This article uses the Posterior Regularization framework to incorporate complex constraints into probabilistic models during learning without changing the efficiency of the underlying model, and presents an efficient learning algorithm for incorporating approximate bijectivity and symmetry constraints.
Abstract: Word-level alignment of bilingual text is a critical resource for a growing variety of tasks. Probabilistic models for word alignment present a fundamental trade-off between richness of captured constraints and correlations versus efficiency and tractability of inference. In this article, we use the Posterior Regularization framework (Graca, Ganchev, and Taskar 2007) to incorporate complex constraints into probabilistic models during learning without changing the efficiency of the underlying model. We focus on the simple and tractable hidden Markov model, and present an efficient learning algorithm for incorporating approximate bijectivity and symmetry constraints. Models estimated with these constraints produce a significant boost in performance as measured by both precision and recall of manually annotated alignments for six language pairs. We also report experiments on two different tasks where word alignments are required: phrase-based machine translation and syntax transfer, and show promising improvements over standard methods.

Journal ArticleDOI
TL;DR: This model uses coarse-grained semantic classes for S internally and the effect of using different levels of granularity on WSD performance is explored, and its performance on noun disambiguation is better than most previously reported unsupervised systems and close to the best supervised systems.
Abstract: We introduce a generative probabilistic model, the noisy channel model, for unsupervised word sense disambiguation. In our model, each context C is modeled as a distinct channel through which the speaker intends to transmit a particular meaning S using a possibly ambiguous word W. To reconstruct the intended meaning the hearer uses the distribution of possible meanings in the given context P(S|C) and possible words that can express each meaning P(W|S). We assume P(W|S) is independent of the context and estimate it using WordNet sense frequencies. The main problem of unsupervised WSD is estimating context-dependent P(S|C) without access to any sense-tagged text. We show one way to solve this problem using a statistical language model based on large amounts of untagged text. Our model uses coarse-grained semantic classes for S internally and we explore the effect of using different levels of granularity on WSD performance. The system outputs fine-grained senses for evaluation, and its performance on noun disambiguation is better than most previously reported unsupervised systems and close to the best supervised systems.

Journal ArticleDOI
TL;DR: This article delimits what the focus of paraphrase extraction and coreference resolution tasks should be, and to what extent they can help each other, in terms of similarities and differences in their linguistic nature.
Abstract: By providing a better understanding of paraphrase and coreference in terms of similarities and differences in their linguistic nature, this article delimits what the focus of paraphrase extraction and coreference resolution tasks should be, and to what extent they can help each other. We argue for the relevance of this discussion to Natural Language Processing.

Journal ArticleDOI
TL;DR: This article introduces an alternative method of measuring the semantic distance between texts that integrates distributional information and ontological knowledge within a network flow formalism, and develops a new measure of semantic coherence that enables us to account for the performance difference across the three data sets.
Abstract: Many NLP applications entail that texts are classified based on their semantic distance (how similar or different the texts are). For example, comparing the text of a new document to that of documents of known topics can help identify the topic of the new text. Typically, a distributional distance is used to capture the implicit semantic distance between two pieces of text. However, such approaches do not take into account the semantic relations between words. In this article, we introduce an alternative method of measuring the semantic distance between texts that integrates distributional information and ontological knowledge within a network flow formalism. We first represent each text as a collection of frequency-weighted concepts within an ontology. We then make use of a network flow method which provides an efficient way of explicitly measuring the frequency-weighted ontological distance between the concepts across two texts. We evaluate our method in a variety of NLP tasks, and find that it performs well on two of three tasks. We develop a new measure of semantic coherence that enables us to account for the performance difference across the three data sets, shedding light on the properties of a data set that lends itself well to our method.

Journal ArticleDOI
Emiel Krahmer1
TL;DR: Language technology has been particularly successful for tasks where huge amounts of textual data is available to which statistical machine learning techniques can be applied, and mainstream computational linguistics is now a successful, application-oriented discipline which is particularly good at extracting information from sequences of words.
Abstract: Sometimes I am amazed by how much the field of computational linguistics haschanged in the past 15 to 20 years. In the mid-nineties, I was working in a researchinstitute where language and speech technologists worked in relatively close quarters.Speech technology seemed on the verge of a major breakthrough; this was around thetime that Bill Gates was quoted in Business Week as saying that speech was not justthe future of Windows, but the future of computing itself. At the same time, languagetechnology was, well, nowhere. Bill Gates certainly wasn’t championing language tech-nology in those days. And while the possible applications of speech technology seemedendless (who would use a keyboard in 2010, when speech-driven user interfaces wouldhave replaced traditional computers?) the language people were thinking hard aboutpossible applications for their admittedly somewhat immature technologies.Predicting the future is a tricky thing. No major breakthrough came for speechtechnology — I am still typing this. However, language technology did change almostbeyond recognition. Perhaps one of the main reasons for this has been the explosivegrowth of the internet, which helped language technology in two different ways. Onthe one hand it instigated the development and refinement of techniques needed forsearching in document collections of unprecedented size, on the other it resulted in alarge increase of freely available text data. Recently, language technology has been par-ticularly successful for tasks where huge amounts of textual data is available to whichstatistical machine learning techniques can be applied (Halevy, Norvig, and Pereira2009). As a result of these developments, mainstream computational linguistics is nowa successful, application-oriented discipline which is particularly good at extractinginformation from sequences of words.But there is more to language than that. For speakers, words are the result of acomplex speech production process; for listeners they are what starts off the similarlycomplex comprehension process. However, in many current applications no attentionis given to the processes by which words are produced nor to the processes by whichthey can be understood. Language is treated as a product not as a process, in theterminology of Clark (1996). In addition, we use language not only as a vehicle forfactual information exchange; speakers may have all sorts of other intentions with theirwords; they may want to convince others to do or buy something, they may want toinduce a particular emotion in the addressee etc. These days, most of computationallinguistics (with a few notable exceptions, more about which below) has little to say

Journal ArticleDOI
TL;DR: A more accurate description of the authors' work is presented, the straw man argument used in Sproat (2010) is pointed out, and a more complete characterization of the Indus script debate is provided.
Abstract: In a recent LastWords column (Sproat 2010), Richard Sproat laments the reviewing practices of “general science journals” after dismissing our work and that of Lee, Jonathan, and Ziman (2010) as “useless” and “trivially and demonstrably wrong.” Although we expect such categorical statements to have already raised some red flags in the minds of readers, we take this opportunity to present a more accurate description of our work, point out the straw man argument used in Sproat (2010), and provide a more complete characterization of the Indus script debate. A separate response by Lee and colleagues in this issue provides clarification of issues not covered here.

Journal ArticleDOI
TL;DR: Until recently nobody had argued that statistical techniques could be used to determine that a symbol system is linguistic, and it was therefore quite a surprise when a short article by Rajesh Rao of the University of Washington and colleagues at two appeared in Science.
Abstract: Few archaeological finds are as evocative as artifacts inscribed with symbols. Whenever an archaeologist finds a potsherd or a seal impression that seems to have symbols scratched or impressed on the surface, it is natural to want to “read” the symbols. And if the symbols come from an undeciphered or previously unknown symbol system it is common to ask what language the symbols supposedly represent and whether the system can be deciphered. Of course the first question that really should be asked is whether the symbols are in fact writing. A writing system, as linguists usually define it, is a symbol system that is used to represent language. Familiar examples are alphabets such as the Latin, Greek, Cyrillic or Hangul alphabets, alphasyllabaries such as Devanagari or Tamil, syllabaries such as Cherokee or Kana, and morphosyllabic systems like Chinese characters. But symbol systems that do not encode language abound: European heraldry, mathematical notation, labanotation (used to represent dance), and boy scout merit badges are all examples of symbol systems that represent things, but do not function as part of a system that represents language. Whether an unknown system is writing or not is a difficult question to answer. It can only be answered definitively in the affirmative if one can develop a verifiable decipherment into some language or languages. Statistical techniques have been used in decipherment for years, but these have always been used under the assumption that the system one is dealing with is writing, and the techniques are used to uncover patterns or regularities that might aid in the decipherment. Patterns of symbol distribution might suggest that a symbol system is not linguistic: for example, odd repetition patterns might make it seem that a symbol system is unlikely to be writing. But until recently nobody had argued that statistical techniques could be used to determine that a system is linguistic.1 It was therefore quite a surprise when, in April 2009, there appeared in Science a short article by Rajesh Rao of the University of Washington and colleagues at two


Journal ArticleDOI
TL;DR: This article investigates whether the Brown et al. (1993) word alignment algorithm makes search errors when it computes Viterbi alignments, that is, whether it returns alignments that are sub-optimal according to a trained model.
Abstract: Word alignment is a critical procedure within statistical machine translation (SMT). Brown et al. (1993) have provided the most popular word alignment algorithm to date, one that has been implemented in the GIZA (Al-Onaizan et al., 1999) and GIZA++ (Och and Ney 2003) software and adopted by nearly every SMT project. In this article, we investigate whether this algorithm makes search errors when it computes Viterbi alignments, that is, whether it returns alignments that are sub-optimal according to a trained model.

Journal ArticleDOI
TL;DR: A computer simulation shows that as the text size continues to increase, the hapax/vocabulary ratio would approach 1.0, and a computer simulation reveals that initially, as the size of text increases, this ratio decreases; however, after the textsize reaches about 3,000,000 words, it starts to increase steadily.
Abstract: In the known literature, hapax legomena in an English text or a collection of texts roughly account for about 50% of the vocabulary. This sort of constancy is baffling. The 100-million-word British National Corpus was used to study this phenomenon. The result reveals that the hapax/vocabulary ratio follows a U-shaped pattern. Initially, as the size of text increases, the hapax/vocabulary ratio decreases; however, after the text size reaches about 3,000,000 words, the hapax/vocabulary ratio starts to increase steadily. A computer simulation shows that as the text size continues to increase, the hapax/vocabulary ratio would approach 1.

Journal ArticleDOI
TL;DR: This book is a textbook on computational linguistics for science and engineering students; it also serves as practical documentation for the NLTK library, and it finally attempts to provide an introduction to programming and algorithm design for humanities students.

Journal ArticleDOI
TL;DR: I hope you’ll find an appreciation for some of the ideas and where they came from, but also a trajectory that continues forward and suggests some solutions to problems not yet solved in this speech.
Abstract: Good morning. I want to thank the ACL for awarding me the 2010 Lifetime Achievement Award. I’m honored to be included in the ranks of my respected colleagues who have received this award previously. I want to talk to you this morning about the evolution of some ideas that I think are important, with a little bit of historical and biographical context thrown in. I hope you’ll find in what I say not only an appreciation for some of the ideas and where they came from, but also a trajectory that continues forward and suggests some solutions to problems not yet solved.

Journal ArticleDOI
TL;DR: In his article "Ancient symbols and computational linguistics" (Sproat 2010), Professor Sproat raised two concerns over a method: first, that the method is unable to detect random but non-equiprobable systems; and second, that it misclassifies kudurru texts.
Abstract: In his article "Ancient symbols and computational linguistics" (Sproat 2010), Professor Sproat raised two concerns over a method that we have proposed for analyzing small data sets of symbols using entropy (Lee, Jonathan, and Ziman 2010): first, that the method is unable to detect random but non-equiprobable systems; and second, that it misclassifies kudurru texts.W e address these concerns in the following response. © 2010 Association for Computational Linguistics.

Journal ArticleDOI
TL;DR: This article presents a parsing algorithm that improves on the baseline parsing method and runs in polynomial time when both the fan-out and rank of the input grammar are bounded, and offers an optimal, efficient algorithm for factorizing a grammar to produce a strongly equivalent TL-MCTAG grammar with theRank minimized.
Abstract: Tree-Local Multi-Component Tree-Adjoining Grammar (TL-MCTAG) is an appealing formalism for natural language representation because it arguably allows the encapsulation of the appropriate domain of locality within its elementary structures. Its multicomponent structure allows modeling of lexical items that may ultimately have elements far apart in a sentence, such as quantifiers and wh-words. When used as the base formalism for a synchronous grammar, its flexibility allows it to express both the close relationships and the divergent structure necessary to capture the links between the syntax and semantics of a single language or the syntax of two different languages. Its limited expressivity provides constraints on movement and, we posit, may have generated additional popularity based on a misconception about its parsing complexity. Although TL-MCTAG was shown to be equivalent in expressivity to TAG when it was first introduced, the complexity of TL-MCTAG is still not well understood. This article offers a thorough examination of the problem of TL-MCTAG recognition, showing that even highly restricted forms of TL-MCTAG are NP-complete to recognize. However, in spite of the provable difficulty of the recognition problem, we offer several algorithms that can substantially improve processing efficiency. First, we present a parsing algorithm that improves on the baseline parsing method and runs in polynomial time when both the fan-out and rank of the input grammar are bounded. Second, we offer an optimal, efficient algorithm for factorizing a grammar to produce a strongly equivalent TL-MCTAG grammar with the rank of the grammar minimized.