scispace - formally typeset
Search or ask a question

Showing papers by "Mark Johnson published in 2011"


Journal ArticleDOI
TL;DR: This work presents a framework for developing statistical models that can generically produce power laws, breaking generative models into two stages, and discusses two stochastic processes that can be used as adaptors in this framework to produce power-law distributions over word frequencies.
Abstract: Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that can generically produce power laws, breaking generative models into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes---the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process---that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.

86 citations


Proceedings Article
27 Jul 2011
TL;DR: This paper shows that the grounded task of learning a semantic parser from ambiguous training data can be reduced to a Probabilistic Context-Free Grammar learning task in a way that gives state of the art results.
Abstract: It is often assumed that 'grounded' learning tasks are beyond the scope of grammatical inference techniques. In this paper, we show that the grounded task of learning a semantic parser from ambiguous training data as discussed in Kim and Mooney (2010) can be reduced to a Probabilistic Context-Free Grammar learning task in a way that gives state of the art results. We further show that additionally letting our model learn the language's canonical word order improves its performance and leads to the highest semantic parsing f-scores previously reported in the literature.

48 citations


Proceedings Article
19 Jun 2011
TL;DR: It is shown that language models trained on large amounts of non-speech data improve performance more than a language model trained on a more modest amount of speech data, and that optimising f-score rather than log loss improves disfluency detection performance.
Abstract: Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance, any such disfluencies must be identified and removed or otherwise dealt with. Operating on transcripts of speech which contain disfluencies, we study the effect of language model and loss function on the performance of a linear reranker that rescores the 25-best output of a noisy-channel model. We show that language models trained on large amounts of non-speech data improve performance more than a language model trained on a more modest amount of speech data, and that optimising f-score rather than log loss improves disfluency detection performance. Our approach uses a log-linear reranker, operating on the top n analyses of a noisy channel model. We use large language models, introduce new features into this reranker and examine different optimisation strategies. We obtain a disfluency detection f-scores of 0.838 which improves upon the current state-of-the-art.

41 citations


01 Dec 2011
TL;DR: A novel online algorithm for the word segmentation models of Goldwater et al. (2009) which is, to the authors' knowledge, the first published version of a Particle Filter for this kind of model and comes with a theoretical guarantee of optimality if the number of particles goes to infinity.
Abstract: Bayesian models are usually learned using batch algorithms that have to iterate multiple times over the full dataset. This is both computationally expensive and, from a cognitive point of view, highly implausible. We present a novel online algorithm for the word segmentation models of Goldwater et al. (2009) which is, to our knowledge, the first published version of a Particle Filter for this kind of model. Also, in contrast to other proposed algorithms, it comes with a theoretical guarantee of optimality if the number of particles goes to infinity. While this is, of course, a theoretical point, a first experimental evaluation of our algorithm shows that, as predicted, its performance improves with the use of more particles, and that it performs competitively with other online learners proposed in Pearl et al. (2011).1

32 citations


01 Dec 2011
TL;DR: This paper investigates whether predictors derived from Latent Semantic Analysis, language models, and Roark’s parser are significant in modeling of the N400m and shows that predictors based on the 4-gram language model and the pairwise-priming language model are highly correlated with the manual annotation of contextual plausibility.
Abstract: The N400 is a human neuroelectric response to semantic incongruity in on-line sentence processing, and implausibility in context has been identified as one of the factors that influence the size of the N400. In this paper we investigate whether predictors derived from Latent Semantic Analysis, language models, and Roark’s parser are significant in modeling of the N400m (the neuromagnetic version of the N400). We also investigate significance of a novel pairwise-priming language model based on the IBM Model 1 translation model. Our experiments show that all the predictors are significant. Moreover, we show that predictors based on the 4-gram language model and the pairwise-priming language model are highly correlated with the manual annotation of contextual plausibility, suggesting that these predictors are capable of playing the same role as the manual annotations in prediction of the N400m response. We also show that the proposed predictors can be grouped into two clusters of significant predictors, suggesting that each cluster is capturing a different characteristic of the N400m response.

23 citations


01 Dec 2011
TL;DR: This paper examines the ways in which parallelism can be used to speed the parsing of dense PCFGs, focusing on two kinds of parallelism here: Symmetric Multi-Processing (SMP) parallelism on shared-memory multicore CPUs, and Single-Instruction MultipleThread (SIMT) Parallelism on GPUs.
Abstract: This paper examines the ways in which parallelism can be used to speed the parsing of dense PCFGs. We focus on two kinds of parallelism here: Symmetric Multi-Processing (SMP) parallelism on shared-memory multicore CPUs, and Single-Instruction MultipleThread (SIMT) parallelism on GPUs. We describe how to achieve speed-ups over an already very efficient baseline parser using both kinds of technology. For our dense PCFG parsing task we obtained a 60×speed-up using SMP and SSE parallelism coupled with a cache-sensitive algorithm design, parsing section 24 of the Penn WSJ treebank in a little over 2 secs.

18 citations


01 Dec 2011
TL;DR: This paper looks at the idea of using Latent Dirichlet Allocation as a feature clustering technique over lexical features to see whether there is any evidence that smaller-scale features do cluster into more coherent latent factors, and investigates their effect in a classification task.
Abstract: Native language identification (NLI) is the task of determining the native language of an author writing in a second language. Several pieces of earlier work have found that features such as function words, part-of-speech n-grams and syntactic structure are helpful in NLI, perhaps representing characteristic errors of different native language speakers. This paper looks at the idea of using Latent Dirichlet Allocation as a feature clustering technique over lexical features to see whether there is any evidence that these smaller-scale features do cluster into more coherent latent factors, and investigates their effect in a classification task. We find that although (not unexpectedly) classification accuracy decreases, there is some evidence of coherent clustering, which could help with much larger syntactic feature spaces.

17 citations


Journal ArticleDOI
TL;DR: The evolution of my own research in statistical parsing lead me away from focusing on any specific linguistic theory, and to concentrate instead on discovering which types of information are important for specific linguistic processes, rather than on the details of exactly how this information should be formalised.
Abstract: I start by explaining what I take computational linguistics to be, and discuss the relationship between its scientific side and its en- gineering applications. Statistical techniques have revolutionised many scientific fields in the past two decades, including computational linguistics. I describe the evolution of my own research in statistical parsing and how that lead me away from focusing on the details of any specific linguistic theory, and to concentrate instead on discovering which types of information (i.e., features) are important for specific linguistic processes, rather than on the details of exactly how this information should be formalised. I end by describing some of the ways that ideas from computational linguistics, statistics and machine learning may have an impact on linguistics in the future.

12 citations


Journal ArticleDOI
TL;DR: This special topic contains several papers presenting some of recent developments in the area of grammar induction and language learning, as applied to various problems in Natural Language Processing, including supervised and unsupervised parsing and statistical machine translation.
Abstract: Grammar induction refers to the process of learning grammars and languages from data; this finds a variety of applications in syntactic pattern recognition, the modeling of natural language acquisition, data mining and machine translation. This special topic contains several papers presenting some of recent developments in the area of grammar induction and language learning, as applied to various problems in Natural Language Processing, including supervised and unsupervised parsing and statistical machine translation.

12 citations


Proceedings Article
05 Jul 2011
TL;DR: Experimental results show that the proposed algorithm provides significantly improved orientation estimates in the presence of magnetic anomalies and sensor movement, as well as increased stability and reduced noise compared the original CKF algorithm.
Abstract: Localization in outdoor environments is widespread, providing improved functionality and benefits for many applications. Extending this ability to in-building environments is a critical prerequisite for many location aware services. Low cost inertial sensors have the potential to improve the performance of new and existing localization systems. A challenging problem in inertial navigation is the accurate tracking of orientation. In this paper we propose a new orientation tracking algorithm that uses a complementary Kalman filter (CKF) for obtaining a reference orientation. A feedback loop is developed, which estimates the gyro bias from the CKF reference. The final orientation is the direct integration of the gyro, after removing the estimated bias. Experimental results show that the proposed algorithm provides significantly improved orientation estimates in the presence of magnetic anomalies and sensor movement, as well as increased stability and reduced noise compared the original CKF algorithm. Position calculations, using a zero-velocity update process, demonstrate the advantages of the enhanced orientation estimate as clear improvements in consequent localization.

7 citations


01 Dec 2011
TL;DR: This paper introduces tree transducers as a unifying theory for semantic parsing models based on tree transformations as well as a variant of the inside-outside algorithm with variational Bayesian estimation that achieves higher raw accuracy than existing generative and discriminative approaches on a standard data set.
Abstract: This paper introduces tree transducers as a unifying theory for semantic parsing models based on tree transformations. Many existing models use tree transformations, but implement specialized training and smoothing methods, which makes it difficult to modify or extend the models. By connecting to the rich literature on tree automata, we show how semantic parsing models can be developed using completely general estimation methods. We demonstrate the approach by reframing and extending one state-of-the-art model as a tree automaton. Using a variant of the inside-outside algorithm with variational Bayesian estimation, our generative model achieves higher raw accuracy than existing generative and discriminative approaches on a standard data set.

01 Jan 2011
TL;DR: 2-year-olds’ prosodic organization of function words in an elicited imitation task showed that the function word was produced as an independent prosodic unit, in contrast to the adult model being imitated.
Abstract: English-speaking children have acquired phrasefinal lengthening by the age of 2, but other aspects of prosodic organization appear to be later acquired. This study investigated 2-year-olds’ prosodic organization of function words in an elicited imitation task. In particular, we wanted to know if children would prosodify pronouns 1) as part of a trochaic foot with the preceding word, or 2) as a separate prosodic word. The results showed that the function word was produced as an independent prosodic unit, in contrast to the adult model being imitated. Implications for a developmental model of speech planning and production are discussed.