Showing papers by "Mark Johnson published in 2011"

PDF

Open Access

Journal Article•DOI•

Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models

[...]

Sharon Goldwater, Thomas L. Griffiths, Mark Johnson

01 Feb 2011-Journal of Machine Learning Research

TL;DR: This work presents a framework for developing statistical models that can generically produce power laws, breaking generative models into two stages, and discusses two stochastic processes that can be used as adaptors in this framework to produce power-law distributions over word frequencies.

...read moreread less

Abstract: Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that can generically produce power laws, breaking generative models into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes---the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process---that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.

...read moreread less

86 citations

Proceedings Article•

Reducing Grounded Learning Tasks To Grammatical Inference

[...]

Benjamin Börschinger¹, Bevan Jones², Mark Johnson¹•Institutions (2)

Macquarie University¹, University of Edinburgh²

27 Jul 2011

TL;DR: This paper shows that the grounded task of learning a semantic parser from ambiguous training data can be reduced to a Probabilistic Context-Free Grammar learning task in a way that gives state of the art results.

...read moreread less

Abstract: It is often assumed that 'grounded' learning tasks are beyond the scope of grammatical inference techniques. In this paper, we show that the grounded task of learning a semantic parser from ambiguous training data as discussed in Kim and Mooney (2010) can be reduced to a Probabilistic Context-Free Grammar learning task in a way that gives state of the art results. We further show that additionally letting our model learn the language's canonical word order improves its performance and leads to the highest semantic parsing f-scores previously reported in the literature.

...read moreread less

48 citations

Proceedings Article•

The impact of language models and loss functions on repair disfluency detection

[...]

Simon Zwarts¹, Mark Johnson¹•Institutions (1)

Macquarie University¹

19 Jun 2011

TL;DR: It is shown that language models trained on large amounts of non-speech data improve performance more than a language model trained on a more modest amount of speech data, and that optimising f-score rather than log loss improves disfluency detection performance.

...read moreread less

Abstract: Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance, any such disfluencies must be identified and removed or otherwise dealt with. Operating on transcripts of speech which contain disfluencies, we study the effect of language model and loss function on the performance of a linear reranker that rescores the 25-best output of a noisy-channel model. We show that language models trained on large amounts of non-speech data improve performance more than a language model trained on a more modest amount of speech data, and that optimising f-score rather than log loss improves disfluency detection performance. Our approach uses a log-linear reranker, operating on the top n analyses of a noisy channel model. We use large language models, introduce new features into this reranker and examine different optimisation strategies. We obtain a disfluency detection f-scores of 0.838 which improves upon the current state-of-the-art.

...read moreread less

41 citations

A Particle Filter algorithm for Bayesian Wordsegmentation

[...]

Benjamin B"orschinger¹, Mark Johnson¹•Institutions (1)

Macquarie University¹

01 Dec 2011

TL;DR: A novel online algorithm for the word segmentation models of Goldwater et al. (2009) which is, to the authors' knowledge, the first published version of a Particle Filter for this kind of model and comes with a theoretical guarantee of optimality if the number of particles goes to infinity.

...read moreread less

Abstract: Bayesian models are usually learned using batch algorithms that have to iterate multiple times over the full dataset. This is both computationally expensive and, from a cognitive point of view, highly implausible. We present a novel online algorithm for the word segmentation models of Goldwater et al. (2009) which is, to our knowledge, the first published version of a Particle Filter for this kind of model. Also, in contrast to other proposed algorithms, it comes with a theoretical guarantee of optimality if the number of particles goes to infinity. While this is, of course, a theoretical point, a first experimental evaluation of our algorithm shows that, as predicted, its performance improves with the use of more particles, and that it performs competitively with other online learners proposed in Pearl et al. (2011).1

...read moreread less

32 citations

Using Language Models and Latent Semantic Analysis to Characterise the N400m Neural Response

[...]

Mehdi Parviz, Mark Johnson, Blake W. Johnson, Jon Brock

01 Dec 2011

TL;DR: This paper investigates whether predictors derived from Latent Semantic Analysis, language models, and Roark’s parser are significant in modeling of the N400m and shows that predictors based on the 4-gram language model and the pairwise-priming language model are highly correlated with the manual annotation of contextual plausibility.

...read moreread less

Abstract: The N400 is a human neuroelectric response to semantic incongruity in on-line sentence processing, and implausibility in context has been identified as one of the factors that influence the size of the N400. In this paper we investigate whether predictors derived from Latent Semantic Analysis, language models, and Roark’s parser are significant in modeling of the N400m (the neuromagnetic version of the N400). We also investigate significance of a novel pairwise-priming language model based on the IBM Model 1 translation model. Our experiments show that all the predictors are significant. Moreover, we show that predictors based on the 4-gram language model and the pairwise-priming language model are highly correlated with the manual annotation of contextual plausibility, suggesting that these predictors are capable of playing the same role as the manual annotations in prediction of the N400m response. We also show that the proposed predictors can be grouped into two clusters of significant predictors, suggesting that each cluster is capturing a different characteristic of the N400m response.

...read moreread less

23 citations

Parsing in Parallel on Multiple Cores and GPUs

[...]

Mark Johnson

01 Dec 2011

TL;DR: This paper examines the ways in which parallelism can be used to speed the parsing of dense PCFGs, focusing on two kinds of parallelism here: Symmetric Multi-Processing (SMP) parallelism on shared-memory multicore CPUs, and Single-Instruction MultipleThread (SIMT) Parallelism on GPUs.

...read moreread less

Abstract: This paper examines the ways in which parallelism can be used to speed the parsing of dense PCFGs. We focus on two kinds of parallelism here: Symmetric Multi-Processing (SMP) parallelism on shared-memory multicore CPUs, and Single-Instruction MultipleThread (SIMT) parallelism on GPUs. We describe how to achieve speed-ups over an already very efficient baseline parser using both kinds of technology. For our dense PCFG parsing task we obtained a 60×speed-up using SMP and SSE parallelism coupled with a cache-sensitive algorithm design, parsing section 24 of the Penn WSJ treebank in a little over 2 secs.

...read moreread less

18 citations

Topic Modeling for Native Language Identification

[...]

Sze-Meng Jojo Wong, Mark Dras, Mark Johnson

01 Dec 2011

TL;DR: This paper looks at the idea of using Latent Dirichlet Allocation as a feature clustering technique over lexical features to see whether there is any evidence that smaller-scale features do cluster into more coherent latent factors, and investigates their effect in a classification task.

...read moreread less

Abstract: Native language identification (NLI) is the task of determining the native language of an author writing in a second language. Several pieces of earlier work have found that features such as function words, part-of-speech n-grams and syntactic structure are helpful in NLI, perhaps representing characteristic errors of different native language speakers. This paper looks at the idea of using Latent Dirichlet Allocation as a feature clustering technique over lexical features to see whether there is any evidence that these smaller-scale features do cluster into more coherent latent factors, and investigates their effect in a classification task. We find that although (not unexpectedly) classification accuracy decreases, there is some evidence of coherent clustering, which could help with much larger syntactic feature spaces.

...read moreread less

17 citations

Journal Article•DOI•

How relevant is linguistics to computational linguistics

[...]

Mark Johnson

16 Nov 2011-Linguistic Issues in Language Technology

TL;DR: The evolution of my own research in statistical parsing lead me away from focusing on any specific linguistic theory, and to concentrate instead on discovering which types of information are important for specific linguistic processes, rather than on the details of exactly how this information should be formalised.

...read moreread less

Abstract: I start by explaining what I take computational linguistics to be, and discuss the relationship between its scientific side and its en- gineering applications. Statistical techniques have revolutionised many scientific fields in the past two decades, including computational linguistics. I describe the evolution of my own research in statistical parsing and how that lead me away from focusing on the details of any specific linguistic theory, and to concentrate instead on discovering which types of information (i.e., features) are important for specific linguistic processes, rather than on the details of exactly how this information should be formalised. I end by describing some of the ways that ideas from computational linguistics, statistics and machine learning may have an impact on linguistics in the future.

...read moreread less

12 citations

Journal Article•DOI•

Introduction to the Special Topic on Grammar Induction, Representation of Language and Language Learning

[...]

Dorota Glowacka¹, John Shawe-Taylor¹, Alexander Clark², Colin de la Higuera³, Mark Johnson - Show less +1 more•Institutions (3)

University College London¹, Royal Holloway, University of London², Centre national de la recherche scientifique³

01 Feb 2011-Journal of Machine Learning Research

TL;DR: This special topic contains several papers presenting some of recent developments in the area of grammar induction and language learning, as applied to various problems in Natural Language Processing, including supervised and unsupervised parsing and statistical machine translation.

...read moreread less

Abstract: Grammar induction refers to the process of learning grammars and languages from data; this finds a variety of applications in syntactic pattern recognition, the modeling of natural language acquisition, data mining and machine translation. This special topic contains several papers presenting some of recent developments in the area of grammar induction and language learning, as applied to various problems in Natural Language Processing, including supervised and unsupervised parsing and statistical machine translation.

...read moreread less

12 citations

Proceedings Article•

Improved orientation estimation in complex environments using low-cost inertial sensors

[...]

Mark Johnson¹, T. Sathyan¹•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

05 Jul 2011

TL;DR: Experimental results show that the proposed algorithm provides significantly improved orientation estimates in the presence of magnetic anomalies and sensor movement, as well as increased stability and reduced noise compared the original CKF algorithm.

...read moreread less

Abstract: Localization in outdoor environments is widespread, providing improved functionality and benefits for many applications. Extending this ability to in-building environments is a critical prerequisite for many location aware services. Low cost inertial sensors have the potential to improve the performance of new and existing localization systems. A challenging problem in inertial navigation is the accurate tracking of orientation. In this paper we propose a new orientation tracking algorithm that uses a complementary Kalman filter (CKF) for obtaining a reference orientation. A feedback loop is developed, which estimates the gyro bias from the CKF reference. The final orientation is the direct integration of the gyro, after removing the estimated bias. Experimental results show that the proposed algorithm provides significantly improved orientation estimates in the presence of magnetic anomalies and sensor movement, as well as increased stability and reduced noise compared the original CKF algorithm. Position calculations, using a zero-velocity update process, demonstrate the advantages of the enhanced orientation estimate as clear improvements in consequent localization.

...read moreread less

7 citations

Formalizing Semantic Parsing with Tree Transducers

[...]

Bevan Jones, Mark Johnson, Sharon Goldwater¹•Institutions (1)

University of Edinburgh¹

01 Dec 2011

TL;DR: This paper introduces tree transducers as a unifying theory for semantic parsing models based on tree transformations as well as a variant of the inside-outside algorithm with variational Bayesian estimation that achieves higher raw accuracy than existing generative and discriminative approaches on a standard data set.

...read moreread less

Abstract: This paper introduces tree transducers as a unifying theory for semantic parsing models based on tree transformations. Many existing models use tree transformations, but implement specialized training and smoothing methods, which makes it difficult to modify or extend the models. By connecting to the rich literature on tree automata, we show how semantic parsing models can be developed using completely general estimation methods. We demonstrate the approach by reframing and extending one state-of-the-art model as a tree automaton. Using a variant of the inside-outside algorithm with variational Bayesian estimation, our generative model achieves higher raw accuracy than existing generative and discriminative approaches on a standard data set.

...read moreread less

Prosodic structure in child speech planning and production

[...]

Ivan Yuen¹, Katherine Demuth¹, Mark Johnson•Institutions (1)

Macquarie University¹

01 Jan 2011

TL;DR: 2-year-olds’ prosodic organization of function words in an elicited imitation task showed that the function word was produced as an independent prosodic unit, in contrast to the adult model being imitated.

...read moreread less

Abstract: English-speaking children have acquired phrasefinal lengthening by the age of 2, but other aspects of prosodic organization appear to be later acquired. This study investigated 2-year-olds’ prosodic organization of function words in an elicited imitation task. In particular, we wanted to know if children would prosodify pronouns 1) as part of a trochaic foot with the preceding word, or 2) as a separate prosodic word. The results showed that the function word was produced as an independent prosodic unit, in contrast to the adult model being imitated. Implications for a developmental model of speech planning and production are discussed.

...read moreread less