scispace - formally typeset
Search or ask a question

Showing papers by "Mark Johnson published in 2015"


Proceedings ArticleDOI
01 Sep 2015
TL;DR: A new set of non-monotonic transitions is described that permits a partial parse state to derive a larger set of completed parse trees than previous work, which allows such a parser to escape the “garden paths” that can trap monotonic greedy transition-based dependency parsers.
Abstract: Transition-based dependency parsers usually use transition systems that monotonically extend partial parse states until they identify a complete parse tree. Honnibal et al. (2013) showed that greedy onebest parsing accuracy can be improved by adding additional non-monotonic transitions that permit the parser to “repair” earlier parsing mistakes by “over-writing” earlier parsing decisions. This increases the size of the set of complete parse trees that each partial parse state can derive, enabling such a parser to escape the “garden paths” that can trap monotonic greedy transition-based dependency parsers. We describe a new set of non-monotonic transitions that permits a partial parse state to derive a larger set of completed parse trees than previous work, which allows our parser to escape from a larger set of garden paths. A parser with our new nonmonotonic transition system has 91.85% directed attachment accuracy, an improvement of 0.6% over a comparable parser using the standard monotonic arc-eager transitions.

557 citations


Journal ArticleDOI
TL;DR: This article extended two Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus.
Abstract: Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.

276 citations


01 Dec 2015
TL;DR: It is found that the removing all words except nouns improved the topics’ semantic coherence and the model training times are faster when reducing the articles to the nouns only.
Abstract: This study compared three topic models trained on three versions of a news corpus. The first model was generated from the raw news corpus, the second was generated from the lemmatised version of the news corpus, and the third model was generated from the lemmatised news corpus reduced to nouns only. We found that the removing all words except nouns improved the topics’ semantic coherence. Using the measures developed by Lau et al (2014), the average observed topic coherence improved 6% and the average word intrusion detection improved 8% for the noun only corpus, compared to modelling the raw corpus. Similar improvements on these measures were obtained by simply lemmatising the news corpus, however, the model training times are faster when reducing the articles to the nouns only.

63 citations


Book ChapterDOI
26 Mar 2015
TL;DR: This article reviews the main classes of probabilistic grammars and points to some active areas of research.
Abstract: Formal grammars are widely used in speech recognition, language translation, and language understanding systems. Grammars rich enough to accommodate natural language generate multiple interpretations of typical sentences. These ambiguities are a fundamental challenge to practical application. Grammars can be equipped with probability distributions, and the various parameters of these distributions can be estimated from data (e.g., acoustic representations of spoken words or a corpus of hand-parsed sentences). The resulting probabilistic grammars help to interpret spoken or written language unambiguously. This article reviews the main classes of probabilistic grammars and points to some active areas of research.

38 citations


Proceedings Article
25 Jan 2015
TL;DR: A simple topic model is presented that uses generalised Mallows models and incomplete topic orderings to incorporate this ordering regularity into the probabilistic generative process of the new model and is reparameterised so that a point-wise sampling algorithm from the Bayesian word segmentation literature can be used for inference.
Abstract: Documents from the same domain usually discuss similar topics in a similar order. However, the number of topics and the exact topics discussed in each individual document can vary. In this paper we present a simple topic model that uses generalised Mallows models and incomplete topic orderings to incorporate this ordering regularity into the probabilistic generative process of the new model. We show how to reparameterise the new model so that a point-wise sampling algorithm from the Bayesian word segmentation literature can be used for inference. This algorithm jointly samples not only the topic orders and the topic assignments but also topic segmentations of documents. Experimental results show that our model performs significantly better than the other ordering-based topic models on nearly all the corpora that we used, and competitively with other state-of-the-art topic segmentation models on corpora that have a strong ordering regularity.

19 citations


Proceedings ArticleDOI
01 May 2015
TL;DR: This work describes a new algorithm for incremental transition-based Combinatory Categorial Grammar parsing and introduces two new actions in the shift-reduce paradigm based on the idea of ‘revealing’ the required information during parsing.
Abstract: Incremental parsers have potential advantages for applications like language modeling for machine translation and speech recognition. We describe a new algorithm for incremental transition-based Combinatory Categorial Grammar parsing. As English CCGbank derivations are mostly right branching and non-incremental, we design our algorithm based on the dependencies resolved rather than the derivation. We introduce two new actions in the shift-reduce paradigm based on the idea of ‘revealing’ (Pareschi and Steedman, 1987) the required information during parsing. On the standard CCGbank test data, our algorithm achieved improvements of 0.88% in labeled and 2.0% in unlabeled F-score over a greedy non-incremental shift-reduce parser.

16 citations


01 Dec 2015
TL;DR: A simple and effective way of incorporating FreeBase’s notable types into a state-of-the-art relation extraction system and results show that the notable type information improves relation extraction more than NER labels alone across a wide range of entity types and relations.
Abstract: Relation extraction is the task of extracting predicate-argument relationships between entities from natural language text. This paper investigates whether background information about entities available in knowledge bases such as FreeBase can be used to improve the accuracy of a state-of-the-art relation extraction system. We describe a simple and effective way of incorporating FreeBase’s notable types into a state-of-the-art relation extraction system (Riedel et al., 2013). Experimental results show that our notable typebased system achieves an average 7.5% weighted MAP score improvement. To understand where the notable type information contributes the most, we perform a series of ablation experiments. Results show that the notable type information improves relation extraction more than NER labels alone across a wide range of entity types and relations.

15 citations


01 Dec 2015
TL;DR: A new approach is presented by incorporating word vectors to directly optimize the maximum a posteriori (MAP) estimation in a topic model to improve the assignments of topics to words.
Abstract: Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature word vectors have been used to obtain high performance in many natural language processing (NLP) tasks. In this paper, we present a new approach by incorporating word vectors to directly optimize the maximum a posteriori (MAP) estimation in a topic model. Preliminary results show that the word vectors induced from the experimental corpus can be used to improve the assignments of topics to words.

11 citations


Proceedings ArticleDOI
01 Jan 2015
TL;DR: A joint model of word segmentation and phonological alternations, which takes unsegmented utterances as input and infers word segmentations and underlying phonological representations, which sets a new state-of- the-art for this corpus forword segmentation, identification of underlying forms, and identi- fication of /d/ and /t/ deletions.
Abstract: This paper describes a joint model of word segmentation and phonological alternations, which takes unsegmented utterances as input and infers word segmentations and underlying phonological representations. The model is a Maximum Entropy or log-linear model, which can express a probabilistic version of Opti- mality Theory (OT; Prince and Smolensky (2004)), a standard phonological framework. The features in our model are inspired by OT's Markedness and Faithfulness constraints. Fol- lowing the OT principle that such features in- dicate "violations", we require their weights to be non-positive. We apply our model to a modified version of the Buckeye corpus (Pitt et al., 2007) in which the only phonological alternations are deletions of word-final /d/ and /t/ segments. The model sets a new state-of- the-art for this corpus for word segmentation, identification of underlying forms, and identi- fication of /d/ and /t/ deletions. We also show that the OT-inspired sign constraints on fea- ture weights are crucial for accurate identifi- cation of deleted /d/s; without them our model posits approximately 10 times more deleted underlying /d/s than appear in the manually annotated data.

9 citations


Journal ArticleDOI
TL;DR: Change in clustering and classification results due to the dmm and lf-dmm bugs.

5 citations




Proceedings ArticleDOI
01 Jul 2015
TL;DR: This paper presents an efficient reformulation of the Adaptor Grammar-based topical collocation model (AG-colloc), and develops a point-wise sampling algorithm for posterior inference in this new formulation, and improves the efficiency of the sampling algorithm by exploiting sparsity and parallelising inference.
Abstract: Most existing topic models make the bagof-words assumption that words are generated independently, and so ignore potentially useful information about word order. Previous attempts to use collocations (short sequences of adjacent words) in topic models have either relied on a pipeline approach, restricted attention to bigrams, or resulted in models whose inference does not scale to large corpora. This paper studies how to simultaneously learn both collocations and their topic assignments. We present an efficient reformulation of the Adaptor Grammar-based topical collocation model (AG-colloc) (Johnson, 2010), and develop a point-wise sampling algorithm for posterior inference in this new formulation. We further improve the efficiency of the sampling algorithm by exploiting sparsity and parallelising inference. Experimental results derived in text classification, information retrieval and human evaluation tasks across a range of datasets show that this reformulation scales to hundreds of thousands of documents while maintaining the good performance of the AG-colloc model.

Proceedings Article
01 Nov 2015
TL;DR: The transponder is formed from two planar microstrip patch antennas and a frequency multiplier circuit, and its minute size and weight make it an attractive option for tracking small objects or animals.
Abstract: A harmonic transponder design suitable for microsensing systems is presented. The transponder is a passive device and operates at millimeter-wave frequencies to reduce its size and weight. The receive band is 38–38.5GHz and the transmit band is 76–77GHz. The transponder is formed from two planar microstrip patch antennas and a frequency multiplier circuit, and its minute size and weight make it an attractive option for tracking small objects or animals.

01 Dec 2015
TL;DR: This paper studies the POS-dependent morphological segmentation in the Adaptor Grammars framework and shows that the segmentation F1-score improves when the tags are used, and that the gold-standard tags lead to the biggest improvement.
Abstract: The utility of using morphological features in part-of-speech (POS) tagging is well established in the literature. However, the usefulness of exploiting information about POS tags for morphological segmentation is less clear. In this paper we study the POS-dependent morphological segmentation in the Adaptor Grammars framework. We experiment with three different scenarios: without POS tags, with gold-standard tags and with automatically induced tags, and show that the segmentation F1-score improves when the tags are used. We show that the gold-standard tags lead to the biggest improvement as expected. However, using automatically induced tags also brings some improvement over the tagindependent baseline.