Other affiliations: University of Sussex, Royal Holloway, University of London, University of Geneva ...read more
Bio: Alexander Clark is an academic researcher from King's College London. The author has contributed to research in topics: Grammar induction & Natural language. The author has an hindex of 26, co-authored 105 publications receiving 2775 citations. Previous affiliations of Alexander Clark include University of Sussex & Royal Holloway, University of London.
Papers published on a yearly basis
••12 Apr 2003
TL;DR: Algorithms for clustering words into classes from unlabelled text using unsupervised algorithms, based on distributional and morphological information, are discussed, showing how the use of morphological Information can improve the performance on rare words.
Abstract: In this paper we discuss algorithms for clustering words into classes from unlabelled text using unsupervised algorithms, based on distributional and morphological information. We show how the use of morphological information can improve the performance on rare words, and that this is robust across a wide range of languages.
01 Jan 2010
TL;DR: The Handbook of Computational Linguistics and Natural Language Processing provides a comprehensive overview of the concepts, methodologies, and applications being undertaken today in computational linguistics and natural language processing.
Abstract: The best part of compiling this handbook has been the opportunity that it has given each of us to observe in detail and in perspective the wonderful burst of creativity that has taken hold of our field in recent years.
TL;DR: It is argued that the results of a set of large-scale experiments using crowd-sourced acceptability judgments that demonstrate gradience to be a pervasive feature inacceptability judgments support the view that linguistic knowledge can be intrinsically probabilistic.
Abstract: The question of whether humans represent grammatical knowledge as a binary condition on membership in a set of well-formed sentences, or as a probabilistic property has been the subject of debate among linguists, psychologists, and cognitive scientists for many decades. Acceptability judgments present a serious problem for both classical binary and probabilistic theories of grammaticality. These judgements are gradient in nature, and so cannot be directly accommodated in a binary formal grammar. However, it is also not possible to simply reduce acceptability to probability. The acceptability of a sentence is not the same as the likelihood of its occurrence, which is, in part, determined by factors like sentence length and lexical frequency. In this paper, we present the results of a set of large-scale experiments using crowd-sourced acceptability judgments that demonstrate gradience to be a pervasive feature in acceptability judgments. We then show how one can predict acceptability judgments on the basis of probability by augmenting probabilistic language models with an acceptability measure. This is a function that normalizes probability values to eliminate the confounding factors of length and lexical frequency. We describe a sequence of modeling experiments with unsupervised language models drawn from state-of-the-art machine learning methods in natural language processing. Several of these models achieve very encouraging levels of accuracy in the acceptability prediction task, as measured by the correlation between the acceptability measure scores and mean human acceptability values. We consider the relevance of these results to the debate on the nature of grammatical competence, and we argue that they support the view that linguistic knowledge can be intrinsically probabilistic.
25 Jan 2011
TL;DR: In this article, a theory-internal APS is proposed to determine the nature of primary Linguistic Data, which is based on the positive-evidence-only APS.
Abstract: Preface. 1 Introduction: Nativism in Linguistic Theory. 1.1 Historical Development. 1.2 The Rationalist-Empiricist Debate. 1.3 Nativism and Cognitive Modularity. 1.4 Connectionism, Nonmodularity, and Antinativism. 1.5 Adaptation and the Evolution of Natural Language. 1.6 Summary and Conclusions. 2 Clarifying the Argument from the Poverty of the Stimulus. 2.1 Formulating the APS. 2.2 Empiricist Learning versus Nativist Learning. 2.3 Our Version of the APS. 2.4 A Theory-Internal APS. 2.5 Evidence for the APS: Auxiliary Inversion as a Paradigm Case. 2.6 Debate on the PLD. 2.7 Learning Theory and Indispensable Data. 2.8 A Second Empirical Case: Anaphoric One. 2.9 Summary and Conclusions. 3 The Stimulus: Determining the Nature of Primary Linguistic Data. 3.1 Primary Linguistic Data. 3.2 Negative Evidence. 3.3 Semantic, Contextual, and Extralinguistic Evidence. 3.4 Prosodic Information. 3.5 Summary and Conclusions. 4 Learning in the Limit: The Gold Paradigm. 4.1 Formal Models of Language Acquisition. 4.2 Mathematical Models of Learnability. 4.3 The Gold Paradigm of Learnability. 4.4 Critique of the Positive-Evidence-Only APS in IIL. 4.5 Proper Positive Results. 4.6 Variants of the Gold Model. 4.7 Implications of Gold's Results for Linguistic Nativism. 4.8 Summary and Conclusions. 5 Probabilistic Learning Theory for Language Acquisition. 5.1 Chomsky's View of Statistical Learning. 5.2 Basic Assumptions of Statistical Learning Theory. 5.3 Learning Distributions. 5.4 Probabilistic Versions of the IIL Framework. 5.5 PAC Learning. 5.6 Consequences of PAC Learnability. 5.7 Problems with the Standard Model. 5.8 Summary and Conclusions. 6 A Formal Model of Indirect Negative Evidence. 6.1 Introduction. 6.2. From Low Probability to Ungrammaticality. 6.3 Modeling the DDA. 6.4 Applying the Functional Lower Bound. 6.5 Summary and Conclusions. 7 Computational Complexity and Efficient Learning. 7.1 Basic Concepts of Complexity 7.2 Efficient Learning. 7.3 Negative Results. 7.4 Interpreting Hardness Results. 7.5 Summary and Conclusions. 8 Positive Results in Efficient Learning. 8.1 Regular Languages. 8.2 Distributional Methods. 8.3 Distributional Learning of Context-Free Languages. 8.4 Lattice-Based Formalisms. 8.5 Arguments against Distributional Learning. 8.6 Summary and Conclusions. 9 Grammar Induction through Implemented Machine Learning. 9.1 Supervised Learning. 9.2Unsupervised Learning. 9.3 Summary and Conclusions. 10 Parameters in Linguistic Theory and Probabilistic Language Models. 10.1 Learnability of Parametric Models of Syntax. 10.2 UG Parameters and Language Variation. 10.3 Parameters in Probabilistic Language Models. 10.4 Inferring Constraints on Hypothesis Spaces with Hierarchical Bayesian Models. 10.5 Summary and Conclusions. 11 A Brief Look at Some Biological and Psychological Evidence. 11.1 Developmental Arguments. 11.2 Genetic Factors: Inherited Language Disorders. 11.3 Experimental Learning of Artificial Languages. 11.4 Summary and Conclusions. 12 Conclusion. 12.1 Summary. 12.2 Conclusions. References. Author Index. Subject Index.
••22 Sep 2008
TL;DR: It is established that all context free languages that satisfy two constraints on the context distributions can be identified in the limit by this approach.
Abstract: We present a polynomial algorithm for the inductive inference of a large class of context free languages, that includes all regular languages. The algorithm uses a representation which we call Binary Feature Grammars based on a set of features, capable of representing richly structured context free languages as well as some context sensitive languages. More precisely, we focus on a particular case of this representation where the features correspond to contexts appearing in the language. Using the paradigm of positive data and a membership oracle, we can establish that all context free languages that satisfy two constraints on the context distributions can be identified in the limit by this approach. The polynomial time algorithm we propose is based on a generalisation of distributional learning and uses the lattice of context occurrences. The formalism and the algorithm seem well suited to natural language and in particular to the modelling of first language acquisition.
01 Mar 1999
TL;DR: This target article summarizes decades of cross-linguistic work by typologists and descriptive linguists, showing just how few and unprofound the universal characteristics of language are, once the authors honestly confront the diversity offered to us by the world's 6,000 to 8,000 languages.
Abstract: Talk of linguistic universals has given cognitive scientists the impression that languages are all built to a common pattern. In fact, there are vanishingly few universals of language in the direct sense that all languages exhibit them. Instead, diversity can be found at almost every level of linguistic organization. This fundamentally changes the object of enquiry from a cognitive science perspective. This target article summarizes decades of cross-linguistic work by typologists and descriptive linguists, showing just how few and unprofound the universal characteristics of language are, once we honestly confront the diversity offered to us by the world's 6,000 to 8,000 languages. After surveying the various uses of "universal," we illustrate the ways languages vary radically in sound, meaning, and syntactic organization, and then we examine in more detail the core grammatical machinery of recursion, constituency, and grammatical relations. Although there are significant recurrent patterns in organization, these are better explained as stable engineering solutions satisfying multiple design constraints, reflecting both cultural-historical factors and the constraints of human cognition. Linguistic diversity then becomes the crucial datum for cognitive science: we are the only species with a communication system that is fundamentally variable at all levels. Recognizing the true extent of structural diversity in human language opens up exciting new research directions for cognitive scientists, offering thousands of different natural experiments given by different languages, with new opportunities for dialogue with biological paradigms concerned with change and diversity, and confronting us with the extraordinary plasticity of the highest human skills.
TL;DR: By exploring the consistency and complementary properties of different views, multi-View learning is rendered more effective, more promising, and has better generalization ability than single-view learning.
Abstract: In recent years, a great many methods of learning from multi-view data by considering the diversity of different views have been proposed. These views may be obtained from multiple sources or different feature subsets. In trying to organize and highlight similarities and differences between the variety of multi-view learning approaches, we review a number of representative multi-view learning algorithms in different areas and classify them into three groups: 1) co-training, 2) multiple kernel learning, and 3) subspace learning. Notably, co-training style algorithms train alternately to maximize the mutual agreement on two distinct views of the data; multiple kernel learning algorithms exploit kernels that naturally correspond to different views and combine kernels either linearly or non-linearly to improve learning performance; and subspace learning algorithms aim to obtain a latent subspace shared by multiple views by assuming that the input views are generated from this latent subspace. Though there is significant variance in the approaches to integrating multiple views to improve learning performance, they mainly exploit either the consensus principle or the complementary principle to ensure the success of multi-view learning. Since accessing multiple views is the fundament of multi-view learning, with the exception of study on learning a model from multiple views, it is also valuable to study how to construct multiple views and how to evaluate these views. Overall, by exploring the consistency and complementary properties of different views, multi-view learning is rendered more effective, more promising, and has better generalization ability than single-view learning.
01 Jan 1999
TL;DR: Second language acquisition research has been extensively studied in the literature as discussed by the authors, with a focus on second language acquisition in the context of English as a Second Language Learning (ESL) programs.
Abstract: Acknowledgements Introduction PART ONE - BACKGROUND Introduction 1. Second language acquisition research: an overview PART TWO - THE DESCRIPTION OF LEARNER LANGUAGE Introduction 2. Learner errors and error analysis 3. Developmental patterns: order and sequence in second language acquisition 4. Variability in learner language 5. Pragmatic aspects of learner language PART THREE - EXPLAINING SECOND LANGUAGE ACQUISITION: EXTERNAL FACTORS Introduction 6. Social factors and second language acquisition 7. Input and interaction and second language acquisition PART FOUR - EXPLAINING SECOND LANGUAGE ACQUISITION: INTERNAL FACTORS Introduction 8. Language transfer 9. Cognitive accounts of second language acquisition 10. Linguistic universals and second language acquisition PART FIVE - EXPLAINING INDIVIDUAL DIFFERENCES IN SECOND LANGUAGE ACQUISITION Introduction 11. Individual learner differences 12. Learning strategies PART SIX - CLASSROOM SECOND LANGUAGE ACQUISITION Introduction 13. Classroom interaction and second language acquisition 14. Formal instruction and second language acquisition PART SEVEN - CONCLUSION Introduction 15. Data, theory, and applications in second language acquisition research Glossary Bibliography Author index Subject index
TL;DR: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results as mentioned in this paper, which is also popularly used in sentiment analysis in recent years.
Abstract: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.