scispace - formally typeset
Search or ask a question
Author

Marc Alexander

Bio: Marc Alexander is an academic researcher from University of Glasgow. The author has contributed to research in topics: Thesaurus (information retrieval) & History of English. The author has an hindex of 8, co-authored 44 publications receiving 196 citations.

Papers
More filters

[...]

01 Jan 2011

22 citations

Journal ArticleDOI

[...]

TL;DR: This new semantic tagger is built on existing NLP tools and incorporates a large-scale historical English thesaurus linked to the Oxford English Dictionary, capable of annotating lexical units with a historically-valid highly fine-grained semantic categorization scheme that contains about 225,000 semantic concepts and 4,033 thematic semantic categories.
Abstract: Automatic extraction and analysis of meaning-related information from natural language data has been an important issue in a number of research areas, such as natural language processing (NLP), text mining, corpus linguistics, and data science. An important aspect of such information extraction and analysis is the semantic annotation of language data using a semantic tagger. In practice, various semantic annotation tools have been designed to carry out different levels of semantic annotation, such as topics of documents, semantic role labeling, named entities or events. Currently, the majority of existing semantic annotation tools identify and tag partial core semantic information in language data, but they tend to be applicable only for modern language corpora. While such semantic analyzers have proven useful for various purposes, a semantic annotation tool that is capable of annotating deep semantic senses of all lexical units, or all-words tagging, is still desirable for a deep, comprehensive semantic analysis of language data. With large-scale digitization efforts underway, delivering historical corpora with texts dating from the last 400 years, a particularly challenging aspect is the need to adapt the annotation in the face of significant word meaning change over time. In this paper, we report on the development of a new semantic tagger (the Historical Thesaurus Semantic Tagger), and discuss challenging issues we faced in this work. This new semantic tagger is built on existing NLP tools and incorporates a large-scale historical English thesaurus linked to the Oxford English Dictionary. Employing contextual disambiguation algorithms, this tool is capable of annotating lexical units with a historically-valid highly fine-grained semantic categorization scheme that contains about 225,000 semantic concepts and 4,033 thematic semantic categories. In terms of novelty, it is adapted for processing historical English data, with rich information about historical usage of words and a spelling variant normalizer for historical forms of English. Furthermore, it is able to make use of knowledge about the publication date of a text to adapt its output. In our evaluation, the system achieved encouraging accuracies ranging from 77.12% to 91.08% on individual test texts. Applying time-sensitive methods improved results by as much as 3.54% and by 1.72% on average.

16 citations

Book ChapterDOI

[...]

01 May 2014

16 citations

Journal ArticleDOI

[...]

TL;DR: The Linguistic DNA of Modern Western Thought (LDAW) project as mentioned in this paper is an AHRC-funded project to identify the discursive concepts that shape thought, culture and society in a particular period.
Abstract: This article describes the background and premises of the AHRC-funded project, ‘The Linguistic DNA of Modern Western Thought’. We offer an empirical, encyclopaedic approach to historical semantics regarding ‘conceptual history’, i.e. the history of concepts that shape thought, culture and society in a particular period. We relate the project to traditional work in conceptual and semantic history and define our object of study as the discursive concept, a category of meaning encoded linguistically as a cluster of expressions that co-occur in discourse. We describe our principal data source, EEBO-TCP, and introduce our key research interests, namely, the contexts of conceptual change, the semantic structure of lexical fields and the nature of lexicalisation pressure. We outline our computational processes, which build upon the theoretical definition of discursive concepts, to discover the linguistically encoded forms underpinning the discursive concepts we seek to identify in EEBO-TCP. Finally, we share preliminary results via a worked example, exploring the discursive contexts in which paradigmatic terms of key cultural concepts emerge. We consider the extent to which particular genres, discourses and users in the early modern period make paradigms, and examine the extent to which these contexts determine the characteristics of key concepts.

15 citations

Proceedings Article

[...]

01 May 2014
TL;DR: This poster describes experiences processing the two-billion-word Hansard corpus using a fairly standard NLP pipeline on a high performance cluster and discusses the gains and benefits of using high-performance machinery rather than relatively cheap commodity hardware.
Abstract: This poster describes experiences processing the two-billion-word Hansard corpus using a fairly standard NLP pipeline on a high performance cluster. Herein we report how we were able to parallelise and apply a "traditional" single-threaded batch-oriented application to a platform that differs greatly from that for which it was originally designed. We start by discussing the tagging toolchain, its specific requirements and properties, and its performance characteristics. This is contrasted with a description of the cluster on which it was to run, and specific limitations are discussed such as the overhead of using SAN-based storage. We then go on to discuss the nature of the Hansard corpus, and describe which properties of this corpus in particular prove challenging for use on the system architecture used. The solution for tagging the corpus is then described, along with performance comparisons against a naive run on commodity hardware. We discuss the gains and benefits of using high-performance machinery rather than relatively cheap commodity hardware. Our poster provides a valuable scenario for large scale NLP pipelines and lessons learnt from the experience

11 citations


Cited by
More filters

[...]

01 Jan 1964
TL;DR: In this paper, the notion of a collective unconscious was introduced as a theory of remembering in social psychology, and a study of remembering as a study in Social Psychology was carried out.
Abstract: Part I. Experimental Studies: 2. Experiment in psychology 3. Experiments on perceiving III Experiments on imaging 4-8. Experiments on remembering: (a) The method of description (b) The method of repeated reproduction (c) The method of picture writing (d) The method of serial reproduction (e) The method of serial reproduction picture material 9. Perceiving, recognizing, remembering 10. A theory of remembering 11. Images and their functions 12. Meaning Part II. Remembering as a Study in Social Psychology: 13. Social psychology 14. Social psychology and the matter of recall 15. Social psychology and the manner of recall 16. Conventionalism 17. The notion of a collective unconscious 18. The basis of social recall 19. A summary and some conclusions.

5,549 citations

Journal ArticleDOI

[...]

2,095 citations

Journal ArticleDOI

[...]

1,778 citations

[...]

01 Jan 1983
TL;DR: In this paper, Leech proposed a Linguistic Guideto English Poetry (GLG) to guide English poetry writers to improve their writing skills by using a linguistic guidance to English poetry.
Abstract: 《小说文体分析》(Style in Fiction)是运用现代语言学的最新成果,对文学作品进行系统的文体分析的一部精采的教科书。随着语言学的日益发展,人们越来越多地把现代语言学的研究成果运用到文体学分析中来。1969年,利奇(Geoffrey N.Leech)发表了《英语诗歌语言分析入门》(A Linguistic Guideto English Poetry),用现代语言学的观点对

515 citations

Journal ArticleDOI

[...]

186 citations