UCD : Diachronic Text Classification with Character, Word, and Syntactic N-grams
Terrence Szymanski,Gerard Lynch +1 more
- pp 879-883
Reads0
Chats0
TLDR
This work extracts n-gram features from the text at the letter, word, and syntactic level, and uses these to train a classifier on date-labeled training data, and incorporates date probabilities of syntactic features as estimated from a very large external corpus of books.Abstract:
We present our submission to SemEval-2015 Task 7: Diachronic Text Evaluation, in which we approach the task of assigning a date to a text as a multi-class classification problem. We extract n-gram features from the text at the letter, word, and syntactic level, and use these to train a classifier on date-labeled training data. We also incorporate date probabilities of syntactic features as estimated from a very large external corpus of books. Our system achieved the highest performance of all systems on subtask 2: identifying texts by specific time language use.read more
Citations
More filters
Journal ArticleDOI
A Bayesian Model of Diachronic Meaning Change
Lea Frermann,Mirella Lapata +1 more
TL;DR: A dynamic Bayesian model of diachronic meaning change is presented, which infers temporal word representations as a set of senses and their prevalence and reveals that it performs on par with highly optimized task-specific systems.
Posted Content
Survey of Computational Approaches to Lexical Semantic Change
TL;DR: This article focuses on diachronic conceptual change as an extension of semantic change, and a survey of recent computational techniques to tackle lexical semantic change currently under review.
Proceedings ArticleDOI
Temporal Word Analogies: Identifying Lexical Replacement with Diachronic Word Embeddings
TL;DR: It is shown that temporal word analogies can effectively be modeled with diachronic word embeddings, provided that the independent embedding spaces from each time period are appropriately transformed into a common vector space.
Proceedings Article
Modeling Language Change in Historical Corpora: The Case of Portuguese
TL;DR: This paper presents a number of experiments to model changes in a historical Portuguese corpus composed of literary texts for the purpose of temporal text classification, reporting results of 99.8% accuracy using word unigram features with a Support Vector Machines classifier to predict the publication date of documents in time intervals of both one century and half a century.
Dissertation
Bayesian Models of Category Acquisition and Meaning Development
TL;DR: This thesis focuses on categories acquired from natural language stimuli, using nouns as a stand-in for their reference concepts, and their linguistic contexts as a representation of the concepts’ features, and presents a Bayesian model which jointly learns categories and structured featural representations.
References
More filters
Journal ArticleDOI
The WEKA data mining software: an update
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Book ChapterDOI
A Simple Approach to Ordinal Classification
Eibe Frank,Mark Hall +1 more
TL;DR: This paper presents a simple method that enables standard classification algorithms to make use of ordering information in class attributes and shows that it outperforms the naive approach, which treats the class values as an unordered set.
Proceedings Article
Syntactic Stylometry for Deception Detection
TL;DR: This paper investigates syntactic stylometry for deception detection, adding a somewhat unconventional angle to prior literature and demonstrating that features driven from Context Free Grammar (CFG) parse trees consistently improve the detection performance over several baselines that are based only on shallow lexico-syntactic features.
Proceedings Article
Syntactic Annotations for the Google Books NGram Corpus
TL;DR: A new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages, is presented, which will facilitate the study of linguistic trends, especially those related to the evolution of syntax.
N-gram-based author profiles for authorship attribution
TL;DR: This work presents a novel method for computer-assisted authorship attribution based on characterlevel n-gram author proles, which is motivated by an almost-forgotten, pioneering method in 1976.