scispace - formally typeset
Open AccessProceedings ArticleDOI

UCD : Diachronic Text Classification with Character, Word, and Syntactic N-grams

Reads0
Chats0
TLDR
This work extracts n-gram features from the text at the letter, word, and syntactic level, and uses these to train a classifier on date-labeled training data, and incorporates date probabilities of syntactic features as estimated from a very large external corpus of books.
Abstract
We present our submission to SemEval-2015 Task 7: Diachronic Text Evaluation, in which we approach the task of assigning a date to a text as a multi-class classification problem. We extract n-gram features from the text at the letter, word, and syntactic level, and use these to train a classifier on date-labeled training data. We also incorporate date probabilities of syntactic features as estimated from a very large external corpus of books. Our system achieved the highest performance of all systems on subtask 2: identifying texts by specific time language use.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A Bayesian Model of Diachronic Meaning Change

TL;DR: A dynamic Bayesian model of diachronic meaning change is presented, which infers temporal word representations as a set of senses and their prevalence and reveals that it performs on par with highly optimized task-specific systems.
Posted Content

Survey of Computational Approaches to Lexical Semantic Change

TL;DR: This article focuses on diachronic conceptual change as an extension of semantic change, and a survey of recent computational techniques to tackle lexical semantic change currently under review.
Proceedings ArticleDOI

Temporal Word Analogies: Identifying Lexical Replacement with Diachronic Word Embeddings

TL;DR: It is shown that temporal word analogies can effectively be modeled with diachronic word embeddings, provided that the independent embedding spaces from each time period are appropriately transformed into a common vector space.
Proceedings Article

Modeling Language Change in Historical Corpora: The Case of Portuguese

TL;DR: This paper presents a number of experiments to model changes in a historical Portuguese corpus composed of literary texts for the purpose of temporal text classification, reporting results of 99.8% accuracy using word unigram features with a Support Vector Machines classifier to predict the publication date of documents in time intervals of both one century and half a century.
Dissertation

Bayesian Models of Category Acquisition and Meaning Development

Lea Frermann
TL;DR: This thesis focuses on categories acquired from natural language stimuli, using nouns as a stand-in for their reference concepts, and their linguistic contexts as a representation of the concepts’ features, and presents a Bayesian model which jointly learns categories and structured featural representations.
References
More filters
Journal ArticleDOI

The WEKA data mining software: an update

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Book ChapterDOI

A Simple Approach to Ordinal Classification

TL;DR: This paper presents a simple method that enables standard classification algorithms to make use of ordering information in class attributes and shows that it outperforms the naive approach, which treats the class values as an unordered set.
Proceedings Article

Syntactic Stylometry for Deception Detection

TL;DR: This paper investigates syntactic stylometry for deception detection, adding a somewhat unconventional angle to prior literature and demonstrating that features driven from Context Free Grammar (CFG) parse trees consistently improve the detection performance over several baselines that are based only on shallow lexico-syntactic features.
Proceedings Article

Syntactic Annotations for the Google Books NGram Corpus

TL;DR: A new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages, is presented, which will facilitate the study of linguistic trends, especially those related to the evolution of syntax.

N-gram-based author profiles for authorship attribution

TL;DR: This work presents a novel method for computer-assisted authorship attribution based on characterlevel n-gram author proles, which is motivated by an almost-forgotten, pioneering method in 1976.