UCD : Diachronic Text Classification with Character, Word, and Syntactic N-grams

doi:10.18653/V1/S15-2148

Open AccessProceedings ArticleDOI

UCD : Diachronic Text Classification with Character, Word, and Syntactic N-grams

Terrence Szymanski, +1 more

- pp 879-883

Chats0

TLDR

This work extracts n-gram features from the text at the letter, word, and syntactic level, and uses these to train a classifier on date-labeled training data, and incorporates date probabilities of syntactic features as estimated from a very large external corpus of books.

Abstract:

We present our submission to SemEval-2015 Task 7: Diachronic Text Evaluation, in which we approach the task of assigning a date to a text as a multi-class classification problem. We extract n-gram features from the text at the letter, word, and syntactic level, and use these to train a classifier on date-labeled training data. We also incorporate date probabilities of syntactic features as estimated from a very large external corpus of books. Our system achieved the highest performance of all systems on subtask 2: identifying texts by specific time language use.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A Bayesian Model of Diachronic Meaning Change

Lea Frermann, +1 more

- 19 Feb 2016 -

Transactions of the Association for Comp...

TL;DR: A dynamic Bayesian model of diachronic meaning change is presented, which infers temporal word representations as a set of senses and their prevalence and reveals that it performs on par with highly optimized task-specific systems.

...read moreread less

Posted Content

Survey of Computational Approaches to Lexical Semantic Change

Nina Tahmasebi, +2 more

- 15 Nov 2018 -

arXiv: Computation and Language

TL;DR: This article focuses on diachronic conceptual change as an extension of semantic change, and a survey of recent computational techniques to tackle lexical semantic change currently under review.

...read moreread less

Proceedings ArticleDOI

Temporal Word Analogies: Identifying Lexical Replacement with Diachronic Word Embeddings

Terrence Szymanski

TL;DR: It is shown that temporal word analogies can effectively be modeled with diachronic word embeddings, provided that the independent embedding spaces from each time period are appropriately transformed into a common vector space.

...read moreread less

Proceedings Article

Modeling Language Change in Historical Corpora: The Case of Portuguese

Marcos Zampieri, +2 more

TL;DR: This paper presents a number of experiments to model changes in a historical Portuguese corpus composed of literary texts for the purpose of temporal text classification, reporting results of 99.8% accuracy using word unigram features with a Support Vector Machines classifier to predict the publication date of documents in time intervals of both one century and half a century.

...read moreread less

Dissertation

Bayesian Models of Category Acquisition and Meaning Development

Lea Frermann

TL;DR: This thesis focuses on categories acquired from natural language stimuli, using nouns as a stand-in for their reference concepts, and their linguistic contexts as a representation of the concepts’ features, and presents a Bayesian model which jointly learns categories and structured featural representations.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

The WEKA data mining software: an update

Mark Hall, +5 more

- 16 Nov 2009 -

Sigkdd Explorations

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

Book ChapterDOI

A Simple Approach to Ordinal Classification

Eibe Frank, +1 more

TL;DR: This paper presents a simple method that enables standard classification algorithms to make use of ordering information in class attributes and shows that it outperforms the naive approach, which treats the class values as an unordered set.

...read moreread less

Proceedings Article

Syntactic Stylometry for Deception Detection

Song Feng, +2 more

TL;DR: This paper investigates syntactic stylometry for deception detection, adding a somewhat unconventional angle to prior literature and demonstrating that features driven from Context Free Grammar (CFG) parse trees consistently improve the detection performance over several baselines that are based only on shallow lexico-syntactic features.

...read moreread less

Proceedings Article

Syntactic Annotations for the Google Books NGram Corpus

Yuri Lin, +5 more

TL;DR: A new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages, is presented, which will facilitate the study of linguistic trends, especially those related to the evolution of syntax.

...read moreread less

N-gram-based author profiles for authorship attribution

Vlado Ke, +3 more

TL;DR: This work presents a novel method for computer-assisted authorship attribution based on characterlevel n-gram author proles, which is motivated by an almost-forgotten, pioneering method in 1976.

...read moreread less

UCD : Diachronic Text Classification with Character, Word, and Syntactic N-grams

Citations

A Bayesian Model of Diachronic Meaning Change

Survey of Computational Approaches to Lexical Semantic Change

Temporal Word Analogies: Identifying Lexical Replacement with Diachronic Word Embeddings

Modeling Language Change in Historical Corpora: The Case of Portuguese

Bayesian Models of Category Acquisition and Meaning Development

References

The WEKA data mining software: an update

A Simple Approach to Ordinal Classification

Syntactic Stylometry for Deception Detection

Syntactic Annotations for the Google Books NGram Corpus

N-gram-based author profiles for authorship attribution

Related Papers (5)

Modeling Language Change in Historical Corpora: The Case of Portuguese

SemEval 2015, Task 7: Diachronic Text Evaluation

Exploring Lexical and Syntactic Features for Language Variety Identification

Native Language Identification: a Simple n-gram Based Approach

A POS-Based Word Prediction System for the Persian Language