scispace - formally typeset
Search or ask a question
Author

Terrence Szymanski

Other affiliations: University of Michigan
Bio: Terrence Szymanski is an academic researcher from University College Dublin. The author has contributed to research in topics: Newspaper & Word embedding. The author has an hindex of 5, co-authored 10 publications receiving 354 citations. Previous affiliations of Terrence Szymanski include University of Michigan.

Papers
More filters
Proceedings Article
09 Jun 2018
TL;DR: This paper surveys the current state of academic research related to diachronic word embeddings and semantic shifts detection, and proposes several axes along which these methods can be compared, and outlines the main challenges before this emerging subfield of NLP.
Abstract: Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.

191 citations

Posted Content
TL;DR: A survey of the current state of academic research related to diachronic word embeddings and semantic shifts detection can be found in this article, where the authors discuss the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models.
Abstract: Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.

124 citations

Proceedings ArticleDOI
04 Aug 2017
TL;DR: It is shown that temporal word analogies can effectively be modeled with diachronic word embeddings, provided that the independent embedding spaces from each time period are appropriately transformed into a common vector space.
Abstract: This paper introduces the concept of temporal word analogies: pairs of words which occupy the same semantic space at different points in time. One well-known property of word embeddings is that they are able to effectively model traditional word analogies (“word w1 is to word w2 as word w3 is to word w4”) through vector addition. Here, I show that temporal word analogies (“word w1 at time t𝛼 is like word w2 at time t𝛽”) can effectively be modeled with diachronic word embeddings, provided that the independent embedding spaces from each time period are appropriately transformed into a common vector space. When applied to a diachronic corpus of news articles, this method is able to identify temporal word analogies such as “Ronald Reagan in 1987 is like Bill Clinton in 1997”, or “Walkman in 1987 is like iPod in 2007”.

52 citations

Proceedings ArticleDOI
05 Jun 2015
TL;DR: This work extracts n-gram features from the text at the letter, word, and syntactic level, and uses these to train a classifier on date-labeled training data, and incorporates date probabilities of syntactic features as estimated from a very large external corpus of books.
Abstract: We present our submission to SemEval-2015 Task 7: Diachronic Text Evaluation, in which we approach the task of assigning a date to a text as a multi-class classification problem. We extract n-gram features from the text at the letter, word, and syntactic level, and use these to train a classifier on date-labeled training data. We also incorporate date probabilities of syntactic features as estimated from a very large external corpus of books. Our system achieved the highest performance of all systems on subtask 2: identifying texts by specific time language use.

16 citations

Posted Content
TL;DR: In this article, the authors present a software tool that employs state-of-the-art NLP and machine learning techniques to help newspaper editors compose effective headlines for online publication.
Abstract: We present a software tool that employs state-of-the-art natural language processing (NLP) and machine learning techniques to help newspaper editors compose effective headlines for online publication. The system identifies the most salient keywords in a news article and ranks them based on both their overall popularity and their direct relevance to the article. The system also uses a supervised regression model to identify headlines that are likely to be widely shared on social media. The user interface is designed to simplify and speed the editor's decision process on the composition of the headline. As such, the tool provides an efficient way to combine the benefits of automated predictors of engagement and search-engine optimization (SEO) with human judgments of overall headline quality.

13 citations


Cited by
More filters
17 Dec 2010
TL;DR: The authors survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000, using a corpus of digitized texts containing about 4% of all books ever printed.
Abstract: L'article, publie dans Science, sur une des premieres utilisations analytiques de Google Books, fondee sur les n-grammes (Google Ngrams) We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can ...

735 citations

Journal ArticleDOI
TL;DR: A broad multidisciplinary overview of the area of bias in AI systems is provided, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well‐grounded in a legal frame.
Abstract: Artificial Intelligence (AI)‐based systems are widely employed nowadays to make decisions that have far‐reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training, and deployment to ensure social good while still benefiting from the huge potential of the AI technology. The goal of this survey is to provide a broad multidisciplinary overview of the area of bias in AI systems, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well‐grounded in a legal frame. In this survey, we focus on data‐driven AI, as a large part of AI is powered nowadays by (big) data and powerful machine learning algorithms. If otherwise not specified, we use the general term bias to describe problems related to the gathering or processing of data that might result in prejudiced decisions on the bases of demographic features such as race, sex, and so forth.

271 citations

Proceedings Article
09 Jun 2018
TL;DR: This paper surveys the current state of academic research related to diachronic word embeddings and semantic shifts detection, and proposes several axes along which these methods can be compared, and outlines the main challenges before this emerging subfield of NLP.
Abstract: Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.

191 citations

Journal ArticleDOI
TL;DR: A dynamic Bayesian model of diachronic meaning change is presented, which infers temporal word representations as a set of senses and their prevalence and reveals that it performs on par with highly optimized task-specific systems.
Abstract: Word meanings change over time and an automated procedure for extracting this information from text would be useful for historical exploratory studies, information retrieval or question answering. We present a dynamic Bayesian model of diachronic meaning change, which infers temporal word representations as a set of senses and their prevalence. Unlike previous work, we explicitly model language change as a smooth, gradual process. We experimentally show that this modeling decision is beneficial: our model performs competitively on meaning change detection tasks whilst inducing discernible word senses and their development over time. Application of our model to the SemEval-2015 temporal classification benchmark datasets further reveals that it performs on par with highly optimized task-specific systems.

152 citations

Posted Content
TL;DR: A survey of the current state of academic research related to diachronic word embeddings and semantic shifts detection can be found in this article, where the authors discuss the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models.
Abstract: Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.

124 citations