scispace - formally typeset
Search or ask a question

Showing papers by "Nello Cristianini published in 2020"


Journal ArticleDOI
TL;DR: The main contribution of this paper is to identify convergent social and technical trends that are leading towards social regulation by algorithms, and to discuss the possible social, political, and ethical consequences of taking this path.
Abstract: Autonomous mechanisms have been proposed to regulate certain aspects of society and are already being used to regulate business organisations. We take seriously recent proposals for algorithmic regulation of society, and we identify the existing technologies that can be used to implement them, most of them originally introduced in business contexts. We build on the notion of ‘social machine’ and we connect it to various ongoing trends and ideas, including crowdsourced task-work, social compiler, mechanism design, reputation management systems, and social scoring. After showing how all the building blocks of algorithmic regulation are already well in place, we discuss the possible implications for human autonomy and social order. The main contribution of this paper is to identify convergent social and technical trends that are leading towards social regulation by algorithms, and to discuss the possible social, political, and ethical consequences of taking this path.

13 citations


Journal ArticleDOI
TL;DR: The tool makes use of scalable algorithms to first extract trends from textual corpora, before making them available for real-time search and discovery, presenting users with an interface to explore the data.
Abstract: Recent studies have shown that macroscopic patterns of continuity and change over the course of centuries can be detected through the analysis of time series extracted from massive textual corpora. Similar data-driven approaches have already revolutionised the natural sciences, and are widely believed to hold similar potential for the humanities and social sciences, driven by the mass-digitisation projects that are currently under way, and coupled with the ever-increasing number of documents which are "born digital". As such, new interactive tools are required to discover and extract macroscopic patterns from these vast quantities of textual data. Here we present History Playground, an interactive web-based tool for discovering trends in massive textual corpora. The tool makes use of scalable algorithms to first extract trends from textual corpora, before making them available for real-time search and discovery, presenting users with an interface to explore the data. Included in the tool are algorithms for standardization, regression, change-point detection in the relative frequencies of ngrams, multi-term indices and comparison of trends across different corpora.

7 citations


BookDOI
TL;DR: In this article, the authors introduce the concept of "concept" as a list of words that have shared semantic content, defined as the capability of a classifier to recognize unseen members of a concept after training on a random subset of it.
Abstract: Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of "concept" as a list of words that have shared semantic content. We use this notion to analyse the learnability of certain concepts, defined as the capability of a classifier to recognise unseen members of a concept after training on a random subset of it. We first use this method to measure the learnability of concepts on pretrained word embeddings. We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters. We find that all embedding methods capture the semantic content of those word lists, but fastText performs better than the others.

2 citations


Book ChapterDOI
05 Jun 2020
TL;DR: This paper introduces the notion of "concept" as a list of words that have shared semantic content, and develops a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters.
Abstract: Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus In this paper we introduce the notion of “concept” as a list of words that have shared semantic content We use this notion to analyse the learnability of certain concepts, defined as the capability of a classifier to recognise unseen members of a concept after training on a random subset of it We first use this method to measure the learnability of concepts on pretrained word embeddings We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters We find that all embedding methods capture the semantic content of those word lists, but fastText performs better than the others

2 citations


Journal ArticleDOI
06 Jul 2020
TL;DR: A novel approach to the fact-checking of natural language text, which uses a combination of all the following techniques: knowledge extraction to establish a knowledge base, logical inference for fact- checking of claims not explicitly mentioned in the text, and a re-querying approach that enables continuous learning.
Abstract: Increasing concerns about the prevalence of false information and fake news has led to calls for automated fact-checking systems that are capable of verifying the truthfulness of statements, especially on the internet. Most previous automated fact-checking systems have focused on the use of grammar rules only for determining the properties of the language used in statements. Here, we demonstrate a novel approach to the fact-checking of natural language text, which uses a combination of all the following techniques: knowledge extraction to establish a knowledge base, logical inference for fact-checking of claims not explicitly mentioned in the text through the verification of the consistency of a set of beliefs with established trusted knowledge, and a re-querying approach that enables continuous learning. The approach that is presented here addresses the limitations of existing automated fact-checking systems via this novel procedure. This procedure is as follows: the approach investigates the consistency of presented facts or claims while using probabilistic soft logic and a Knowledge Base, which is continuously updated through continuous learning strategies. We demonstrate this approach by focusing on the task of checking facts about family-tree relationships against a corpus of web resources concerned with the UK Royal Family.