scispace - formally typeset
Search or ask a question
Author

Nello Cristianini

Bio: Nello Cristianini is an academic researcher from University of Bristol. The author has contributed to research in topics: Kernel method & Support vector machine. The author has an hindex of 51, co-authored 183 publications receiving 46640 citations. Previous affiliations of Nello Cristianini include Royal Holloway, University of London & University of California, Davis.


Papers
More filters
Book ChapterDOI
24 Oct 2018
TL;DR: A rigorous way to measure some of these biases is presented, based on the use of word lists created for social psychology applications, and a simple projection can significantly reduce the effects of embedding bias.
Abstract: Many modern Artificial Intelligence (AI) systems make use of data embeddings, particularly in the domain of Natural Language Processing (NLP). These embeddings are learnt from data that has been gathered “from the wild” and have been found to contain unwanted biases. In this paper we make three contributions towards measuring, understanding and removing this problem. We present a rigorous way to measure some of these biases, based on the use of word lists created for social psychology applications; we observe how gender bias in occupations reflects actual gender bias in the same occupations in the real world; and finally we demonstrate how a simple projection can significantly reduce the effects of embedding bias. All this is part of an ongoing effort to understand how trust can be built into AI systems.

21 citations

Book ChapterDOI
18 Aug 2004
TL;DR: This paper demonstrates a case study in which many algorithms and kernels are mixed and matched, for a cross-language text analysis task.
Abstract: Kernel Methods are a class of algorithms for pattern analysis with a number of convenient features. They can deal in a uniform way with a multitude of data types and can be used to detect many types of relations in data. Importantly for applications, they have a modular structure, in that any kernel function can be used with any kernel-based algorithm. This means that customized solutions can be easily developed from a standard library of kernels and algorithms. This paper demonstrates a case study in which many algorithms and kernels are mixed and matched, for a cross-language text analysis task. All the software is available online.

19 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: This study investigates seasonalfluctuations in mood and mental health by analyzing the accesslogs of Wikipedia pages and the content of Twitter in the UK over a period of four years, finding that both negative affect onTwitter and access to mental health pages on Wikipedia follow an annual cycle, both peaking during the winter months.
Abstract: Understanding changes in the mood and mentalhealth of large populations is a challenge, with the need for largenumbers of samples to uncover any regular patterns within thedata. The use of data generated by online activities of healthyindividuals offers the opportunity to perform such observationson the large scales and for the long periods that are required. Various studies have previously examined circadian fluctuationsof mood in this way. In this study, we investigate seasonalfluctuations in mood and mental health by analyzing the accesslogs of Wikipedia pages and the content of Twitter in the UK overa period of four years. By using standard methods of NaturalLanguage Processing, we extract daily indicators of negativeaffect, anxiety, anger and sadness from Twitter and comparethis with the overall daily traffic to Wikipedia pages aboutmental health disorders. We show that both negative affect onTwitter and access to mental health pages on Wikipedia follow anannual cycle, both peaking during the winter months. Breakingthis down into specific moods and pages, we find that peakaccess to the Wikipedia page for Seasonal Affective Disordercoincides with the peak period for the sadness indicator inTwitter content, with both most over-expressed in Novemberand December. A period of heightened anger and anxiety onTwitter partly overlaps with increased information seeking aboutstress, panic and eating disorders on Wikipedia in the late winterand early spring. Finally, we compare Twitter mood indicatorswith various weather time series, finding that negative affectand anger can be partially explained in terms of the climatictemperature and photoperiod, sadness can be partially explainedby the photoperiod and the perceived change in the photoperiod, while anxiety is partially explained by the level of precipitation. Using these multiple sources of data allows us to have accessto inexpensive, although indirect, information about collectivevariations in mood over long periods of time, in turn helpingus to begin to separate out the various possible causes of these fluctuations.

19 citations

Proceedings Article
01 Dec 1997
TL;DR: Perceptron Decision Trees (also known as Linear Machine DTs, etc.) are analysed in order that data-dependent Structural Risk Minimisation can be applied and it is indicated that choosing the maximal margin hyperplanes at the decision nodes will improve the generalization.
Abstract: Perceptron Decision Trees (also known as Linear Machine DTs, etc.) are analysed in order that data-dependent Structural Risk Minimisation can be applied. Data-dependent analysis is performed which indicates that choosing the maximal margin hyperplanes at the decision nodes will improve the generalization. The analysis uses a novel technique to bound the generalization error in terms of the margins at individual nodes. Experiments performed on real data sets confirm the validity of the approach.

19 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

40,826 citations

Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Book
25 Oct 1999
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

20,196 citations

Journal ArticleDOI
TL;DR: There are several arguments which support the observed high accuracy of SVMs, which are reviewed and numerous examples and proofs of most of the key theorems are given.
Abstract: The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.

15,696 citations