Showing papers by "Nello Cristianini published in 2012"

PDF

Open Access

Journal Article•DOI•

Nowcasting Events from the Social Web with Statistical Learning

[...]

Vasileios Lampos¹, Nello Cristianini¹•Institutions (1)

01 Sep 2012-ACM Transactions on Intelligent Systems and Technology

TL;DR: A general methodology for inferring the occurrence and magnitude of an event or phenomenon by exploring the rich amount of unstructured textual information on the social part of the Web by investigating two case studies of geo-tagged user posts on the microblogging service of Twitter.

...read moreread less

Abstract: We present a general methodology for inferring the occurrence and magnitude of an event or phenomenon by exploring the rich amount of unstructured textual information on the social part of the Web. Having geo-tagged user posts on the microblogging service of Twitter as our input data, we investigate two case studies. The first consists of a benchmark problem, where actual levels of rainfall in a given location and time are inferred from the content of tweets. The second one is a real-life task, where we infer regional Influenza-like Illness rates in the effort of detecting timely an emerging epidemic disease. Our analysis builds on a statistical learning framework, which performs sparse learning via the bootstrapped version of LASSO to select a consistent subset of textual features from a large amount of candidates. In both case studies, selected features indicate close semantic correlation with the target topics and inference, conducted by regression, has a significant performance, especially given the short length --approximately one year-- of Twitter’s data time series.

...read moreread less

209 citations

Proceedings Article•DOI•

Effects of the recession on public mood in the UK

[...]

Thomas Lansdall-Welfare¹, Vasileios Lampos¹, Nello Cristianini¹•Institutions (1)

University of Bristol¹

16 Apr 2012

TL;DR: A collection of 484 million tweets generated by more than 9.8 million users from the United Kingdom over the past 31 months, a period marked by economic downturn and some social tensions, shows that periodic events such as Christmas and Halloween evoke similar mood patterns every year.

...read moreread less

Abstract: Large scale analysis of social media content allows for real time discovery of macro-scale patterns in public opinion and sentiment. In this paper we analyse a collection of 484 million tweets generated by more than 9.8 million users from the United Kingdom over the past 31 months, a period marked by economic downturn and some social tensions. Our findings, besides corroborating our choice of method for the detection of public mood, also present intriguing patterns that can be explained in terms of events and social changes. On the one hand, the time series we obtain show that periodic events such as Christmas and Halloween evoke similar mood patterns every year. On the other hand, we see that a significant increase in negative mood indicators coincide with the announcement of the cuts to public spending by the government, and that this effect is still lasting. We also detect events such as the riots of summer 2011, as well as a possible calming effect coinciding with the run up to the royal wedding.

...read moreread less

77 citations

Journal Article•DOI•

Nowcasting the mood of the nation

[...]

Thomas Lansdall-Welfare¹, Vasileios Lampos¹, Nello Cristianini¹•Institutions (1)

University of Bristol¹

01 Aug 2012-Significance

TL;DR: Vast data‐streams from social networks like Twitter and Facebook contain a people's opinions, fears and dreams and Thomas Lansdall‐Welfare, Vasileios Lampos and Nello Cristianini exploit a whole new tool for social scientists.

...read moreread less

Abstract: Vast data-streams from social networks like Twitter and Facebook contain a people's opinions, fears and dreams. Thomas Lansdall-Welfare, Vasileios Lampos and Nello Cristianini exploit a whole new tool for social scientists.

...read moreread less

33 citations

Journal Article•DOI•

Learning to translate: a statistical and computational analysis

[...]

Marco Turchi¹, Tijl De Bie¹, Cyril Goutte², Nello Cristianini¹•Institutions (2)

University of Bristol¹, National Research Council²

01 Jan 2012-Advances in Artificial Intelligence

TL;DR: An extensive experimental study of Phrase-based Statistical Machine Translation, from the point of view of its learning capabilities, which confirms existing and mostly unpublished beliefs about the learning capabilities and provides insight into the way statistical machine translation learns from data.

...read moreread less

Abstract: We present an extensive experimental study of Phrase-based Statistical Machine Translation, from the point of view of its learning capabilities. Very accurate Learning Curves are obtained, using high-performance computing, and extrapolations of the projected performance of the system under different conditions are provided. Our experiments confirm existing and mostly unpublished beliefs about the learning capabilities of statistical machine translation systems. We also provide insight into the way statistical machine translation learns from data, including the respective influence of translation and language models, the impact of phrase length on performance, and various unlearning and perturbation analyses. Our results support and illustrate the fact that performance improves by a constant amount for each doubling of the data, across different language pairs, and different systems. This fundamental limitation seems to be a direct consequence of Zipf law governing textual data. Although the rate of improvement may depend on both the data and the estimation method, it is unlikely that the general shape of the learning curve will change withoutmajor changes in the modeling and inference phases. Possible research directions that address this issue include the integration of linguistic rules or the development of active learning procedures.

...read moreread less

17 citations

Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL)

[...]

Saatviga Sudhahar, Thomas Lansdall-Welfare, Ilias Flaounas, Nello Cristianini

01 Jan 2012

12 citations

International Conference on Pattern Recognition Applications and Methods

[...]

Thomas Lansdall-Welfare, Ilias Flaounas, Nello Cristianini

01 Jan 2012

10 citations

Journal Article•DOI•

Linguistic Phylogenetic Inference by PAM-like Matrices

[...]

Antonella Delmestri¹, Nello Cristianini²•Institutions (2)

University of Trento¹, University of Bristol²

13 Mar 2012-Journal of Quantitative Linguistics

TL;DR: The authors' results reproduce correctly all the established major language groups and subgroups present in the dataset, are compatible with the Indo-European benchmark tree and include also some of the supported higher-level structures.

...read moreread less

Abstract: We apply to the task of linguistic phylogenetic inference a successful cognate identification learning model based on point accepted mutation (PAM)-like matrices. We train our system and we employ the learned parameters for measuring the lexical distance between languages. We estimate phylogenetic trees using distance-based methods on an Indo-European database. Our results reproduce correctly all the established major language groups and subgroups present in the dataset, are compatible with the Indo-European benchmark tree and include also some of the supported higher-level structures. We review and compare other studies reported in the literature with regard to recognized aspects of the Indo-European language family.

...read moreread less

9 citations

Proceedings Article•

ElectionWatch: Detecting Patterns in News Coverage of US Elections

[...]

Saatviga Sudhahar¹, Thomas Lansdall-Welfare¹, Ilias Flaounas¹, Nello Cristianini¹•Institutions (1)

University of Bristol¹

23 Apr 2012

TL;DR: A web tool that allows users to explore news stories concerning the 2012 US Presidential Elections via an interactive interface based on concepts of "narrative analysis", where the key actors of a narration are identified, along with their relations, in what are sometimes called "semantic triplets".

...read moreread less

Abstract: We present a web tool that allows users to explore news stories concerning the 2012 US Presidential Elections via an interactive interface. The tool is based on concepts of "narrative analysis", where the key actors of a narration are identified, along with their relations, in what are sometimes called "semantic triplets" (one example of a triplet of this kind is "Romney Criticised Obama"). The network of actors and their relations can be mined for insights about the structure of the narration, including the identification of the key players, of the network of political support of each of them, a representation of the similarity of their political positions, and other information concerning their role in the media narration of events. The interactive interface allows the users to retrieve news report supporting the relations of interest.

...read moreread less

8 citations

Proceedings Article•

Scalable corpus annotation by graph construction and label propagation

[...]

Thomas Lansdall-Welfare, Ilias Flaounas, Nello Cristianini¹•Institutions (1)

University of Bristol¹

01 Jan 2012

TL;DR: This paper compares the effectiveness of various approaches to graph construction by building graphs of 800,000 vertices based on the Reuters corpus, showing that relation-based classification is competitive with Support Vector Machines, which can be considered as state of the art.

...read moreread less

Abstract: The efficient annotation of documents in vast corpora calls for scalable methods of text classification. Representing the documents in the form of graph vertices, rather than in the form of vectors in a bag of words space, allows for the necessary information to be pre-computed and stored. It also fundamentally changes the problem definition, from a content-based to a relation-based classification problem. Efficiently creating a graph where nearby documents are likely to have the same annotation is the central task of this paper. We compare the effectiveness of various approaches to graph construction by building graphs of 800,000 vertices based on the Reuters corpus, showing that relation-based classification is competitive with Support Vector Machines, which can be considered as state of the art. We further show that the combination of our relation-based approach and Support Vector Machines leads to an improvement over the methods individually.

...read moreread less

7 citations

Quantitative Narrative Analysis of US Elections in International News Media

[...]

Saatviga Sudhahar, Thomas Lansdall-Welfare, Ilias Flaounas, Nello Cristianini

20 Sep 2012

5 citations

Journal Article•DOI•

An intelligent Web agent that autonomously learns how to translate

[...]

Marco Turchi¹, Tijl De Bie², Nello Cristianini²•Institutions (2)

International Practical Shooting Confederation¹, University of Bristol²

01 Apr 2012-Web Intelligence and Agent Systems: An International Journal

TL;DR: The design of an autonomous agent that can teach itself how to translate from a foreign language, by first assembling its own training set, then using it to improve its vocabulary and language model is described.

...read moreread less

Abstract: We describe the design of an autonomous agent that can teach itself how to translate from a foreign language, by first assembling its own training set, then using it to improve its vocabulary and language model. The key idea is that a Statistical Machine Translation package can be used for the Cross-Language Retrieval Task of assembling a training set from a vast amount of available text e.g. a large multilingual corpus, or the Web and then train on that data, repeating the process several times. The stability issues related to such a feedback loop are addressed by a mathematical model, connecting statistical and control-theoretic aspects of the system. We test it on controlled environment and real-world tasks, showing that indeed this agent can improve its translation performance autonomously and in a stable fashion, when seeded with a very small initial training set. We develop a multiprocessor version of the agent that directly accesses the Web using a Web search engine and taking advantage of the big amount of data available there. The modelling approach we develop for this agent is general, and we believe that it will be useful for an entire class of self-learning autonomous agents working on the Web.

...read moreread less

Learning Machine Translation from In-domain and Out-of-domain Data

[...]

Marco Turchi, Cyril Goutte, Nello Cristianini

01 Jan 2012

TL;DR: This work analyzes and characterize the way in which the in-domain and out-of-domain performance of PBSMT is impacted when the amount of training data increases and indicates the translation model contributes about 30% more to the performance gain than the language model.

...read moreread less

Abstract: The performance of Phrase-Based Statistical Machine Translation (PBSMT) systems mostly depends on training data. Many papers have investigated how to create new resources in order to increase the size of the training corpus in an attempt to improve PBSMT performance. In this work, we analyse and characterize the way in which the in-domain and outof-domain performance of PBSMT is impacted when the amount of training data increases. Two different PBSMT systems, Moses and Portage, two of the largest parallel corpora, Giga (French-English) and UN (Chinese-English) datasets and several in- and out-of-domain test sets were used to build high quality learning curves showing consistent logarithmic growth in performance. These results are stable across language pairs, PBSMT systems and domains. We also analyse the respective impact of additional training data for estimating the language and translation models. Our proposed model approximates learning curves very well and indicates the translation model contributes about 30% more to the performance gain than the language model.

...read moreread less

Journal Article•DOI•

The NetCover algorithm for the reconstruction of causal networks

[...]

Nick Fyson¹, Tijl De Bie¹, Nello Cristianini¹•Institutions (1)

University of Bristol¹

01 Nov 2012-Neurocomputing

TL;DR: The NetCover algorithm is presented, a method for the reconstruction of networks based on the order of nodes visited by a stochastic branching process, and it is shown that, crucially, the neighbourhood of each node may be inferred in turn, with global consistency between network and data achieved through purely local considerations.

...read moreread less

Proceedings Article•

WHAT MAKES US CLICK? - Modelling and Predicting the Appeal of News Articles

[...]

Elena Hensinger¹, Ilias Flaounas¹, Nello Cristianini¹•Institutions (1)

University of Bristol¹

01 Jan 2012

TL;DR: It is discovered that UK tabloids and the website of the “People” magazine contain more appealing content for all audiences than broadsheet newspapers, news aggregators and newswires, and that this measure of readers’ preferences correlates with a measure of linguistic subjectivity at the level of outlets.

...read moreread less

Abstract: We model readers’ preferences for online news, and use these models to compare different news outlets with each other. The models are based on linear scoring functions, and are inferred by exploiting aggregate behavioural information about readers’ click choices for textual content of six given news outlets over one year of time. We generate one model per outlet, and while not extremely accurate – due to limited information – these models are shown to predict the click choices of readers, as well as to being stable over time. We use those six audience preference models in several ways: to compare how the audiences’ preferences of different outlets relate to each other; to score different news topics with respect to user appeal; to rank a large number of other news outlets with respect to their content appeal to all audiences; and to explain this measure by relating it to other metrics. We discover that UK tabloids and the website of the “People” magazine contain more appealing content for all audiences than broadsheet newspapers, news aggregators and newswires, and that this measure of readers’ preferences correlates with a measure of linguistic subjectivity at the level of outlets.

...read moreread less