Showing papers by "Nello Cristianini published in 2009"

PDF

Open Access

Journal Article•DOI•

[...]

Alessia Mammone¹, Marco Turchi², Nello Cristianini²•Institutions (2)

Sapienza University of Rome¹, University of Bristol²

01 Nov 2009-Wiley Interdisciplinary Reviews: Computational Statistics

TL;DR: Support vector machines are a family of machine learning methods originally introduced for the problem of classification and later generalized to various other situations, and are currently used in various domains of application, including bioinformatics, text categorization, and computer vision.

...read moreread less

Abstract: Support vector machines (SVMs) are a family of machine learning methods, originally introduced for the problem of classification and later generalized to various other situations. They are based on principles of statistical learning theory and convex optimization, and are currently used in various domains of application, including bioinformatics, text categorization, and computer vision. Copyright © 2009 John Wiley & Sons, Inc. For further resources related to this article, please visit the WIREs website.

...read moreread less

323 citations

Estimating the Sentence-Level Quality of Machine Translation Systems

[...]

Lucia Specia, Marco Turchi, Nicola Cancedda, Nello Cristianini, Marc Dymetman - Show less +1 more

01 Jan 2009

TL;DR: Results show that the proposed method allows obtaining good estimates and that identifying a reduced set of relevant features plays an important role in predicting the quality of sentences produced by machine translation systems when reference translations are not available.

...read moreread less

Abstract: We investigate the problem of predicting the quality of sentences produced by machine translation systems when reference translations are not available. The problem is addressed as a regression task and a method that takes into account the contribution of different features is proposed. We experiment with this method for translations produced by various MT systems and different language pairs, annotated with quality scores both automatically and manually. Results show that our method allows obtaining good estimates and that identifying a reduced set of relevant features plays an important role. The experiments also highlight a number of outstanding features that were consistently selected as the most relevant and could be used in different ways to improve MT performance or to enhance MT evaluation.

...read moreread less

267 citations

From frequent itemsets to informative patterns

[...]

Gallo Arianna, Mammone Alessia, Bie Tijl De, Turchi Marco, Nello Cristianini - Show less +1 more

01 Jan 2009

10 citations

Learning to translate: a statistical and computational analysis.

[...]

Marco Turchi, Tijl De Bie, Nello Cristianini

01 Jan 2009

TL;DR: This paper presented an extensive experimental study of phrase-based statistical machine translation, from the point of view of its learning capabilities, and provided very accurate Learning Curves, using high-performance computing, and extrapolations of the projected performance of the system under different conditions.

...read moreread less

Abstract: We present an extensive experimental study of Phrase-based Statistical Machine Translation, from the point of view of its learning capabilities. Very accurate Learning Curves are obtained, using high-performance computing, and extrapolations of the projected performance of the system under different conditions are provided. Our experiments confirm existing and mostly unpublished beliefs about the learning capabilities of statistical machine translation systems. We also provide insight into the way statistical machine translation learns from data, including the respective influence of translation and language models, the impact of phrase length on performance, and various unlearning and perturbation analyses. Our results support and illustrate the fact that performance improves by a constant amount for each doubling of the data, across different language pairs, and different systems. This fundamental limitation seems to be a direct consequence of Zipf law governing textual data. Although the rate of improvement may depend on both the data and the estimation method, it is unlikely that the general shape of the learning curve will change withoutmajor changes in the modeling and inference phases. Possible research directions that address this issue include the integration of linguistic rules or the development of active learning procedures.

...read moreread less

7 citations

Book Chapter•DOI•

Inference and Validation of Networks

[...]

Ilias Flaounas¹, Marco Turchi¹, Tijl De Bie¹, Nello Cristianini¹•Institutions (1)

University of Bristol¹

30 Aug 2009

TL;DR: A statistical methodology to validate the result of network inference algorithms, based on principles of statistical testing and machine learning is developed, for the case of inferring a network of News Outlets based on their preference of stories to cover.

...read moreread less

Abstract: We develop a statistical methodology to validate the result of network inference algorithms, based on principles of statistical testing and machine learning. The comparison of results with reference networks, by means of similarity measures and null models, allows us to measure the significance of results, as well as their predictive power. The use of Generalised Linear Models allows us to explain the results in terms of available ground truth which we expect to be partially relevant. We present these methods for the case of inferring a network of News Outlets based on their preference of stories to cover. We compare three simple network inference methods and show how our technique can be used to choose between them. All the methods presented here can be directly applied to other domains where network inference is used.

...read moreread less

7 citations

Proceedings Article•DOI•

An Intelligent Agent That Autonomously Learns How to Translate

[...]

Marco Turchi¹, Tijl De Bie¹, Nello Cristianini•Institutions (1)

University of Bristol¹

15 Sep 2009

TL;DR: The design of an autonomous agent that can teach itself how to translate from a foreign language, by first assembling its own training set, then using it to improve its vocabulary and language model is described.

...read moreread less

Abstract: We describe the design of an autonomous agent that can teach itself how to translate from a foreign language, by first assembling its own training set, then using it to improve its vocabulary and language model. The key idea is that a Statistical Machine Translation package can be used for the Cross-Language Retrieval Task of assembling a training set from a vast amount of available text (e.g. a large multilingual corpus, or the Web) and then train on that data, repeating that process several times. The stability issues related to such a feedback loop are addressed by a mathematical model, connecting statistical and control-theoretic aspects of the system. We test it on real-world tasks, showing that indeed this agent can improve its translation performance autonomously and in a stable fashion, when seeded with a very small initial training set. The modelling approach we develop for this agent is general, and we believe will be useful for an entire class of self-learning autonomous agents working on the Web.

...read moreread less

7 citations

Book Chapter•DOI•

Analysis of Text Patterns Using Kernel Methods

[...]

Alessia Mammone, Marco Turchi, Nello Cristianini

15 Jun 2009

3 citations