Showing papers by "Nello Cristianini published in 2002"

PDF

Open Access

Journal Article•DOI•

Text classification using string kernels

[...]

Huma Lodhi¹, Craig Saunders¹, John Shawe-Taylor¹, Nello Cristianini¹, Chris Watkins¹ - Show less +1 more•Institutions (1)

Royal Holloway, University of London¹

01 Mar 2002-Journal of Machine Learning Research

TL;DR: A novel kernel is introduced for comparing two text documents consisting of an inner product in the feature space consisting of all subsequences of length k, which can be efficiently evaluated by a dynamic programming technique.

...read moreread less

Abstract: We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences that are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. Experimental comparisons of the performance of the kernel compared with a standard word feature space kernel (Joachims, 1998) show positive results on modestly sized datasets. The case of contiguous subsequences is also considered for comparison with the subsequences kernel with different decay factors. For larger documents and datasets the paper introduces an approximation technique that is shown to deliver good approximations efficiently for large datasets.

...read moreread less

1,281 citations

Proceedings Article•

Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis

[...]

Alexei Vinokourov¹, Nello Cristianini², John Shawe-Taylor¹•Institutions (2)

Royal Holloway, University of London¹, University of California, Davis²

01 Jan 2002

TL;DR: The problem of learning a semantic representation of a text document from data is addressed, in the situation where a corpus of unlabeled paired documents is available, each pair being formed by a short English document and its French translation.

...read moreread less

Abstract: The problem of learning a semantic representation of a text document from data is addressed, in the situation where a corpus of unlabeled paired documents is available, each pair being formed by a short English document and its French translation. This representation can then be used for any retrieval, categorization or clustering task, both in a standard and in a cross-lingual setting. By using kernel functions, in this case simple bag-of-words inner products, each part of the corpus is mapped to a high-dimensional space. The correlations between the two spaces are then learnt by using kernel Canonical Correlation Analysis. A set of directions is found in the first and in the second space that are maximally correlated. Since we assume the two representations are completely independent apart from the semantic content, any correlation between them should reflect some semantic similarity. Certain patterns of English words that relate to a specific meaning should correlate with certain patterns of French words corresponding to the same meaning, across the corpus. Using the semantic representation obtained in this way we first demonstrate that the correlations detected between the two versions of the corpus are significantly higher than random, and hence that a representation based on such features does capture statistical patterns that should reflect semantic information. Then we use such representation both in cross-language and in single-language retrieval tasks, observing performance that is consistently and significantly superior to LSI on the same data.

...read moreread less

287 citations

Optimizing Kernel Alignment over Combinations of Kernel

[...]

J. Kandola, John Shawe-Taylor, Nello Cristianini

01 Jan 2002

TL;DR: Both theoretical and experimental evidence is given to show that improving the alignment leads to a reduction in generalization error of standard classifiers.

...read moreread less

Abstract: Alignment has recently been proposed as a method for measuring the degree of agreement between a kernel and a learning task (Cristianini et al., 2001). Previous approaches to optimizing kernel alignment have required the eigendecomposition of the kernel matrix which can be computationally prohibitive especially for large kernel matrices. In this paper we propose a general method for optimizing alignment over a linear combination of kernels. We apply the approach to give both transductive and inductive algorithms based on the Incomplete Cholesky factorization of the kernel matrix. The Incomplete Cholesky factorization is equivalent to performing a Gram-Schmidt orthogonalization of the training points in the feature space. The alignment optimization method adapts the feature space to increase its training set alignment. Regularization is required to ensure this alignment is also retained for the test set. Both theoretical and experimental evidence is given to show that improving the alignment leads to a reduction in generalization error of standard classifiers.

...read moreread less

94 citations

On the Extensions of Kernel Alignment

[...]

J. Kandola, John Shawe-Taylor, Nello Cristianini

01 Jan 2002

TL;DR: This paper addresses the problem of measuring the degree of agreement between a kernel and a learning task, and derives a series of algorithms for adapting a kernel in two important machine learning problems: regression and classification with uneven datasets.

...read moreread less

Abstract: In this paper we address the problem of measuring the degree of agreement between a kernel and a learning task. The quantity that we use to capture this notion is alignment \cite{cris2001}. We motivate its theoretical properties, and derive a series of algorithms for adapting a kernel in two important machine learning problems: regression and classification with uneven datasets. We also propose a novel inductive algorithm within the framework of kernel alignment that can be used for kernel combination and kernel selection. The algorithms presented have been tested on both artificial and real-world datasets.

...read moreread less

44 citations

Journal Article•DOI•

Editorial: Kernel Methods: Current Research and Future Directions

[...]

Nello Cristianini, Colin Campbell, Chris Burges

11 Mar 2002-Machine Learning

17 citations

Journal Article•DOI•

Introduction to the special issue on kernel methods

[...]

Nello Cristianini, John Shawe-Taylor¹, Robert C. Williamson²•Institutions (2)

Royal Holloway, University of London¹, Australian National University²

01 Mar 2002-Journal of Machine Learning Research

TL;DR: This special issue arose from a workshop held at NIPS 2000 on New Directions in Kernel Methods, to provide a rapid but refereed route to publication for the papers presented at the workshop less than a year ago and to support the fledgling Journal of Machine Learning Research.

...read moreread less

Abstract: This special issue arose from a workshop held at NIPS 2000 on New Directions in Kernel Methods, though not all the submissions received were from talks at the workshop. With the great help of around forty referees we selected the following ten papers from some 28 submissions, an acceptance rate of 36%. The high number of submissions we received illustrates the vitality and popularity of the field of kernel methods in machine learning. We are pleased to be able to support the fledgling Journal of Machine Learning Research in this way and to provide a rapid but refereed route to publication for the papers presented at the workshop less than a year ago. The papers in the special issue cover a wide range of topics in kernel-based learning machines, but mostly reflect three of the main current research directions: exporting the design principles of standard Support Vector Machines to a variety of other algorithms, producing alternative and more efficient implementations, and deepening the theoretical understanding of kernel methods. The first five papers in the special issue describe extensions of the basic algorithms: Kernel Partial Least Squares Regression in RKHS by Roman Rosipal and Leonard J. Trejo describes the development of kernel partial least squares regression. This technique is similar to kernel PCA or latent semantic kernels, but the projection is chosen by modeling the relationship between input and output variables. The paper compares performance of a number of different projection methods and obtains encouraging results, particularly in terms of the number of dimensions required to obtain a certain level of performance. In Support Vector Clustering, Asa Ben-Hur, David Horn, Hava T. Siegelmann and Vladimir Vapnik present a novel clustering method using Support Vector Machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where the minimal enclosing sphere can be calculated. When mapped back to data space, this sphere can separate into several components, each enclosing a separate cluster of points. A simple algorithm for identifying these clusters is discussed and evaluated experimentally. One-Class SVMs for Document Classification by Larry M. Manevitz and Malik Yousef provides extensive experimentation comparing the SVM approach to one-class classification of text documents with more traditional methods such as nearest neighbour, naive Bayes and one more advanced neural network method based on ‘bottleneck’ compression. The

...read moreread less

3 citations