scispace - formally typeset
Search or ask a question

Showing papers by "Nello Cristianini published in 2002"


Journal ArticleDOI
TL;DR: A novel kernel is introduced for comparing two text documents consisting of an inner product in the feature space consisting of all subsequences of length k, which can be efficiently evaluated by a dynamic programming technique.
Abstract: We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those occurrences that are close to contiguous. A direct computation of this feature vector would involve a prohibitive amount of computation even for modest values of k, since the dimension of the feature space grows exponentially with k. The paper describes how despite this fact the inner product can be efficiently evaluated by a dynamic programming technique. Experimental comparisons of the performance of the kernel compared with a standard word feature space kernel (Joachims, 1998) show positive results on modestly sized datasets. The case of contiguous subsequences is also considered for comparison with the subsequences kernel with different decay factors. For larger documents and datasets the paper introduces an approximation technique that is shown to deliver good approximations efficiently for large datasets.

1,281 citations


Proceedings Article
01 Jan 2002
TL;DR: The problem of learning a semantic representation of a text document from data is addressed, in the situation where a corpus of unlabeled paired documents is available, each pair being formed by a short English document and its French translation.
Abstract: The problem of learning a semantic representation of a text document from data is addressed, in the situation where a corpus of unlabeled paired documents is available, each pair being formed by a short English document and its French translation. This representation can then be used for any retrieval, categorization or clustering task, both in a standard and in a cross-lingual setting. By using kernel functions, in this case simple bag-of-words inner products, each part of the corpus is mapped to a high-dimensional space. The correlations between the two spaces are then learnt by using kernel Canonical Correlation Analysis. A set of directions is found in the first and in the second space that are maximally correlated. Since we assume the two representations are completely independent apart from the semantic content, any correlation between them should reflect some semantic similarity. Certain patterns of English words that relate to a specific meaning should correlate with certain patterns of French words corresponding to the same meaning, across the corpus. Using the semantic representation obtained in this way we first demonstrate that the correlations detected between the two versions of the corpus are significantly higher than random, and hence that a representation based on such features does capture statistical patterns that should reflect semantic information. Then we use such representation both in cross-language and in single-language retrieval tasks, observing performance that is consistently and significantly superior to LSI on the same data.

287 citations


01 Jan 2002
TL;DR: Both theoretical and experimental evidence is given to show that improving the alignment leads to a reduction in generalization error of standard classifiers.
Abstract: Alignment has recently been proposed as a method for measuring the degree of agreement between a kernel and a learning task (Cristianini et al., 2001). Previous approaches to optimizing kernel alignment have required the eigendecomposition of the kernel matrix which can be computationally prohibitive especially for large kernel matrices. In this paper we propose a general method for optimizing alignment over a linear combination of kernels. We apply the approach to give both transductive and inductive algorithms based on the Incomplete Cholesky factorization of the kernel matrix. The Incomplete Cholesky factorization is equivalent to performing a Gram-Schmidt orthogonalization of the training points in the feature space. The alignment optimization method adapts the feature space to increase its training set alignment. Regularization is required to ensure this alignment is also retained for the test set. Both theoretical and experimental evidence is given to show that improving the alignment leads to a reduction in generalization error of standard classifiers.

94 citations


01 Jan 2002
TL;DR: This paper addresses the problem of measuring the degree of agreement between a kernel and a learning task, and derives a series of algorithms for adapting a kernel in two important machine learning problems: regression and classification with uneven datasets.
Abstract: In this paper we address the problem of measuring the degree of agreement between a kernel and a learning task. The quantity that we use to capture this notion is alignment \cite{cris2001}. We motivate its theoretical properties, and derive a series of algorithms for adapting a kernel in two important machine learning problems: regression and classification with uneven datasets. We also propose a novel inductive algorithm within the framework of kernel alignment that can be used for kernel combination and kernel selection. The algorithms presented have been tested on both artificial and real-world datasets.

44 citations



Journal ArticleDOI
TL;DR: This special issue arose from a workshop held at NIPS 2000 on New Directions in Kernel Methods, to provide a rapid but refereed route to publication for the papers presented at the workshop less than a year ago and to support the fledgling Journal of Machine Learning Research.
Abstract: This special issue arose from a workshop held at NIPS 2000 on New Directions in Kernel Methods, though not all the submissions received were from talks at the workshop. With the great help of around forty referees we selected the following ten papers from some 28 submissions, an acceptance rate of 36%. The high number of submissions we received illustrates the vitality and popularity of the field of kernel methods in machine learning. We are pleased to be able to support the fledgling Journal of Machine Learning Research in this way and to provide a rapid but refereed route to publication for the papers presented at the workshop less than a year ago. The papers in the special issue cover a wide range of topics in kernel-based learning machines, but mostly reflect three of the main current research directions: exporting the design principles of standard Support Vector Machines to a variety of other algorithms, producing alternative and more efficient implementations, and deepening the theoretical understanding of kernel methods. The first five papers in the special issue describe extensions of the basic algorithms: Kernel Partial Least Squares Regression in RKHS by Roman Rosipal and Leonard J. Trejo describes the development of kernel partial least squares regression. This technique is similar to kernel PCA or latent semantic kernels, but the projection is chosen by modeling the relationship between input and output variables. The paper compares performance of a number of different projection methods and obtains encouraging results, particularly in terms of the number of dimensions required to obtain a certain level of performance. In Support Vector Clustering, Asa Ben-Hur, David Horn, Hava T. Siegelmann and Vladimir Vapnik present a novel clustering method using Support Vector Machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where the minimal enclosing sphere can be calculated. When mapped back to data space, this sphere can separate into several components, each enclosing a separate cluster of points. A simple algorithm for identifying these clusters is discussed and evaluated experimentally. One-Class SVMs for Document Classification by Larry M. Manevitz and Malik Yousef provides extensive experimentation comparing the SVM approach to one-class classification of text documents with more traditional methods such as nearest neighbour, naive Bayes and one more advanced neural network method based on ‘bottleneck’ compression. The

3 citations