scispace - formally typeset
Search or ask a question

Showing papers by "Ivor W. Tsang published in 2005"


Journal ArticleDOI
TL;DR: This paper shows that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry and obtains provably approximately optimal solutions with the idea of core sets, and proposes the proposed Core Vector Machine (CVM) algorithm, which can be used with nonlinear kernels and has a time complexity that is linear in m.
Abstract: Standard SVM training has O(m3) time and O(m2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. By observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, we scale up kernel methods by exploiting such "approximateness" in this paper. We first show that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry. Then, by adopting an efficient approximate MEB algorithm, we obtain provably approximately optimal solutions with the idea of core sets. Our proposed Core Vector Machine (CVM) algorithm can be used with nonlinear kernels and has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and real-world data sets demonstrate that the CVM is as accurate as existing SVM implementations, but is much faster and can handle much larger data sets than existing scale-up methods. For example, CVM with the Gaussian kernel produces superior results on the KDDCUP-99 intrusion detection data, which has about five million training patterns, in only 1.4 seconds on a 3.2GHz Pentium--4 PC.

1,017 citations


Proceedings ArticleDOI
27 Dec 2005
TL;DR: It is shown that RCA can also be kernelized, which then results in significant improvements when nonlinearities are needed, and becomes applicable to distance metric learning for structured objects that have no natural vectorial representation.
Abstract: Defining a good distance measure between patterns is of crucial importance in many classification and clustering algorithms. Recently, relevant component analysis (RCA) is proposed which offers a simple yet powerful method to learn this distance metric. However, it is confined to linear transforms in the input space. In this paper, we show that RCA can also be kernelized, which then results in significant improvements when nonlinearities are needed. Moreover, it becomes applicable to distance metric learning for structured objects that have no natural vectorial representation. Besides, it can be used in an incremental setting. Performance of this kernel method is evaluated on both toy and real-world data sets with encouraging results.

49 citations


Proceedings ArticleDOI
07 Aug 2005
TL;DR: The recently proposed Core Vector Machine algorithm is extended to the regression setting by generalizing the underlying minimum enclosing ball problem, and the resultant Core Vector Regression (CVR) algorithm is developed.
Abstract: In this paper, we extend the recently proposed Core Vector Machine algorithm to the regression setting by generalizing the underlying minimum enclosing ball problem. The resultant Core Vector Regression (CVR) algorithm can be used with any linear/nonlinear kernels and can obtain provably approximately optimal solutions. Its asymptotic time complexity is linear in the number of training patterns m, while its space complexity is independent of m. Experiments show that CVR has comparable performance with SVR, but is much faster and produces much fewer support vectors on very large data sets. It is also successfully applied to large 3D point sets in computer graphics for the modeling of implicit surfaces.

48 citations


Proceedings Article
01 Jan 2005
TL;DR: This paper scales up kernel methods by exploiting the “approximateness” in practical SVM implementations, formulate many kernel methods as equivalent minimum enclosing ball problems in computational geometry, and then obtain provably approximately optimal solutions efficiently with the use of core-sets.
Abstract: Standard SVM training has O(m) time and O(m) space complexities, where m is the training set size. In this paper, we scale up kernel methods by exploiting the “approximateness” in practical SVM implementations. We formulate many kernel methods as equivalent minimum enclosing ball problems in computational geometry, and then obtain provably approximately optimal solutions efficiently with the use of core-sets. Our proposed Core Vector Machine (CVM) algorithm has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and real-world data sets demonstrate that the CVM is much faster and can handle much larger data sets than existing scaleup methods. In particular, on our PC with only 512M RAM, the CVM with Gaussian kernel can process the checkerboard data set with 1 million points in less than 13 seconds.

39 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: This paper proposes and investigates a cost-effective and distributed algorithm to accurately estimate nodal positions for wireless sensor networks and is shown to have fast convergence with low estimation error, even for large networks.
Abstract: In wireless sensor networks, estimating nodal positions is important for routing efficiency and location-based services. Traditional techniques based on precise measurements are often expensive and power-inefficient, while approaches based on landmarks often require bandwidth-inefficient flooding and hence are not scalable for large networks. In this paper, we propose and investigate a cost-effective and distributed algorithm to accurately estimate nodal positions for wireless sensor networks. In our algorithm, a node only needs to identify and exchange information with a certain number of neighbors (around 30) in its proximity in order to estimate its relative nodal position accurately. For location-identification, only a small number of nodes (around 10) are needed to have additional GPS capabilities to accurately estimate the absolute position of every node in the network. Our algorithm is shown to have fast convergence with low estimation error, even for large networks

29 citations