Showing papers by "Ivor W. Tsang published in 2005"

PDF

Open Access

Journal Article•DOI•

Core Vector Machines: Fast SVM Training on Very Large Data Sets

[...]

Ivor W. Tsang¹, James T. Kwok¹, Pak-Ming Cheung¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Dec 2005-Journal of Machine Learning Research

TL;DR: This paper shows that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry and obtains provably approximately optimal solutions with the idea of core sets, and proposes the proposed Core Vector Machine (CVM) algorithm, which can be used with nonlinear kernels and has a time complexity that is linear in m.

...read moreread less

Abstract: Standard SVM training has O(m3) time and O(m2) space complexities, where m is the training set size. It is thus computationally infeasible on very large data sets. By observing that practical SVM implementations only approximate the optimal solution by an iterative strategy, we scale up kernel methods by exploiting such "approximateness" in this paper. We first show that many kernel methods can be equivalently formulated as minimum enclosing ball (MEB) problems in computational geometry. Then, by adopting an efficient approximate MEB algorithm, we obtain provably approximately optimal solutions with the idea of core sets. Our proposed Core Vector Machine (CVM) algorithm can be used with nonlinear kernels and has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and real-world data sets demonstrate that the CVM is as accurate as existing SVM implementations, but is much faster and can handle much larger data sets than existing scale-up methods. For example, CVM with the Gaussian kernel produces superior results on the KDDCUP-99 intrusion detection data, which has about five million training patterns, in only 1.4 seconds on a 3.2GHz Pentium--4 PC.

...read moreread less

1,017 citations

Proceedings Article•DOI•

Kernel relevant component analysis for distance metric learning

[...]

Ivor W. Tsang¹, Pak-Ming Cheung¹, James T. Kwok¹•Institutions (1)

Hong Kong University of Science and Technology¹

27 Dec 2005

TL;DR: It is shown that RCA can also be kernelized, which then results in significant improvements when nonlinearities are needed, and becomes applicable to distance metric learning for structured objects that have no natural vectorial representation.

...read moreread less

Abstract: Defining a good distance measure between patterns is of crucial importance in many classification and clustering algorithms. Recently, relevant component analysis (RCA) is proposed which offers a simple yet powerful method to learn this distance metric. However, it is confined to linear transforms in the input space. In this paper, we show that RCA can also be kernelized, which then results in significant improvements when nonlinearities are needed. Moreover, it becomes applicable to distance metric learning for structured objects that have no natural vectorial representation. Besides, it can be used in an incremental setting. Performance of this kernel method is evaluated on both toy and real-world data sets with encouraging results.

...read moreread less

49 citations

Proceedings Article•DOI•

Core Vector Regression for very large regression problems

[...]

Ivor W. Tsang¹, James T. Kwok¹, Kimo T. Lai¹•Institutions (1)

Hong Kong University of Science and Technology¹

07 Aug 2005

TL;DR: The recently proposed Core Vector Machine algorithm is extended to the regression setting by generalizing the underlying minimum enclosing ball problem, and the resultant Core Vector Regression (CVR) algorithm is developed.

...read moreread less

Abstract: In this paper, we extend the recently proposed Core Vector Machine algorithm to the regression setting by generalizing the underlying minimum enclosing ball problem. The resultant Core Vector Regression (CVR) algorithm can be used with any linear/nonlinear kernels and can obtain provably approximately optimal solutions. Its asymptotic time complexity is linear in the number of training patterns m, while its space complexity is independent of m. Experiments show that CVR has comparable performance with SVR, but is much faster and produces much fewer support vectors on very large data sets. It is also successfully applied to large 3D point sets in computer graphics for the modeling of implicit surfaces.

...read moreread less

48 citations

Proceedings Article•

Very Large SVM Training using Core Vector Machines.

[...]

Ivor W. Tsang, James T. Kwok, Pak-Ming Cheung

01 Jan 2005

TL;DR: This paper scales up kernel methods by exploiting the “approximateness” in practical SVM implementations, formulate many kernel methods as equivalent minimum enclosing ball problems in computational geometry, and then obtain provably approximately optimal solutions efficiently with the use of core-sets.

...read moreread less

Abstract: Standard SVM training has O(m) time and O(m) space complexities, where m is the training set size. In this paper, we scale up kernel methods by exploiting the “approximateness” in practical SVM implementations. We formulate many kernel methods as equivalent minimum enclosing ball problems in computational geometry, and then obtain provably approximately optimal solutions efficiently with the use of core-sets. Our proposed Core Vector Machine (CVM) algorithm has a time complexity that is linear in m and a space complexity that is independent of m. Experiments on large toy and real-world data sets demonstrate that the CVM is much faster and can handle much larger data sets than existing scaleup methods. In particular, on our PC with only 512M RAM, the CVM with Gaussian kernel can process the checkerboard data set with 1 million points in less than 13 seconds.

...read moreread less

39 citations

Proceedings Article•DOI•

Position estimation for wireless sensor networks

[...]

K.-F.S. Wong¹, Ivor W. Tsang¹, Victor Cheung¹, Shueng Han Gary Chan¹, James T. Kwok¹ - Show less +1 more•Institutions (1)

Hong Kong University of Science and Technology¹

01 Jan 2005

TL;DR: This paper proposes and investigates a cost-effective and distributed algorithm to accurately estimate nodal positions for wireless sensor networks and is shown to have fast convergence with low estimation error, even for large networks.

...read moreread less

Abstract: In wireless sensor networks, estimating nodal positions is important for routing efficiency and location-based services. Traditional techniques based on precise measurements are often expensive and power-inefficient, while approaches based on landmarks often require bandwidth-inefficient flooding and hence are not scalable for large networks. In this paper, we propose and investigate a cost-effective and distributed algorithm to accurately estimate nodal positions for wireless sensor networks. In our algorithm, a node only needs to identify and exchange information with a certain number of neighbors (around 30) in its proximity in order to estimate its relative nodal position accurately. For location-identification, only a small number of nodes (around 10) are needed to have additional GPS capabilities to accurately estimate the absolute position of every node in the network. Our algorithm is shown to have fast convergence with low estimation error, even for large networks

...read moreread less

29 citations