Search or ask a question

Showing papers by "Kai-Wei Chang published in 2010"

PDF

Open Access

Journal Article•

Training and Testing Low-degree Polynomial Data Mappings via Linear SVM

[...]

Yin-Wen Chang¹, Cho-Jui Hsieh¹, Kai-Wei Chang¹, Michael Ringgaard², Chih-Jen Lin¹ - Show less +1 more•Institutions (2)

National Taiwan University¹, Google²

01 Mar 2010-Journal of Machine Learning Research

TL;DR: The proposed fast linear-SVM methods are applied to the explicit form of polynomially mapped data and successfully applied to a natural language processing (NLP) application by improving the testing accuracy under some training/testing speed requirements.

...read moreread less

Abstract: Kernel techniques have long been used in SVM to handle linearly inseparable problems by transforming data to a high dimensional space, but training and testing large data sets is often time consuming. In contrast, we can efficiently train and test much larger data sets using linear SVM without kernels. In this work, we apply fast linear-SVM methods to the explicit form of polynomially mapped data and investigate implementation issues. The approach enjoys fast training and testing, but may sometimes achieve accuracy close to that of using highly nonlinear kernels. Empirical experiments show that the proposed method is useful for certain large-scale data sets. We successfully apply the proposed method to a natural language processing (NLP) application by improving the testing accuracy under some training/testing speed requirements.

...read moreread less

486 citations

Journal Article•

A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification

[...]

Guo-Xun Yuan, Kai-Wei Chang, Cho-Jui Hsieh, Chih-Jen Lin

01 Mar 2010-Journal of Machine Learning Research

TL;DR: Extensive comparisons indicate that carefully implemented coordinate descent methods are very suitable for training large document data.

...read moreread less

Abstract: Large-scale linear classification is widely used in many areas. The L1-regularized form can be applied for feature selection; however, its non-differentiability causes more difficulties in training. Although various optimization methods have been proposed in recent years, these have not yet been compared suitably. In this paper, we first broadly review existing methods. Then, we discuss state-of-the-art software packages in detail and propose two efficient implementations. Extensive comparisons indicate that carefully implemented coordinate descent methods are very suitable for training large document data.

...read moreread less

273 citations

Proceedings Article•DOI•

Large linear classification when data cannot fit in memory

[...]

Hsiang-Fu Yu¹, Cho-Jui Hsieh¹, Kai-Wei Chang¹, Chih-Jen Lin¹•Institutions (1)

National Taiwan University¹

25 Jul 2010

TL;DR: This work proposes and analyzes a block minimization framework for data larger than the memory size, and investigates two implementations of the proposed framework for primal and dual SVMs, respectively.

...read moreread less

Abstract: Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. As data cannot fit in memory, many design considerations are very different from those for traditional algorithms. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method.

...read moreread less

42 citations

Patent•

Efficient polynomial mapping of data for use with linear support vector machines

[...]

Yin-Wen Chang¹, Cho-Jui Hsieh¹, Kai-Wei Chang¹, Michael Ringgaard¹, Chih-Jen Lin¹ - Show less +1 more•Institutions (1)

Google¹

29 Jul 2010

6 citations

Training and Testing Low-degree Polynomial Data Mappings via

[...]

Linear Svm, Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, Amphitheatre Parkway, Chih-Jen Lin - Show less +3 more

01 Jan 2010

TL;DR: In this article, the authors apply fast linear-SVM methods to the explicit form of polynomially mapped data and investigate implementation issues, which may sometimes achieve accuracy close to that of using highly nonlinear kernels.

...read moreread less

Abstract: Kernel techniques have long been used in SVM to handle linearly inseparable problems by transforming data to a high dimensional space, but training and testing large data sets is often time consuming. In contrast, we can efficiently train and test muc h larger data sets using linear SVM without kernels. In this work, we apply fast linear-SVM methods to the explicit form of polynomially mapped data and investigate implementation issues. The approach enjoys fast training and testing, but may sometimes achieve accuracy close to that of using highly nonlinear kernels. Empirical experiments show that the proposed method is useful for certain large-scale data sets. We successfully apply the proposed method to a natural language processing (NLP) application by improving the testing accuracy under some training/testing speed requirements.

...read moreread less