scispace - formally typeset
Search or ask a question

Showing papers by "Ethem Alpaydin published in 2011"


Journal Article
TL;DR: Overall, using multiple kernels instead of a single one is useful and it is believed that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.
Abstract: In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.

1,762 citations


Journal ArticleDOI
TL;DR: A new strategy for reducing LDA to Hotelling's canonical correlation analysis (CCA) is proposed, called within-class coupling CCA (WCCCA), which is to apply CCA to pairs of data samples that are most likely to belong to the same class.

26 citations


Book ChapterDOI
17 Jan 2011
TL;DR: It is shown that the multivariate tests have higher power than the univariate error test, that is, they can detect differences that the error test cannot, and it is shown how multivariate or univariate pairwise tests can be used as post-hoc tests after MANOVA to find cliques of algorithms, or order them along separate dimensions.
Abstract: The misclassification error which is usually used in tests to compare classification algorithms, does not make a distinction between the sources of error, namely, false positives and false negatives. Instead of summing these in a single number, we propose to collect multivariate statistics and use multivariate tests on them. Information retrieval uses the measures of precision and recall, and signal detection uses true positive rate (tpr) and false positive rate (fpr) and a multivariate test can also use such two values instead of combining them in a single value, such as error or average precision. For example, we can have bivariate tests for (precision, recall) or (tpr, fpr). We propose to use the pairwise test based on Hotelling's multivariate T2 test to compare two algorithms or multivariate analysis of variance (MANOVA) to compare L>2 algorithms. In our experiments, we show that the multivariate tests have higher power than the univariate error test, that is, they can detect differences that the error test cannot, and we also discuss how the decisions made by different multivariate tests differ, to be able to point out where to use which. We also show how multivariate or univariate pairwise tests can be used as post-hoc tests after MANOVA to find cliques of algorithms, or order them along separate dimensions.

25 citations


Journal ArticleDOI
TL;DR: This work compares multiple kernel learning and the proposed regularized variant in terms of accuracy, support vector count, and the number of kernels selected and sees that the proposed variant achieves statistically similar or higher accuracy results by using fewer kernel functions and/or support vectors through suitable regularization.

15 citations


Book ChapterDOI
01 Jan 2011
TL;DR: This paper compares the algorithm with the standard ECOC approach, using Neural Networks (NNs) as the base classifiers, and shows that it improves the accuracy for some well-known data sets under different settings.
Abstract: Error Correcting Output Coding (ECOC) is a multiclass classification technique, in which multiple base classifiers (dichotomizers) are trained using subsets of the training data, determined by a preset code matrix. While it is one of the best solutions to multiclass problems, ECOC is suboptimal, as the code matrix and the base classifiers are not learned simultaneously. In this paper, we show an iterative update algorithm that reduces this decoupling. We compare the algorithm with the standard ECOC approach, using Neural Networks (NNs) as the base classifiers, and show that it improves the accuracy for some well-known data sets under different settings.

10 citations


Journal ArticleDOI
01 May 2011
TL;DR: This paper discusses learning algorithms together with some example applications, as well as the current challenges and research areas in machine learning.
Abstract: Machine learning is already a mature field with significant theoretical work and an impressive suite of applications. I will discuss learning algorithms together with some example applications, as well as the current challenges and research areas. WIREs Comp Stat 2011 3 195-203 DOI: 10.1002/wics.166

8 citations


Journal ArticleDOI
TL;DR: Investigation of two kinds of classifier systems which are capable of estimating how much to weight each base classifier dynamically during the calculation of the overall output for a given test data instance shows that, by using well-trained selection unit (referee or gating), the study shows that one can get as high accuracy as using all the base classifiers with drastic decrease in the number of baseclassifiers used, and also improve accuracy.

8 citations


Proceedings ArticleDOI
01 Jan 2011
TL;DR: This paper investigates various approaches to data sampling and dimensionality reduction for discriminative language models (DLM), and exploits ranking perceptron and ranking SVM as two alternative discrim inative modeling techniques, and applies data sampling to improve their efficiency.
Abstract: This paper investigates various approaches to data sampling and dimensionality reduction for discriminative language models (DLM). Being a feature based language modeling approach, the aim of DLM is to rerank the ASR output with discriminatively trained feature parameters. Using a Turkish morphology based feature set, we examine the use of online Principal Component Analysis (PCA) as a dimensionality reduction method. We exploit ranking perceptron and ranking SVM as two alternative discriminative modeling techniques, and apply data sampling to improve their efficiency. We obtain a reduction in word error rate (WER) of 0.4%, significant at p < 0.001 over the baseline perceptron result.

3 citations