Showing papers by "Ethem Alpaydin published in 2009"

PDF

Open Access

Journal Article•DOI•

Incremental construction of classifier and discriminant ensembles

[...]

Aydin Ulas¹, Murat Semerci², Olcay Taner Yildiz³, Ethem Alpaydin¹•Institutions (3)

Boğaziçi University¹, Rensselaer Polytechnic Institute², Işık University³

01 Apr 2009-Information Sciences

TL;DR: It is seen that an incremental ensemble has higher accuracy than bagging and random subspace method; and it has a comparable accuracy to AdaBoost, but fewer classifiers.

...read moreread less

74 citations

Journal Article•DOI•

An incremental framework based on cross-validation for estimating the architecture of a multilayer perceptron

[...]

Oya Aran¹, Olcay Taner Yildiz¹, Ethem Alpaydin¹•Institutions (1)

Boğaziçi University¹

01 Mar 2009-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: The MOST (Multiple Operators using Statistical Tests) framework that incrementally modifies the structure and checks for improvement using cross-validation is proposed and shows that MOST variants generally find simpler networks having lower or comparable error rates than DNC and CC.

...read moreread less

Abstract: We define the problem of optimizing the architecture of a multilayer perceptron (MLP) as a state space search and propose the MOST (Multiple Operators using Statistical Tests) framework that incrementally modifies the structure and checks for improvement using cross-validation. We consider five variants that implement forward/backward search, using single/multiple operators and searching depth-first/breadth-first. On 44 classification and 30 regression datasets, we exhaustively search for the optimal and evaluate the goodness based on: (1) Order, the accuracy with respect to the optimal and (2) Rank, the computational complexity. We check for the effect of two resampling methods (5 × 2, ten-fold cv), four statistical tests (5 × 2 cv t, ten-fold cv t, Wilcoxon, sign) and two corrections for multiple comparisons (Bonferroni, Holm). We also compare with Dynamic Node Creation (DNC) and Cascade Correlation (CC). Our results show that: (1) On most datasets, networks with few hidden units are optimal, (2) forward searching finds simpler architectures, (3) variants using single node additions (deletions) generally stop early and get stuck in simple (complex) networks, (4) choosing the best of multiple operators finds networks closer to the optimal, (5) MOST variants generally find simpler networks having lower or comparable error rates than DNC and CC.

...read moreread less

26 citations

Journal Article•DOI•

Machine learning integration for predicting the effect of single amino acid substitutions on protein stability

[...]

Ayşegül Özen¹, Mehmet Gönen¹, Ethem Alpaydin¹, Turkan Haliloglu¹•Institutions (1)

Boğaziçi University¹

19 Oct 2009-BMC Structural Biology

TL;DR: Overall accuracy of regression is not better than that of classification but it has less false positives, especially when combined with the reject option, whereas in both early and late integration, combining inputs or decisions is useful in increasing accuracy.

...read moreread less

Abstract: Background: Computational prediction of protein stability change due to single-site amino acid substitutions is of interest in protein design and analysis. We consider the following four ways to improve the performance of the currently available predictors: (1) We include additional sequenceand structure-based features, namely, the amino acid substitution likelihoods, the equilibrium fluctuations of the alpha- and beta-carbon atoms, and the packing density. (2) By implementing different machine learning integration approaches, we combine information from different features or representations. (3) We compare classification vs. regression methods to predict the sign vs. the output of stability change. (4) We allow a reject option for doubtful cases where the risk of misclassification is high. Results: We investigate three different approaches: early, intermediate and late integration, which respectively combine features, kernels over feature subsets, and decisions. We perform simulations on two data sets: (1) S1615 is used in previous studies, (2) S2783 is the updated version (as of July 2, 2009) extracted also from ProTherm. For S1615 data set, our highest accuracy using both sequence and structure information is 0.842 on cross-validation and 0.904 on testing using early integration. Newly added features, namely, local compositional packing and the mobility extent of the mutated residues, improve accuracy significantly with intermediate integration. For S2783 data set, we also train regression methods to estimate not only the sign but also the amount of stability change and apply risk-based classification to reject when the learner has low confidence and the loss of misclassification is high. The highest accuracy is 0.835 on cross-validation and 0.832 on testing using only sequence information. The percentage of false positives can be decreased to less than 0.005 by rejecting 10 per cent using late integration. Conclusion: We find that in both early and late integration, combining inputs or decisions is useful in increasing accuracy. Intermediate integration allows assessing the contributions of individual features by looking at the assigned weights. Overall accuracy of regression is not better than that of classification but it has less false positives, especially when combined with the reject option. The server for stability prediction for three integration approaches and the data sets are available at

...read moreread less

24 citations

Proceedings Article•DOI•

Calculating the VC-dimension of decision trees

[...]

Olcay Taner Yildiz¹, Ethem Alpaydin²•Institutions (2)

Işık University¹, Boğaziçi University²

23 Oct 2009

TL;DR: An exhaustive search algorithm is proposed that calculates the VC-dimension of univariate decision trees with binary features and shows that SRM-pruning using the estimated VC-dimensions finds trees that are as accurate as those pruned using cross-validation.

...read moreread less

Abstract: We propose an exhaustive search algorithm that calculates the VC-dimension of univariate decision trees with binary features. The VC-dimension of the univariate decision tree with binary features depends on (i) the VC-dimension values of the left and right subtrees, (ii) the number of inputs, and (iii) the number of nodes in the tree. From a training set of example trees whose VC-dimensions are calculated by exhaustive search, we fit a general regressor to estimate the VC-dimension of any binary tree. These VC-dimension estimates are then used to get VC-generalization bounds for complexity control using SRM in decision trees, i.e., pruning. Our simulation results shows that SRM-pruning using the estimated VC-dimensions finds trees that are as accurate as those pruned using cross-validation.

...read moreread less

21 citations

Multiple Kernel Machines Using Localized Kernels

[...]

Ethem Alpaydin¹•Institutions (1)

Boğaziçi University¹

01 Jan 2009

TL;DR: P X m=1 m n X i=1 i y i h m (x); m (X i )i | {z } K m ( x;x i ) +b Unweighted sum m = 1 8m Weighted sum (1) p P m =1 m = 2 and m 0 8m

...read moreread less

Abstract: Multiple kernel learning (Mkl) uses a convex combination of kernels where the weight of each kernel is optimized during training. However, Mkl assigns the same weight to a kernel over the whole in- put space. Localized multiple kernel learning (Lmkl) framework extends the Mkl framework to allow combining kernels with difierent weights in difierent regions of the input space by using a gating model. Lmkl extracts the relative importance of kernels in each region whereas Mkl gives their relative importance over the whole input space. In this paper, we generalize the Lmkl framework with a kernel-based gating model and derive the learning algorithm for binary classiflcation. Empirical results on toy classiflcation problems are used to illustrate the algorithm. Ex- periments on two bioinformatics data sets are performed to show that kernel machines can also be localized in a data-dependent way by using kernel values as gating model features. The localized variant achieves signiflcantly higher accuracy on one of the bioinformatics data sets.

...read moreread less

12 citations