scispace - formally typeset
Search or ask a question

Showing papers by "Ethem Alpaydin published in 2009"


Journal ArticleDOI
TL;DR: It is seen that an incremental ensemble has higher accuracy than bagging and random subspace method; and it has a comparable accuracy to AdaBoost, but fewer classifiers.

74 citations


Journal ArticleDOI
TL;DR: The MOST (Multiple Operators using Statistical Tests) framework that incrementally modifies the structure and checks for improvement using cross-validation is proposed and shows that MOST variants generally find simpler networks having lower or comparable error rates than DNC and CC.
Abstract: We define the problem of optimizing the architecture of a multilayer perceptron (MLP) as a state space search and propose the MOST (Multiple Operators using Statistical Tests) framework that incrementally modifies the structure and checks for improvement using cross-validation. We consider five variants that implement forward/backward search, using single/multiple operators and searching depth-first/breadth-first. On 44 classification and 30 regression datasets, we exhaustively search for the optimal and evaluate the goodness based on: (1) Order, the accuracy with respect to the optimal and (2) Rank, the computational complexity. We check for the effect of two resampling methods (5 × 2, ten-fold cv), four statistical tests (5 × 2 cv t, ten-fold cv t, Wilcoxon, sign) and two corrections for multiple comparisons (Bonferroni, Holm). We also compare with Dynamic Node Creation (DNC) and Cascade Correlation (CC). Our results show that: (1) On most datasets, networks with few hidden units are optimal, (2) forward searching finds simpler architectures, (3) variants using single node additions (deletions) generally stop early and get stuck in simple (complex) networks, (4) choosing the best of multiple operators finds networks closer to the optimal, (5) MOST variants generally find simpler networks having lower or comparable error rates than DNC and CC.

26 citations


Journal ArticleDOI
TL;DR: Overall accuracy of regression is not better than that of classification but it has less false positives, especially when combined with the reject option, whereas in both early and late integration, combining inputs or decisions is useful in increasing accuracy.
Abstract: Background: Computational prediction of protein stability change due to single-site amino acid substitutions is of interest in protein design and analysis. We consider the following four ways to improve the performance of the currently available predictors: (1) We include additional sequenceand structure-based features, namely, the amino acid substitution likelihoods, the equilibrium fluctuations of the alpha- and beta-carbon atoms, and the packing density. (2) By implementing different machine learning integration approaches, we combine information from different features or representations. (3) We compare classification vs. regression methods to predict the sign vs. the output of stability change. (4) We allow a reject option for doubtful cases where the risk of misclassification is high. Results: We investigate three different approaches: early, intermediate and late integration, which respectively combine features, kernels over feature subsets, and decisions. We perform simulations on two data sets: (1) S1615 is used in previous studies, (2) S2783 is the updated version (as of July 2, 2009) extracted also from ProTherm. For S1615 data set, our highest accuracy using both sequence and structure information is 0.842 on cross-validation and 0.904 on testing using early integration. Newly added features, namely, local compositional packing and the mobility extent of the mutated residues, improve accuracy significantly with intermediate integration. For S2783 data set, we also train regression methods to estimate not only the sign but also the amount of stability change and apply risk-based classification to reject when the learner has low confidence and the loss of misclassification is high. The highest accuracy is 0.835 on cross-validation and 0.832 on testing using only sequence information. The percentage of false positives can be decreased to less than 0.005 by rejecting 10 per cent using late integration. Conclusion: We find that in both early and late integration, combining inputs or decisions is useful in increasing accuracy. Intermediate integration allows assessing the contributions of individual features by looking at the assigned weights. Overall accuracy of regression is not better than that of classification but it has less false positives, especially when combined with the reject option. The server for stability prediction for three integration approaches and the data sets are available at

24 citations


Proceedings ArticleDOI
23 Oct 2009
TL;DR: An exhaustive search algorithm is proposed that calculates the VC-dimension of univariate decision trees with binary features and shows that SRM-pruning using the estimated VC-dimensions finds trees that are as accurate as those pruned using cross-validation.
Abstract: We propose an exhaustive search algorithm that calculates the VC-dimension of univariate decision trees with binary features. The VC-dimension of the univariate decision tree with binary features depends on (i) the VC-dimension values of the left and right subtrees, (ii) the number of inputs, and (iii) the number of nodes in the tree. From a training set of example trees whose VC-dimensions are calculated by exhaustive search, we fit a general regressor to estimate the VC-dimension of any binary tree. These VC-dimension estimates are then used to get VC-generalization bounds for complexity control using SRM in decision trees, i.e., pruning. Our simulation results shows that SRM-pruning using the estimated VC-dimensions finds trees that are as accurate as those pruned using cross-validation.

21 citations


01 Jan 2009
TL;DR: P X m=1 m n X i=1 i y i h m (x); m (X i )i | {z } K m ( x;x i ) +b Unweighted sum m = 1 8m Weighted sum (1) p P m =1 m = 2 and m 0 8m
Abstract: Multiple kernel learning (Mkl) uses a convex combination of kernels where the weight of each kernel is optimized during training. However, Mkl assigns the same weight to a kernel over the whole in- put space. Localized multiple kernel learning (Lmkl) framework extends the Mkl framework to allow combining kernels with difierent weights in difierent regions of the input space by using a gating model. Lmkl extracts the relative importance of kernels in each region whereas Mkl gives their relative importance over the whole input space. In this paper, we generalize the Lmkl framework with a kernel-based gating model and derive the learning algorithm for binary classiflcation. Empirical results on toy classiflcation problems are used to illustrate the algorithm. Ex- periments on two bioinformatics data sets are performed to show that kernel machines can also be localized in a data-dependent way by using kernel values as gating model features. The localized variant achieves signiflcantly higher accuracy on one of the bioinformatics data sets.

12 citations