Showing papers on "Statistical learning theory published in 2013"

PDF

Open Access

Journal Article•DOI•

[...]

Shiliang Sun¹•Institutions (1)

17 Feb 2013-Neural Computing and Applications

TL;DR: This paper reviews theories developed to understand the properties and behaviors of multi-view learning and gives a taxonomy of approaches according to the machine learning mechanisms involved and the fashions in which multiple views are exploited.

...read moreread less

Abstract: Multi-view learning or learning with multiple distinct feature sets is a rapidly growing direction in machine learning with well theoretical underpinnings and great practical success. This paper reviews theories developed to understand the properties and behaviors of multi-view learning and gives a taxonomy of approaches according to the machine learning mechanisms involved and the fashions in which multiple views are exploited. This survey aims to provide an insightful organization of current developments in the field of multi-view learning, identify their limitations, and give suggestions for further research. One feature of this survey is that we attempt to point out specific open problems which can hopefully be useful to promote the research of multi-view machine learning.

...read moreread less

782 citations

Journal Article•DOI•

Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data

[...]

Cristiano Leite de Castro¹, Antônio de Pádua Braga¹•Institutions (1)

Universidade Federal de Minas Gerais¹

22 Feb 2013-IEEE Transactions on Neural Networks

TL;DR: A new cost-sensitive algorithm (CSMLP) is presented to improve the discrimination ability of (two-class) MLPs and it is theoretically demonstrated that the incorporation of prior information via the cost parameter may lead to balanced decision boundaries in the feature space.

...read moreread less

Abstract: Traditional learning algorithms applied to complex and highly imbalanced training sets may not give satisfactory results when distinguishing between examples of the classes. The tendency is to yield classification models that are biased towards the overrepresented (majority) class. This paper investigates this class imbalance problem in the context of multilayer perceptron (MLP) neural networks. The consequences of the equal cost (loss) assumption on imbalanced data are formally discussed from a statistical learning theory point of view. A new cost-sensitive algorithm (CSMLP) is presented to improve the discrimination ability of (two-class) MLPs. The CSMLP formulation is based on a joint objective function that uses a single cost parameter to distinguish the importance of class errors. The learning rule extends the Levenberg-Marquadt's rule, ensuring the computational efficiency of the algorithm. In addition, it is theoretically demonstrated that the incorporation of prior information via the cost parameter may lead to balanced decision boundaries in the feature space. Based on the statistical analysis of results on real data, our approach shows a significant improvement of the area under the receiver operating characteristic curve and G-mean measures of regular MLPs.

...read moreread less

195 citations

Book•

Unsupervised Process Monitoring and Fault Diagnosis with Machine Learning Methods

[...]

Chris Aldrich, Lidia Auret

15 Jun 2013

TL;DR: This unique text/reference describes in detail the latest advances in unsupervised process monitoring and fault diagnosis with machine learning methods.

...read moreread less

Abstract: This unique text/reference describes in detail the latest advances in unsupervised process monitoring and fault diagnosis with machine learning methods. Abundant case studies throughout the text demonstrate the efficacy of each method in real-world settings. The broad coverage examines such cutting-edge topics as the use of information theory to enhance unsupervised learning in tree-based methods, the extension of kernel methods to multiple kernel learning for feature extraction from data, and the incremental training of multilayer perceptrons to construct deep architectures for enhanced data projections. Topics and features: discusses machine learning frameworks based on artificial neural networks, statistical learning theory and kernel-based methods, and tree-based methods; examines the application of machine learning to steady state and dynamic operations, with a focus on unsupervised learning; describes the use of spectral methods in process fault diagnosis.

...read moreread less

104 citations

Journal Article•DOI•

The Generalization Ability of Online Algorithms for Dependent Data

[...]

Alekh Agarwal¹, John C. Duchi¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2013-IEEE Transactions on Information Theory

TL;DR: In this article, the generalization performance of online learning algorithms trained on samples coming from a dependent source of data was studied, and it was shown that the regret of any stable online algorithm concentrates around its regret, an easily computable statistic of the online performance.

...read moreread less

Abstract: We study the generalization performance of online learning algorithms trained on samples coming from a dependent source of data. We show that the generalization error of any stable online algorithm concentrates around its regret-an easily computable statistic of the online performance of the algorithm-when the underlying ergodic process is β- or φ -mixing. We show high-probability error bounds assuming the loss function is convex, and we also establish sharp convergence rates and deviation bounds for strongly convex losses and several linear prediction problems such as linear and logistic regression, least-squares SVM, and boosting on dependent data. In addition, our results have straightforward applications to stochastic optimization with dependent data, and our analysis requires only martingale convergence arguments; we need not rely on more powerful statistical tools such as empirical process theory.

...read moreread less

69 citations

Journal Article•DOI•

A new support vector model-based imperialist competitive algorithm for time estimation in new product development projects

[...]

S. Meysam Mousavi¹, Reza Tavakkoli-Moghaddam¹, Behnam Vahdani¹, H. Hashemi², Mohammad Javad Sanjari³ - Show less +1 more•Institutions (3)

University of Tehran¹, Islamic Azad University², Amirkabir University of Technology³

01 Feb 2013-Robotics and Computer-integrated Manufacturing

TL;DR: A new model integrating the SVR and the ICA for time estimation in NPD projects, in which ICA is used to tune the parameters of the S VR, and results indicate that the presented model achieves high estimation accuracy and leads to effective prediction.

...read moreread less

Abstract: Time estimation in new product development (NPD) projects is often a complex problem due to its nonlinearity and the small quantity of data patterns. Support vector regression (SVR) based on statistical learning theory is introduced as a new neural network technique with maximum generalization ability. The SVR has been utilized to solve nonlinear regression problems successfully. However, the applicability of the SVR is highly affected due to the difficulty of selecting the SVR parameters appropriately. The imperialist competitive algorithm (ICA) as a socio-politically inspired optimization strategy is employed to solve the real world engineering problems. This optimization algorithm is inspired by competition mechanism among imperialists and colonies, in contrast to evolutionary algorithms. This paper presents a new model integrating the SVR and the ICA for time estimation in NPD projects, in which ICA is used to tune the parameters of the SVR. A real data set from a case study of an NPD project in a manufacturing industry is presented to demonstrate the performance of the proposed model. In addition, the comparison is provided between the proposed model and conventional techniques, namely nonlinear regression, back-propagation neural networks (BPNN), pure SVR and general regression neural networks (GRNN). The experimental results indicate that the presented model achieves high estimation accuracy and leads to effective prediction. Highlights? Proposing a new support vector model to capture data patterns of time intervals. ? Employing imperialist competitive algorithm to optimize the parameters of SVR. ? Presenting a real case study in a manufacturing industry in the NPD environment. ? Providing a comparison between the proposed model and conventional techniques.

...read moreread less

61 citations

Journal Article•DOI•

One Size Does Not Fit All: The Limits of Structure-Based Models in Drug Discovery

[...]

Gregory A. Ross¹, Garrett M. Morris, Philip C. Biggin¹•Institutions (1)

University of Oxford¹

10 Sep 2013-Journal of Chemical Theory and Computation

TL;DR: It is proved that even the very best generalized structure-based model is inherently limited in its accuracy, and protein-specific models are always likely to be better.

...read moreread less

Abstract: A major goal in computational chemistry has been to discover the set of rules that can accurately predict the binding affinity of any protein-drug complex, using only a single snapshot of its three-dimensional structure. Despite the continual development of structure-based models, predictive accuracy remains low, and the fundamental factors that inhibit the inference of all-encompassing rules have yet to be fully explored. Using statistical learning theory and information theory, here we prove that even the very best generalized structure-based model is inherently limited in its accuracy, and protein-specific models are always likely to be better. Our results refute the prevailing assumption that large data sets and advanced machine learning techniques will yield accurate, universally applicable models. We anticipate that the results will aid the development of more robust virtual screening strategies and scoring function error estimations.

...read moreread less

42 citations

Journal Article•DOI•

Using support vector machine for materials design

[...]

Wencong Lu¹, Xiaobo Ji¹, Minjie Li¹, Liang Liu¹, Baohua Yue¹, Liangmiao Zhang¹ - Show less +2 more•Institutions (1)

Shanghai University¹

05 Jun 2013-Advances in Manufacturing

TL;DR: Support vector machine (SVM), including support vector classification (SVC) and support vector regression (SVR) based on the statistical learning theory (SLT) proposed by Vapnik, is introduced as a relatively new data mining method to meet the different tasks of materials design in the lab.

...read moreread less

Abstract: Materials design is the most important and fundamental work on the background of materials genome initiative for global competitiveness proposed by the National Science and Technology Council of America. As far as the methodologies of materials design, besides the thermodynamic and kinetic methods combing databases, both deductive approaches so-called the first principle methods and inductive approaches based on data mining methods are gaining great progress because of their successful applications in materials design. In this paper, support vector machine (SVM), including support vector classification (SVC) and support vector regression (SVR) based on the statistical learning theory (SLT) proposed by Vapnik, is introduced as a relatively new data mining method to meet the different tasks of materials design in our lab. The advantage of using SVM for materials design is discussed based on the applications in the formability of perovskite or BaNiO3 structure, the prediction of energy gaps of binary compounds, the prediction of sintered cold modulus of sialon-corundum castable, the optimization of electric resistances of VPTC semiconductors and the thickness control of In2O3 semiconductor film preparation. The results presented indicate that SVM is an effective modeling tool for the small sizes of sample sets with great potential applications in materials design.

...read moreread less

41 citations

Journal Article•DOI•

Learning theory analysis for association rules and sequential event prediction

[...]

Cynthia Rudin¹, Benjamin Letham¹, David Madigan²•Institutions (2)

Massachusetts Institute of Technology¹, Columbia University²

01 Jan 2013-Journal of Machine Learning Research

TL;DR: In this article, the authors present a theoretical analysis for prediction algorithms based on association rules and introduce a problem for which rules are particularly natural, called "sequential event prediction." In sequential event prediction, events in a sequence are revealed one by one, and the goal is to determine which event will next be revealed.

...read moreread less

Abstract: We present a theoretical analysis for prediction algorithms based on association rules. As part of this analysis, we introduce a problem for which rules are particularly natural, called "sequential event prediction." In sequential event prediction, events in a sequence are revealed one by one, and the goal is to determine which event will next be revealed. The training set is a collection of past sequences of events. An example application is to predict which item will next be placed into a customer's online shopping cart, given his/her past purchases. In the context of this problem, algorithms based on association rules have distinct advantages over classical statistical and machine learning methods: they look at correlations based on subsets of co-occurring past events (items a and b imply item c), they can be applied to the sequential event prediction problem in a natural way, they can potentially handle the "cold start" problem where the training set is small, and they yield interpretable predictions. In this work, we present two algorithms that incorporate association rules. These algorithms can be used both for sequential event prediction and for supervised classification, and they are simple enough that they can possibly be understood by users, customers, patients, managers, etc. We provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an "adjusted confidence" measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis.

...read moreread less

40 citations

Journal Article•

Machine learning with operational costs

[...]

Theja Tulabandhula¹, Cynthia Rudin¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2013-Journal of Machine Learning Research

TL;DR: In this article, the authors propose a method to align statistical modeling with decision making by propagating uncertainty in predictive modeling to the uncertainty in operational cost, where operational cost is the amount spent by the practitioner in solving the problem.

...read moreread less

Abstract: This work proposes a way to align statistical modeling with decision making. We provide a method that propagates the uncertainty in predictive modeling to the uncertainty in operational cost, where operational cost is the amount spent by the practitioner in solving the problem. The method allows us to explore the range of operational costs associated with the set of reasonable statistical models, so as to provide a useful way for practitioners to understand uncertainty. To do this, the operational cost is cast as a regularization term in a learning algorithm's objective function, allowing either an optimistic or pessimistic view of possible costs, depending on the regularization parameter. From another perspective, if we have prior knowledge about the operational cost, for instance that it should be low, this knowledge can help to restrict the hypothesis space, and can help with generalization. We provide a theoretical generalization bound for this scenario. We also show that learning with operational costs is related to robust optimization.

...read moreread less

40 citations

Journal Article•DOI•

Prediction of time series by statistical learning: general losses and fast rates

[...]

Pierre Alquier¹, Xiaoyin Li, Olivier Wintenberger², Olivier Wintenberger³•Institutions (3)

University College Dublin¹, ENSAE ParisTech², PSL Research University³

31 Dec 2013-Dependence Modeling

TL;DR: In this paper, the PAC-Bayesian approach was used for quantile forecasting of the French GDP and it was shown that the Gibbs estimator actually achieves fast rates of convergence d/n.

...read moreread less

Abstract: We establish rates of convergences in statistical learning for time series forecasting. Using the PAC-Bayesian approach, slow rates of convergence p d/n for the Gibbs estimator under the absolute loss were given in a previous work [7], where n is the sample size and d the dimension of the set of predictors. Under the same weak dependence conditions, we extend this result to any convex Lipschitz loss function. We also identify a condition on the parameter space that ensures similar rates for the classical penalized ERM procedure. We apply this method for quantile forecasting of the French GDP. Under additional conditions on the loss functions (satisfied by the quadratic loss function) and for uniformly mixing processes, we prove that the Gibbs estimator actually achieves fast rates of convergence d/n. We discuss the optimality of these dierent rates pointing out references to lower bounds when they are available. In particular, these results bring a generalization the results of [29] on sparse regression estimation to some autoregression.

...read moreread less

36 citations

Proceedings Article•DOI•

Convexity issues in system identification

[...]

Lennart Ljung¹, Tianshi Chen¹•Institutions (1)

Linköping University¹

12 Jun 2013

TL;DR: This article will illustrate recent interest in identification algorithms with the use of subspace methods as well as nuclear norms as proxies to rank constraints and a quite different route to convexity is to use algebraic techniques manipulate the model parameterizations.

...read moreread less

Abstract: System Identification is about estimating models of dynamical systems from measured input-output data. Its traditional foundation is basic statistical techniques, such as maximum likelihood estimation and asymptotic analysis of bias and variance and the like. Maximum likelihood estimation relies on minimization of criterion functions that typically are non-convex, and may cause numerical search problems. Recent interest in identification algorithms has focused on techniques that are centered around convex formulations. This is partly the result of developments in machine learning and statistical learning theory. The development concerns issues of regularization for sparsity and for better tuned bias/variance trade-offs. It also involves the use of subspace methods as well as nuclear norms as proxies to rank constraints. A quite different route to convexity is to use algebraic techniques manipulate the model parameterizations. This article will illustrate all this recent development.

...read moreread less

Journal Article•DOI•

Fuzzy classifier based support vector regression framework for Poisson ratio determination

[...]

Mojtaba Asoodeh¹, Parisa Bagheripour¹•Institutions (1)

Islamic Azad University¹

01 Sep 2013-Journal of Applied Geophysics

TL;DR: Support vector regression (SVR) method based on statistical learning theory (SLT) was employed as a supervised learning algorithm to estimate Poisson ratio from conventional well log data and results indicated that SVR predicted Poisson ratios values are in good agreement with measured values.

...read moreread less

Journal Article•DOI•

Sparse high-dimensional fractional-norm support vector machine via DC programming

[...]

Wei Guan¹, Alexander G. Gray¹•Institutions (1)

Georgia Institute of Technology¹

01 Nov 2013-Computational Statistics & Data Analysis

TL;DR: Numerical results on seven real world biomedical datasets support the effectiveness of the proposed approach compared to other commonly-used sparse SVM methods, including L"1-SVM, and recent approximated L"0-S VM approaches.

...read moreread less