Showing papers in &quot;Machine Learning in 2007&quot;

A note on Platt's probabilistic outputs for support vector machines

TL;DR: Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.

...read moreread less

Abstract: In an online convex optimization problem a decision-maker makes a sequence of decisions, i.e., chooses a sequence of points in Euclidean space, from a fixed feasible set. After each point is chosen, it encounters a sequence of (possibly unrelated) convex cost functions. Zinkevich (ICML 2003) introduced this framework, which models many natural repeated decision-making problems and generalizes many existing problems such as Prediction from Expert Advice and Cover's Universal Portfolios. Zinkevich showed that a simple online gradient descent algorithm achieves additive regret $O(\sqrt{T})$ , for an arbitrary sequence of T convex cost functions (of bounded gradients), with respect to the best single decision in hindsight. In this paper, we give algorithms that achieve regret O(log?(T)) for an arbitrary sequence of strictly convex functions (with bounded first and second derivatives). This mirrors what has been done for the special cases of prediction from expert advice by Kivinen and Warmuth (EuroCOLT 1999), and Universal Portfolios by Cover (Math. Finance 1:1---19, 1991). We propose several algorithms achieving logarithmic regret, which besides being more general are also much more efficient to implement. The main new ideas give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field. Our analysis shows a surprising connection between the natural follow-the-leader approach and the Newton method. We also analyze other algorithms, which tie together several different previous approaches including follow-the-leader, exponential weighting, Cover's algorithm and gradient descent.

...read moreread less

1,124 citations

Journal Article•DOI•

[...]

Hsuan-Tien Lin¹, Chih-Jen Lin¹, Ruby C. Weng²•Institutions (2)

National Taiwan University¹, National Chengchi University²

Active learning for logistic regression: an evaluation

TL;DR: An improved algorithm that theoretically converges and avoids numerical difficulties is proposed for Platt’s probabilistic outputs for Support Vector Machines.

...read moreread less

Abstract: Platt's probabilistic outputs for Support Vector Machines (Platt, J. in Smola, A., et al. (eds.) Advances in large margin classifiers. Cambridge, 2000) has been popular for applications that require posterior class probabilities. In this note, we propose an improved algorithm that theoretically converges and avoids numerical difficulties. A simple and ready-to-use pseudo code is included.

...read moreread less

926 citations

Journal Article•DOI•

[...]

Andrew I. Schein¹, Lyle H. Ungar¹•Institutions (1)

University of Pennsylvania¹

AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents

TL;DR: A re-derive of the variance reduction method known in experimental design circles as ‘A-optimality’ and comparisons against different variations of the most widely used heuristic schemes are run to discover which methods work best for different classes of problems and why.

...read moreread less

Abstract: Which active learning methods can we expect to yield good performance in learning binary and multi-category logistic regression classifiers? Addressing this question is a natural first step in providing robust solutions for active learning across a wide variety of exponential models including maximum entropy, generalized linear, log-linear, and conditional random field models. For the logistic regression model we re-derive the variance reduction method known in experimental design circles as `A-optimality.' We then run comparisons against different variations of the most widely used heuristic schemes: query by committee and uncertainty sampling, to discover which methods work best for different classes of problems and why. We find that among the strategies tested, the experimental design methods are most likely to match or beat a random sample baseline. The heuristic alternatives produced mixed results, with an uncertainty sampling variant called margin sampling and a derivative method called QBB-MM providing the most promising performance at very low computational cost. Computational running times of the experimental design methods were a bottleneck to the evaluations. Meanwhile, evaluation of the heuristic methods lead to an accumulation of negative results. We explore alternative evaluation design parameters to test whether these negative results are merely an artifact of settings where experimental design methods can be applied. The results demonstrate a need for improved active learning methods that will provide reliable performance at a reasonable computational cost.

...read moreread less

326 citations

Journal Article•DOI•

[...]

Vincent Conitzer¹, Tuomas Sandholm¹•Institutions (1)

Carnegie Mellon University¹

01 May 2007-Machine Learning

TL;DR: In this article, the authors present AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium), which is the first algorithm that is guaranteed to have these two properties in games with arbitrary numbers of actions and players.

...read moreread less

Abstract: Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action (repeated) games--assuming that the opponent's mixed strategy is observable. Another algorithm, ReDVaLeR (which was introduced after the algorithm described in this paper), achieves the two properties in games with arbitrary numbers of actions and players, but still requires that the opponents' mixed strategies are observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players. It is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others' strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. We provide experimental results that suggest that AWESOME converges fast in practice. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing future multiagent learning algorithms as well.

...read moreread less

261 citations

Journal Article•DOI•

Tracking the best hyperplane with a simple budget Perceptron

[...]

Giovanni Cavallanti¹, Nicolò Cesa-Bianchi¹, Claudio Gentile•Institutions (1)

University of Milan¹

A primal-dual perspective of online learning algorithms

TL;DR: This paper introduces and analyzes a shifting Perceptron algorithm achieving the best known shifting bounds while using an unlimited budget, and shows that the randomized algorithm strikes the optimal trade-off between budget B and norm U of the largest classifier in the comparison sequence.

...read moreread less

Abstract: Shifting bounds for on-line classification algorithms ensure good performance on any sequence of examples that is well predicted by a sequence of changing classifiers. When proving shifting bounds for kernel-based classifiers, one also faces the problem of storing a number of support vectors that can grow unboundedly, unless an eviction policy is used to keep this number under control. In this paper, we show that shifting and on-line learning on a budget can be combined surprisingly well. First, we introduce and analyze a shifting Perceptron algorithm achieving the best known shifting bounds while using an unlimited budget. Second, we show that by applying to the Perceptron algorithm the simplest possible eviction policy, which discards a random support vector each time a new one comes in, we achieve a shifting bound close to the one we obtained with no budget restrictions. More importantly, we show that our randomized algorithm strikes the optimal trade-off $$U = \Theta(\sqrt{B})$$ between budget B and norm U of the largest classifier in the comparison sequence. Experiments are presented comparing several linear-threshold algorithms on chronologically-ordered textual datasets. These experiments support our theoretical findings in that they show to what extent randomized budget algorithms are more robust than deterministic ones when learning shifting target data streams.

...read moreread less

182 citations

Journal Article•DOI•

[...]

Shai Shalev-Shwartz¹, Yoram Singer¹•Institutions (1)

Hebrew University of Jerusalem¹

Improved second-order bounds for prediction with expert advice

TL;DR: A novel framework for the design and analysis of online learning algorithms based on the notion of duality in constrained optimization is described, able to tie the primal objective value and the number of prediction mistakes using the increase in the dual.

...read moreread less

Abstract: We describe a novel framework for the design and analysis of online learning algorithms based on the notion of duality in constrained optimization. We cast a sub-family of universal online bounds as an optimization problem. Using the weak duality theorem we reduce the process of online learning to the task of incrementally increasing the dual objective function. The amount by which the dual increases serves as a new and natural notion of progress for analyzing online learning algorithms. We are thus able to tie the primal objective value and the number of prediction mistakes using the increase in the dual.

...read moreread less

174 citations

Journal Article•DOI•

[...]

Nicolò Cesa-Bianchi¹, Yishay Mansour², Gilles Stoltz³•Institutions (3)

University of Milan¹, Tel Aviv University², École Normale Supérieure³

Statistical properties of kernel principal component analysis

TL;DR: This article studied external regret in sequential prediction games with both positive and negative payoffs and derived new and sharper regret bounds for the well-known exponentially weighted average forecaster and for a second forecaster with a different multiplicative update rule.

...read moreread less

Abstract: This work studies external regret in sequential prediction games with both positive and negative payoffs. External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. In this setting, we derive new and sharper regret bounds for the well-known exponentially weighted average forecaster and for a second forecaster with a different multiplicative update rule. Our analysis has two main advantages: first, no preliminary knowledge about the payoff sequence is needed, not even its range; second, our bounds are expressed in terms of sums of squared payoffs, replacing larger first-order quantities appearing in previous bounds. In addition, our most refined bounds have the natural and desirable property of being stable under rescalings and general translations of the payoff sequence.

...read moreread less

166 citations

Journal Article•DOI•

[...]

Gilles Blanchard, Olivier Bousquet, Laurent Zwald¹•Institutions (1)

University of Paris-Sud¹

PAV and the ROC convex hull

TL;DR: In this paper, it was shown that the convergence rate for kernel principal component analysis (KPCA) can typically be faster than n 1/2, where n is the number of kernels in the kernel Gram matrix.

...read moreread less

Abstract: The main goal of this paper is to prove inequalities on the reconstruction error for kernel principal component analysis. With respect to previous work on this topic, our contribution is twofold: (1) we give bounds that explicitly take into account the empirical centering step in this algorithm, and (2) we show that a "localized" approach allows to obtain more accurate bounds. In particular, we show faster rates of convergence towards the minimum reconstruction error; more precisely, we prove that the convergence rate can typically be faster than n ?1/2. We also obtain a new relative bound on the error. A secondary goal, for which we present similar contributions, is to obtain convergence bounds for the partial sums of the biggest or smallest eigenvalues of the kernel Gram matrix towards eigenvalues of the corresponding kernel operator. These quantities are naturally linked to the KPCA procedure; furthermore these results can have applications to the study of various other kernel algorithms. The results are presented in a functional analytic framework, which is suited to deal rigorously with reproducing kernel Hilbert spaces of infinite dimension.

...read moreread less

152 citations

Journal Article•DOI•

[...]

Tom Fawcett¹, Alexandru Niculescu-Mizil²•Institutions (2)

Stanford University¹, Cornell University²

01 Jul 2007-Machine Learning

TL;DR: It is shown that, surprisingly, isotonic regression based calibration using the Pool Adjacent Violators algorithm is equivalent to the ROC convex hull method.

...read moreread less

Abstract: Classifier calibration is the process of converting classifier scores into reliable probability estimates. Recently, a calibration technique based on isotonic regression has gained attention within machine learning as a flexible and effective way to calibrate classifiers. We show that, surprisingly, isotonic regression based calibration using the Pool Adjacent Violators algorithm is equivalent to the ROC convex hull method.

...read moreread less

109 citations

Journal Article•DOI•

Invariant kernel functions for pattern analysis and machine learning

[...]

Bernard Haasdonk¹, Hans Burkhardt¹•Institutions (1)

University of Freiburg¹

01 Jul 2007-Machine Learning

TL;DR: This work presents two generic approaches for constructing invariant kernels and proposes a more distinguishing treatment in particular in the active field of kernel methods for machine learning and pattern analysis, to enable a smooth interpolation between invariant and non-invariant pattern analysis.

...read moreread less

Abstract: In many learning problems prior knowledge about pattern variations can be formalized and beneficially incorporated into the analysis system. The corresponding notion of invariance is commonly used in conceptionally different ways. We propose a more distinguishing treatment in particular in the active field of kernel methods for machine learning and pattern analysis. Additionally, the fundamental relation of invariant kernels and traditional invariant pattern analysis by means of invariant representations will be clarified. After addressing these conceptional questions, we focus on practical aspects and present two generic approaches for constructing invariant kernels. The first approach is based on a technique called invariant integration. The second approach builds on invariant distances. In principle, our approaches support general transformations in particular covering discrete and non-group or even an infinite number of pattern-transformations. Additionally, both enable a smooth interpolation between invariant and non-invariant pattern analysis, i.e. they are a covering general framework. The wide applicability and various possible benefits of invariant kernels are demonstrated in different kernel methods.

...read moreread less

91 citations

Journal Article•DOI•

Structured large margin machines: sensitive to data distributions

[...]

Daniel S. Yeung¹, Defeng Wang¹, Wing W. Y. Ng¹, Eric C. C. Tsang¹, Xizhao Wang² - Show less +1 more•Institutions (2)

Hong Kong Polytechnic University¹, Hebei University²

01 Aug 2007-Machine Learning

TL;DR: The theoretical importance of the SLMM model is demonstrated by showing that it generalizes existing approaches, such as SVMs and M4s, provides novel insight into learning models, and lays a foundation for conceiving other “structured” classifiers.

...read moreread less

Abstract: This paper proposes a new large margin classifier--the structured large margin machine (SLMM)--that is sensitive to the structure of the data distribution. The SLMM approach incorporates the merits of "structured" learning models, such as radial basis function networks and Gaussian mixture models, with the advantages of "unstructured" large margin learning schemes, such as support vector machines and maxi-min margin machines. We derive the SLMM model from the concepts of "structured degree" and "homospace", based on an analysis of existing structured and unstructured learning models. Then, by using Ward's agglomerative hierarchical clustering on input data (or data mappings in the kernel space) to extract the underlying data structure, we formulate SLMM training as a sequential second order cone programming. Many promising features of the SLMM approach are illustrated, including its accuracy, scalability, extensibility, and noise tolerance. We also demonstrate the theoretical importance of the SLMM model by showing that it generalizes existing approaches, such as SVMs and M4s, provides novel insight into learning models, and lays a foundation for conceiving other "structured" classifiers.

...read moreread less

Journal Article•DOI•

Feature space perspectives for learning the kernel

[...]

Charles A. Micchelli¹, Massimiliano Pontil²•Institutions (2)

State University of New York System¹, University College London²

Suboptimal behavior of Bayes and MDL in classification under misspecification

TL;DR: This paper presents a reformulation of this problem of learning an optimal kernel in a prescribed convex set of kernels within a feature space environment and relates this problem in a special case to regularization in the dual space of all continuous functions on a compact domain with values in a Hilbert space with a mix norm.

...read moreread less

Abstract: In this paper, we continue our study of learning an optimal kernel in a prescribed convex set of kernels (Micchelli & Pontil, 2005) . We present a reformulation of this problem within a feature space environment. This leads us to study regularization in the dual space of all continuous functions on a compact domain with values in a Hilbert space with a mix norm. We also relate this problem in a special case to $${\cal L}^p$$ regularization.

...read moreread less

Journal Article•DOI•

[...]

Peter Grünwald, John Langford¹•Institutions (1)

Yahoo!¹

Unconditional lower bounds for learning intersections of halfspaces

TL;DR: It is shown that forms of Bayesian and MDL inference that are often applied to classification problems can be inconsistent, which means that there exists a learning problem such that for all amounts of data the generalization errors of the MDL classifier and the Bayes classifier relative to the Bayesian posterior both remain bounded away from the smallest achievable generalization error.

...read moreread less

Abstract: We show that forms of Bayesian and MDL inference that are often applied to classification problems can be inconsistent. This means that there exists a learning problem such that for all amounts of data the generalization errors of the MDL classifier and the Bayes classifier relative to the Bayesian posterior both remain bounded away from the smallest achievable generalization error. From a Bayesian point of view, the result can be reinterpreted as saying that Bayesian inference can be inconsistent under misspecification, even for countably infinite models. We extensively discuss the result from both a Bayesian and an MDL perspective.

...read moreread less

Journal Article•DOI•

[...]

Adam R. Klivans¹, Alexander A. Sherstov¹•Institutions (1)

University of Texas at Austin¹

Optimal dyadic decision trees

TL;DR: There are new lower bounds for learning intersections of halfspaces, one of the most important concept classes in computational learning theory, and any statistical-query algorithm for learning the intersection of $2^{\varOmega (\sqrt{n}$) halfspaced in n dimensions must make queries, first non-trivial lower bound on the statistical query dimension for this concept class.

...read moreread less

Abstract: We prove new lower bounds for learning intersections of halfspaces, one of the most important concept classes in computational learning theory. Our main result is that any statistical-query algorithm for learning the intersection of $\sqrt{n}$ halfspaces in n dimensions must make $2^{\varOmega (\sqrt{n})}$ queries. This is the first non-trivial lower bound on the statistical query dimension for this concept class (the previous best lower bound was n ?(log?n)). Our lower bound holds even for intersections of low-weight halfspaces. In the latter case, it is nearly tight. We also show that the intersection of two majorities (low-weight halfspaces) cannot be computed by a polynomial threshold function (PTF) with fewer than n ?(log?n/log?log?n) monomials. This is the first super-polynomial lower bound on the PTF length of this concept class, and is nearly optimal. For intersections of k=?(log?n) low-weight halfspaces, we improve our lower bound to $\min\{2^{\varOmega (\sqrt{n})},n^{\varOmega (k/\log k)}\},$ which too is nearly optimal. As a consequence, intersections of even two halfspaces are not computable by polynomial-weight PTFs, the most expressive class of functions known to be efficiently learnable via Jackson's Harmonic Sieve algorithm. Finally, we report our progress on the weak learnability of intersections of halfspaces under the uniform distribution.

...read moreread less

Journal Article•DOI•

[...]

Gilles Blanchard, Christin Schäfer, Yves Rozenholc, Klaus-Robert Müller¹•Institutions (1)

Technical University of Berlin¹

A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering

TL;DR: A new algorithm building an optimal dyadic decision tree (ODT) that combines guaranteed performance in the learning theoretical sense and optimal search from the algorithmic point of view and improves performance over classical approaches such as CART/C4.5.

...read moreread less

Abstract: We introduce a new algorithm building an optimal dyadic decision tree (ODT). The method combines guaranteed performance in the learning theoretical sense and optimal search from the algorithmic point of view. Furthermore it inherits the explanatory power of tree approaches, while improving performance over classical approaches such as CART/C4.5, as shown on experiments on artificial and benchmark data.

...read moreread less

Journal Article•DOI•

[...]

Shai Ben-David¹•Institutions (1)

University of Waterloo¹

On the use of ROC analysis for the optimization of abstaining classifiers

TL;DR: The main technical tool is a uniform convergence result for center based clustering that can be viewed as showing that the effective VC-dimension of k-center clustering equals k.

...read moreread less

Abstract: We consider a framework of sample-based clustering. In this setting, the input to a clustering algorithm is a sample generated i.i.d by some unknown arbitrary distribution. Based on such a sample, the algorithm has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clustering algorithms that approximate the optimal clustering. We show that the K-median clustering, as well as K-means and the Vector Quantization problems, satisfy these conditions. Our results apply to the combinatorial optimization setting where, assuming that sampling uniformly over an input set can be done in constant time, we get a sampling-based algorithm for the K-median and K-means clustering problems that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the dependence of the running time of our algorithm on the Euclidean dimension is only linear. Our main technical tool is a uniform convergence result for center based clustering that can be viewed as showing that the effective VC-dimension of k-center clustering equals k.

...read moreread less

Journal Article•DOI•

[...]

Tadeusz Pietraszek¹•Institutions (1)

IBM¹

01 Aug 2007-Machine Learning

TL;DR: This work proposes a method to optimally build a specific type of abstaining binary classifiers using ROC analysis and presents a simple and efficient algorithm for finding the optimal classifier in these models, namely, the bounded-abstention and bounded-improvement models.

...read moreread less

Abstract: Classifiers that refrain from classification in certain cases can significantly reduce the misclassification cost. However, the parameters for such abstaining classifiers are often set in a rather ad-hoc manner. We propose a method to optimally build a specific type of abstaining binary classifiers using ROC analysis. These classifiers are built based on optimization criteria in the following three models: cost-based, bounded-abstention and bounded-improvement. We show that selecting the optimal classifier in the first model is similar to known iso-performance lines and uses only the slopes of ROC curves, whereas selecting the optimal classifier in the remaining two models is not straightforward. We investigate the properties of the convex-down ROCCH (ROC Convex Hull) and present a simple and efficient algorithm for finding the optimal classifier in these models, namely, the bounded-abstention and bounded-improvement models. We demonstrate the application of these models to effectively reduce misclassification cost in real-life classification systems. The method has been validated with an ROC building algorithm and cross-validation on 15 UCI KDD datasets.

...read moreread less

Journal Article•DOI•

Smooth relevance vector machine: a smoothness prior extension of the RVM

[...]

Alexander Schmolck¹, Richard M. Everson¹•Institutions (1)

University of Exeter¹

01 Aug 2007-Machine Learning

TL;DR: This model encompasses the original RVM as a special case, but the empirical results show that it can surpass RVM performance in terms of goodness of fit and achieved sparsity as well as computational performance in many cases.

...read moreread less

Abstract: Enforcing sparsity constraints has been shown to be an effective and efficient way to obtain state-of-the-art results in regression and classification tasks. Unlike the support vector machine (SVM) the relevance vector machine (RVM) explicitly encodes the criterion of model sparsity as a prior over the model weights. However the lack of an explicit prior structure over the weight variances means that the degree of sparsity is to a large extent controlled by the choice of kernel (and kernel parameters). This can lead to severe overfitting or oversmoothing--possibly even both at the same time (e.g. for the multiscale Doppler data). We detail an efficient scheme to control sparsity in Bayesian regression by incorporating a flexible noise-dependent smoothness prior into the RVM. We present an empirical evaluation of the effects of choice of prior structure on a selection of popular data sets and elucidate the link between Bayesian wavelet shrinkage and RVM regression. Our model encompasses the original RVM as a special case, but our empirical results show that we can surpass RVM performance in terms of goodness of fit and achieved sparsity as well as computational performance in many cases. The code is freely available.

...read moreread less

Journal Article•DOI•

A general criterion and an algorithmic framework for learning in multi-agent systems

[...]

Rob Powers¹, Yoav Shoham¹, Thuc Vu¹•Institutions (1)

Stanford University¹

01 May 2007-Machine Learning

TL;DR: A modular approach for achieving effective agent-centric learning in multi-agent systems that consists of a number of basic algorithmic building blocks, which can be instantiated and composed differently depending on the environment setting as well as the target class of opponents.

...read moreread less

Abstract: We offer a new formal criterion for agent-centric learning in multi-agent systems, that is, learning that maximizes one's rewards in the presence of other agents who might also be learning (using the same or other learning algorithms). This new criterion takes in as a parameter the class of opponents. We then provide a modular approach for achieving effective agent-centric learning; the approach consists of a number of basic algorithmic building blocks, which can be instantiated and composed differently depending on the environment setting (for example, 2- versus n-player games) as well as the target class of opponents. We then provide several specific instances of the approach: an algorithm for stationary opponents, and two algorithms for adaptive opponents with bounded memory, one algorithm for the n-player case and another optimized for the 2-player case. We prove our algorithms correct with respect to the formal criterion, and furthermore show the algorithms to be experimentally effective via comprehensive computer testing.

...read moreread less

Journal Article•DOI•

Classifying under computational resource constraints: anytime classification using probabilistic estimators

[...]

Ying Yang¹, Geoffrey I. Webb¹, Kevin B. Korb¹, Kai Ming Ting¹•Institutions (1)

Monash University, Clayton campus¹

Interactive learning of node selecting tree transducer

TL;DR: A novel anytime classification algorithm, anytime averaged probabilistic estimators (AAPE), which is capable of delivering strong prediction accuracy with little CPU time and utilizing additional CPU time to increase classification accuracy.

...read moreread less

Abstract: In many online applications of machine learning, the computational resources available for classification will vary from time to time. Most techniques are designed to operate within the constraints of the minimum expected resources and fail to utilize further resources when they are available. We propose a novel anytime classification algorithm, anytime averaged probabilistic estimators (AAPE), which is capable of delivering strong prediction accuracy with little CPU time and utilizing additional CPU time to increase classification accuracy. The idea is to run an ordered sequence of very efficient Bayesian probabilistic estimators (single improvement steps) until classification time runs out. Theoretical studies and empirical validations reveal that by properly identifying, ordering, invoking and ensembling single improvement steps, AAPE is able to accomplish accurate classification whenever it is interrupted. It is also able to output class probability estimates beyond simple 0/1-loss classifications, as well as adeptly handle incremental learning.

...read moreread less

Journal Article•DOI•

[...]

Julien Carme¹, Rémi Gilleron¹, Aurélien Lemay¹, Joachim Niehren¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Jan 2007-Machine Learning

TL;DR: In this paper, the authors propose to represent monadic queries by bottom-up deterministic Node Selecting Tree Transducers (NSTTs), a particular class of tree automata that they introduce.

...read moreread less

Abstract: We develop new algorithms for learning monadic node selection queries in unranked trees from annotated examples, and apply them to visually interactive Web information extraction. We propose to represent monadic queries by bottom-up deterministic Node Selecting Tree Transducers (NSTTs), a particular class of tree automata that we introduce. We prove that deterministic NSTTs capture the class of queries definable in monadic second order logic (MSO) in trees, which Gottlob and Koch (2002) argue to have the right expressiveness for Web information extraction, and prove that monadic queries defined by NSTTs can be answered efficiently. We present a new polynomial time algorithm in RPNI-style that learns monadic queries defined by deterministic NSTTs from completely annotated examples, where all selected nodes are distinguished. In practice, users prefer to provide partial annotations. We propose to account for partial annotations by intelligent tree pruning heuristics. We introduce pruning NSTTs--a formalism that shares many advantages of NSTTs. This leads us to an interactive learning algorithm for monadic queries defined by pruning NSTTs, which satisfies a new formal active learning model in the style of Angluin (1987). We have implemented our interactive learning algorithm integrated it into a visually interactive Web information extraction system--called SQUIRREL--by plugging it into the Mozilla Web browser. Experiments on realistic Web documents confirm excellent quality with very few user interactions during wrapper induction.

...read moreread less

Proceedings Article•DOI•

A persistent union-find data structure

[...]

Sylvain Conchon¹, Jean-Christophe Filliâtre²•Institutions (2)

University of Paris-Sud¹, Centre national de la recherche scientifique²

02 Oct 2007-Machine Learning

TL;DR: This paper details the implementation of a persistent union-find data structure as efficient as its imperative counterpart and is a significant example of a data structure whose side effects are safely hidden behind a persistent interface.

...read moreread less

Abstract: The problem of disjoint sets, also known as union-find, consists in maintaining a partition of a finite set within a data structure This structure provides two operations: a function find returning the class of an element and a function union merging two classes An optimal and imperative solution is known since 1975 However, the imperative nature of this data structure may be a drawback when it is used in a backtracking algorithm This paper details the implementation of a persistent union-find data structure as efficient as its imperative counterpart To achieve this result, our solution makes heavy use of imperative features and thus it is a significant example of a data structure whose side effects are safely hidden behind a persistent interface To strengthen this last claim, we also detail a formalization using the Coq proof assistant which shows both the correctness of our solution and its observational persistence

...read moreread less

Proceedings Article•DOI•

Status report: the manticore project

[...]

Matthew Fluet¹, Nic Ford², Mike Rainey², John Reppy², Adam Shaw², Yingqi Xiao² - Show less +2 more•Institutions (2)

Toyota Technological Institute at Chicago¹, University of Chicago²

02 Oct 2007-Machine Learning

TL;DR: An overview of the design and report on the status of the implementation effort of Manticore, a heterogeneous language that supports parallelism at multiple levels that combines CML-style explicit concurrency with fine-grain, implicitly threaded, parallel constructs.

...read moreread less

Abstract: The Manticore project is an effort to design and implement a new functional language for parallel programming. Unlike many earlier parallel languages, Manticore is a heterogeneous language that supports parallelism at multiple levels. Specifically, we combine CML-style explicit concurrency with fine-grain, implicitly threaded, parallel constructs. We have been working on an implementation of Manticore for the past six months; this paper gives an overview of our design and a report on the status ofthe implementation effort.

...read moreread less

Journal Article•DOI•

Annealing stochastic approximation Monte Carlo algorithm for neural network training

[...]

Faming Liang¹•Institutions (1)

Texas A&M University¹

Bidding agents for online auctions with hidden bids

TL;DR: Like other stochastic algorithms, ASAMC requires longer training time than do the gradient-based algorithms, but provides, however, an efficient approach to train MLPs for which the energy landscape is rugged.

...read moreread less

Abstract: We propose a general-purpose stochastic optimization algorithm, the so-called annealing stochastic approximation Monte Carlo (ASAMC) algorithm, for neural network training. ASAMC can be regarded as a space annealing version of the stochastic approximation Monte Carlo (SAMC) algorithm. Under mild conditions, we show that ASAMC can converge weakly at a rate of ? $(1/\sqrt{t})$ toward a neighboring set (in the space of energy) of the global minimizers. ASAMC is compared with simulated annealing, SAMC, and the BFGS algorithm for training MLPs on a number of examples. The numerical results indicate that ASAMC outperforms the other algorithms in both training and test errors. Like other stochastic algorithms, ASAMC requires longer training time than do the gradient-based algorithms. It provides, however, an efficient approach to train MLPs for which the energy landscape is rugged.

...read moreread less

Journal Article•DOI•

[...]

Albert Xin Jiang¹, Kevin Leyton-Brown¹•Institutions (1)

University of British Columbia¹

01 May 2007-Machine Learning

TL;DR: This work proposes an EM-based algorithm that estimates bidders’ valuation distributions and the distribution over the true number of bidder significantly more accurately than more straightforward density estimation techniques.

...read moreread less

Abstract: There is much active research into the design of automated bidding agents, particularly for environments that involve multiple decoupled auctions. These settings are complex partly because an agent's strategy depends on information about other bidders'interests. When bidders' valuation distributions are not known ex ante, machine learning techniques can be used to approximate them from historical data. It is a characteristic feature of auctions, however, that information about some bidders'valuations is systematically concealed. This occurs in the sense that some bidders may fail to bid at all because the asking price exceeds their valuations, and also in the sense that a high bidder may not be compelled to reveal her valuation. Ignoring these "hidden bids" can introduce bias into the estimation of valuation distributions. To overcome this problem, we propose an EM-based algorithm. We validate the algorithm experimentally using agents that react to their environments both decision-theoretically and game-theoretically, using both synthetic and real-world (eBay) datasets. We show that our approach estimates bidders' valuation distributions and the distribution over the true number of bidders significantly more accurately than more straightforward density estimation techniques.

...read moreread less

Journal Article•DOI•

Model selection by bootstrap penalization for classification

[...]

Magalie Fromont¹•Institutions (1)

University of Rennes 2¹