scispace - formally typeset
Search or ask a question

Showing papers in "Machine Learning in 2007"


Journal ArticleDOI
TL;DR: Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.
Abstract: In an online convex optimization problem a decision-maker makes a sequence of decisions, i.e., chooses a sequence of points in Euclidean space, from a fixed feasible set. After each point is chosen, it encounters a sequence of (possibly unrelated) convex cost functions. Zinkevich (ICML 2003) introduced this framework, which models many natural repeated decision-making problems and generalizes many existing problems such as Prediction from Expert Advice and Cover's Universal Portfolios. Zinkevich showed that a simple online gradient descent algorithm achieves additive regret $O(\sqrt{T})$ , for an arbitrary sequence of T convex cost functions (of bounded gradients), with respect to the best single decision in hindsight. In this paper, we give algorithms that achieve regret O(log?(T)) for an arbitrary sequence of strictly convex functions (with bounded first and second derivatives). This mirrors what has been done for the special cases of prediction from expert advice by Kivinen and Warmuth (EuroCOLT 1999), and Universal Portfolios by Cover (Math. Finance 1:1---19, 1991). We propose several algorithms achieving logarithmic regret, which besides being more general are also much more efficient to implement. The main new ideas give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field. Our analysis shows a surprising connection between the natural follow-the-leader approach and the Newton method. We also analyze other algorithms, which tie together several different previous approaches including follow-the-leader, exponential weighting, Cover's algorithm and gradient descent.

1,124 citations


Journal ArticleDOI
TL;DR: An improved algorithm that theoretically converges and avoids numerical difficulties is proposed for Platt’s probabilistic outputs for Support Vector Machines.
Abstract: Platt's probabilistic outputs for Support Vector Machines (Platt, J. in Smola, A., et al. (eds.) Advances in large margin classifiers. Cambridge, 2000) has been popular for applications that require posterior class probabilities. In this note, we propose an improved algorithm that theoretically converges and avoids numerical difficulties. A simple and ready-to-use pseudo code is included.

926 citations


Journal ArticleDOI
TL;DR: A re-derive of the variance reduction method known in experimental design circles as ‘A-optimality’ and comparisons against different variations of the most widely used heuristic schemes are run to discover which methods work best for different classes of problems and why.
Abstract: Which active learning methods can we expect to yield good performance in learning binary and multi-category logistic regression classifiers? Addressing this question is a natural first step in providing robust solutions for active learning across a wide variety of exponential models including maximum entropy, generalized linear, log-linear, and conditional random field models. For the logistic regression model we re-derive the variance reduction method known in experimental design circles as `A-optimality.' We then run comparisons against different variations of the most widely used heuristic schemes: query by committee and uncertainty sampling, to discover which methods work best for different classes of problems and why. We find that among the strategies tested, the experimental design methods are most likely to match or beat a random sample baseline. The heuristic alternatives produced mixed results, with an uncertainty sampling variant called margin sampling and a derivative method called QBB-MM providing the most promising performance at very low computational cost. Computational running times of the experimental design methods were a bottleneck to the evaluations. Meanwhile, evaluation of the heuristic methods lead to an accumulation of negative results. We explore alternative evaluation design parameters to test whether these negative results are merely an artifact of settings where experimental design methods can be applied. The results demonstrate a need for improved active learning methods that will provide reliable performance at a reasonable computational cost.

326 citations


Journal ArticleDOI
TL;DR: In this article, the authors present AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium), which is the first algorithm that is guaranteed to have these two properties in games with arbitrary numbers of actions and players.
Abstract: Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action (repeated) games--assuming that the opponent's mixed strategy is observable. Another algorithm, ReDVaLeR (which was introduced after the algorithm described in this paper), achieves the two properties in games with arbitrary numbers of actions and players, but still requires that the opponents' mixed strategies are observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players. It is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others' strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. We provide experimental results that suggest that AWESOME converges fast in practice. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing future multiagent learning algorithms as well.

261 citations


Journal ArticleDOI
TL;DR: This paper introduces and analyzes a shifting Perceptron algorithm achieving the best known shifting bounds while using an unlimited budget, and shows that the randomized algorithm strikes the optimal trade-off between budget B and norm U of the largest classifier in the comparison sequence.
Abstract: Shifting bounds for on-line classification algorithms ensure good performance on any sequence of examples that is well predicted by a sequence of changing classifiers. When proving shifting bounds for kernel-based classifiers, one also faces the problem of storing a number of support vectors that can grow unboundedly, unless an eviction policy is used to keep this number under control. In this paper, we show that shifting and on-line learning on a budget can be combined surprisingly well. First, we introduce and analyze a shifting Perceptron algorithm achieving the best known shifting bounds while using an unlimited budget. Second, we show that by applying to the Perceptron algorithm the simplest possible eviction policy, which discards a random support vector each time a new one comes in, we achieve a shifting bound close to the one we obtained with no budget restrictions. More importantly, we show that our randomized algorithm strikes the optimal trade-off $$U = \Theta(\sqrt{B})$$ between budget B and norm U of the largest classifier in the comparison sequence. Experiments are presented comparing several linear-threshold algorithms on chronologically-ordered textual datasets. These experiments support our theoretical findings in that they show to what extent randomized budget algorithms are more robust than deterministic ones when learning shifting target data streams.

182 citations


Journal ArticleDOI
TL;DR: A novel framework for the design and analysis of online learning algorithms based on the notion of duality in constrained optimization is described, able to tie the primal objective value and the number of prediction mistakes using the increase in the dual.
Abstract: We describe a novel framework for the design and analysis of online learning algorithms based on the notion of duality in constrained optimization. We cast a sub-family of universal online bounds as an optimization problem. Using the weak duality theorem we reduce the process of online learning to the task of incrementally increasing the dual objective function. The amount by which the dual increases serves as a new and natural notion of progress for analyzing online learning algorithms. We are thus able to tie the primal objective value and the number of prediction mistakes using the increase in the dual.

174 citations


Journal ArticleDOI
TL;DR: This article studied external regret in sequential prediction games with both positive and negative payoffs and derived new and sharper regret bounds for the well-known exponentially weighted average forecaster and for a second forecaster with a different multiplicative update rule.
Abstract: This work studies external regret in sequential prediction games with both positive and negative payoffs. External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. In this setting, we derive new and sharper regret bounds for the well-known exponentially weighted average forecaster and for a second forecaster with a different multiplicative update rule. Our analysis has two main advantages: first, no preliminary knowledge about the payoff sequence is needed, not even its range; second, our bounds are expressed in terms of sums of squared payoffs, replacing larger first-order quantities appearing in previous bounds. In addition, our most refined bounds have the natural and desirable property of being stable under rescalings and general translations of the payoff sequence.

166 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the convergence rate for kernel principal component analysis (KPCA) can typically be faster than n 1/2, where n is the number of kernels in the kernel Gram matrix.
Abstract: The main goal of this paper is to prove inequalities on the reconstruction error for kernel principal component analysis. With respect to previous work on this topic, our contribution is twofold: (1) we give bounds that explicitly take into account the empirical centering step in this algorithm, and (2) we show that a "localized" approach allows to obtain more accurate bounds. In particular, we show faster rates of convergence towards the minimum reconstruction error; more precisely, we prove that the convergence rate can typically be faster than n ?1/2. We also obtain a new relative bound on the error. A secondary goal, for which we present similar contributions, is to obtain convergence bounds for the partial sums of the biggest or smallest eigenvalues of the kernel Gram matrix towards eigenvalues of the corresponding kernel operator. These quantities are naturally linked to the KPCA procedure; furthermore these results can have applications to the study of various other kernel algorithms. The results are presented in a functional analytic framework, which is suited to deal rigorously with reproducing kernel Hilbert spaces of infinite dimension.

152 citations


Journal ArticleDOI
TL;DR: It is shown that, surprisingly, isotonic regression based calibration using the Pool Adjacent Violators algorithm is equivalent to the ROC convex hull method.
Abstract: Classifier calibration is the process of converting classifier scores into reliable probability estimates. Recently, a calibration technique based on isotonic regression has gained attention within machine learning as a flexible and effective way to calibrate classifiers. We show that, surprisingly, isotonic regression based calibration using the Pool Adjacent Violators algorithm is equivalent to the ROC convex hull method.

109 citations


Journal ArticleDOI
TL;DR: This work presents two generic approaches for constructing invariant kernels and proposes a more distinguishing treatment in particular in the active field of kernel methods for machine learning and pattern analysis, to enable a smooth interpolation between invariant and non-invariant pattern analysis.
Abstract: In many learning problems prior knowledge about pattern variations can be formalized and beneficially incorporated into the analysis system. The corresponding notion of invariance is commonly used in conceptionally different ways. We propose a more distinguishing treatment in particular in the active field of kernel methods for machine learning and pattern analysis. Additionally, the fundamental relation of invariant kernels and traditional invariant pattern analysis by means of invariant representations will be clarified. After addressing these conceptional questions, we focus on practical aspects and present two generic approaches for constructing invariant kernels. The first approach is based on a technique called invariant integration. The second approach builds on invariant distances. In principle, our approaches support general transformations in particular covering discrete and non-group or even an infinite number of pattern-transformations. Additionally, both enable a smooth interpolation between invariant and non-invariant pattern analysis, i.e. they are a covering general framework. The wide applicability and various possible benefits of invariant kernels are demonstrated in different kernel methods.

91 citations


Journal ArticleDOI
TL;DR: The theoretical importance of the SLMM model is demonstrated by showing that it generalizes existing approaches, such as SVMs and M4s, provides novel insight into learning models, and lays a foundation for conceiving other “structured” classifiers.
Abstract: This paper proposes a new large margin classifier--the structured large margin machine (SLMM)--that is sensitive to the structure of the data distribution. The SLMM approach incorporates the merits of "structured" learning models, such as radial basis function networks and Gaussian mixture models, with the advantages of "unstructured" large margin learning schemes, such as support vector machines and maxi-min margin machines. We derive the SLMM model from the concepts of "structured degree" and "homospace", based on an analysis of existing structured and unstructured learning models. Then, by using Ward's agglomerative hierarchical clustering on input data (or data mappings in the kernel space) to extract the underlying data structure, we formulate SLMM training as a sequential second order cone programming. Many promising features of the SLMM approach are illustrated, including its accuracy, scalability, extensibility, and noise tolerance. We also demonstrate the theoretical importance of the SLMM model by showing that it generalizes existing approaches, such as SVMs and M4s, provides novel insight into learning models, and lays a foundation for conceiving other "structured" classifiers.

Journal ArticleDOI
TL;DR: This paper presents a reformulation of this problem of learning an optimal kernel in a prescribed convex set of kernels within a feature space environment and relates this problem in a special case to regularization in the dual space of all continuous functions on a compact domain with values in a Hilbert space with a mix norm.
Abstract: In this paper, we continue our study of learning an optimal kernel in a prescribed convex set of kernels (Micchelli & Pontil, 2005) . We present a reformulation of this problem within a feature space environment. This leads us to study regularization in the dual space of all continuous functions on a compact domain with values in a Hilbert space with a mix norm. We also relate this problem in a special case to $${\cal L}^p$$ regularization.

Journal ArticleDOI
Peter Grünwald, John Langford1
TL;DR: It is shown that forms of Bayesian and MDL inference that are often applied to classification problems can be inconsistent, which means that there exists a learning problem such that for all amounts of data the generalization errors of the MDL classifier and the Bayes classifier relative to the Bayesian posterior both remain bounded away from the smallest achievable generalization error.
Abstract: We show that forms of Bayesian and MDL inference that are often applied to classification problems can be inconsistent. This means that there exists a learning problem such that for all amounts of data the generalization errors of the MDL classifier and the Bayes classifier relative to the Bayesian posterior both remain bounded away from the smallest achievable generalization error. From a Bayesian point of view, the result can be reinterpreted as saying that Bayesian inference can be inconsistent under misspecification, even for countably infinite models. We extensively discuss the result from both a Bayesian and an MDL perspective.

Journal ArticleDOI
TL;DR: There are new lower bounds for learning intersections of halfspaces, one of the most important concept classes in computational learning theory, and any statistical-query algorithm for learning the intersection of $2^{\varOmega (\sqrt{n}$) halfspaced in n dimensions must make queries, first non-trivial lower bound on the statistical query dimension for this concept class.
Abstract: We prove new lower bounds for learning intersections of halfspaces, one of the most important concept classes in computational learning theory. Our main result is that any statistical-query algorithm for learning the intersection of $\sqrt{n}$ halfspaces in n dimensions must make $2^{\varOmega (\sqrt{n})}$ queries. This is the first non-trivial lower bound on the statistical query dimension for this concept class (the previous best lower bound was n ?(log?n)). Our lower bound holds even for intersections of low-weight halfspaces. In the latter case, it is nearly tight. We also show that the intersection of two majorities (low-weight halfspaces) cannot be computed by a polynomial threshold function (PTF) with fewer than n ?(log?n/log?log?n) monomials. This is the first super-polynomial lower bound on the PTF length of this concept class, and is nearly optimal. For intersections of k=?(log?n) low-weight halfspaces, we improve our lower bound to $\min\{2^{\varOmega (\sqrt{n})},n^{\varOmega (k/\log k)}\},$ which too is nearly optimal. As a consequence, intersections of even two halfspaces are not computable by polynomial-weight PTFs, the most expressive class of functions known to be efficiently learnable via Jackson's Harmonic Sieve algorithm. Finally, we report our progress on the weak learnability of intersections of halfspaces under the uniform distribution.

Journal ArticleDOI
TL;DR: A new algorithm building an optimal dyadic decision tree (ODT) that combines guaranteed performance in the learning theoretical sense and optimal search from the algorithmic point of view and improves performance over classical approaches such as CART/C4.5.
Abstract: We introduce a new algorithm building an optimal dyadic decision tree (ODT). The method combines guaranteed performance in the learning theoretical sense and optimal search from the algorithmic point of view. Furthermore it inherits the explanatory power of tree approaches, while improving performance over classical approaches such as CART/C4.5, as shown on experiments on artificial and benchmark data.

Journal ArticleDOI
TL;DR: The main technical tool is a uniform convergence result for center based clustering that can be viewed as showing that the effective VC-dimension of k-center clustering equals k.
Abstract: We consider a framework of sample-based clustering. In this setting, the input to a clustering algorithm is a sample generated i.i.d by some unknown arbitrary distribution. Based on such a sample, the algorithm has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clustering algorithms that approximate the optimal clustering. We show that the K-median clustering, as well as K-means and the Vector Quantization problems, satisfy these conditions. Our results apply to the combinatorial optimization setting where, assuming that sampling uniformly over an input set can be done in constant time, we get a sampling-based algorithm for the K-median and K-means clustering problems that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the dependence of the running time of our algorithm on the Euclidean dimension is only linear. Our main technical tool is a uniform convergence result for center based clustering that can be viewed as showing that the effective VC-dimension of k-center clustering equals k.

Journal ArticleDOI
Tadeusz Pietraszek1
TL;DR: This work proposes a method to optimally build a specific type of abstaining binary classifiers using ROC analysis and presents a simple and efficient algorithm for finding the optimal classifier in these models, namely, the bounded-abstention and bounded-improvement models.
Abstract: Classifiers that refrain from classification in certain cases can significantly reduce the misclassification cost. However, the parameters for such abstaining classifiers are often set in a rather ad-hoc manner. We propose a method to optimally build a specific type of abstaining binary classifiers using ROC analysis. These classifiers are built based on optimization criteria in the following three models: cost-based, bounded-abstention and bounded-improvement. We show that selecting the optimal classifier in the first model is similar to known iso-performance lines and uses only the slopes of ROC curves, whereas selecting the optimal classifier in the remaining two models is not straightforward. We investigate the properties of the convex-down ROCCH (ROC Convex Hull) and present a simple and efficient algorithm for finding the optimal classifier in these models, namely, the bounded-abstention and bounded-improvement models. We demonstrate the application of these models to effectively reduce misclassification cost in real-life classification systems. The method has been validated with an ROC building algorithm and cross-validation on 15 UCI KDD datasets.

Journal ArticleDOI
TL;DR: This model encompasses the original RVM as a special case, but the empirical results show that it can surpass RVM performance in terms of goodness of fit and achieved sparsity as well as computational performance in many cases.
Abstract: Enforcing sparsity constraints has been shown to be an effective and efficient way to obtain state-of-the-art results in regression and classification tasks. Unlike the support vector machine (SVM) the relevance vector machine (RVM) explicitly encodes the criterion of model sparsity as a prior over the model weights. However the lack of an explicit prior structure over the weight variances means that the degree of sparsity is to a large extent controlled by the choice of kernel (and kernel parameters). This can lead to severe overfitting or oversmoothing--possibly even both at the same time (e.g. for the multiscale Doppler data). We detail an efficient scheme to control sparsity in Bayesian regression by incorporating a flexible noise-dependent smoothness prior into the RVM. We present an empirical evaluation of the effects of choice of prior structure on a selection of popular data sets and elucidate the link between Bayesian wavelet shrinkage and RVM regression. Our model encompasses the original RVM as a special case, but our empirical results show that we can surpass RVM performance in terms of goodness of fit and achieved sparsity as well as computational performance in many cases. The code is freely available.

Journal ArticleDOI
TL;DR: A modular approach for achieving effective agent-centric learning in multi-agent systems that consists of a number of basic algorithmic building blocks, which can be instantiated and composed differently depending on the environment setting as well as the target class of opponents.
Abstract: We offer a new formal criterion for agent-centric learning in multi-agent systems, that is, learning that maximizes one's rewards in the presence of other agents who might also be learning (using the same or other learning algorithms). This new criterion takes in as a parameter the class of opponents. We then provide a modular approach for achieving effective agent-centric learning; the approach consists of a number of basic algorithmic building blocks, which can be instantiated and composed differently depending on the environment setting (for example, 2- versus n-player games) as well as the target class of opponents. We then provide several specific instances of the approach: an algorithm for stationary opponents, and two algorithms for adaptive opponents with bounded memory, one algorithm for the n-player case and another optimized for the 2-player case. We prove our algorithms correct with respect to the formal criterion, and furthermore show the algorithms to be experimentally effective via comprehensive computer testing.

Journal ArticleDOI
TL;DR: A novel anytime classification algorithm, anytime averaged probabilistic estimators (AAPE), which is capable of delivering strong prediction accuracy with little CPU time and utilizing additional CPU time to increase classification accuracy.
Abstract: In many online applications of machine learning, the computational resources available for classification will vary from time to time. Most techniques are designed to operate within the constraints of the minimum expected resources and fail to utilize further resources when they are available. We propose a novel anytime classification algorithm, anytime averaged probabilistic estimators (AAPE), which is capable of delivering strong prediction accuracy with little CPU time and utilizing additional CPU time to increase classification accuracy. The idea is to run an ordered sequence of very efficient Bayesian probabilistic estimators (single improvement steps) until classification time runs out. Theoretical studies and empirical validations reveal that by properly identifying, ordering, invoking and ensembling single improvement steps, AAPE is able to accomplish accurate classification whenever it is interrupted. It is also able to output class probability estimates beyond simple 0/1-loss classifications, as well as adeptly handle incremental learning.

Journal ArticleDOI
TL;DR: In this paper, the authors propose to represent monadic queries by bottom-up deterministic Node Selecting Tree Transducers (NSTTs), a particular class of tree automata that they introduce.
Abstract: We develop new algorithms for learning monadic node selection queries in unranked trees from annotated examples, and apply them to visually interactive Web information extraction. We propose to represent monadic queries by bottom-up deterministic Node Selecting Tree Transducers (NSTTs), a particular class of tree automata that we introduce. We prove that deterministic NSTTs capture the class of queries definable in monadic second order logic (MSO) in trees, which Gottlob and Koch (2002) argue to have the right expressiveness for Web information extraction, and prove that monadic queries defined by NSTTs can be answered efficiently. We present a new polynomial time algorithm in RPNI-style that learns monadic queries defined by deterministic NSTTs from completely annotated examples, where all selected nodes are distinguished. In practice, users prefer to provide partial annotations. We propose to account for partial annotations by intelligent tree pruning heuristics. We introduce pruning NSTTs--a formalism that shares many advantages of NSTTs. This leads us to an interactive learning algorithm for monadic queries defined by pruning NSTTs, which satisfies a new formal active learning model in the style of Angluin (1987). We have implemented our interactive learning algorithm integrated it into a visually interactive Web information extraction system--called SQUIRREL--by plugging it into the Mozilla Web browser. Experiments on realistic Web documents confirm excellent quality with very few user interactions during wrapper induction.

Proceedings ArticleDOI
TL;DR: This paper details the implementation of a persistent union-find data structure as efficient as its imperative counterpart and is a significant example of a data structure whose side effects are safely hidden behind a persistent interface.
Abstract: The problem of disjoint sets, also known as union-find, consists in maintaining a partition of a finite set within a data structure This structure provides two operations: a function find returning the class of an element and a function union merging two classes An optimal and imperative solution is known since 1975 However, the imperative nature of this data structure may be a drawback when it is used in a backtracking algorithm This paper details the implementation of a persistent union-find data structure as efficient as its imperative counterpart To achieve this result, our solution makes heavy use of imperative features and thus it is a significant example of a data structure whose side effects are safely hidden behind a persistent interface To strengthen this last claim, we also detail a formalization using the Coq proof assistant which shows both the correctness of our solution and its observational persistence

Proceedings ArticleDOI
TL;DR: An overview of the design and report on the status of the implementation effort of Manticore, a heterogeneous language that supports parallelism at multiple levels that combines CML-style explicit concurrency with fine-grain, implicitly threaded, parallel constructs.
Abstract: The Manticore project is an effort to design and implement a new functional language for parallel programming. Unlike many earlier parallel languages, Manticore is a heterogeneous language that supports parallelism at multiple levels. Specifically, we combine CML-style explicit concurrency with fine-grain, implicitly threaded, parallel constructs. We have been working on an implementation of Manticore for the past six months; this paper gives an overview of our design and a report on the status ofthe implementation effort.

Journal ArticleDOI
TL;DR: Like other stochastic algorithms, ASAMC requires longer training time than do the gradient-based algorithms, but provides, however, an efficient approach to train MLPs for which the energy landscape is rugged.
Abstract: We propose a general-purpose stochastic optimization algorithm, the so-called annealing stochastic approximation Monte Carlo (ASAMC) algorithm, for neural network training. ASAMC can be regarded as a space annealing version of the stochastic approximation Monte Carlo (SAMC) algorithm. Under mild conditions, we show that ASAMC can converge weakly at a rate of ? $(1/\sqrt{t})$ toward a neighboring set (in the space of energy) of the global minimizers. ASAMC is compared with simulated annealing, SAMC, and the BFGS algorithm for training MLPs on a number of examples. The numerical results indicate that ASAMC outperforms the other algorithms in both training and test errors. Like other stochastic algorithms, ASAMC requires longer training time than do the gradient-based algorithms. It provides, however, an efficient approach to train MLPs for which the energy landscape is rugged.

Journal ArticleDOI
TL;DR: This work proposes an EM-based algorithm that estimates bidders’ valuation distributions and the distribution over the true number of bidder significantly more accurately than more straightforward density estimation techniques.
Abstract: There is much active research into the design of automated bidding agents, particularly for environments that involve multiple decoupled auctions. These settings are complex partly because an agent's strategy depends on information about other bidders'interests. When bidders' valuation distributions are not known ex ante, machine learning techniques can be used to approximate them from historical data. It is a characteristic feature of auctions, however, that information about some bidders'valuations is systematically concealed. This occurs in the sense that some bidders may fail to bid at all because the asking price exceeds their valuations, and also in the sense that a high bidder may not be compelled to reveal her valuation. Ignoring these "hidden bids" can introduce bias into the estimation of valuation distributions. To overcome this problem, we propose an EM-based algorithm. We validate the algorithm experimentally using agents that react to their environments both decision-theoretically and game-theoretically, using both synthetic and real-world (eBay) datasets. We show that our approach estimates bidders' valuation distributions and the distribution over the true number of bidders significantly more accurately than more straightforward density estimation techniques.

Journal ArticleDOI
TL;DR: It is proved that these estimators achieve the global minimax risk over sets of functions built from Vapnik-Chervonenkis classes, which can be seen as special examples of bootstrap type penalties.
Abstract: We consider the binary classification problem Given an iid sample drawn from the distribution of an ?×{0,1}?valued random pair, we propose to estimate the so-called Bayes classifier by minimizing the sum of the empirical classification error and a penalty term based on Efron's or iid weighted bootstrap samples of the data We obtain exponential inequalities for such bootstrap type penalties, which allow us to derive non-asymptotic properties for the corresponding estimators In particular, we prove that these estimators achieve the global minimax risk over sets of functions built from Vapnik-Chervonenkis classes The obtained results generalize Koltchinskii (2001) and Bartlett et al's (2002) ones for Rademacher penalties that can thus be seen as special examples of bootstrap type penalties To illustrate this, we carry out an experimental study in which we compare the different methods for an intervals model selection problem

Journal ArticleDOI
TL;DR: This paper proposes a new boosting algorithm, named “MSmoothBoost”, which introduces a smoothing mechanism into the boosting procedure to explicitly address the overfitting problem with AdaBoost.OC.
Abstract: AdaBoost.OC has been shown to be an effective method in boosting "weak" binary classifiers for multi-class learning. It employs the Error-Correcting Output Code (ECOC) method to convert a multi-class learning problem into a set of binary classification problems, and applies the AdaBoost algorithm to solve them efficiently. One of the main drawbacks with the AdaBoost.OC algorithm is that it is sensitive to the noisy examples and tends to overfit training examples when they are noisy. In this paper, we propose a new boosting algorithm, named "MSmoothBoost", which introduces a smoothing mechanism into the boosting procedure to explicitly address the overfitting problem with AdaBoost.OC. We proved the bounds for both the empirical training error and the marginal training error of the proposed boosting algorithm. Empirical studies with seven UCI datasets and one real-world application have indicated that the proposed boosting algorithm is more robust and effective than the AdaBoost.OC algorithm for multi-class learning.

Journal ArticleDOI
TL;DR: A context-free grammatical inference algorithm operating on positive data only is described, which integrates an information theoretic constituent likelihood measure together with more traditional heuristics based on substitutability and frequency.
Abstract: This paper describes the winning entry to the Omphalos context free grammar learning competition. We describe a context-free grammatical inference algorithm operating on positive data only, which integrates an information theoretic constituent likelihood measure together with more traditional heuristics based on substitutability and frequency. The competition is discussed from the perspective of a competitor. We discuss a class of deterministic grammars, the Non-terminally Separated (NTS) grammars, that have a property relied on by our algorithm, and consider the possibilities of extending the algorithm to larger classes of languages.

Journal ArticleDOI
TL;DR: Under the statistical statement of machine translation, how modeling, learning and search problems can be solved by using stochastic finite-state transducers is overviewed and the results achieved by the systems developed under this paradigm are reviewed.
Abstract: In formal language theory, finite-state transducers are well-know models for simple "input-output" mappings between two languages. Even if more powerful, recursive models can be used to account for more complex mappings, it has been argued that the input-output relations underlying most usual natural language pairs can essentially be modeled by finite-state devices. Moreover, the relative simplicity of these mappings has recently led to the development of techniques for learning finite-state transducers from a training set of input-output sentence pairs of the languages considered. In the last years, these techniques have lead to the development of a number of machine translation systems. Under the statistical statement of machine translation, we overview here how modeling, learning and search problems can be solved by using stochastic finite-state transducers. We also review the results achieved by the systems we have developed under this paradigm. As a main conclusion of this review we argue that, as task complexity and training data scarcity increase, those systems which rely more on statistical techniques tend produce the best results.

Proceedings ArticleDOI
TL;DR: The design and specification of JavaScript is described and the experience so far using Standard ML for this purpose is described.
Abstract: The Ecma TC39-TG1 working group is using ML as the specification language for the next generation of JavaScript, the popular programming language for browser-based web applications. This "definitional interpreter" serves many purposes: a high-level and readable specification language, an executable and testable specification, a reference implementation, and an aid in driving the design process. We describe the design and specification of JavaScript and our experience so far using Standard ML for this purpose.