scispace - formally typeset
Search or ask a question

Showing papers on "Generalization published in 1999"


Journal ArticleDOI
TL;DR: This paper addresses two crucial issues which have been considered to be a 'black art' in classification tasks ever since the introduction of stacked generalization: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input.
Abstract: Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy In this paper we address two crucial issues which have been considered to be a 'black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones We demonstrate the effectiveness of stacked generalization for combining three different types of learning algorithms for classification tasks We also compare the performance of stacked generalization with majority vote and published results of arcing and bagging

662 citations


Book
01 Jun 1999
TL;DR: In this article, a simple generalization of the traditional query optimization algorithm is proposed to optimize queries in the presence of materialized views. But, the optimization problem is not addressed in this paper.
Abstract: While much work has addressed the problem of maintaining materialized views, the important question of optimizing queries in the presence of materialised views has not been resolved. In this paper, we analyze the optimization question and provide a comprehensive and efficient solution. Our solution has the desirable property that it is a simple generalization of the traditional query optimization algorithm. >

410 citations



Book ChapterDOI
01 Aug 1999
TL;DR: The Analytic Hierarchy Process (AHP) as discussed by the authors provides objective mathematics to process the inescapably subjective and personal preferences of an individual or a group in making a decision, and the Analytic Network Process (ANP) is a generalization of the AHP.
Abstract: The Analytic Hierarchy Process (AHP) provides the objective mathematics to process the inescapably subjective and personal preferences of an individual or a group in making a decision. With the AHP and its generalization, the Analytic Network Process (ANP), one constructs hierarchies or feedback networks, then makes judgments or performs measurements on pairs of elements with respect to a controlling element to derive ratio scales that are then synthesized throughout the structure to select the best alternative.

290 citations


Journal ArticleDOI
TL;DR: In this paper, a generalization of the stability of the Jensen's equation has been proposed in the spirit of Hyers, Ulam, Rassias, and Gavruta.

190 citations


01 Jan 1999
TL;DR: In this paper, a generalized resolution criterion is defined and used for assessing non-regular fractional factorials, notably Plackett-Burman designs, which is intended to capture projection properties, complementing that of Webb (1964) whose concept of resolution concerns the estimability of lower order fractional fractional factors under the assumption that higher order effects are negligible.
Abstract: Resolution has been the most widely used criterion for comparing regular fractional factorials since it was introduced in 1961 by Box and Hunter. In this pa- per, we examine how a generalized resolution criterion can be defined and used for assessing nonregular fractional factorials, notably Plackett-Burman designs. Our generalization is intended to capture projection properties, complementing that of Webb (1964) whose concept of resolution concerns the estimability of lower order ef- fects under the assumption that higher order effects are negligible. Our generalized resolution provides a fruitful criterion for ranking different designs while Webb's resolution is mainly useful as a classification rule. An additional advantage of our approach is that the idea leads to a natural generalization of minimum aberration. Examples are given to illustrate the usefulness of the new criteria.

188 citations


Journal ArticleDOI
TL;DR: This paper extends the concept of M-convex function to functions on generalized polymatroids with a view to providing a unified framework for efficiently solvable nonlinear discrete optimization problems.
Abstract: The concept of M-convex function, introduced by Murota 1996, is a quantitative generalization of the set of integral points in an integral base polyhedron as well as an extension of valuated matroid of Dress and Wenzel 1990. In this paper, we extend this concept to functions on generalized polymatroids with a view to providing a unified framework for efficiently solvable nonlinear discrete optimization problems.

187 citations


Journal ArticleDOI
TL;DR: Empirical comparisons between model selection using VC-bounds and classical methods are performed for various noise levels, sample size, target functions and types of approximating functions, demonstrating the advantages of VC-based complexity control with finite samples.
Abstract: It is well known that for a given sample size there exists a model of optimal complexity corresponding to the smallest prediction (generalization) error. Hence, any method for learning from finite samples needs to have some provisions for complexity control. Existing implementations of complexity control include penalization (or regularization), weight decay (in neural networks), and various greedy procedures (aka constructive, growing, or pruning methods). There are numerous proposals for determining optimal model complexity (aka model selection) based on various (asymptotic) analytic estimates of the prediction risk and on resampling approaches. Nonasymptotic bounds on the prediction risk based on Vapnik-Chervonenkis (VC)-theory have been proposed by Vapnik. This paper describes application of VC-bounds to regression problems with the usual squared loss. An empirical study is performed for settings where the VC-bounds can be rigorously applied, i.e., linear models and penalized linear models where the VC-dimension can be accurately estimated, and the empirical risk can be reliably minimized. Empirical comparisons between model selection using VC-bounds and classical methods are performed for various noise levels, sample size, target functions and types of approximating functions. Our results demonstrate the advantages of VC-based complexity control with finite samples.

186 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied a sequence of generalizations of the Tsetlin library and developed a formula analogous to Theorem 1.1 for the distinct eigenvalues and multiplicities for this more general class of Markov chains.
Abstract: 1. Introduction. Imagine a collection of books labeled 1 through n arranged in a row in some order. We reorganize the row of books by successively choosing a book at random: choosing book i with probability w i and moving it to the front of the row. This \" move-to-front rule \" determines an interesting Markov chain on the set of arrangements of the books. If σ and τ denote any two orderings of the books, then the probability of transition from σ to τ is w i if and only if τ is obtained from σ by moving book i to the front. This Markov chain is commonly called the Tsetlin library or move-to-front scheme. Due to its use in computer science as a standard scheme for dynamic file maintenance as well as cache maintenance (cf. [Do], [FHo], and [P]), the move-to-front rule is a very well-studied Markov chain. A primary resource for this problem is Fill's comprehensive paper [F], which derives the transition probabilities for any number of steps of the chain and the eigenvalues with corresponding idempotents and discusses the rate of convergence to stationarity. Its thorough bibliography contains a wealth of pointers to the relevant literature. Of particular interest is the spectrum of this Markov chain. In general, knowledge of the eigenvalues for the transition matrix of a Markov chain can give some indication of the rate at which the chain converges to its equilibrium distribution. In the case of the Tsetlin library, the eigenvalues have an elegant formula, discovered (independently) Theorem 1.1. The distinct eigenvalues for the move-to-front rule are indexed by subsets A ⊆ {1,. .. , n} and given by λ A = i∈A w i. The multiplicity of λ A is the number of derangements (permutations with no fixed points) of the set {1,. .. , n − |A|} In this paper we study a sequence of generalizations of the Tsetlin library, culminating in a generalization of the setting of central hyperplane arrangements. In each case we develop a formula analogous to Theorem 1.1 for the distinct eigenvalues and multiplicities for this more general class of Markov chains. Our first generalization comes from viewing move-to-front as the operation of moving the books in the subset {i} to the front and then moving the subset [n] − {i} behind {i} while retaining their relative order; this is all done with probability w i …

183 citations


Journal ArticleDOI
TL;DR: It is shown how the generalization error can be used to select the number of principal components in two analyses of functional magnetic resonance imaging activation sets.

176 citations


Journal ArticleDOI
TL;DR: A generalization of the classical Gruss's integral inequality in inner product spaces is given in this article, where applications for positive linear functionals and integrals are also pointed out.

Journal ArticleDOI
TL;DR: A generalization of Ostrowski's inequality for lipschitzian mappings and applications in Numerical Analysis and for Euler's Beta function are given in this paper, where the authors also consider the use of ODEs in NER.
Abstract: A generalization of Ostrowski's inequality for lipschitzian mappings and applications in Numerical Analysis and for Euler's Beta function are given.

Journal ArticleDOI
01 Jun 1999
TL;DR: In this article, it was shown that when the function is convex, the generalized Bernstein polynomials Bn are monotonic in n, as in the classical case.
Abstract: This paper is concerned with a generalization of the classical Bernstein polynomials where the function is evaluated at intervals which are in geometric progression. It is shown that, when the function is convex, the generalized Bernstein polynomials Bn are monotonic in n, as in the classical case.

Journal ArticleDOI
TL;DR: This work investigates the generalization performance of support vector machines (SVMs), which have been recently introduced as a general alternative to neural networks, and finds that SVMs overfit only weakly.
Abstract: Using methods of Statistical Physics, we investigate the generalization performance of support vector machines (SVMs), which have been recently introduced as a general alternative to neural networks. For nonlinear classification rules, the generalization error saturates on a plateau, when the number of examples is too small to properly estimate the coefficients of the nonlinear part. When trained on simple rules, we find that SVMs overfit only weakly. The performance of SVMs is strongly enhanced, when the distribution of the inputs has a gap in feature space.

01 Jan 1999
TL;DR: This paper presents a framework and an architecture that provides a canonical generalization of algorithms, i.e., if the generalized algorithms are run on a single table database, they give the same results as their single-table counterparts.
Abstract: An important aspect of data mining algorithms and systems is that they should scale well to large databases. A consequence of this is that most data mining tools are based on machine learning algorithms that work on data in attribute-value format. Experience has proven that such 'single-table' mining algorithms indeed scale well. The downside of this format is, however, that more complex patterns are simply not expressible in this format and, thus, cannot be discovered. One way to enlarge the expressiveness is to generalize, as in ILP, from one-table mining to multiple table mining, i.e., to support mining on full relational databases. The key step in such a generalization is to ensure that the search space does not explode and that efficiency and, thus, scalability are maintained. In this paper we present a framework and an architecture that provide such a generalization. In this framework the semantic information in the database schema, e.g., foreign keys, are exploited to prune the search space and, in the architecture, database primitives are defined to ensure efficiency. Moreover, the framework induces a canonical generalization of algorithms, i.e., if the generalized algorithms are run on a single table database, they give the same results as their single-table counterparts. The framework is illustrated by the Warmr algorithm, which is a multi-relational generalization of the Apriori algorithm.

Book ChapterDOI
01 Jan 1999
TL;DR: Three results are obtruned, which provide partial remedies for shortcomings in Hilbert series and degree bounds in the modular case and are a generalization of Goobel’s degree bound to the case of monomial representations.
Abstract: The Hilbert series and degree bounds play significant roles in computational invariant theory In the modular case, neither of these tools is avrulable in general In this article three results are obtruned, which provide partial remedies for these shortcomings First, it is shown that the so-called extended Hilbert series, which can always be calculated by a MoHen type formula, yields strong constraints on the degrees of primary invariants Then it is shown that for a trivial source module the (ordinary) Hilbert series coincides with that of a lift to characteristic 0 and can hence be calculated by MoHen’s formula The last result is a generalization of Goobel’s degree bound to the case of monomial representations

Journal ArticleDOI
TL;DR: A generalization of Blaschke's Rolling Theorem for not necessarily convex sets is proved in this article, which exhibits an intimate connection between a generalized notion of convexity, various concepts in mathematical morphology and image processing, and a certain smoothness condition.
Abstract: A generalization of Blaschke's Rolling Theorem for not necessarily convex sets is proved that exhibits an intimate connection between a generalized notion of convexity, various concepts in mathematical morphology and image processing, and a certain smoothness condition As a consequence a geometric characterization of Serra's regular model is obtained and various problems in image processing arisng from the smoothing of surfaces with Sternberg's rolling ball algorithm are addressed Copyright © 1999 John Wiley & Sons, Ltd

Journal ArticleDOI
TL;DR: In this article, the pointwise ergodic theorem for general locally compact amenable groups along Folner sequences was shown to hold for all amenable group along folner sequences that obey some restrictions.
Abstract: In this paper we prove the pointwise ergodic theorem for general locally compact amenable groups along Folner sequences that obey some restrictions. These restrictions are mild enough so that such sequences exist for all amenable groups. We also prove a generalization of the Shannon-McMillan-Breiman theorem to all discrete amenable groups. -->

Journal ArticleDOI
TL;DR: In this paper, the concept of pseudo-degrees of freedom (PDF) is defined for all models which assume independent and identically distributed errors, including variable selection and partial least squares (PLS).
Abstract: This paper considers models that are relatively complex considering the extent of the calibration data. Such data frequently arise in chemometric applications, near-infrared spectroscopy (NIRS) being a well-known example. Commonly used models are multiple linear regression (MLR) with variable selection or partial least squares (PLS) regression. The concept of degrees of freedom is undefined for such models; this paper proposes a definition for pseudo-degrees of freedom (PDF) based on predictive performance and an analogy with the standard linear model. The generalization is intended for all models which assume independent and identically distributed errors. Pseudo-degrees of freedom are very easily calculated from ordinary and cross-validation residuals. An example from a real-life NIRS application is given to illustrate the new concept. Copyright © 1999 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A general bound for a class of approximation schemes that include radial basis functions and multilayer perceptrons is proved and it is shown how the total error can be decomposed into two parts: an approximation part that is due to the finite number of parameters of the approximation scheme used.
Abstract: We consider the problem of approximating functions from scattered data using linear superpositions of non-linearly parameterized functions. We show how the total error (generalization error) can be decomposed into two parts: an approximation part that is due to the finite number of parameters of the approximation scheme used; and an estimation part that is due to the finite number of data available. We bound each of these two parts under certain assumptions and prove a general bound for a class of approximation schemes that include radial basis functions and multilayer perceptrons.

Proceedings Article
29 Nov 1999
TL;DR: This paper argues that two apparently distinct modes of generalizing concepts - abstracting rules and computing similarity to exemplars - should both be seen as special cases of a more general Bayesian learning framework.
Abstract: This paper argues that two apparently distinct modes of generalizing concepts - abstracting rules and computing similarity to exemplars - should both be seen as special cases of a more general Bayesian learning framework. Bayes explains the specific workings of these two modes - which rules are abstracted, how similarity is measured - as well as why generalization should appear rule - or similarity-based in different situations. This analysis also suggests why the rules/similarity distinction, even if not computationally fundamental, may still be useful at the algorithmic level as part of a principled approximation to fully Bayesian learning.

Journal ArticleDOI
TL;DR: A new machine learning method that, given a set of training examples, induces a definition of the target concept in terms of a hierarchy of intermediate concepts and their definitions, which effectively decomposes the problem into smaller, less complex problems.

Journal ArticleDOI
TL;DR: In this paper, the authors examine some assumptions and results of cartographic line simplification in the digital realm, focusing upon two major aspects of map generalization-scale-specificity and the concept of characteristic points.
Abstract: This paper examines some assumptions and results of cartographic line simplification in the digital realm, focusing upon two major aspects of map generalization-scale-specificity and the concept of characteristic points. These are widely regarded as critical controls to generalization but, in our estimation, they are rarely well considered or properly applied. First, a look at how scale and shape are treated in various research papers identifies some important conceptual and methodological issues that either have been misconstrued or inadequately treated. We then conduct an empirical analysis with a set of line generalization experiments that control resolution, detail, and sinuosity using four source datasets. The tests yield about 100 different versions of two island coastlines digitized at two scales, exploring systematically the consequences of linking scale with spatial resolution as well as a variety of point selection strategies. The generalized results are displayed (at scale and enlarged) along w...

Journal ArticleDOI
TL;DR: A new generalization of the DST is put forward that gives a fuzzy-valued definition for the belief, plausibility, and probability functions over a finite referential set that is capable of modeling the uncertainties in the real world and eliminate the need for extra preassumptions and preprocessing.
Abstract: The Dempster-Shafer theory (DST) may be considered as a generalization of the probability theory, which assigns mass values to the subsets of the referential set and suggests an interval-valued probability measure. There have been several attempts for fuzzy generalization of the DST by assigning mass (probability) values to the fuzzy subsets of the referential set. The interval-valued probability measures thus obtained are not equivalent to the original fuzzy body of evidence. In this paper, a new generalization of the DST is put forward that gives a fuzzy-valued definition for the belief, plausibility, and probability functions over a finite referential set. These functions are all equivalent to one another and to the original fuzzy body of evidence. The advantage of the proposed model is shown in three application examples. It can be seen that the proposed generalization is capable of modeling the uncertainties in the real world and eliminate the need for extra preassumptions and preprocessing.

Journal ArticleDOI
TL;DR: A generalization of k-order additive discrete fuzzy measures recently introduced by Grabisch is shown and connection of the proposed generalization with the general Mobius transform of Shafer is shown.
Abstract: A generalization of k-order additive discrete fuzzy measures recently introduced by Grabisch is shown. k-order additive fuzzy measures on general spaces are introduced. Connection of the proposed generalization with the general Mobius transform of Shafer is shown. General evaluation formula for the Choquet integral is given. Further generalizations concerning the type of applied arithmetics are proposed.

Journal ArticleDOI
TL;DR: A necessary and sufficient condition is given for the exact reduction of systems modeled by linear fractional transformations on structured operator sets, based on the existence of a rank-deficient solution to either of a pair of linear matrix inequalities which generalize Lyapunov equations.
Abstract: A necessary and sufficient condition is given for the exact reduction of systems modeled by linear fractional transformations (LFTs) on structured operator sets. This condition is based on the existence of a rank-deficient solution to either of a pair of linear matrix inequalities which generalize Lyapunov equations; the notion of Gramians is thus also generalized to uncertain systems, as well as Kalman-like decomposition structures. A related minimality condition, the converse of the reducibility condition, may then be inferred from these results and the equivalence class of all minimal LFT realizations defined. These results comprise the first stage of a complete generalization of realization theory concepts to uncertain systems. Subsequent results, such as the definition of and rank tests on structured controllability and observability matrices are also given. The minimality results described are applicable to multidimensional system realizations as well as to uncertain systems; connections to formal powers series representations also exist.

Journal ArticleDOI
TL;DR: In this article, a method known as conflict probability estimation (CPE) is presented to estimate the probability of conflict for pairs of aircraft with uncertain predicted trajectories, which is a generalization of a generalisation of a...
Abstract: A method known as conflict probability estimation (CPE) is presented to estimate the probability of conflict for pairs of aircraft with uncertain predicted trajectories. It is a generalization of a...

Journal Article
Michael Rosen1
TL;DR: Mertens' theorem about the partial product of the Riemann zeta function at s = 1 has been proved in this paper, where the authors show that it is a theorem that is applicable to the case of R.
Abstract: Mertens' proved an interesting and useful theorem about the partial product of the Riemann zeta function at s = 1. Namely.

Journal ArticleDOI
TL;DR: This paper generalizes the interpolant using Voronoi diagrams in two directions: one is to general-dimensional data, and the other is to data distributed continuously on curves.
Abstract: Recently, the authors found an interpolant using Voronoi diagrams that differs from Sibson's interpolant. This paper generalizes our interpolant in two directions: one is to general-dimensional data, and the other is to data distributed continuously on curves. The Minkowski's theorem is used as the basic principle in generalization.

01 Mar 1999
TL;DR: A generalization of the classical Gruss integral inequality in inner product spaces is given in this article, where applications for positive linear functionals and integrals are also pointed out, and a generalisation of the Gruss' integral inequality is discussed.
Abstract: A generalization of the classical Gruss' integral inequality in inner product spaces is given. Some applications for positive linear functionals and integrals are also pointed out.