scispace - formally typeset
Search or ask a question

Showing papers on "Generalization published in 2003"


Journal ArticleDOI
TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.
Abstract: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.

6,832 citations


Posted Content
TL;DR: In this article, the authors used quantum walks to construct a new O(n 2/3 ) query quantum algorithm for element distinctness and its generalization, which was later improved to O(k/(k+1) ).
Abstract: We use quantum walks to construct a new quantum algorithm for element distinctness and its generalization. For element distinctness (the problem of finding two equal items among N given items), we get an O(N^{2/3}) query quantum algorithm. This improves the previous O(N^{3/4}) query quantum algorithm of Buhrman et.al. (quant-ph/0007016) and matches the lower bound by Shi (quant-ph/0112086). The algorithm also solves the generalization of element distinctness in which we have to find k equal items among N items. For this problem, we get an O(N^{k/(k+1)}) query quantum algorithm.

524 citations


Journal ArticleDOI
TL;DR: The authors present a theory of how relational inference and generalization can be accomplished within a cognitive architecture that is psychologically and neurally realistic and demonstrate the sufficiency of the model by using it to simulate a body of empirical phenomena concerning analogical inference and relational generalization.
Abstract: The authors present a theory of how relational inference and generalization can be accomplished within a cognitive architecture that is psychologically and neurally realistic. Their proposal is a form of symbolic connectionism: a connectionist system based on distributed representations of concept meanings, using temporal synchrony to bind fillers and roles into relational structures. The authors present a specific instantiation of their theory in the form of a computer simulation model, Learning and Inference with Schemas and Analogies (LISA). By using a kind of self-supervised learning, LISA can make specific inferences and form new relational generalizations and can hence acquire new schemas by induction from examples. The authors demonstrate the sufficiency of the model by using it to simulate a body of empirical phenomena concerning analogical inference and relational generalization.

491 citations


Journal ArticleDOI
TL;DR: It is suggested that basis elements representing the internal model of dynamics are sensitive to limb velocity with bimodal tuning; however, it is also possible that during adaptation the error metric itself adapts, which affects the implied shape of the basis elements.
Abstract: During reaching movements, the brain's internal models map desired limb motion into predicted forces. When the forces in the task change, these models adapt. Adaptation is guided by generalization: errors in one movement influence prediction in other types of movement. If the mapping is accomplished with population coding, combining basis elements that encode different regions of movement space, then generalization can reveal the encoding of the basis elements. We present a theory that relates encoding to generalization using trial-by-trial changes in behavior during adaptation. We consider adaptation during reaching movements in various velocity-dependent force fields and quantify how errors generalize across direction. We find that the measurement of error is critical to the theory. A typical assumption in motor control is that error is the difference between a current trajectory and a desired trajectory (DJ) that does not change during adaptation. Under this assumption, in all force fields that we examined, including one in which force randomly changes from trial to trial, we found a bimodal generalization pattern, perhaps reflecting basis elements that encode direction bimodally. If the DJ was allowed to vary, bimodality was reduced or eliminated, but the generalization function accounted for nearly twice as much variance. We suggest, therefore, that basis elements representing the internal model of dynamics are sensitive to limb velocity with bimodal tuning; however, it is also possible that during adaptation the error metric itself adapts, which affects the implied shape of the basis elements.

440 citations


Journal ArticleDOI
TL;DR: For regression problems, based on scale space theory, it is demonstrated the existence of a certain range of σ, within which the generalization performance is stable, and an appropriate σ within the range can be achieved via dynamic evaluation.

359 citations


Proceedings ArticleDOI
20 Jul 2003
TL;DR: This paper presents a framework for exact incremental learning and adaptation of support vector machine (SVM) classifiers that allows one to learn and unlearn individual or multiple examples, adapt the current SVM to changes in regularization and kernel parameters, and evaluate generalization performance through exact leave-one-out error estimation.
Abstract: The objective of machine learning is to identify a model that yields good generalization performance. This involves repeatedly selecting a hypothesis class, searching the hypothesis class by minimizing a given objective function over the model's parameter space, and evaluating the generalization performance of the resulting model. This search can be computationally intensive as training data continuously arrives, or as one needs to tune hyperparameters in the hypothesis class and the objective function. In this paper, we present a framework for exact incremental learning and adaptation of support vector machine (SVM) classifiers. The approach is general and allows one to learn and unlearn individual or multiple examples, adapt the current SVM to changes in regularization and kernel parameters, and evaluate generalization performance through exact leave-one-out error estimation.

271 citations


Journal ArticleDOI
TL;DR: Novel and fundamental improvements of fMRI data analysis are introduced, including a technique termed constrained canonical correlation analysis, which can be viewed as a natural extension and generalization of the popular general linear model method.

221 citations


Proceedings Article
09 Aug 2003
TL;DR: This paper presents an approach to the generalization problem based on a new framework of relational Markov Decision Processes (RMDPs), and proves that a polynomial number of sampled environments suffices to achieve performance close to the performance achievable when optimizing over the entire space.
Abstract: A longstanding goal in planning research is the ability to generalize plans developed for some set of environments to a new but similar environment, with minimal or no replanning. Such generalization can both reduce planning time and allow us to tackle larger domains than the ones tractable for direct planning. In this paper, we present an approach to the generalization problem based on a new framework of relational Markov Decision Processes (RMDPs). An RMDP can model a set of similar environments by representing objects as instances of different classes. In order to generalize plans to multiple environments, we define an approximate value function specified in terms of classes of objects and, in a multiagent setting, by classes of agents. This class-based approximate value function is optimized relative to a sampled subset of environments, and computed using an efficient linear programming method. We prove that a polynomial number of sampled environments suffices to achieve performance close to the performance achievable when optimizing over the entire space. Our experimental results show that our method generalizes plans successfully to new, significantly larger, environments, with minimal loss of performance relative to environment-specific planning. We demonstrate our approach on a real strategic computer war game.

214 citations


Journal ArticleDOI
TL;DR: The theory of gap functions, developed in the literature for variational inequalities, is extended to a general equilibrium problem and descent methods, with exact an inexact line-search rules, are proposed.
Abstract: The theory of gap functions, developed in the literature for variational inequalities, is extended to a general equilibrium problem. Descent methods, with exact an inexact line-search rules, are proposed. It is shown that these methods are a generalization of the gap function algorithms for variational inequalities and optimization problems.

196 citations


Journal ArticleDOI
TL;DR: This model leads to non-classical shocks and enjoys an unexpected stability in spite of the presence of umbilic points, and allows for a description of phenomena often neglected by other models, such as overtaking.
Abstract: We present an -population generalization of the Lighthill–Whitham and Richards traffic flow model. This model is analytically interesting because of several non-standard features. For instance, it leads to non-classical shocks and enjoys an unexpected stability in spite of the presence of umbilic points. Furthermore, while satisfying all the minimal ‘common sense’ requirements, it also allows for a description of phenomena often neglected by other models, such as overtaking.

181 citations


Journal ArticleDOI
TL;DR: A way to generate new finite elements in the absolute nodalcoordinate formulation (ANCF) and use a generalization of displacementfields and degrees of freedom of ordinary finite elements used in structural mechanics to study Hermitian element.
Abstract: We propose a way to generate new finite elements in the absolute nodalcoordinate formulation (ANCF) and use a generalization of displacementfields and degrees of freedom (d.o.f.) of ordinary finite elements usedin structural mechanics. Application of this approach to 16- and12-d.o.f. rectangle plate elements as well as to 9-d.o.f. triangleelement gives, accordingly, 48-, 36- and 27-d.o.f. ANCF plate elements.We perform a thorough study of a 48-d.o.f. Hermitian element. Its shapefunction set is a Cartesian product of sets of one-dimensional shapefunctions for beam elements. Arguments of the shape functions aredecoupled, that is why an explicit calculation of terms of equations ofmotion leads to single integration only. We develop several models ofelastic forces of different complexity with their Jacobian matrices.Convergence and accuracy of the finite element is demonstrated ingeometrically nonlinear static and dynamic test problems, as well as inlinear analysis of natural frequencies.

Journal ArticleDOI
TL;DR: In this paper, a generalization of Picard groups to derived cat- egories of algebras is introduced and they construct braid group actions on these groups for particular classes of algebra.
Abstract: We introduce in this paper a generalization of Picard groups to derived cat- egories of algebras. We study general properties of those groups and we construct braid group actions on these groups for particular classes of algebras.

Journal ArticleDOI
TL;DR: The definition of a fuzzy subgroups with thresholds is given, which is a generalization of Rosenfeld's fuzzy subgroup and Bhakat and Das's fuzzy group and discusses relations between two fuzzy sub groups.

Journal ArticleDOI
TL;DR: The use of this model for the ground-holding problem improves upon prior models by allowing for easy integration into the newly developed ground-delay program procedures based on the Collaborative Decision-Making paradigm.
Abstract: In this paper, we analyze a generalization of a classic network-flow model. The generalization involves the replacement of deterministic demand with stochastic demand. While this generalization destroys the original network structure, we show that the matrix underlying the stochastic model is dual network. Thus, the integer program associated with the stochastic model can be solved efficiently using network-flow or linear-programming techniques. We also develop an application of this model to the ground-holding problem in air-traffic management. The use of this model for the ground-holding problem improves upon prior models by allowing for easy integration into the newly developed ground-delay program procedures based on the Collaborative Decision-Making paradigm.

Journal ArticleDOI
TL;DR: The result of this paper shows that the mixture model can attain the more precise prediction than regular statistical models if Bayesian estimation is applied in statistical inference.

Journal ArticleDOI
01 Jul 2003-Analysis
TL;DR: In this article, a probabilistic measure of coherence based on a modification of Kemeny and Oppenheim's (1952) measure of factual support is proposed.
Abstract: Desideratum (1) captures the qualitative features that a probabilistic generalization of logical coherence should satisfy — it requires C to respect the extreme deductive cases, and to be properly sensitive to probabilistic dependence (a general notion of probabilistic dependence will be defined precisely, and in a slightly non-standard way, below). I propose a probabilistic measure of coherence C based on a slight modification of Kemeny and Oppenheim’s (1952) measure of factual support F. The formulation of C is somewhat intricate. We begin with some preliminary definitions. First, we define the two-place function F(X,Y). F(X,Y) may be interpreted as the degree to which one proposition Y supports another proposition X (relative to a finitely additive, regular, Kolmogorov (1956) probability function Pr).

Journal ArticleDOI
TL;DR: In this article, a class of functions unifying all singular limits for the emission of a given number of soft or collinear gluons in tree-level gauge-theory amplitudes was derived.
Abstract: I derive a class of functions unifying all singular limits for the emission of a given number of soft or collinear gluons in tree-level gauge-theory amplitudes. Each function is a generalization of the single-emission antenna function of my earlier paper. The helicity-summed squares of these functions are thus also generalizations to multiple singular emission of the Catani-Seymour dipole factorization function.

Journal ArticleDOI
TL;DR: This paper investigates odor recognition in this new perspective by using a novel learning scheme known as support vector machines (SVM) which guarantees high generalization ability on the test set and illustrates the basics of the theory of SVM.
Abstract: Pattern recognition techniques have widely been used in the context of odor recognition. The recognition of mixtures and simple odors as separate clusters is an untractable problem with some of the classical supervised methods. Recently, a new paradigm has been introduced in which the detection problem can be seen as a learning from examples problem. In this paper, we investigate odor recognition in this new perspective and in particular by using a novel learning scheme known as support vector machines (SVM) which guarantees high generalization ability on the test set. We illustrate the basics of the theory of SVM and show its performance in comparison with radial basis network and the error backpropagation training method. The leave-one-out procedure has been used for all classifiers, in order to finding the near-optimal SVM parameter and both to reduce the generalization error and to avoid outliers.

Journal ArticleDOI
TL;DR: Three forms of inductive generalization are insufficient concerning case-to-case generalization, which is a form of analogical generalization and needs to be reinforced by setting up explicit analogical argumentation.
Abstract: Three forms of inductive generalization - statistical generalization, variation-based generalization and theory-carried generalization - are insufficient concerning case-to-case generalization, which is a form of analogical generalization. The quality of case-to-case generalization needs to be reinforced by setting up explicit analogical argumentation. To evaluate analogical argumentation six criteria are discussed. Good analogical reasoning is an indispensable support to forms of communicative generalization - receptive and responsive (participative) generalization — as well as exemplary generalization.

01 Jan 2003
TL;DR: In this paper, the authors prove an upper semicontinuity result for perturbations of cocycle attractors and study the relationship between non-autonomous and global attractors.
Abstract: In this paper we prove an upper semicontinuity result for perturbations of cocycle attractors. In particular, we study relationship between non-autonomous and global attractors. In this sense, we show that the concept of a cocycle attractor is a sensible generalization to non-autonomous and random dynamical systems of that of the global attractor.

Journal ArticleDOI
TL;DR: In this paper, a generalization of the yield index proposed by Boyles was proposed for processes with multiple characteristics, and a control chart based on the proposed generalization was developed to display all the characteristic measures in one single chart.
Abstract: Process capability indices, such as , , and , have been widely used in the manufacturing industry providing numerical measures on process precision, process accuracy, and process performance. Capability measures for processes with a single characteristic have been investigated extensively. However, capability measures for processes with multiple characteristics are comparatively neglected. In this paper, we consider a generalization of the yield index proposed by Boyles, for processes with multiple characteristics. We establish a relationship between the generalization and the process yield. We also develop a control chart based on the proposed generalization, which displays all the characteristic measures in one single chart. Using the chart, the engineers can effectively monitor and control the performance of all process characteristics simultaneously. Copyright © 2003 John Wiley & Sons, Ltd.

Book ChapterDOI
01 Jan 2003
TL;DR: This chapter surveys an efficient algorithm that is based on an additive operator splitting (AOS) that is suitable for geometric and geodesic active contour models as well as for mean curvature motion and proves that the scheme satisfies a discrete maximumminimum principle which implies unconditional stability if no balloon forces are present.
Abstract: Implicit active contour models belong to the most popular level set methods in computer vision. Typical implementations, however, suffer from poor efficiency. In this chapter we survey an efficient algorithm that is based on an additive operator splitting (AOS). It is suitable for geometric and geodesic active contour models as well as for mean curvature motion. It uses harmonic averaging and does not require to compute the distance function in each iteration step. We prove that the scheme satisfies a discrete maximumminimum principle which implies unconditional stability if no balloon forces are present. Moreover, it possesses all typical advantages of AOS schemes: simple implementation, equal treatment of all axes, suitability for parallel computing, and straightforward generalization to higher dimensions. Experiments show that one can gain a speed up by one order of magnitude compared to the widely used explicit time discretization.

Book ChapterDOI
01 Mar 2003
TL;DR: The results indicate that any universal learning machine, which transforms data into the Euclidean space and then applies linear (or large margin) classification, cannot enjoy any meaningful generalization guarantees that are based on either VC dimension or margins considerations.
Abstract: The notion of embedding a class of dichotomies in a class of linear half spaces is central to the support vector machines paradigm. We examine the question of determining the minimal Euclidean dimension and the maximal margin that can be obtained when the embedded class has a finite VC dimension. We show that an overwhelming majority of the family of finite concept classes of any constant VC dimension cannot be embedded in low-dimensional half spaces. (In fact, we show that the Euclidean dimension must be almost as high as the size of the instance space.) We strengthen this result even further by showing that an overwhelming majority of the family of finite concept classes of any constant VC dimension cannot be embedded in half spaces (of arbitrarily high Euclidean dimension) with a large margin. (In fact, the margin cannot be substantially larger than the margin achieved by the trivial embedding.) Furthermore, these bounds are robust in the sense that allowing each image half space to err on a small fraction of the instances does not imply a significant weakening of these dimension and margin bounds. Our results indicate that any universal learning machine, which transforms data into the Euclidean space and then applies linear (or large margin) classification, cannot enjoy any meaningful generalization guarantees that are based on either VC dimension or margins considerations. This failure of generalization bounds applies even to classes for which "straight forward" empirical risk minimization does enjoy meaningful generalization guarantees.

Journal ArticleDOI
01 May 2003-Metrika
TL;DR: In this paper, an alternative randomized response technique is presented which improves upon the pioneering work of Warner (1965), the procedure includes Warner's method as a special case for a specific choice of the parameters.
Abstract: To eliminate a major source of bias in surveys of human populations resulting from respondents refusal to cooparate in cases where a question of sensitive nature is involved, the idea of “randomized response” was introduced by Warner (1965). In this paper, an alternative randomized response technique is presented which improves upon the pioneering work of Warner (1965). The procedure includes Warner's method as a special case for a specific choice of the parameters. In addition, a generalization of the proposed method is presented.

Journal ArticleDOI
TL;DR: In this article, the authors investigated whether a functional relationship exists between self-monitoring with self-recruited reinforcement and an increase in both on-task behavior and as-signment completion.
Abstract: This study investigates whether a functional relationship exists between self- monitoring with self-recruited reinforcement and an increase in both on-task behavior and as- signment completion. The study further assesses whether self-monitoring with self-recruited reinforcement is associated with generalization of performance gains to untrained settings. Tr aining in self-management procedures included systematic instruction of behavior and gen- eral case programming to promote generalization of skills. An ABCAC design was used to assess the effects of self-management procedures in the training setting, and a multiple-baseline- across-settings design was used to assess generalization effects. The results demonstrated that a functional relationship existed between self-monitoring with self-recruited reinforcement and an increase in on-task behavior and assignment completion. Generalization of self- management skills to novel school contexts varied. The role of self-management procedures in promoting generalization is discussed.

Journal ArticleDOI
01 Jun 2003
TL;DR: The paper considers the application of Quantum Computing to solve the problem of effective SVM training, especially in the case of digital implementations, and compares the behavioral aspects of conventional and enhanced SVMs.
Abstract: Refined concepts, such as Rademacher estimates of model complexity and nonlinear criteria for weighting empirical classification errors, represent recent and promising approaches to characterize the generalization ability of Support Vector Machines (SVMs). The advantages of those techniques lie in both improving the SVM representation ability and yielding tighter generalization bounds. On the other hand, they often make Quadratic-Programming algorithms no longer applicable, and SVM training cannot benefit from efficient, specialized optimization techniques. The paper considers the application of Quantum Computing to solve the problem of effective SVM training, especially in the case of digital implementations. The presented research compares the behavioral aspects of conventional and enhanced SVMs; experiments in both a synthetic and real-world problems support the theoretical analysis. At the same time, the related differences between Quadratic-Programming and Quantum-based optimization techniques are considered.

Journal ArticleDOI
TL;DR: Knuth's generalization of Dijkstra's algorithm for the shortest-path problem offers a general method to solve this problem and is modular in the sense that Knuth's algorithm is formulated independently from the weighted deduction system.
Abstract: We discuss weighted deductive parsing and consider the problem of finding the derivation with the lowest weight. We show that Knuth's generalization of Dijkstra's algorithm for the shortest-path problem offers a general method to solve this problem. Our approach is modular in the sense that Knuth's algorithm is formulated independently from the weighted deduction system.

Journal ArticleDOI
TL;DR: The behavior of the maximum likelihood estimator (MLE), in the case that the true parameter cannot be identified uniquely, is discussed, and a larger order is proved if the true function is given by a smaller model.
Abstract: This paper discusses the behavior of the maximum likelihood estimator (MLE), in the case that the true parameter cannot be identified uniquely. Among many statistical models with unidentifiability, neural network models are the main concern of this paper. It is known in some models with unidentifiability that the asymptotics of the likelihood ratio of the MLE has an unusually larger order. Using the framework of locally conic models put forth by Dacunha-Castelle and Gassiat as a generalization of Hartigan's idea, a useful sufficient condition of such larger orders is derived. This result is applied to neural network models, and a larger order is proved if the true function is given by a smaller model. Also, under the condition that the model has at least two redundant hidden units, a log n lower bound for the likelihood ratio is derived.

Book ChapterDOI
01 Jan 2003
TL;DR: In this article, it was shown that for sufficiently large N, every subset of of size [N 2] of size at least δN 2 contains three points of the form (a,b), (a + d, b, b + d).
Abstract: We give a simple proof that for sufficiently large N, every subset of of size[N 2]of size at least δN 2 contains three points of the form (a,b), (a + d, b), (a, b + d).

Journal ArticleDOI
TL;DR: The generalization of covering problems such as the set cover problem to partial covering problems, where one only wants to cover a given number k of elements rather than all elements, is studied.