scispace - formally typeset
Search or ask a question

Showing papers on "Generalization published in 2004"


Proceedings Article
01 Dec 2004
TL;DR: This paper presents a new algorithm for automatically learning hypernym (is-a) relations from text, using "dependency path" features extracted from parse trees and introduces a general-purpose formalization and generalization of these patterns.
Abstract: Semantic taxonomies such as WordNet provide a rich source of knowledge for natural language processing applications, but are expensive to build, maintain, and extend. Motivated by the problem of automatically constructing and extending such taxonomies, in this paper we present a new algorithm for automatically learning hypernym (is-a) relations from text. Our method generalizes earlier work that had relied on using small numbers of hand-crafted regular expression patterns to identify hypernym pairs. Using "dependency path" features extracted from parse trees, we introduce a general-purpose formalization and generalization of these patterns. Given a training set of text containing known hypernym pairs, our algorithm automatically extracts useful dependency paths and applies them to new corpora to identify novel pairs. On our evaluation task (determining whether two nouns in a news article participate in a hypernym relationship), our automatically extracted database of hypernyms attains both higher precision and higher recall than WordNet.

789 citations


Journal ArticleDOI
TL;DR: This paper proves tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic M/sub n/ associated with the on-line performance of the ensemble, and obtains risk tail bounds for kernel perceptron algorithms interms of the spectrum of the empirical kernel matrix.
Abstract: In this paper, it is shown how to extract a hypothesis with small risk from the ensemble of hypotheses generated by an arbitrary on-line learning algorithm run on an independent and identically distributed (i.i.d.) sample of data. Using a simple large deviation argument, we prove tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic M/sub n/ associated with the on-line performance of the ensemble. Via sharp pointwise bounds on M/sub n/, we then obtain risk tail bounds for kernel perceptron algorithms in terms of the spectrum of the empirical kernel matrix. These bounds reveal that the linear hypotheses found via our approach achieve optimal tradeoffs between hinge loss and margin size over the class of all linear functions, an issue that was left open by previous results. A distinctive feature of our approach is that the key tools for our analysis come from the model of prediction of individual sequences; i.e., a model making no probabilistic assumptions on the source generating the data. In fact, these tools turn out to be so powerful that we only need very elementary statistical facts to obtain our final risk bounds.

580 citations


Journal ArticleDOI
TL;DR: In this paper, the Generalized Interpolation Material Point (GIMP) method is generalized using a variational form and a Petrov-Galerkin discretization scheme, resulting in a family of methods named the GIMP methods.
Abstract: The Material Point Method (MPM) discrete solution procedure for computational solid mechanics is generalized using a variational form and a Petrov- Galerkin discretization scheme, resulting in a family of methods named the Generalized Interpolation Material Point (GIMP) methods. The generalization permits iden- tification with aspects of other point or node based dis- crete solution techniques which do not use a body-fixed grid, i.e. the "meshless methods". Similarities are noted and some practical advantages relative to some of these methods are identified. Examples are used to demon- strate and explain numerical artifact noise which can be expected in MPM calculations. This noise results in non- physical local variations at the material points, where constitutive response is evaluated. It is shown to destroy the explicit solution in one case, and seriously degrade it in another. History dependent, inelastic constitutive laws can be expected to evolve erroneously and report inac- curate stress states because of noisy input. The noise is due to the lack of smoothness of the interpolation func- tions, and occurs due to material points crossing compu- tational grid boundaries. The next degree of smoothness available in the GIMP methods is shown to be capable of eliminating cell crossing noise. keyword: MPM, PIC, meshless methods, Petrov- Galerkin discretization.

550 citations


01 Dec 2004
TL;DR: In this article, a novel technique for detecting salient regions in an image is described, which is a generalization to affine invariance of the method introduced by Kadir and Brady.
Abstract: In this paper we describe a novel technique for detecting salient regions in an image. The detector is a generalization to affine invariance of the method introduced by Kadir and Brady [10]. The detector deems a region salient if it exhibits unpredictability in both its attributes and its spatial scale.

501 citations


Journal ArticleDOI
TL;DR: This work examines a number of optimization criteria, and extends their applicability by using the generalized singular value decomposition to circumvent the nonsingularity requirement.
Abstract: Discriminant analysis has been used for decades to extract features that preserve class separability. It is commonly defined as an optimization problem involving covariance matrices that represent the scatter within and between clusters. The requirement that one of these matrices be nonsingular limits its application to data sets with certain relative dimensions. We examine a number of optimization criteria, and extend their applicability by using the generalized singular value decomposition to circumvent the nonsingularity requirement. The result is a generalization of discriminant analysis that can be applied even when the sample size is smaller than the dimension of the sample data. We use classification results from the reduced representation to compare the effectiveness of this approach with some alternatives, and conclude with a discussion of their relative merits.

358 citations


Proceedings ArticleDOI
01 Nov 2004
TL;DR: This paper investigates data mining as a technique for masking data, therefore, termed data mining based privacy protection, and adapts an iterative bottom-up generalization from data mining to generalize the data.
Abstract: The well-known privacy-preserved data mining modifies existing data mining techniques to randomized data. In this paper, we investigate data mining as a technique for masking data, therefore, termed data mining based privacy protection. This approach incorporates partially the requirement of a targeted data mining task into the process of masking data so that essential structure is preserved in the masked data. The idea is simple but novel: we explore the data generalization concept from data mining as a way to hide detailed information, rather than discover trends and patterns. Once the data is masked, standard data mining techniques can be applied without modification. Our work demonstrated another positive use of data mining technology: not only can it discover useful patterns, but also mask private information. We consider the following privacy problem: a data holder wants to release a version of data for building classification models, but wants to protect against linking the released data to an external source for inferring sensitive information. We adapt an iterative bottom-up generalization from data mining to generalize the data. The generalized data remains useful to classification but becomes difficult to link to other sources. The generalization space is specified by a hierarchical structure of generalizations. A key is identifying the best generalization to climb up the hierarchy at each iteration. Enumerating all candidate generalizations is impractical. We present a scalable solution that examines at most one generalization in each iteration for each attribute involved in the linking.

330 citations


Proceedings ArticleDOI
04 Jul 2004
TL;DR: In this article, the authors proposed a unified approach that systematically integrates all available training information such as past user-item ratings as well as attributes of items or users to learn a prediction function.
Abstract: Collaborative and content-based filtering are two paradigms that have been applied in the context of recommender systems and user preference prediction. This paper proposes a novel, unified approach that systematically integrates all available training information such as past user-item ratings as well as attributes of items or users to learn a prediction function. The key ingredient of our method is the design of a suitable kernel or similarity function between user-item pairs that allows simultaneous generalization across the user and item dimensions. We propose an on-line algorithm (JRank) that generalizes perceptron learning. Experimental results on the EachMovie data set show significant improvements over standard approaches.

313 citations


Journal ArticleDOI
TL;DR: About eighty MATLAB functions from plot and sum to svd and cond have been overloaded so that one can work with "chebfun" objects using almost exactly the usual MATLAB syntax.
Abstract: An object-oriented MATLAB system is described for performing numerical linear algebra on continuous functions and operators rather than the usual discrete vectors and matrices. About eighty MATLAB functions from plot and sum to svd and cond have been overloaded so that one can work with our "chebfun" objects using almost exactly the usual MATLAB syntax. All functions live on [-1,1] and are represented by values at sufficiently many Chebyshev points for the polynomial interpolant to be accurate to close to machine precision. Each of our overloaded operations raises questions about the proper generalization of familiar notions to the continuous context and about appropriate methods of interpolation, differentiation, integration, zerofinding, or transforms. Applications in approximation theory and numerical analysis are explored, and possible extensions for more substantial problems of scientific computing are mentioned.

250 citations


Journal ArticleDOI
TL;DR: This work starts from Wilson's generalization hypothesis, which states that XCS has an intrinsic tendency to evolve accurate, maximally general classifiers, and derives a simple equation that supports the hypothesis theoretically.
Abstract: Takes initial steps toward a theory of generalization and learning in the learning classifier system XCS. We start from Wilson's generalization hypothesis, which states that XCS has an intrinsic tendency to evolve accurate, maximally general classifiers. We analyze the different evolutionary pressures in XCS and derive a simple equation that supports the hypothesis theoretically. The equation is tested with a number of experiments that confirm the model of generalization pressure that we provide. Then, we focus on the conditions, termed "challenges," that must be satisfied for the existence of effective fitness or accuracy pressure in XCS. We derive two equations that suggest how to set the population size and the covering probability so as to ensure the development of fitness pressure. We argue that when the challenges are met, XCS is able to evolve problem solutions reliably. When the challenges are not met, a problem may provide intrinsic fitness guidance or the reward may be biased in such a way that the problem will still be solved. The equations and the influence of intrinsic fitness guidance and biased reward are tested on large Boolean multiplexer problems. The paper is a contribution to understanding how XCS functions and lays the foundation for research on XCS's learning complexity.

235 citations


Book ChapterDOI
01 Jan 2004
TL;DR: The theory of Anosov systems is a result of the generalization of certain properties, which hold on geodesic flows on manifolds of negative curvature, which are connected with the asymptotical behavior of variational equations along the trajectories of anosov system as discussed by the authors.
Abstract: The theory of Anosov systems is a result of the generalization of certain properties, which hold on geodesic flows on manifolds of negative curvature. It turned out that these properties alone are sufficient to ensure ergodicity, mixing, and, moreover, existence of K-partitions. All above-mentioned properties are connected with the asymptotical behavior of variational equations along the trajectories of Anosov systems. Therefore, it would be appropriate to propose that other asymptotical properties of geodesic flows on manifolds of negative curvature hold for the class of Anosov systems, too. However, it would be more rational to consider not all of the Anosov flows, but the class L of Anosov flows that preserve some integral invariant and have no continuous eigenfunctions.

234 citations


Posted Content
29 Apr 2004
TL;DR: In this article, a geometric interpretation of the Neutrosophic set is given using a Neutroophic Cube, and distinctions between NS and IFS are underlined.
Abstract: In this paper we generalize the intuitionistic fuzzy set (IFS), paraconsistent set, and intuitionistic set to the neutrosophic set (NS). Several examples are presented. Also, a geometric interpretation of the Neutrosophic Set is given using a Neutrosophic Cube. Many distinctions between NS and IFS are underlined.

Journal ArticleDOI
TL;DR: In this article, a generalization of multiscale finite element methods (Ms-FEM) to nonlinear problems is proposed and the convergence of the proposed method for nonlinear elliptic equations is studied.
Abstract: In this paper we propose a generalization of multiscale finite element methods (Ms-FEM) to nonlinear problems. We study the convergence of the proposed method for nonlinear elliptic equations and propose an oversampling technique. Numerical examples demonstrate that the over-sampling technique greatly reduces the error. The application of MsFEM to porous media flows is considered. Finally, we describe further generalizations of MsFEM to nonlinear time-dependent equations and discuss the convergence of the method for various kinds of heterogeneities.

Journal ArticleDOI
V. Singh1
TL;DR: A novel linear matrix inequality (LMI)-based criterion for the global asymptotic stability and uniqueness of the equilibrium point of a class of delayed cellular neural networks (CNNs) is presented and turns out to be a generalization and improvement over some previous criteria.
Abstract: A novel linear matrix inequality (LMI)-based criterion for the global asymptotic stability and uniqueness of the equilibrium point of a class of delayed cellular neural networks (CNNs) is presented. The criterion turns out to be a generalization and improvement over some previous criteria.

Journal ArticleDOI
TL;DR: The surface body is a generalization of the floating body and its relation to the p -affine surface area is studied in this article, where it is shown that the surface body can be decomposed into two parts.

Proceedings ArticleDOI
21 Jul 2004
TL;DR: A new IE method is presented that employs Relational Markov Networks (a generalization of CRFs), which can represent arbitrary dependencies between extractions, which allows for "collective information extraction" that exploits the mutual influence between possible extractions.
Abstract: Most information extraction (IE) systems treat separate potential extractions as independent. However, in many cases, considering influences between different potential extractions could improve overall accuracy. Statistical methods based on undirected graphical models, such as conditional random fields (CRFs), have been shown to be an effective approach to learning accurate IE systems. We present a new IE method that employs Relational Markov Networks (a generalization of CRFs), which can represent arbitrary dependencies between extractions. This allows for "collective information extraction" that exploits the mutual influence between possible extractions. Experiments on learning to extract protein names from biomedical text demonstrate the advantages of this approach.

Journal ArticleDOI
TL;DR: The XOR problem, the detection of symmetry problem, and the fading equalization problem can be successfully solved by the two-layered complex-valued neural network with the highest generalization ability, which reveals a potent computational power of complex- valued neural nets.
Abstract: This letter presents some results of an analysis on the decision boundaries of complex-valued neural networks whose weights, threshold values, input and output signals are all complex numbers. The main results may be summarized as follows. (1) A decision boundary of a single complex-valued neuron consists of two hypersurfaces that intersect orthogonally, and divides a decision region into four equal sections. The XOR problem and the detection of symmetry problem that cannot be solved with two-layered real-valued neural networks, can be solved by two-layered complex-valued neural networks with the orthogonal decision boundaries, which reveals a potent computational power of complex-valued neural nets. Furthermore, the fading equalization problem can be successfully solved by the two-layered complex-valued neural network with the highest generalization ability. (2) A decision boundary of a three-layered complex-valued neural network has the orthogonal property as a basic structure, and its two hypersurfaces approach orthogonality as all the net inputs to each hidden neuron grow. In particular, most of the decision boundaries in the three-layered complex-valued neural network inetersect orthogonally when the network is trained using Complex-BP algorithm. As a result, the orthogonality of the decision boundaries improves its generalization ability. (3) The average of the learning speed of the Complex-BP is several times faster than that of the Real-BP. The standard deviation of the learning speed of the Complex-BP is smaller than that of the Real-BP. It seems that the complex-valued neural network and the related algorithm are natural for learning complex-valued patterns for the above reasons.

Journal ArticleDOI
TL;DR: Novel bounds on the stability of combinations of any classifiers are derived that can be used to formally show that, for example, bagging increases the Stability of unstable learning machines.
Abstract: We study the leave-one-out and generalization errors of voting combinations of learning machines. A special case considered is a variant of bagging. We analyze in detail combinations of kernel machines, such as support vector machines, and present theoretical estimates of their leave-one-out error. We also derive novel bounds on the stability of combinations of any classifiers. These bounds can be used to formally show that, for example, bagging increases the stability of unstable learning machines. We report experiments supporting the theoretical findings.

Journal ArticleDOI
TL;DR: Good agreement is found with the values that had previously been predicted by a theoretical argument based on a the asymptotic efficiency of a simplified model of SV regression of Support Vector regression.

Book ChapterDOI
01 Jan 2004
TL;DR: This chapter discusses the statistical theory underlying various parameter-estimation methods, and gives algorithms which depend on alternatives to maximum-likelihood estimation, and describes parameter estimation algorithms which are motivated by these generalization bounds.
Abstract: A fundamental problem in statistical parsing is the choice of criteria and algo-algorithms used to estimate the parameters in a model. The predominant approach in computational linguistics has been to use a parametric model with some variant of maximum-likelihood estimation. The assumptions under which maximum-likelihood estimation is justified are arguably quite strong. This chapter discusses the statistical theory underlying various parameter-estimation methods, and gives algorithms which depend on alternatives to (smoothed) maximum-likelihood estimation. We first give an overview of results from statistical learning theory. We then show how important concepts from the classification literature - specifically, generalization results based on margins on training data - can be derived for parsing models. Finally, we describe parameter estimation algorithms which are motivated by these generalization bounds.

Journal ArticleDOI
TL;DR: A probabilistic active learning strategy for support vector machine (SVM) design in large data applications that queries for a set of points according to a distribution as determined by the current separating hyperplane and a newly defined concept of an adaptive confidence factor.
Abstract: The paper describes a probabilistic active learning strategy for support vector machine (SVM) design in large data applications. The learning strategy is motivated by the statistical query model. While most existing methods of active SVM learning query for points based on their proximity to the current separating hyperplane, the proposed method queries for a set of points according to a distribution as determined by the current separating hyperplane and a newly defined concept of an adaptive confidence factor. This enables the algorithm to have more robust and efficient learning capabilities. The confidence factor is estimated from local information using the k nearest neighbor principle. The effectiveness of the method is demonstrated on real-life data sets both in terms of generalization performance, query complexity, and training time.

Journal ArticleDOI
TL;DR: This paper introduces a generalization of cover-free families which includes as special cases all of the previously used definitions, and gives several bounds and some efficient constructions for these generalized cover- free families.

Journal ArticleDOI
01 Dec 2004
TL;DR: The concept of positive causality is introduced and demonstrated its utility in axiomatic correctness of the RNOR and concepts describing the ways in which dependent causes can work together as being either "synergistic" or "interfering" are provided.
Abstract: This paper focuses on approaches that address the intractability of knowledge acquisition of conditional probability tables in causal or Bayesian belief networks. We state a rule that we term the "recursive noisy OR" (RNOR) which allows combinations of dependent causes to be entered and later used for estimating the probability of an effect. In the development of this paper, we investigate the axiomatic correctness and semantic meaning of this rule and show that the recursive noisy OR is a generalization of the well-known noisy OR. We introduce the concept of positive causality and demonstrate its utility in axiomatic correctness of the RNOR. We also introduce concepts describing the ways in which dependent causes can work together as being either "synergistic" or "interfering." We provide a formalization to quantify these concepts and show that they are preserved by the RNOR. Finally, we present a method for the determination of Conditional Probability Tables from this causal theory.

Proceedings ArticleDOI
04 Jul 2004
TL;DR: This work reformulates a generalization of the multiple-instance learning model using a kernel for a support vector machine, reducing its time complexity from exponential to polynomial, and gives a fullyPolynomial randomized approximation scheme (FPRAS) for it.
Abstract: The multiple-instance learning (MIL) model has been very successful in application areas such as drug discovery and content-based image-retrieval. Recently, a generalization of this model and an algorithm for this generalization were introduced, showing significant advantages over the conventional MIL model in certain application areas. Unfortunately, this algorithm is inherently inefficient, preventing scaling to high dimensions. We reformulate this algorithm using a kernel for a support vector machine, reducing its time complexity from exponential to polynomial. Computing the kernel is equivalent to counting the number of axis-parallel boxes in a discrete, bounded space that contain at least one point from each of two multisets P and Q. We show that this problem is #P-complete, but then give a fully polynomial randomized approximation scheme (FPRAS) for it. Finally, we empirically evaluate our kernel.

Journal ArticleDOI
TL;DR: An extension and analysis of the original Shu-Osher representation is given, by means of which questions can be settled regarding properties which are referred to, in the literature, by the terms monotonicity and strong-stability-preserving (SSP).
Abstract: In the context of solving nonlinear partial differential equations, Shu and Osher introduced representations of explicit Runge-Kutta methods, which lead to stepsize conditions under which the numerical process is total-variation-diminishing (TVD). Much attention has been paid to these representations in the literature. In general, a Shu-Osher representation of a given Runge-Kutta method is not unique. Therefore, of special importance are representations of a given method which are best possible with regard to the stepsize condition that can be derived from them. Several basic questions are still open, notably regarding the following issues: (1) the formulation of a simple and general strategy for finding a best possible Shu-Osher representation for any given Runge-Kutta method; (2) the question of whether the TVD property of a given Runge-Kutta method can still be guaranteed when the stepsize condition, corresponding to a best possible Shu-Osher representation of the method, is violated; (3) the generalization of the Shu-Osher approach to general (possibly implicit) Runge-Kutta methods. In this paper we give an extension and analysis of the original Shu-Osher representation, by means of which the above questions can be settled. Moreover, we clarify analogous questions regarding properties which are referred to, in the literature, by the terms monotonicity and strong-stability-preserving (SSP).

Journal ArticleDOI
TL;DR: A generalization of dimensional analysis and its corollary, the Pi-theorem, to the class of problems in which some of the quantities that define the problem have fixed values in all the cases that are of interest.
Abstract: This article introduces a generalization of dimensional analysis and its corollary, the Π-theorem, to the class of problems in which some of the quantities that define the problem have fixed values in all the cases that are of interest. The procedure can reduce the number of dimensionless similarity variables beyond the prediction of Buckingham's theorem. The generalized Π-theorem tells when and how large a reduction is attainable.

Journal ArticleDOI
01 Sep 2004-K-theory
TL;DR: Support varieties for finite dimensional algebra over a field were introduced in this paper using graded subalgebras of the Hochschild cohomol- ogy, and many of the standard results from the theory of support variety for finite groups generalize to this situation.
Abstract: Support varieties for any finite dimensional algebra over a field were introduced in (20) using graded subalgebras of the Hochschild cohomol- ogy. We mainly study these varieties for selfinjective algebras under appropri- ate finite generation hypotheses. Then many of the standard results from the theory of support varieties for finite groups generalize to this situation. In par- ticular, the complexity of the module equals the dimension of its corresponding variety, all closed homogeneous varieties occur as the variety of some module, the variety of an indecomposable module is connected, periodic modules are lines and for symmetric algebras a generalization of Webb's theorem is true.

Journal ArticleDOI
TL;DR: This paper studies a simple learning algorithm for binary classification that predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error, and shows that the prediction is much more stable than the prediction of an algorithm that predicting with the best hypothesis.
Abstract: We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, that is, the hypothesis that minimizes the training error, our algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction of this algorithm is much more stable than the prediction of an algorithm that predicts with the best hypothesis. By allowing the algorithm to abstain from predicting on some examples, we show that the predictions it makes when it does not abstain are very reliable. Finally, we show that the probability that the algorithm abstains is comparable to the generalization error of the best hypothesis in the class.

Journal ArticleDOI
TL;DR: The authors reviewed the book "Validity Generalization: A Critical Review, edited by Kevin R. Murphy" and concluded that the book is "a critical review of the literature on generalization".
Abstract: The article reviews the book “Validity Generalization: A Critical Review,” edited by Kevin R. Murphy.

01 Jan 2004
TL;DR: This is meant to be a self-contained presentation of adaptive classification seen from the PAC-Bayesian point of view, where the main improvements brought here are more localized bounds and the use of exchangeable prior distributions.
Abstract: This is meant to be a self-contained presentation of adaptive classification seen from the PAC-Bayesian point of view. Although most of the results are original, some review materials about the VC dimension and support vector machines are also included. This study falls in the field of statistical learning theory, where complex data have to be analyzed from a limited amount of informations, drawn from a finite sample. It relies on non asymptotic deviation inequalities, where the complexity of models is captured through the use of prior measures. The main improvements brought here are more localized bounds and the use of exchangeable prior distributions. Interesting consequences are drawn for the generalization properties of support vector machines and the design of new classification algorithms. 2000 Mathematics Subject Classification: 62H30, 68T05, 62B10.

Posted Content
TL;DR: In this paper, a new asymptotic method for the analysis of matrix Riemann-Hilbert problems is proposed, which is a generalization of the steepest descent method first proposed by Deift and Zhou.
Abstract: We develop a new asymptotic method for the analysis of matrix Riemann-Hilbert problems. Our method is a generalization of the steepest descent method first proposed by Deift and Zhou; however our method systematically handles jump matrices that need not be analytic. The essential technique is to introduce nonanalytic extensions of certain functions appearing in the jump matrix, and to therefore convert the Riemann-Hilbert problem into a dbar problem. We use our method to study several asymptotic problems of polynomials orthogonal with respect to a measure given on the unit circle, obtaining new detailed uniform convergence results, and for some classes of nonanalytic weights, complete information about the asymptotic behavior of the individual zeros.