Showing papers in &quot;Statistics and Computing in 1992&quot;

Learning classification trees

TL;DR: In this paper, a new approximation for the coefficients required to calculate the Shapiro-Wilk W-test is derived, which is easy to calculate and applies for any sample size greater than 3.

...read moreread less

Abstract: A new approximation for the coefficients required to calculate the Shapiro-WilkW-test is derived. It is easy to calculate and applies for any sample size greater than 3. A normalizing transformation for theW statistic is given, enabling itsP-value to be computed simply. The distribution of the new approximation toW agrees well with published critical points which use exact coefficients.

...read moreread less

573 citations

Journal Article•DOI•

[...]

Wray Buntine¹•Institutions (1)

Ames Research Center¹

Applications of a general propagation algorithm for probabilistic expert systems

TL;DR: This paper introduces Bayesian techniques for splitting, smoothing, and tree averaging, which are similar to Quinlan's information gain, while smoothing and averaging replace pruning.

...read moreread less

Abstract: Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to Quinlan's information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach,c4 (Quinlanet al., 1987) andcart (Breimanet al., 1984), show that the full Bayesian algorithm can produce more accurate predictions than versions of these other approaches, though pays a computational price.

...read moreread less

418 citations

Journal Article•DOI•

[...]

A. P. Dawid¹•Institutions (1)

University College London¹

Principal curves revisited

TL;DR: This paper analyses a ‘flow-propagation’ algorithm for calculating marginal and conditional distributions in a probabilistic expert system in detail, and shows how it can be modified to perform other tasks, including maximization of the joint density and simultaneous 'fast retraction' of evidence entered on several variables.

...read moreread less

Abstract: A probabilistic expert system provides a graphical representation of a joint probability distribution which can be used to simplify and localize calculations. Jensenet al. (1990) introduced a ‘flow-propagation’ algorithm for calculating marginal and conditional distributions in such a system. This paper analyses that algorithm in detail, and shows how it can be modified to perform other tasks, including maximization of the joint density and simultaneous ‘fast retraction’ of evidence entered on several variables.

...read moreread less

269 citations

Journal Article•DOI•

[...]

Robert Tibshirani¹•Institutions (1)

University of Toronto¹

Optimal Decomposition of Probabilistic Networks by Simulated Annealing

TL;DR: In this paper, an alternative definition of a principal curve, based on a mixture model, is presented. But this definition is restricted to the case where the principal curve is a smooth curve passing through the "middle" of a distribution or data cloud.

...read moreread less

Abstract: A principal curve (Hastie and Stuetzle, 1989) is a smooth curve passing through the ‘middle’ of a distribution or data cloud, and is a generalization of linear principal components. We give an alternative definition of a principal curve, based on a mixture model. Estimation is carried out through an EM algorithm. Some comparisons are made to the Hastie-Stuetzle definition.

...read moreread less

247 citations

Journal Article•DOI•

[...]

Uffe Kjærulff¹•Institutions (1)

Aalborg University¹

A statistical semantics for causation

TL;DR: This paper investigates the applicability of a Monte Carlo technique known as ‘simulated annealing’ to achieve optimum or sub-optimum decompositions of probabilistic networks under bounded resources and proves that cost-function changes can be computed locally.

...read moreread less

Abstract: This paper investigates the applicability of a Monte Carlo technique known as ‘simulated annealing’ to achieve optimum or sub-optimum decompositions of probabilistic networks under bounded resources. High-quality decompositions are essential for performing efficient inference in probabilistic networks. Optimum decomposition of probabilistic networks is known to be NP-hard (Wen, 1990). The paper proves that cost-function changes can be computed locally, which is essential to the efficiency of the annealing algorithm. Pragmatic control schedules which reduce the running time of the annealing algorithm are presented and evaluated. Apart from the conventional temperature parameter, these schedules involve the radius of the search space as a new control parameter. The evaluation suggests that the inclusion of this new parameter is important for the success of the annealing algorithm for the present problem.

...read moreread less

102 citations

Journal Article•DOI•

[...]

Judea Pearl¹, Thomas Verma¹•Institutions (1)

University of California, Los Angeles¹

Fast retraction of evidence in a probabilistic expert system

TL;DR: A model-theoretic definition of causation is proposed and it is shown that, contrary to common folklore, genuine causal influences can be distinguished from spurious covariations following standard norms of inductive reasoning.

...read moreread less

Abstract: We propose a model-theoretic definition of causation, and show that, contrary to common folklore, genuine causal influences can be distinguished from spurious covariations following standard norms of inductive reasoning. We also establish a sound characterization of the conditions under which such a distinction is possible. Finally, we provide a proof-theoretical procedure for inductive causation and show that, for a large class of data and structures, effective algorithms exist that uncover the direction of causal influences as defined above.

...read moreread less

47 citations

Journal Article•DOI•

[...]

R. G. Cowell¹, A. P. Dawid¹•Institutions (1)

University College London¹

Discrete regularized discriminant analysis

TL;DR: A propagation algorithm is presented and justified to facilitate the simultaneous calculation, for every node in a probabilistic exper system of the distribution of the associated random quantity, conditional on all the evidence obtained about the remaining nodes.

...read moreread less

Abstract: We present and justify a propagation algorithm to facilitate the simultaneous calculation, for every node in a probabilistic exper system of the distribution of the associated random quantity, conditional on all the evidence obtained about the remaining nodes.

...read moreread less

30 citations

Journal Article•DOI•

[...]

Gilles Celeux¹, Abdallah Mkhadri¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

Boltzmann machines that learn to recognize patterns on control charts

TL;DR: A method of regularized discriminant analysis for discrete data, denoted DRDA, is proposed, which has a median position between multinomial discrimination, the first-order independence model and kernel discrimination.

...read moreread less

Abstract: A method of regularized discriminant analysis for discrete data, denoted DRDA, is proposed. This method is related to the regularized discriminant analysis conceived by Friedman (1989) in a Gaussian framework for continuous data. Here, we are concerned with discrete data and consider the classification problem using the multionomial distribution. DRDA has been conceived in the small-sample, high-dimensional setting. This method has a median position between multinomial discrimination, the first-order independence model and kernel discrimination. DRDA is characterized by two parameters, the values of which are calculated by minimizing a sample-based estimate of future misclassification risk by cross-validation. The first parameter is acomplexity parameter which provides class-conditional probabilities as a convex combination of those derived from the full multinomial model and the first-order independence model. The second parameter is asmoothing parameter associated with the discrete kernel of Aitchison and Aitken (1976). The optimal complexity parameter is calculated first, then, holding this parameter fixed, the optimal smoothing parameter is determined. A modified approach, in which the smoothing parameter is chosen first, is discussed. The efficiency of the method is examined with other classical methods through application to data.

...read moreread less

22 citations

Journal Article•DOI•

[...]

H. Brian Hwarng¹, Norma F. Hubele²•Institutions (2)

National University of Singapore¹, Arizona State University²

Probabilistic text understanding

TL;DR: Simulations indicate that the temperature level of the annealing schedule significantly affects the convergence behavior of the training process and that, to achieve a balanced performance of these BMPRs, a medium to high level of annealed temperatures is recommended.

...read moreread less

Abstract: Boltzmann machines (BM), a type of neural networking algorithm, have been proven to be useful in pattern recognition. Patterns on quality control charts have long been recognized as providing useful information for correcting process performance problems. In computer-integrated manufacturing environments, where the control charts are monitored by computer algorithms, the potential for using pattern-recognition algorithms is considerable. The main purpose of this paper is to formulate a Boltzmann machine pattern recognizer (BMPR) and demonstrate its utility in control chart pattern recognition. It is not the intent of this paper to make comparisons between existing related algorithms. A factorial design of experiments was conducted to study the effects of numerous factors on the convergence behavior and performance of these BMPRs. These factors include the number of hidden nodes used in the network and the annealing schedule. Simulations indicate that the temperature level of the annealing schedule significantly affects the convergence behavior of the training process and that, to achieve a balanced performance of these BMPRs, a medium to high level of annealing temperatures is recommended. Numerical results for cyclical and stratification patterns illustrate that the classification capability of these BMPRs is quite powerful.

...read moreread less

21 citations

Journal Article•DOI•

[...]

Robert P. Goldman¹, Eugene Charniak²•Institutions (2)

Tulane University¹, Brown University²

Analytical approximations for iterated bootstrap confidence intervals

TL;DR: A probabilistic model of text understanding is developed, using probability theory to handle the uncertainty which arises in this abductive inference process, and all aspects of natural language processing are treated in the same framework, allowing to integrate syntactic, semantic and pragmatic constraints.

...read moreread less

Abstract: We discuss a new framework for text understanding. Three major design decisions characterize this approach. First, we take the problem of text understanding to be a particular case of the general problem of abductive inference. Second, we use probability theory to handle the uncertainty which arises in this abductive inference process. Finally, all aspects of natural language processing are treated in the same framework, allowing us to integrate syntactic, semantic and pragmatic constraints. In order to apply probability theory to this problem, we have developed a probabilistic model of text understanding. To make it practical to use this model, we have devised a way of incrementally constructing and evaluating belief networks. We have written a program,wimp3, to experiment with this framework. To evaluate this program, we have developed a simple ‘single-blind’ testing method.

...read moreread less

19 citations

Journal Article•DOI•

[...]

Thomas J. DiCiccio¹, Michael A. Martin¹, G. Alastair Young²•Institutions (2)

Stanford University¹, University of Cambridge²

A segmented algorithm for simulated annealing

TL;DR: In this paper, the authors propose an alternative approach to constructing double bootstrap confidence intervals that involves replacing the inner level of resampling by an analytical approximation, based on saddlepoint methods and a tail probability approximation of DiCiccio and Martin.

...read moreread less

Abstract: Standard algorithms for the construction of iterated bootstrap confidence intervals are computationally very demanding, requiring nested levels of bootstrap resampling. We propose an alternative approach to constructing double bootstrap confidence intervals that involves replacing the inner level of resampling by an analytical approximation. This approximation is based on saddlepoint methods and a tail probability approximation of DiCiccio and Martin (1991). Our technique significantly reduces the computational expense of iterated bootstrap calculations. A formal algorithm for the construction of our approximate iterated bootstrap confidence intervals is presented, and some crucial practical issues arising in its implementation are discussed. Our procedure is illustrated in the case of constructing confidence intervals for ratios of means using both real and simulated data. We repeat an experiment of Schenker (1985) involving the construction of bootstrap confidence intervals for a variance and demonstrate that our technique makes feasible the construction of accurate bootstrap confidence intervals in that context. Finally, we investigate the use of our technique in a more complex setting, that of constructing confidence intervals for a correlation coefficient.

...read moreread less

Journal Article•DOI•

[...]

Anthony C. Atkinson¹•Institutions (1)

London School of Economics and Political Science¹

TL;DR: In this article, the properties of a parameterized form of generalized simulated annealing for function minimization are investigated by studying properties of repeated minimizations from random starting points, which leads to the comparison of distributions of function values and of numbers of function evaluations.

...read moreread less

Abstract: The properties of a parameterized form of generalized simulated annealing for function minimization are investigated by studying the properties of repeated minimizations from random starting points. This leads to the comparison of distributions of function values and of numbers of function evaluations. Parameter values which yield searches repeatedly terminating close to the global minimum may require unacceptably many function evaluations. If computational resources are a constraint, the total number of function evaluations may be limited. A sensible strategy is then to restart at a random point any search which terminates, until the total allowable number of function evaluations has been exhausted. The response is now the minimum of the function values obtained. This strategy yields a surprisingly stable solution for the parameter values of the simulated annealing algorithm. The algorithm can be further improved by segmentation in which each search is limited to a maximum number of evaluations, perhaps no more than a fifth of the total available. The main tool for interpreting the distributions of function values is the boxplot. The application is to the optimum design of experiments.

...read moreread less

Journal Article•DOI•

SPSS for the Macintosh

[...]

Jonathan Vaughan¹•Institutions (1)

Hamilton College¹

A comparison between the simulated annealing and the EM algorithms in normal mixture decompositions

Journal Article•DOI•

[...]

Salvatore Ingrassia¹•Institutions (1)

University of Catania¹

An analysis of two probabilistic model induction techniques

TL;DR: In this article, the authors compare the performances of simulated annealing and EM algorithms in problems of decomposition of normal mixtures according to the likelihood approach, considering a suitable reformulation of the problem which yields an optimization problem having a global solution and at least a smaller number of spurious maxima.

...read moreread less

Abstract: We compare the performances of the simulated annealing and the EM algorithms in problems of decomposition of normal mixtures according to the likelihood approach. In this case the likelihood function has multiple maxima and singularities, and we consider a suitable reformulation of the problem which yields an optimization problem having a global solution and at least a smaller number of spurious maxima. The results are compared considering some distance measures between the estimated distributions and the true ones. No overwhelming superiority of either method has been demonstrated, though in one of our cases simulated annealing achieved better results.

...read moreread less

Journal Article•DOI•

[...]

Stuart L. Crawford, Robert M. Fung

Simulations on the Jelinski-Moranda model of software reliability; application of some parametric bootstrap methods

TL;DR: Two probabilistic model induction techniques, cart and constructor, are compared, via a series of experiments, in terms of their ability to induce models that are both interpretable and predictive.

...read moreread less

Abstract: Two probabilistic model induction techniques, cart and constructor, are compared, via a series of experiments, in terms of their ability to induce models that are both interpretable and predictive. The experiments show that, although both algorithms are able to deliver classifiers with predictive performance close to that of the optimal Bayes rule,constructor is able to generate a probabilistic model that is more easily interpretable than the cart model. On the other hand, cart is a more mature algorithm and is capable of handling many more situations (e.g., real-valued training sets) thanconstructor. A variety of characteristics of both algorithms are compared, and suggestions for future research are made.

...read moreread less

Journal Article•DOI•

[...]

Mark Van Pul

DEXPERT: an expert system for the design of experiments

TL;DR: In this article, the authors investigate how well the maximum likelihood estimation procedure and the parametric bootstrap behave in the case of the very well-known software reliability model suggested by Jelinski and Moranda (1972).

...read moreread less

Abstract: In software reliability theory many different models have been proposed and investigated. some of these models intuitively match reality better than others. The properties of certain statistical estimation procedures in connection with these models are also model-dependent. In this paper we investigate how well the maximum likelihood estimation procedure and the parametric bootstrap behave in the case of the very well-known software reliability model suggested by Jelinski and Moranda (1972). For this study we will make use of simulated data.

...read moreread less

Journal Article•DOI•

[...]

Thomas J. Lorenzen¹, Lynn T. Truss¹, W. Scott Spangler¹, William T. Corpus, Andrew B. Parker² - Show less +1 more•Institutions (2)

General Motors¹, Sun Microsystems²

Inside two commercially available statistical expert systems

TL;DR: DEXPERT is an expert system, built using KEE, for the design and analysis of experiments, which provides a layout sheet for the collection of the data and then analyzes and interprets the results using analytical and graphical methods.

...read moreread less

Abstract: DEXPERT is an expert system, built using KEE, for the design and analysis of experiments. From a mathematical model, expected mean squares are computed, tests are determined, and the power of the tests computed. Comparisons between designs are aided by suggestions and verbal interpretations provided by DEXPERT. DEXPERT provides a layout sheet for the collection of the data and then analyzes and interprets the results using analytical and graphical methods.

...read moreread less

Journal Article•DOI•

[...]

Jan Raes¹•Institutions (1)

University of Antwerp¹

A computationally efficient method for bootstrapping systems of demand equations: A comparison to traditional techniques

TL;DR: It is concluded that although the technology and concepts that drive these systems could still benefit from further improvement, the real challenge lies in defining and constructing the statistical knowledge and strategy that should be incorporated and in presenting the results to the user's full advantage.

...read moreread less

Abstract: A decade of research into the applications of artificial intelligence in statistics has finally resulted in the appearance of commercially available statistical expert systems. This paper takes a closer look at two of these systems, which are now commercially available on microcomputers, and shows what knowledge they actually contain and how they operate. It is concluded that although the technology and concepts that drive these systems could still benefit from further improvement, the real challenge lies in defining and constructing the statistical knowledge and strategy that should be incorporated and in presenting the results to the user's full advantage.

...read moreread less

Journal Article•DOI•

[...]

Joe Hirschberg¹•Institutions (1)

Southern Methodist University¹

Performance evaluation of normal distribution software

TL;DR: This paper solves two problems, the first by providing a simplified solution to the Liapunov matrix equation which can be written in a few lines of code in computer languages such as SAS PROC MATRIX/IMLTM or GAUSSTM; the second, by bootstrapping the parameter covariance matrix.

...read moreread less

Abstract: The solution to a Liapunov matrix equation (LME) has been proposed to estimate the parameters of the demand equations derived from the Translog, the Almost Ideal Demand System and the Rotterdam demand models. When compared to traditional scemingly unrelated regression (SUR) methods the LME approach saves both computer time and space, and it provides parameter estimates that are less likely to suffer from round-off error. However, the LME method is difficult to implement without the use of specially written computer programs and, unlike traditional SUR methods, it does not automatically provide an estimate of the covariance of the parameters. This paper solves these two problems, the first by providing a simplified solution to the Liapunov matrix equation which can be written in a few lines of code in computer languages such as SAS PROC MATRIX/IMLTM or GAUSSTM; the second, by bootstrapping the parameter covariance matrix.

...read moreread less

Journal Article•DOI•

[...]

Allan J. Macleod

Admissible stochastic complexity models for classification problems

TL;DR: Methods are developed to test the performance of software for calculating the standard normal distribution function, and results are presented on a selection of available codes, showing a wide variation in performance.

...read moreread less

Abstract: Methods are developed to test the performance of software for calculating the standard normal distribution function. The accuracy and implementation details of the tests are described. Results are presented on a selection of available codes, showing a wide variation in performance. At least one published code is shown to have severe defects.

...read moreread less

Journal Article•DOI•

[...]

Padhraic Smyth¹•Institutions (1)

California Institute of Technology¹

Density of the quotient of non-negative quadratic forms in normal variables with application to the F-statistic

TL;DR: The notion of admissible models are defined as a function of problem complexity, the number of data pointsN, and prior belief to derive general bounds relating classifier complexity with data-dependent parameters such as sample size, class entropy and the optimal Bayes error rate.

...read moreread less

Abstract: In this paper we investigate the application of stochastic complexity theory to classification problems. In particular, we define the notion of admissible models as a function of problem complexity, the number of data pointsN, and prior belief. This allows us to derive general bounds relating classifier complexity with data-dependent parameters such as sample size, class entropy and the optimal Bayes error rate. We discuss the application of these results to a variety of problems, including decision tree classifiers, Markov models for image segmentation, and feedforward multilayer neural network classifiers.

...read moreread less

Journal Article•DOI•

[...]

B.B. van der Genugten¹•Institutions (1)

Tilburg University¹

A knowledge acquisition inductive system driven by empirical interpretation of derived results

TL;DR: In this paper, the robustness of the F-test with respect to errors of the first and second kind was investigated, and an explicit expression for this density was given in the form of a proper Riemann-integral on a finite interval, suitable for numerical calculation.

...read moreread less

Abstract: The density of the quotient of two non-negative quadratic forms in normal variables is considered. The covariance matrix of these variables is arbitrary. The result is useful in the study of the robustness of theF-test with respect to errors of the first and second kind. An explicit expression for this density is given in the form of a proper Riemann-integral on a finite interval, suitable for numerical calculation.

...read moreread less

Journal Article•DOI•

[...]

Katsuhiko Tsujino¹, Shogo Nishida¹•Institutions (1)

Mitsubishi Electric¹

Computing asymptotic p-values for EDF tests

TL;DR: This system is an intelligent workbench for construction of knowledge bases for classification tasks by domain experts themselves and aims at integrating similarity-based inductive learning and explanation-based deductive reasoning by guiding inductive inference with theoretical and/or heuristic knowledge about the domain.

...read moreread less

Abstract: This paper describes the design philosophy of and current issues concerning a knowledge acquisition system namedkaiser. This system is an intelligent workbench for construction of knowledge bases for classification tasks by domain experts themselves. It first learns classification knowledge inductively from the examples given by a human expert, then analyzes the result and process based on abstract domain knowledge which is also given by the expert. Based on this analysis, it asks sophisticated questions for acquiring new knowledge. The queries stimulate the human expert and help him to revise the learned results, control the learning process and prepare new examples and domain knowledge. Viewed from an AI aspect, it aims at integrating similarity-based inductive learning and explanation-based deductive reasoning by guiding inductive inference with theoretical and/or heuristic knowledge about the domain. This interactive induce-evaluate-ask cycle produces a rational interview which promotes incremental acquisition of domain knowledge as well as efficient induction of operational and reasonable knowledge proved by the domain knowledge.

...read moreread less

Journal Article•DOI•

[...]

Richard A. Lockhart¹, Tim B. Swartz¹•Institutions (1)

Simon Fraser University¹

Algorithms for frequency distributions: efficiency and generality comparisons

TL;DR: The problem of computingp-values for the asymptotic distribution of certain goodness-of-fit test statistics based on the empirical distribution is approached via quadrature and it is shown that this approach can lead to considerable time savings over the standard practice of discretizing the underlying eigenvalue problem.

...read moreread less

Abstract: In this paper the problem of computingp-values for the asymptotic distribution of certain goodness-of-fit test statistics based on the empirical distribution is approached via quadrature. Through examples it is shown that this approach can lead to considerable time savings over the standard practice of discretizing the underlying eigenvalue problem.

...read moreread less

Journal Article•DOI•

[...]

Michael E. Dewey¹•Institutions (1)

University of Liverpool¹

SYSTAT/SYGRAPH and Micro-TSP

TL;DR: Five methods for forming empirical frequency distributions are outlined, and theoretical comparison of their speed and storage is supplemented by simulation data to give a series of recommendations about the appropriateness of each for different situations.

...read moreread less

Abstract: Five methods for forming empirical frequency distributions are outlined. A specific implementation of each is described, and theoretical comparison of their speed and storage is supplemented by simulation data to give a series of recommendations about the appropriateness of each for different situations. The index method is the fastest of those considered, but often uses excessive space. A method based on height-balanced trees is economical of space, and still has good speed. A method based on Quicksort is faster than the tree method, but uses more space.

...read moreread less

Journal Article•DOI•

[...]

Jeffrey E. Jarrett¹•Institutions (1)

University of Rhode Island¹