scispace - formally typeset
Search or ask a question

Showing papers in "Statistics and Computing in 1992"


Journal ArticleDOI
TL;DR: In this paper, a new approximation for the coefficients required to calculate the Shapiro-Wilk W-test is derived, which is easy to calculate and applies for any sample size greater than 3.
Abstract: A new approximation for the coefficients required to calculate the Shapiro-WilkW-test is derived. It is easy to calculate and applies for any sample size greater than 3. A normalizing transformation for theW statistic is given, enabling itsP-value to be computed simply. The distribution of the new approximation toW agrees well with published critical points which use exact coefficients.

573 citations


Journal ArticleDOI
TL;DR: This paper introduces Bayesian techniques for splitting, smoothing, and tree averaging, which are similar to Quinlan's information gain, while smoothing and averaging replace pruning.
Abstract: Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to Quinlan's information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach,c4 (Quinlanet al., 1987) andcart (Breimanet al., 1984), show that the full Bayesian algorithm can produce more accurate predictions than versions of these other approaches, though pays a computational price.

418 citations


Journal ArticleDOI
TL;DR: This paper analyses a ‘flow-propagation’ algorithm for calculating marginal and conditional distributions in a probabilistic expert system in detail, and shows how it can be modified to perform other tasks, including maximization of the joint density and simultaneous 'fast retraction' of evidence entered on several variables.
Abstract: A probabilistic expert system provides a graphical representation of a joint probability distribution which can be used to simplify and localize calculations. Jensenet al. (1990) introduced a ‘flow-propagation’ algorithm for calculating marginal and conditional distributions in such a system. This paper analyses that algorithm in detail, and shows how it can be modified to perform other tasks, including maximization of the joint density and simultaneous ‘fast retraction’ of evidence entered on several variables.

269 citations


Journal ArticleDOI
TL;DR: In this paper, an alternative definition of a principal curve, based on a mixture model, is presented. But this definition is restricted to the case where the principal curve is a smooth curve passing through the "middle" of a distribution or data cloud.
Abstract: A principal curve (Hastie and Stuetzle, 1989) is a smooth curve passing through the ‘middle’ of a distribution or data cloud, and is a generalization of linear principal components. We give an alternative definition of a principal curve, based on a mixture model. Estimation is carried out through an EM algorithm. Some comparisons are made to the Hastie-Stuetzle definition.

247 citations


Journal ArticleDOI
TL;DR: This paper investigates the applicability of a Monte Carlo technique known as ‘simulated annealing’ to achieve optimum or sub-optimum decompositions of probabilistic networks under bounded resources and proves that cost-function changes can be computed locally.
Abstract: This paper investigates the applicability of a Monte Carlo technique known as ‘simulated annealing’ to achieve optimum or sub-optimum decompositions of probabilistic networks under bounded resources. High-quality decompositions are essential for performing efficient inference in probabilistic networks. Optimum decomposition of probabilistic networks is known to be NP-hard (Wen, 1990). The paper proves that cost-function changes can be computed locally, which is essential to the efficiency of the annealing algorithm. Pragmatic control schedules which reduce the running time of the annealing algorithm are presented and evaluated. Apart from the conventional temperature parameter, these schedules involve the radius of the search space as a new control parameter. The evaluation suggests that the inclusion of this new parameter is important for the success of the annealing algorithm for the present problem.

102 citations


Journal ArticleDOI
TL;DR: A model-theoretic definition of causation is proposed and it is shown that, contrary to common folklore, genuine causal influences can be distinguished from spurious covariations following standard norms of inductive reasoning.
Abstract: We propose a model-theoretic definition of causation, and show that, contrary to common folklore, genuine causal influences can be distinguished from spurious covariations following standard norms of inductive reasoning. We also establish a sound characterization of the conditions under which such a distinction is possible. Finally, we provide a proof-theoretical procedure for inductive causation and show that, for a large class of data and structures, effective algorithms exist that uncover the direction of causal influences as defined above.

47 citations


Journal ArticleDOI
TL;DR: A propagation algorithm is presented and justified to facilitate the simultaneous calculation, for every node in a probabilistic exper system of the distribution of the associated random quantity, conditional on all the evidence obtained about the remaining nodes.
Abstract: We present and justify a propagation algorithm to facilitate the simultaneous calculation, for every node in a probabilistic exper system of the distribution of the associated random quantity, conditional on all the evidence obtained about the remaining nodes.

30 citations


Journal ArticleDOI
TL;DR: A method of regularized discriminant analysis for discrete data, denoted DRDA, is proposed, which has a median position between multinomial discrimination, the first-order independence model and kernel discrimination.
Abstract: A method of regularized discriminant analysis for discrete data, denoted DRDA, is proposed. This method is related to the regularized discriminant analysis conceived by Friedman (1989) in a Gaussian framework for continuous data. Here, we are concerned with discrete data and consider the classification problem using the multionomial distribution. DRDA has been conceived in the small-sample, high-dimensional setting. This method has a median position between multinomial discrimination, the first-order independence model and kernel discrimination. DRDA is characterized by two parameters, the values of which are calculated by minimizing a sample-based estimate of future misclassification risk by cross-validation. The first parameter is acomplexity parameter which provides class-conditional probabilities as a convex combination of those derived from the full multinomial model and the first-order independence model. The second parameter is asmoothing parameter associated with the discrete kernel of Aitchison and Aitken (1976). The optimal complexity parameter is calculated first, then, holding this parameter fixed, the optimal smoothing parameter is determined. A modified approach, in which the smoothing parameter is chosen first, is discussed. The efficiency of the method is examined with other classical methods through application to data.

22 citations


Journal ArticleDOI
TL;DR: Simulations indicate that the temperature level of the annealing schedule significantly affects the convergence behavior of the training process and that, to achieve a balanced performance of these BMPRs, a medium to high level of annealed temperatures is recommended.
Abstract: Boltzmann machines (BM), a type of neural networking algorithm, have been proven to be useful in pattern recognition. Patterns on quality control charts have long been recognized as providing useful information for correcting process performance problems. In computer-integrated manufacturing environments, where the control charts are monitored by computer algorithms, the potential for using pattern-recognition algorithms is considerable. The main purpose of this paper is to formulate a Boltzmann machine pattern recognizer (BMPR) and demonstrate its utility in control chart pattern recognition. It is not the intent of this paper to make comparisons between existing related algorithms. A factorial design of experiments was conducted to study the effects of numerous factors on the convergence behavior and performance of these BMPRs. These factors include the number of hidden nodes used in the network and the annealing schedule. Simulations indicate that the temperature level of the annealing schedule significantly affects the convergence behavior of the training process and that, to achieve a balanced performance of these BMPRs, a medium to high level of annealing temperatures is recommended. Numerical results for cyclical and stratification patterns illustrate that the classification capability of these BMPRs is quite powerful.

21 citations


Journal ArticleDOI
TL;DR: A probabilistic model of text understanding is developed, using probability theory to handle the uncertainty which arises in this abductive inference process, and all aspects of natural language processing are treated in the same framework, allowing to integrate syntactic, semantic and pragmatic constraints.
Abstract: We discuss a new framework for text understanding. Three major design decisions characterize this approach. First, we take the problem of text understanding to be a particular case of the general problem of abductive inference. Second, we use probability theory to handle the uncertainty which arises in this abductive inference process. Finally, all aspects of natural language processing are treated in the same framework, allowing us to integrate syntactic, semantic and pragmatic constraints. In order to apply probability theory to this problem, we have developed a probabilistic model of text understanding. To make it practical to use this model, we have devised a way of incrementally constructing and evaluating belief networks. We have written a program,wimp3, to experiment with this framework. To evaluate this program, we have developed a simple ‘single-blind’ testing method.

19 citations


Journal ArticleDOI
TL;DR: In this paper, the authors propose an alternative approach to constructing double bootstrap confidence intervals that involves replacing the inner level of resampling by an analytical approximation, based on saddlepoint methods and a tail probability approximation of DiCiccio and Martin.
Abstract: Standard algorithms for the construction of iterated bootstrap confidence intervals are computationally very demanding, requiring nested levels of bootstrap resampling. We propose an alternative approach to constructing double bootstrap confidence intervals that involves replacing the inner level of resampling by an analytical approximation. This approximation is based on saddlepoint methods and a tail probability approximation of DiCiccio and Martin (1991). Our technique significantly reduces the computational expense of iterated bootstrap calculations. A formal algorithm for the construction of our approximate iterated bootstrap confidence intervals is presented, and some crucial practical issues arising in its implementation are discussed. Our procedure is illustrated in the case of constructing confidence intervals for ratios of means using both real and simulated data. We repeat an experiment of Schenker (1985) involving the construction of bootstrap confidence intervals for a variance and demonstrate that our technique makes feasible the construction of accurate bootstrap confidence intervals in that context. Finally, we investigate the use of our technique in a more complex setting, that of constructing confidence intervals for a correlation coefficient.

Journal ArticleDOI
TL;DR: In this article, the properties of a parameterized form of generalized simulated annealing for function minimization are investigated by studying properties of repeated minimizations from random starting points, which leads to the comparison of distributions of function values and of numbers of function evaluations.
Abstract: The properties of a parameterized form of generalized simulated annealing for function minimization are investigated by studying the properties of repeated minimizations from random starting points. This leads to the comparison of distributions of function values and of numbers of function evaluations. Parameter values which yield searches repeatedly terminating close to the global minimum may require unacceptably many function evaluations. If computational resources are a constraint, the total number of function evaluations may be limited. A sensible strategy is then to restart at a random point any search which terminates, until the total allowable number of function evaluations has been exhausted. The response is now the minimum of the function values obtained. This strategy yields a surprisingly stable solution for the parameter values of the simulated annealing algorithm. The algorithm can be further improved by segmentation in which each search is limited to a maximum number of evaluations, perhaps no more than a fifth of the total available. The main tool for interpreting the distributions of function values is the boxplot. The application is to the optimum design of experiments.

Journal ArticleDOI

Journal ArticleDOI
TL;DR: In this article, the authors compare the performances of simulated annealing and EM algorithms in problems of decomposition of normal mixtures according to the likelihood approach, considering a suitable reformulation of the problem which yields an optimization problem having a global solution and at least a smaller number of spurious maxima.
Abstract: We compare the performances of the simulated annealing and the EM algorithms in problems of decomposition of normal mixtures according to the likelihood approach. In this case the likelihood function has multiple maxima and singularities, and we consider a suitable reformulation of the problem which yields an optimization problem having a global solution and at least a smaller number of spurious maxima. The results are compared considering some distance measures between the estimated distributions and the true ones. No overwhelming superiority of either method has been demonstrated, though in one of our cases simulated annealing achieved better results.

Journal ArticleDOI
TL;DR: Two probabilistic model induction techniques, cart and constructor, are compared, via a series of experiments, in terms of their ability to induce models that are both interpretable and predictive.
Abstract: Two probabilistic model induction techniques, cart and constructor, are compared, via a series of experiments, in terms of their ability to induce models that are both interpretable and predictive. The experiments show that, although both algorithms are able to deliver classifiers with predictive performance close to that of the optimal Bayes rule,constructor is able to generate a probabilistic model that is more easily interpretable than the cart model. On the other hand, cart is a more mature algorithm and is capable of handling many more situations (e.g., real-valued training sets) thanconstructor. A variety of characteristics of both algorithms are compared, and suggestions for future research are made.

Journal ArticleDOI
TL;DR: In this article, the authors investigate how well the maximum likelihood estimation procedure and the parametric bootstrap behave in the case of the very well-known software reliability model suggested by Jelinski and Moranda (1972).
Abstract: In software reliability theory many different models have been proposed and investigated. some of these models intuitively match reality better than others. The properties of certain statistical estimation procedures in connection with these models are also model-dependent. In this paper we investigate how well the maximum likelihood estimation procedure and the parametric bootstrap behave in the case of the very well-known software reliability model suggested by Jelinski and Moranda (1972). For this study we will make use of simulated data.

Journal ArticleDOI
TL;DR: DEXPERT is an expert system, built using KEE, for the design and analysis of experiments, which provides a layout sheet for the collection of the data and then analyzes and interprets the results using analytical and graphical methods.
Abstract: DEXPERT is an expert system, built using KEE, for the design and analysis of experiments. From a mathematical model, expected mean squares are computed, tests are determined, and the power of the tests computed. Comparisons between designs are aided by suggestions and verbal interpretations provided by DEXPERT. DEXPERT provides a layout sheet for the collection of the data and then analyzes and interprets the results using analytical and graphical methods.

Journal ArticleDOI
Jan Raes1
TL;DR: It is concluded that although the technology and concepts that drive these systems could still benefit from further improvement, the real challenge lies in defining and constructing the statistical knowledge and strategy that should be incorporated and in presenting the results to the user's full advantage.
Abstract: A decade of research into the applications of artificial intelligence in statistics has finally resulted in the appearance of commercially available statistical expert systems. This paper takes a closer look at two of these systems, which are now commercially available on microcomputers, and shows what knowledge they actually contain and how they operate. It is concluded that although the technology and concepts that drive these systems could still benefit from further improvement, the real challenge lies in defining and constructing the statistical knowledge and strategy that should be incorporated and in presenting the results to the user's full advantage.

Journal ArticleDOI
TL;DR: This paper solves two problems, the first by providing a simplified solution to the Liapunov matrix equation which can be written in a few lines of code in computer languages such as SAS PROC MATRIX/IMLTM or GAUSSTM; the second, by bootstrapping the parameter covariance matrix.
Abstract: The solution to a Liapunov matrix equation (LME) has been proposed to estimate the parameters of the demand equations derived from the Translog, the Almost Ideal Demand System and the Rotterdam demand models. When compared to traditional scemingly unrelated regression (SUR) methods the LME approach saves both computer time and space, and it provides parameter estimates that are less likely to suffer from round-off error. However, the LME method is difficult to implement without the use of specially written computer programs and, unlike traditional SUR methods, it does not automatically provide an estimate of the covariance of the parameters. This paper solves these two problems, the first by providing a simplified solution to the Liapunov matrix equation which can be written in a few lines of code in computer languages such as SAS PROC MATRIX/IMLTM or GAUSSTM; the second, by bootstrapping the parameter covariance matrix.

Journal ArticleDOI
TL;DR: Methods are developed to test the performance of software for calculating the standard normal distribution function, and results are presented on a selection of available codes, showing a wide variation in performance.
Abstract: Methods are developed to test the performance of software for calculating the standard normal distribution function. The accuracy and implementation details of the tests are described. Results are presented on a selection of available codes, showing a wide variation in performance. At least one published code is shown to have severe defects.

Journal ArticleDOI
TL;DR: The notion of admissible models are defined as a function of problem complexity, the number of data pointsN, and prior belief to derive general bounds relating classifier complexity with data-dependent parameters such as sample size, class entropy and the optimal Bayes error rate.
Abstract: In this paper we investigate the application of stochastic complexity theory to classification problems. In particular, we define the notion of admissible models as a function of problem complexity, the number of data pointsN, and prior belief. This allows us to derive general bounds relating classifier complexity with data-dependent parameters such as sample size, class entropy and the optimal Bayes error rate. We discuss the application of these results to a variety of problems, including decision tree classifiers, Markov models for image segmentation, and feedforward multilayer neural network classifiers.

Journal ArticleDOI
TL;DR: In this paper, the robustness of the F-test with respect to errors of the first and second kind was investigated, and an explicit expression for this density was given in the form of a proper Riemann-integral on a finite interval, suitable for numerical calculation.
Abstract: The density of the quotient of two non-negative quadratic forms in normal variables is considered. The covariance matrix of these variables is arbitrary. The result is useful in the study of the robustness of theF-test with respect to errors of the first and second kind. An explicit expression for this density is given in the form of a proper Riemann-integral on a finite interval, suitable for numerical calculation.

Journal ArticleDOI
TL;DR: This system is an intelligent workbench for construction of knowledge bases for classification tasks by domain experts themselves and aims at integrating similarity-based inductive learning and explanation-based deductive reasoning by guiding inductive inference with theoretical and/or heuristic knowledge about the domain.
Abstract: This paper describes the design philosophy of and current issues concerning a knowledge acquisition system namedkaiser. This system is an intelligent workbench for construction of knowledge bases for classification tasks by domain experts themselves. It first learns classification knowledge inductively from the examples given by a human expert, then analyzes the result and process based on abstract domain knowledge which is also given by the expert. Based on this analysis, it asks sophisticated questions for acquiring new knowledge. The queries stimulate the human expert and help him to revise the learned results, control the learning process and prepare new examples and domain knowledge. Viewed from an AI aspect, it aims at integrating similarity-based inductive learning and explanation-based deductive reasoning by guiding inductive inference with theoretical and/or heuristic knowledge about the domain. This interactive induce-evaluate-ask cycle produces a rational interview which promotes incremental acquisition of domain knowledge as well as efficient induction of operational and reasonable knowledge proved by the domain knowledge.

Journal ArticleDOI
TL;DR: The problem of computingp-values for the asymptotic distribution of certain goodness-of-fit test statistics based on the empirical distribution is approached via quadrature and it is shown that this approach can lead to considerable time savings over the standard practice of discretizing the underlying eigenvalue problem.
Abstract: In this paper the problem of computingp-values for the asymptotic distribution of certain goodness-of-fit test statistics based on the empirical distribution is approached via quadrature. Through examples it is shown that this approach can lead to considerable time savings over the standard practice of discretizing the underlying eigenvalue problem.

Journal ArticleDOI
TL;DR: Five methods for forming empirical frequency distributions are outlined, and theoretical comparison of their speed and storage is supplemented by simulation data to give a series of recommendations about the appropriateness of each for different situations.
Abstract: Five methods for forming empirical frequency distributions are outlined. A specific implementation of each is described, and theoretical comparison of their speed and storage is supplemented by simulation data to give a series of recommendations about the appropriateness of each for different situations. The index method is the fastest of those considered, but often uses excessive space. A method based on height-balanced trees is economical of space, and still has good speed. A method based on Quicksort is faster than the tree method, but uses more space.