scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Computational and Graphical Statistics in 1996"


Journal ArticleDOI
TL;DR: In this article, the authors discuss their experience designing and implementing a statistical computing language, which combines what they felt were useful features from two existing computer languages, and they feel that the new language provides advantages in the areas of portability, computational efficiency, memory management, and scope.
Abstract: In this article we discuss our experience designing and implementing a statistical computing language. In developing this new language, we sought to combine what we felt were useful features from two existing computer languages. We feel that the new language provides advantages in the areas of portability, computational efficiency, memory management, and scoping.

9,446 citations


Journal ArticleDOI
TL;DR: A new algorithm based on a Monte Carlo method that can be applied to a broad class of nonlinear non-Gaussian higher dimensional state space models on the provision that the dimensions of the system noise and the observation noise are relatively low.
Abstract: A new algorithm for the prediction, filtering, and smoothing of non-Gaussian nonlinear state space models is shown. The algorithm is based on a Monte Carlo method in which successive prediction, filtering (and subsequently smoothing), conditional probability density functions are approximated by many of their realizations. The particular contribution of this algorithm is that it can be applied to a broad class of nonlinear non-Gaussian higher dimensional state space models on the provision that the dimensions of the system noise and the observation noise are relatively low. Several numerical examples are shown.

2,406 citations


Journal ArticleDOI
TL;DR: In this paper, a general definition of residuals for regression models with independent responses is given, which produces residuals that are exactly normal, apart from sampling variability in the estimated parameters, by inverting the fitted distribution function for each response value and finding the equivalent standard normal quantile.
Abstract: In this article we give a general definition of residuals for regression models with independent responses Our definition produces residuals that are exactly normal, apart from sampling variability in the estimated parameters, by inverting the fitted distribution function for each response value and finding the equivalent standard normal quantile Our definition includes some randomization to achieve continuous residuals when the response variable is discrete Quantile residuals are easily computed in computer packages such as SAS, S-Plus, GLIM, or LispStat, and allow residual analyses to be carried out in many commonly occurring situations in which the customary definitions of residuals fail Quantile residuals are applied in this article to three example data sets

838 citations


Journal ArticleDOI
TL;DR: A rudimentary taxonomy of interactive data visualization is proposed based on a triad of data analytic tasks: finding Gestalt, posing queries, and making comparisons; namely, high-dimensional projections, linked scatterplot brushing, and matrices of conditional plots.
Abstract: We propose a rudimentary taxonomy of interactive data visualization based on a triad of data analytic tasks: finding Gestalt, posing queries, and making comparisons. These tasks are supported by three classes of interactive view manipulations: focusing, linking, and arranging views. This discussion extends earlier work on the principles of focusing and linking and sets them on a firmer base. Next, we give a high-level introduction to a particular system for multivariate data visualization—XGobi. This introduction is not comprehensive but emphasizes XGobi tools that are examples of focusing, linking, and arranging views; namely, high-dimensional projections, linked scatterplot brushing, and matrices of conditional plots. Finally, in a series of case studies in data visualization, we show the powers and limitations of particular focusing, linking, and arranging tools. The discussion is dominated by high-dimensional projections that form an extremely well-developed part of XGobi. Of particular inter...

389 citations


Journal ArticleDOI
Abstract: We consider the kernel estimator of conditional density and derive its asymptotic bias, variance, and mean-square error. Optimal bandwidths (with respect to integrated mean-square error) are found and it is shown that the convergence rate of the density estimator is order n –2/3. We also note that the conditional mean function obtained from the estimator is equivalent to a kernel smoother. Given the undesirable bias properties of kernel smoothers, we seek a modified conditional density estimator that has mean equivalent to some other nonparametric regression smoother with better bias properties. It is also shown that our modified estimator has smaller mean square error than the standard estimator in some commonly occurring situations. Finally, three graphical methods for visualizing conditional density estimators are discussed and applied to a data set consisting of maximum daily temperatures in Melbourne, Australia.

384 citations


Journal ArticleDOI
TL;DR: Trellis display provides a powerful mechanism for understanding interactions in studies of how a response depends on explanatory variables, and makes important discoveries not appreciated in the original analyses.
Abstract: Trellis display is a framework for the visualization of data. Its most prominent aspect is an overall visual design, reminiscent of a garden trelliswork, in which panels are laid out into rows, columns, and pages. On each panel of the trellis, a subset of the data is graphed by a display method such as a scatterplot, curve plot, boxplot, 3-D wireframe, normal quantile plot, or dot plot. Each panel shows the relationship of certain variables conditional on the values of other variables. A number of display methods employed in the visual design of Trellis display enable it to succeed in uncovering the structure of data even when the structure is quite complicated. For example, Trellis display provides a powerful mechanism for understanding interactions in studies of how a response depends on explanatory variables. Three examples demonstrate this; in each case, we make important discoveries not appreciated in the original analyses. Several control methods are also essential to Trellis display. A con...

294 citations


Journal ArticleDOI
TL;DR: It is shown how the DWT breaks down a fdGn, and the exact correlation structure of the resulting coefficients for different wavelets (Daubechies' minimum-phase and least-asymmetric and Haar) is shown.
Abstract: The discrete wavelet transform (DWT) can be interpreted as a filtering of a time series by a set of octave band filters such that the width of each band as a proportion of its center frequency is constant. A long-memory process having a power spectrum that plots as a straight line on log-frequency/log-power scales over many octaves of frequency is intrinsically related to such a structure. As an example of such processes, we focus on one class of discrete-time, stationary, long-memory processes, the fractionally differenced Gaussian white noise processes (fdGn). We show how the DWT breaks down a fdGn, and show the exact correlation structure of the resulting coefficients for different wavelets (Daubechies' minimum-phase and least-asymmetric and Haar). The DWT is an impressive “whitening filter.” A discrete wavelet-based scheme for simulating fdGn's is discussed and is shown to be equivalent to a spectral decomposition of the covariance matrix of the process; however, it can be carried out using o...

158 citations


Journal ArticleDOI
TL;DR: The MANET software has been developed for keeping track of missing values in interactive graphics analyses and for investigating new interactive graphics tools.
Abstract: Missing values are a problem for statistical methods. This applies just as much to modern methods such as interactive graphics as to more classical methods. The MANET software has been developed for keeping track of missing values in interactive graphics analyses and for investigating new interactive graphics tools.

115 citations


Journal ArticleDOI
TL;DR: This article describes a set of pixel-oriented visualization techniques that use each pixel of the display to visualize one data value and therefore allow the visualization of the largest amount of data possible.
Abstract: An important goal of visualization technology is to support the exploration and analysis of very large amounts of data. This article describes a set of pixel-oriented visualization techniques that use each pixel of the display to visualize one data value and therefore allow the visualization of the largest amount of data possible. Most of the techniques have been specifically designed for visualizing and querying large data bases. The techniques may be divided into query-independent techniques that directly visualize the data (or a certain portion of it) and query-dependent techniques that visualize the data in the context of a specific query. Examples for the class of query-independent techniques are the screen-filling curve and recursive pattern techniques. The screen-filling curve techniques are based on the well-known Morton and Peano–Hilbert curve algorithms, and the recursive pattern technique is based on a generic recursive scheme, which generalizes a wide range of pixel-oriented arrangeme...

91 citations


Journal ArticleDOI
TL;DR: It is shown that the effect of the leverage in regression models makes very difficult the convergence of the Gibbs sampling algorithm in sets of data with strong masking.
Abstract: This article discusses the convergence of the Gibbs sampling algorithm when it is applied to the problem of outlier detection in regression models. Given any vector of initial conditions, theoretically, the algorithm converges to the true posterior distribution. However, the speed of convergence may slow down in a high-dimensional parameter space where the parameters are highly correlated. We show that the effect of the leverage in regression models makes very difficult the convergence of the Gibbs sampling algorithm in sets of data with strong masking. The problem is illustrated with examples.

44 citations


Journal ArticleDOI
TL;DR: The basic idea, which may be described as “sequential linearization of constraints,” is a very simple one, but it could have significant ramifications for the implementation and practical use of empirical likelihood methodology.
Abstract: Empirical likelihood for a mean is straightforward to compute, but for nonlinear statistics significant computational difficulties arise because of the presence of nonlinear constraints in the underlying optimization problem. It is certainly the case that these difficulties can be overcome with sufficient time, care, and programming effort. However, they do make it difficult to write general software for implementing empirical likelihood, and therefore these difficulties are likely to hinder the widespread use of empirical likelihood in applied work. The purpose of this article is to suggest an approximate approach that sidesteps the difficult computational issues. The basic idea, which may be described as “sequential linearization of constraints,” is a very simple one, but we believe it could have significant ramifications for the implementation and practical use of empirical likelihood methodology. One application of the linearization approach, which we consider in this article, is to the probl...

Journal ArticleDOI
Robert Gray1
TL;DR: In this paper, a method for nonparametric estimation of hazard rates as a function of time and possibly multiple covariates is proposed, which is based on dividing the time axis into intervals, and calculating number of event and follow-up time contributions from the different intervals.
Abstract: This article proposes a method for nonparametric estimation of hazard rates as a function of time and possibly multiple covariates. The method is based on dividing the time axis into intervals, and calculating number of event and follow-up time contributions from the different intervals. The number of event and follow-up time data are then separately smoothed on time and the covariates, and the hazard rate estimators obtained by taking the ratio. Pointwise consistency and asymptotic normality are shown for the hazard rate estimators for a certain class of smoothers, which includes some standard approaches to locally weighted regression and kernel regression. It is shown through simulation that a variance estimator based on this asymptotic distribution is reasonably reliable in practice. The problem of how to select the smoothing parameter is considered, but a satisfactory resolution to this problem has not been identified. The method is illustrated using data from several breast cancer clinical t...

Journal ArticleDOI
TL;DR: In this article, an algorithm for isotonic regression on ordered rectangular grids is presented, with running time no more than a cubic in the number of grid points, which makes bivariate isotonic regressions a practical choice for some data analysis.
Abstract: In this article, we give an algorithm for isotonic regressions on ordered rectangular grids. The running time of the algorithm is no more than cubic in the number of grid points. This algorithm makes bivariate isotonic regression a practical choice for some data analysis.

Journal ArticleDOI
TL;DR: Voyager, an extensible data analysis system based on Oberon, which tries to exploit some of these possibilities for statistical computing by exploiting dynamic loading and type-safety across module boundaries, even at run time.
Abstract: Recent changes in software technology have opened new possibilities for statistical computing. Conditions for creating efficient and reliable extensible systems have been largely improved by programming languages and systems that provide dynamic loading and type-safety across module boundaries, even at run time. We introduce Voyager, an extensible data analysis system based on Oberon, which tries to exploit some of these possibilities.

Journal ArticleDOI
TL;DR: In this article, a polynomial multiplication algorithm was proposed to compute exact distributions and tail areas for the family of stratum-additive statistics, including score, likelihood ratio, and other statistics.
Abstract: The investigation of interaction in a series of 2 × 2 tables is warranted in a variety of research endeavors. Though many large-sample approaches for such investigations are available, the exact analysis of the problem has been formulated for the probability statistic only. We present several alternative statistics applicable in this context. We also give an efficient polynomial multiplication algorithm to compute exact distributions and tail areas for the family of stratum-additive statistics. Besides the probability statistic, these include the score, likelihood ratio, and other statistics. In addition to comparing, in empirical terms, the diverse computational strategies for exact interaction analysis, we also explore the theoretical linkages between them. Data from published papers are used for illustration.

Journal ArticleDOI
TL;DR: In this article, the authors investigate the extension of binning methodology to fast computation of several auxiliary quantities that arise in local polynomial smoothing, such as degrees of freedom measures, cross-validation functions, variance estimates, and exact measures of error.
Abstract: We investigate the extension of binning methodology to fast computation of several auxiliary quantities that arise in local polynomial smoothing. Examples include degrees of freedom measures, cross-validation functions, variance estimates, and exact measures of error. It is shown that the computational effort required for such approximations is of the same order of magnitude as that required for a binned local polynomial smooth.

Journal ArticleDOI
TL;DR: This paper showed that the use of an improper prior leads to an improper posterior, although the conditionals are proper, and hence a formal Gibbs sampler can be constructed to solve the problem.
Abstract: In this article we examine the use of Gibbs sampling to estimate the autocorrelation coefficient in a linear regression model. Researchers had previously experienced difficulty with moderate-to-high positive autocorrelated errors; estimates could be unstable and sometimes failed to converge. We show that the cause of this problem is that the use of an improper prior leads to an improper posterior, although the conditionals are proper, and hence a formal Gibbs sampler can be constructed. The problem is solved by the use of a vague but proper prior. In this simple case many of the calculations can be done analytically and it serves as a warning as to the uncritical use of improper priors with Gibbs sampling.

Journal ArticleDOI
TL;DR: In this paper, a jump process is used to explain irregular fluctuations that are not so plausibly modeled by fractal processes, such as discontinuities and nonlinear drift in the mean.
Abstract: Arguably the best-known applications of fractal methods are in relatively homogeneous, stationary settings, where the environment is controllable by scientists or engineers. For example, in applications to surface science, an unblemished portion of a surface is selected for analysis; and in environmental science, an artificial soil bed of controlled homogeneity is subjected to uniformly distributed water droplets, to model the effect of actual rain on a real soil surface. In some applications, however, the environment is uncontrollable, with the result that measurements are subject to irregular fluctuations that are not so plausibly modeled by fractal processes. The fluctuations may include discontinuities and nonlinear drift in the mean. Some approaches to analysis do not distinguish between this nonstationary contamination and the “background,” with the result that a jump process may provide a significantly better explanation of the data than a fractal process. In this article we suggest decomp...

Journal ArticleDOI
TL;DR: An index of the images in the collection of digital images that has nearly 30,000 members is constructed, regarded as an early development of statistical exploratory tools for studying collections of complex objects.
Abstract: We are interested in the exploratory analysis of large collections of complex objects. As an example, we are studying a large collection of digital images that has nearly 30,000 members. We regard each image in the collection as an individual observation. To facilitate our study we construct an index of the images in the collection. The index uses a small copy of each image (an icon or a “thumbnail”) to represent the full-size version. A large number of these thumbnails are laid out in a workstation window. We can interactively arrange and rearrange the thumbnails within the window. For example, we can sort the thumbnails by the values of a function computed from them or by the values of data associated with each of them. By the use of specialized equipment (a single-frame video disk recorder/player), we can instantly access any individual full-size image in the collection as a video image. We regard our software as an early development of statistical exploratory tools for studying collections of...

Journal ArticleDOI
TL;DR: The TWI-Stat project, a computer-aided instruction course was developed to help students become more familiar with modern statistical analysis and presents itself as a dynamic, interactive, personal book.
Abstract: At Delft University of Technology many students experience difficulties in mastering basic concepts of probability and statistics. In the past few years the lectures have undergone a radical change—the lecture notes now contain modern data analysis techniques, like kernel density estimation, simulation, and bootstrapping. In the TWI-Stat project, a computer-aided instruction course was developed to help students become more familiar with modern statistical analysis. The course presents itself as a dynamic, interactive, personal book. Highly interactive analysis tools are available. The software will be available for MS-Windows.


Journal ArticleDOI
Allan R. Wilks1
TL;DR: This article describes in detail the constraint system that Pictor uses, which describes graphs as graphical objects whose component pieces are related by several sorts of constraints.
Abstract: Pictor is an environment for statistical graphics that promotes simple commands for common uses and offers the ability to experiment with whole new paradigms. Pictor describes graphs as graphical objects whose component pieces are related by several sorts of constraints. This article describes in detail the constraint system that Pictor uses.

Journal ArticleDOI
TL;DR: Lisp-Stat is an extensible statistical computing environment based on the Lisp language that is currently being revised on the basis of experience gained from several years of use.
Abstract: Lisp-Stat is an extensible statistical computing environment based on the Lisp language. The system is currently being revised on the basis of experience gained from several years of use. This article outlines some of the changes that have been completed and others that are under consideration.

Journal ArticleDOI
TL;DR: This article introduces the Oberon system from the perspective of an extensible system and discusses issues related to these hierarchies and the approaches selected in Oberon for their implementation.
Abstract: Extensible software systems play an important role in prototyping environments where a fast compile-and-test turnaround is required. Typically, extensible software systems combine ways to reuse code, an approach to object-oriented programming, and ways to preserve state from one session to another. In this article we introduce the Oberon system from the perspective of an extensible system. In Oberon, the stated requirements manifest themselves as separate hierarchies related to modularity, the type system, runtime system organization, and persistency. We discuss issues related to these hierarchies and the approaches selected in Oberon for their implementation. The article is mainly a short introduction to Oberon and a summary of what has been accomplished with this system.