scispace - formally typeset
Search or ask a question

Showing papers in "AStA Advances in Statistical Analysis in 2010"


Journal ArticleDOI
TL;DR: The included papers present an interesting mixture of recent developments in the field as they cover fundamental research on the design of experiments, models and analysis methods as well as more applied research connected to real-life applications.
Abstract: The design and analysis of computer experiments as a relatively young research field is not only of high importance for many industrial areas but also presents new challenges and open questions for statisticians. This editorial introduces a special issue devoted to the topic. The included papers present an interesting mixture of recent developments in the field as they cover fundamental research on the design of experiments, models and analysis methods as well as more applied research connected to real-life applications.

2,583 citations


Journal ArticleDOI
TL;DR: This paper provides a broad introduction to the topic of computer experiments by briefly presenting a number of applications with different types of output or different goals, and reviewing modelling strategies, including the popular Gaussian process approach, as well as variations and modifications.
Abstract: In this paper we provide a broad introduction to the topic of computer experiments. We begin by briefly presenting a number of applications with different types of output or different goals. We then review modelling strategies, including the popular Gaussian process approach, as well as variations and modifications. Other strategies that are reviewed are based on polynomial regression, non-parametric regression and smoothing spline ANOVA. The issue of multi-level models, which combine simulators of different resolution in the same experiment, is also addressed. Special attention is given to modelling techniques that are suitable for functional data. To conclude the modelling section, we discuss calibration, validation and verification. We then review design strategies including Latin hypercube designs and space-filling designs and their adaptation to computer experiments. We comment on a number of special issues, such as designs for multi-level simulators, nested factors and determination of experiment size.

95 citations


Journal ArticleDOI
TL;DR: It is shown that SDR can be effectively combined with the “classical” approach to obtain a more accurate and efficient estimation of smoothing spline ANOVA models to be applied for emulation purposes.
Abstract: In this paper we present a unified discussion of different approaches to the identification of smoothing spline analysis of variance (ANOVA) models: (i) the “classical” approach (in the line of Wahba in Spline Models for Observational Data, 1990; Gu in Smoothing Spline ANOVA Models, 2002; Storlie et al. in Stat. Sin., 2011) and (ii) the State-Dependent Regression (SDR) approach of Young in Nonlinear Dynamics and Statistics (2001). The latter is a nonparametric approach which is very similar to smoothing splines and kernel regression methods, but based on recursive filtering and smoothing estimation (the Kalman filter combined with fixed interval smoothing). We will show that SDR can be effectively combined with the “classical” approach to obtain a more accurate and efficient estimation of smoothing spline ANOVA models to be applied for emulation purposes. We will also show that such an approach can compare favorably with kriging.

76 citations


Journal ArticleDOI
TL;DR: In this paper, a new algorithm called constrained Latin hypercube sampling (cLHS) is proposed, which takes into account inequality constraints between the sampled variables and does permutations on an initial LHS to honor the desired monotonic constraints.
Abstract: In some studies requiring predictive and CPU-time consuming numerical models, the sampling design of the model input variables has to be chosen with caution. For this purpose, Latin hypercube sampling has a long history and has shown its robustness capabilities. In this paper we propose and discuss a new algorithm to build a Latin hypercube sample (LHS) taking into account inequality constraints between the sampled variables. This technique, called constrained Latin hypercube sampling (cLHS), consists in doing permutations on an initial LHS to honor the desired monotonic constraints. The relevance of this approach is shown on a real example concerning the numerical welding simulation, where the inequality constraints are caused by the physical decreasing of some material properties in function of the temperature.

66 citations


Journal ArticleDOI
TL;DR: In this article, the authors explore the biasinducing effects of rounding and show that under appropriate conditions, these effects can be approximately rectified by versions of Sheppard's correction formula.
Abstract: Using rounded data to estimate moments and regression coefficients typically biases the estimates. We explore the bias-inducing effects of rounding, thereby reviewing widely dispersed and often half forgotten results in the literature. Under appropriate conditions, these effects can be approximately rectified by versions of Sheppard’s correction formula. We discuss the conditions under which these approximations are valid and also investigate the efficiency loss caused by rounding. The rounding error, which corresponds to the measurement error of a measurement error model, has a marginal distribution, which can be approximated by the uniform distribution, but is not independent of the true value. In order to take account of rounding preferences (heaping), we generalize the concept of simple rounding to that of asymmetric rounding and consider its effect on the mean and variance of a distribution.

48 citations


Journal ArticleDOI
TL;DR: A space-filling criterion based on the Kullback–Leibler information is used to build a new class of Latin hypercube designs that appear to perform well.
Abstract: Space-filling designs are commonly used for selecting the input values of time-consuming computer codes. Computer experiment context implies two constraints on the design. First, the design points should be evenly spread throughout the experimental region. A space-filling criterion (for instance, the maximin distance) is used to build optimal designs. Second, the design should avoid replication when projecting the points onto a subset of input variables (non-collapsing). The Latin hypercube structure is often enforced to ensure good projective properties. In this paper, a space-filling criterion based on the Kullback–Leibler information is used to build a new class of Latin hypercube designs. The new designs are compared with several traditional optimal Latin hypercube designs and appear to perform well.

45 citations


Journal ArticleDOI
TL;DR: In this article, the RFA/LMS method and the MGFA method are compared in detecting uniform and non-uniform measurement bias under various conditions, varying the size of uniform bias, size of nonuniform bias, the sample size, and the ability distribution.
Abstract: Factor analysis is an established technique for the detection of measurement bias. Multigroup factor analysis (MGFA) can detect both uniform and nonuniform bias. Restricted factor analysis (RFA) can also be used to detect measurement bias, albeit only uniform measurement bias. Latent moderated structural equations (LMS) enable the estimation of nonlinear interaction effects in structural equation modelling. By extending the RFA method with LMS, the RFA method should be suited to detect nonuniform bias as well as uniform bias. In a simulation study, the RFA/LMS method and the MGFA method are compared in detecting uniform and nonuniform measurement bias under various conditions, varying the size of uniform bias, the size of nonuniform bias, the sample size, and the ability distribution. For each condition, 100 sets of data were generated and analysed through both detection methods. The RFA/LMS and MGFA methods turned out to perform equally well. Percentages of correctly identified items as biased (true positives) generally varied between 92% and 100%, except in small sample size conditions in which the bias was nonuniform and small. For both methods, the percentages of false positives were generally higher than the nominal levels of significance.

40 citations


Journal ArticleDOI
TL;DR: In this article, an integer-valued stationary symmetric AR(1) process with positive or negative lag-one autocorrelation is presented. But the model is not symmetric.
Abstract: We construct an integer-valued stationary symmetric AR(1) process which can have either a positive or a negative lag-one autocorrelation. Nearly all integer-valued time series models are designed for observations which are non-negative integers or counts. They have innovations which are distributed on the non-negative integers and therefore obviously non-symmetric. We build our model using innovations that come from the difference of two independent identically distributed Poisson random variables. These innovations have a symmetric distribution, which has many advantages; in particular, they will allow us to model negative correlations. For our AR(1) process, we examine its basic properties and consider estimation via conditional least squares.

36 citations


Journal ArticleDOI
TL;DR: This paper established the strong consistency and asymptotic normality for the least square estimators in simple linear errors-in-variables (EV) regression models when the errors form a stationary α-mixing sequence of random variables.
Abstract: In this paper, we establish the strong consistency and asymptotic normality for the least square (LS) estimators in simple linear errors-in-variables (EV) regression models when the errors form a stationary α-mixing sequence of random variables. The quadratic-mean consistency is also considered.

35 citations


Journal ArticleDOI
TL;DR: This paper provides a survey of copulae where different copula classes, estimation and simulation techniques and goodness-of-fit tests are considered and differentCopulae to the static and dynamic Value-at-Risk of portfolio returns and Profit-and-Loss function are applied.
Abstract: Normal distribution of residuals is a traditional assumption in multivariate models. It is, however, not very often consistent with real data. Copulae allow for an extension of dependency models to nonellipticity and for separation of margins from the dependency. This paper provides a survey of copulae where different copula classes, estimation and simulation techniques and goodness-of-fit tests are considered. In the empirical section we apply different copulae to the static and dynamic Value-at-Risk of portfolio returns and Profit-and-Loss function.

32 citations


Journal ArticleDOI
TL;DR: The CS-C(M−1) change model as discussed by the authors allows investigators to study inter-individual differences in intra-individual change over time, to separate true change from random measurement error, and to analyse change simultaneously for different methods.
Abstract: Geiser (Multitrait-multimethod-multioccasion modeling, 2009) recently presented the Correlated State-Correlated (Methods-Minus-1) [CS-C(M−1)] model for analysing longitudinal multitrait-multimethod (MTMM) data. In the present article, the authors discuss the extension of the CS-C(M−1) model to a model that includes latent difference variables, called CS-C(M−1) change model. The CS-C(M−1) change model allows investigators to study inter-individual differences in intra-individual change over time, to separate true change from random measurement error, and to analyse change simultaneously for different methods. Change in a reference method can be contrasted with change in other methods to analyse convergent validity of change.

Journal ArticleDOI
TL;DR: The behavior of various LH designs is examined according to the Gaussian assumption with exponential correlation, in order to minimize the total prediction error at the points of a regular lattice.
Abstract: In Computer Experiments (CE), a careful selection of the design points is essential for predicting the system response at untried points, based on the values observed at tried points. In physical experiments, the protocol is based on Design of Experiments, a methodology whose basic principles are questioned in CE. When the responses of a CE are modeled as jointly Gaussian random variables with their covariance depending on the distance between points, the use of the so called space-filling designs (random designs, stratified designs and Latin Hypercube designs) is a common choice, because it is expected that the nearer the untried point is to the design points, the better is the prediction. In this paper we focus on the class of Latin Hypercube (LH) designs. The behavior of various LH designs is examined according to the Gaussian assumption with exponential correlation, in order to minimize the total prediction error at the points of a regular lattice. In such a special case, the problem is reduced to an algebraic statistical model, which is solved using both symbolic algebraic software and statistical software. We provide closed-form computation of the variance of the Gaussian linear predictor as a function of the design, in order to make a comparison between LH designs. In principle, the method applies to any number of factors and any number of levels, and also to classes of designs other than LHs. In our current implementation, the applicability is limited by the high computational complexity of the algorithms involved.

Journal ArticleDOI
TL;DR: In this paper, the performance of LISREL-PI and LMS was compared to PLS-PI results previously reported in Chin et al. (2003) and Goodhue et al (2007) for identical conditions.
Abstract: Nonlinear structural equation modeling provides many advantages over analyses based on manifest variables only. Several approaches for the analysis of latent interaction effects have been developed within the last 15 years, including the partial least squares product indicator approach (PLS-PI), the constrained product indicator approach using the LISREL software (LISREL-PI), and the distribution-analytic latent moderated structural equations approach (LMS) using the Mplus program. An assumed advantage of PLS-PI is that it is able to deal with very large numbers of indicators, while LISREL-PI and LMS have not been investigated under such conditions. In a Monte Carlo study, the performance of LISREL-PI and LMS was compared to PLS-PI results previously reported in Chin et al. (2003) and Goodhue et al. (2007) for identical conditions. The latent interaction model included six indicator variables for the measurement of each latent predictor variable and the latent criterion, and sample size was N=100. The results showed that PLS-PI’s linear and interaction parameter estimates were downward biased, while parameter estimates were unbiased for LISREL-PI and LMS. True standard errors were smallest for PLS-PI, while the power to detect the latent interaction effect was higher for LISREL-PI and LMS. Compared to the symmetric distributions of interaction parameter estimates for LISREL-PI and LMS, PLS-PI showed a distribution that was symmetric for positive values, but included outlying negative estimates. Possible explanations for these findings are discussed.

Journal ArticleDOI
TL;DR: In this article, a three-step procedure is proposed to investigate measurement bias and response shift in longitudinal health-related quality-of-life (QoL) data of HIV/AIDS patients, collected at four semi-annual measurement occasions.
Abstract: We propose a three step procedure to investigate measurement bias and response shift, a special case of measurement bias in longitudinal data. Structural equation modelling is used in each of the three steps, which can be described as (1) establishing a measurement model using confirmatory factor analysis, (2) detecting measurement bias by testing the equivalence of model parameters across measurement occasions, (3) detecting measurement bias with respect to additional exogenous variables by testing their direct effects on the indicator variables. The resulting model can be used to investigate true change in the attributes of interest, by testing changes in common factor means. Solutions for the issue of constraint interaction and for chance capitalisation in model specification searches are discussed as part of the procedure. The procedure is illustrated by applying it to longitudinal health-related quality-of-life data of HIV/AIDS patients, collected at four semi-annual measurement occasions.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a measure that quantifies the heteroscedasticity of residuals in structural equation models, based on a comparison of the likelihood for the residuals under the assumption of hetero-coding with the likelihood under homo coding, which is designed to respond to omitted nonlinear terms in the structural part of the model.
Abstract: The model chi-square that is used in linear structural equation modeling compares the fitted covariance matrix of a target model to an unstructured covariance matrix to assess global fit. For models with nonlinear terms, i.e., interaction or quadratic terms, this comparison is very problematic because these models are not nested within the saturated model that is represented by the unstructured covariance matrix. We propose a novel measure that quantifies the heteroscedasticity of residuals in structural equation models. It is based on a comparison of the likelihood for the residuals under the assumption of heteroscedasticity with the likelihood under the assumption of homoscedasticity. The measure is designed to respond to omitted nonlinear terms in the structural part of the model that result in heteroscedastic residual scores. In a small Monte Carlo study, we demonstrate that the measure appears to detect omitted nonlinear terms reliably when falsely a linear model is analyzed and the omitted nonlinear terms account for substantial nonlinear effects. The results also indicate that the measure did not respond when the correct model or an overparameterized model were used.

Journal ArticleDOI
TL;DR: In this article, the authors estimate individual potential income with stochastic earnings frontiers to measure overqualification as the ratio between actual income and potential income, and remove a drawback of the IAB employment sample, the censoring of the income data, by multiple imputation.
Abstract: We estimate individual potential income with stochastic earnings frontiers to measure overqualification as the ratio between actual income and potential income. To do this, we remove a drawback of the IAB employment sample, the censoring of the income data, by multiple imputation. The measurement of overqualification by the income ratio is also a valuable addition to the overeducation literature because the well-established objective or subjective overeducation measures focus on some ordinal matching aspects and ignore the metric income and efficiency aspects of overqualification.

Journal ArticleDOI
TL;DR: It is concluded that measurement bias implies multidimensionality, whereas multidimsensionality shows up as measurement bias only if multidIMensionality is not properly accounted for in the measurement model.
Abstract: Restricted factor analysis can be used to investigate measurement bias. A prerequisite for the detection of measurement bias through factor analysis is the correct specification of the measurement model. We applied restricted factor analysis to two subtests of a Dutch cognitive ability test. These two examples serve to illustrate the relationship between multidimensionality and measurement bias. We conclude that measurement bias implies multidimensionality, whereas multidimensionality shows up as measurement bias only if multidimensionality is not properly accounted for in the measurement model.

Journal ArticleDOI
TL;DR: In this article, analytical and numerical analyses for some typical symmetric and skew population distributions often found in applications are investigated by analyzing the loss of information provided by the continuous population by grouping them.
Abstract: Continuous populations are grouped in many social, economic, medical, or technical fields of research. However, by grouping them, a lot of information provided by the continuous population is lost. Especially the median split, which is still adopted by many researchers, and its generalization to an equiprobable k-group split lead to a high efficiency loss. Here, this loss of information is investigated by analytical and numerical analyses for some typical symmetric and skew population distributions often found in applications. Various distribution parameters, numbers of groups, and split methods are taken from theoretical considerations and real data sets. Losses sometimes in excess of 50% can be reduced by optimal grouping.

Journal ArticleDOI
TL;DR: In this article, the results developed for cointegration analysis with state space models by Bauer and Wagner in a series of papers are presented and exemplified for empirical applications, and a canonical representation is developed and thereafter some available statistical results are briefly discussed.
Abstract: This paper presents and exemplifies results developed for cointegration analysis with state space models by Bauer and Wagner in a series of papers. Unit root processes, cointegration, and polynomial cointegration are defined. Based upon these definitions, the major part of the paper discusses how state space models, which are equivalent to VARMA models, can be fruitfully employed for cointegration analysis. By detailing the cases most relevant for empirical applications, the I(1), multiple frequency I(1), and I(2) cases, a canonical representation is developed and thereafter some available statistical results are briefly discussed.

Journal ArticleDOI
TL;DR: The important problem of the ratio of Weibull random variables is considered in this article, where the authors derived exact expressions for the probability density function, cumulative distribution function, hazard rate function, shape characteristics, moments, factorial moments, skewness, kurtosis and percentiles of ratio.
Abstract: The important problem of the ratio of Weibull random variables is considered. Two motivating examples from engineering are discussed. Exact expressions are derived for the probability density function, cumulative distribution function, hazard rate function, shape characteristics, moments, factorial moments, skewness, kurtosis and percentiles of the ratio. Estimation procedures by the methods of moments and maximum likelihood are provided. The performances of the estimates from these methods are compared by simulation. Finally, an application is discussed for aspect and performance ratios of systems.

Journal ArticleDOI
TL;DR: Measurements from experiments and results of a finite element analysis (FEA) are combined in order to compute accurate empirical models for the temperature distribution before a thermomechanically coupled forming process.
Abstract: In this paper, measurements from experiments and results of a finite element analysis (FEA) are combined in order to compute accurate empirical models for the temperature distribution before a thermomechanically coupled forming process To accomplish this, Design and Analysis of Computer Experiments (DACE) is used to separately compute models for the measurements and the functional output of the FEA Based on a hierarchical approach, a combined model of the process is computed In this combined modelling approach, the model for the FEA is corrected by taking into account the systematic deviations from the experimental measurements The large number of observations based on the functional output hinders the direct computation of the DACE models due to the internal inversion of the correlation matrix Thus, different techniques for identifying a relevant subset of the observations are proposed The application of the resulting procedure is presented, and a statistical validation of the empirical models is performed

Journal ArticleDOI
TL;DR: In this article, a continuous-time autoregressive latent trajectory (CALT) model was proposed to solve the problems related to the linear components in the ALT and CALT models.
Abstract: The paper first discusses the autoregressive latent trajectory (ALT) model and presents in detail its improved version, the continuous-time autoregressive latent trajectory (CALT) model. Next, serious problems related to the linear components in the ALT and CALT models are dealt with. As an alternative for the linear component, the first-order derivative in a second-order stochastic differential equation model is proposed. This is applied to Marital Satisfaction data, collected in four consecutive years (2002–2005). It is pointed out that the first-order derivative as explanatory variable has none of the problems associated with the linear component.

Journal ArticleDOI
TL;DR: In this paper, a modified signed likelihood ratio statistic that follows a standard normal distribution with a high degree of accuracy was derived, where the error terms are allowed to follow a multivariate distribution in the class of the elliptical distributions.
Abstract: In this paper we deal with the issue of performing accurate testing inference on a scalar parameter of interest in structural errors-in-variables models. The error terms are allowed to follow a multivariate distribution in the class of the elliptical distributions, which has the multivariate normal distribution as special case. We derive a modified signed likelihood ratio statistic that follows a standard normal distribution with a high degree of accuracy. Our Monte Carlo results show that the modified test is much less size distorted than its unmodified counterpart. An application is presented.

Journal ArticleDOI
TL;DR: A three-step methodological approach for predicting simulated IRS dispersion of imperfectly known aircraft is proposed, which gives satisfactory estimation of the infrared signature dispersion.
Abstract: Existing computer simulations of aircraft InfraRed Signature (IRS) do not account for the dispersion induced by uncertainty on input data such as aircraft aspect angles and meteorological conditions As a result, they are of little use to estimate the detection performance of optronic systems: in that case, the scenario encompasses a lot of possible situations that must indeed be addressed, but cannot be singly simulated In this paper, a three-step methodological approach for predicting simulated IRS dispersion of imperfectly known aircraft is proposed The first step is a sensitivity analysis The second step consists in a Quasi-Monte Carlo survey of the code output dispersion In the last step, a neural network metamodel of the IRS simulation code is constructed It will allow carrying out thorough computationally demanding tasks, such as those required for optimization of an optronic sensor This method is illustrated in a typical scenario, namely an air-to-ground full-frontal attack by a generic combat aircraft, and gives satisfactory estimation of the infrared signature dispersion

Journal ArticleDOI
TL;DR: This editorial introduces the Special Issue “Advances in Structural Equation Modeling” which provides a snapshot of the different research activities performed by members of the working group “Structural Equations Modeling" at the 2009 annual meeting in Berlin at Humboldt University.
Abstract: This editorial introduces the Special Issue “Advances in Structural Equation Modeling” which provides a snapshot of the different research activities performed by members of the working group “Structural Equation Modeling”. More specifically, this issue contains a selection of papers presented at the 2009 annual meeting in Berlin at Humboldt University.

Journal ArticleDOI
TL;DR: In an earlier contribution to this journal, Kauermann and Weihs as discussed by the authors addressed the lack of procedural understanding in statistical consulting: “Even though there seems to be a consensus that statistical consulting should be well structured and target-orientated, the range of activity and the process itself seem to be less well-understood.
Abstract: In an earlier contribution to this journal, Kauermann and Weihs (Adv. Stat. Anal. 91(4):344 2007) addressed the lack of procedural understanding in statistical consulting: “Even though there seems to be a consensus that statistical consulting should be well structured and target-orientated, the range of activity and the process itself seem to be less well-understood.” While this issue appears to be rather new to statistical consultants, other consulting disciplines—in particular management consultants—have long come up with a viable approach that divides the typical consulting process into seven successive steps. Using this model as a frame allows for reflecting the approaches on statistical consulting suggested by authors published in AStA volume 91, number 4, and for adding value to statistical consulting in general.