scispace - formally typeset
Search or ask a question

Showing papers on "Linear model published in 1999"


Journal ArticleDOI
TL;DR: It is the view that distance-based RDA will be extremely useful to ecologists measuring multispecies responses to structured multifactorial experimental designs.
Abstract: We present a new multivariate technique for testing the significance of individual terms in a multifactorial analysis-of-variance model for multispecies response variables. The technique will allow researchers to base analyses on measures of association (distance measures) that are ecologically relevant. In addition, unlike other distance-based hypothesis-testing techniques, this method allows tests of significance of interaction terms in a linear model. The technique uses the existing method of redundancy analysis (RDA) but allows the analysis to be based on Bray-Curtis or other ecologically meaningful measures through the use of principal coordinate analysis (PCoA). Steps in the procedure include: (1) calculating a matrix of distances among replicates using a distance measure of choice (e.g., Bray-Curtis); (2) determining the principal coordinates (including a correction for negative eigenvalues, if necessary), which preserve these distances; (3) creating a matrix of dummy variables corresponding to the design of the experiment (i.e., individual terms in a linear model); (4) analyzing the relationship between the principal coordinates (species data) and the dummy variables (model) using RDA; and (5) implementing a test by permutation for particular statistics corresponding to the particular terms in the model. This method has certain advantages not shared by other multivariate testing procedures. We demonstrate the use of this technique with experimental ecological data from intertidal assemblages and show how the presence of significant multivariate interactions can be interpreted. It is our view that distance-based RDA will be extremely useful to ecologists measuring multispecies responses to structured multifactorial experimental designs.

2,193 citations


Journal ArticleDOI
TL;DR: An approach based on transformation and fractional polynomials which yields simple regression models with interpretable curves is proposed and shows that non-linear risk models fit the data better than linear models.
Abstract: Background The traditional method of analysing continuous or ordinal risk factors by categorization or linear models may be improved. Methods We propose an approach based on transformation and fractional polynomials which yields simple regression models with interpretable curves. We suggest a way of presenting the results from such models which involves tabulating the risks estimated from the model at convenient values of the risk factor. We discuss how to incorporate several continuous risk and confounding variables within a single model. The approach is exemplified with data from the Whitehall I study of British Civil Servants. We discuss the approach in relation to categorization and non-parametric regression models. Results We show that non-linear risk models fit the data better than linear models. We discuss the difficulties introduced by categorization and the advantages of the new approach. Conclusions Our approach based on fractional polynomials should be considered as an important alternative to the traditional approaches for the analysis of continuous variables in epidemiological studies.

956 citations


Journal ArticleDOI
TL;DR: In this article, a relatively simple analog method is described and applied for downscaling purposes, where the large scale circulation simulated by a GCM is associated with the local variables observed simultaneously with the most similar large-scale circulation pattern in a pool of historical observations.
Abstract: The derivation of local scale information from integrations of coarse-resolution general circulation models (GCM) with the help of statistical models fitted to present observations is generally referred to as statistical downscaling. In this paper a relatively simple analog method is described and applied for downscaling purposes. According to this method the large-scale circulation simulated by a GCM is associated with the local variables observed simultaneously with the most similar large-scale circulation pattern in a pool of historical observations. The similarity of the large-scale circulation patterns is defined in terms of their coordinates in the space spanned by the leading observed empirical orthogonal functions. The method can be checked by replicating the evolution of the local variables in an independent period. Its performance for monthly and daily winter rainfall in the Iberian Peninsula is compared to more complicated techniques, each belonging to one of the broad families of existing statistical downscaling techniques: a method based on canonical correlation analysis, as representative of linear methods; a method based on classification and regression trees, as representative of a weather generator based on classification methods; and a neural network, as an example of deterministic nonlinear methods. It is found in these applications that the analog method performs in general as well as the more complicated methods, and it can be applied to both normally and nonnormally distributed local variables. Furthermore, it produces the right level of variability of the local variable and preserves the spatial covariance between local variables. On the other hand linear multivariate methods offer a clearer physical interpretation that supports more strongly its validity in an altered climate. Classification and neural networks are generally more complicated methods and do not directly offer a physical interpretation.

759 citations


Book
01 Jul 1999
TL;DR: The JMP Help System as mentioned in this paper is a help system for JMP data tables that allows the user to build a JMP table with both feet, and it can be used for statistical sleuthing.
Abstract: Part I: JMPing IN with both feet: 1. Jump Right In. First Session. Modelling Type. Analyze and Graph. Getting Help: The JMP Help System. 2. JMP Data Tables. The Ins and Outs of a JMP Data Table. Moving Data and Results Out of JMP. Juggling Data Tables. The Group/Summary Command. 3. Calculator Adventures. The Calculator Window. A Quick Example. Calculator Pieces and Parts. Terms Functions. Conditional Expressions and Comparison Operators. Summarize Down a Column or Summarize Across Rows. Random Number Functions. Parameters. Tips on Building Formulas. Caution and Error Messages. Part II Statistical sleuthing: 4. What are Statistics? Ponderings. Preparations. Statistical Terms. 5. Univariate Distribution: One Variable, One Sample. Looking at Distributions. Review: Probability Distributions. Describing Distributions of Values. Statistical Inference on the Man. Special Topic: Testing for Normality. Special Topic: Simulating the Central Limit Theorem. 6. Differences Between Two Means. Two Independent Groups. Testing Means for Matched Pairs. Review. A Nonparametric Approach. 7. Comparing Many Means: One-Way Analysis of Variance. What is a One-Way Layout? Comparing and Testing Means. Special Topic: Adjusting for Multiple Comparisons. Special Topic: Power. Special Topic: Unequal Variances. Special Topic: Nonparametric Methods. 8. Fitting Curves Through Points: Regression. Regression. Why Graphics are Important. Why It's Called Regression. Curiosities. 9. Categorical Distributions. Categorical Situations. Categorical Responses and Count Data: Two Outlooks. A Simulated Categorical Response. The Chi-Square Pearson Chi-Square Test Statistic. The G-Square Likelihood Ratio Chi-Square Test Statistic. Univariate Categorical Chi-Square Tests. 10. Categorical Models. Fitting Categorical Responses to Categorical Factors. Correspondence Analysis: Looking at Data with Many Levels. Continuous Factors for Categorical Responses: Logistic Regression. Special Topics. Surprise: Simpson's Paradox: Aggregate Data versus Grouped Data. 11. Multiple Regression. Parts of a Regression Model. A Multiple Regression Example. Special Topic: Collinearity. Special Topic: The Case of the Hidden Leverage Point. Special Topic: Mining Data with Stepwise Regression. 12. Fitting Linear Models. The General Linear Model. Two-Way Analysis of Variance and Interactions. Optional Topic: Random Effects and Nested Effects. 13. Bivariate and Multivariate Relationships. Bivariate Distributions. Correlations and the Bivariate Normal. Three and More Dimensions. 14. Design of Experiments. Introduction. Generating and Experimental Design in JMP. Two-Level Screening Designs. Screening for Main Effects: The Flour Paste Experiment. Screening for Interactions. Response Surface Designs. 15. Statistical Quality Control. Control Charts and Shewhart Charts. The Control Chart Dialog. Pareto Charts. 16. Time Series Analysis. Introduction. Graphing and Fitting by Time. Lagging and Autocorrelation.

675 citations


Journal ArticleDOI
TL;DR: It is shown that such a one-step method can not be optimal when di erent coe cient functions admit di Erent degrees of smoothness, and this drawback can be repaired by using the proposed two-step estimation procedure.
Abstract: Varying coefficient models are a useful extension of classical linear models. They arise naturally when one wishes to examine how regression coefficients change over different groups characterized by certain covariates such as age. The appeal of these models is that the coef .cient functions can easily be estimated via a simple local regression.This yields a simple one-step estimation procedure. We show that such a one-step method cannot be optimal when different coefficient functions admit different degrees of smoothness. This drawback can be repaired by using our proposed two-step estimation procedure.The asymptotic mean-squared error for the two-step procedure is obtained and is shown to achieve the optimal rate of convergence. A few simulation studies show that the gain by the two-step procedure can be quite substantial.The methodology is illustrated by an application to an environmental data set.

643 citations


Book
09 Feb 1999
TL;DR: A unified account of the most popular approaches to nonparametric regression smoothing can be found in this article, including boundary corrections for trigonometric series estimators, detailed asymptotics for polynomial regression, testing goodness-of-fit, estimation in partially linear models, practical aspects, problems and methods for co
Abstract: Provides a unified account of the most popular approaches to nonparametric regression smoothing. This edition contains discussions of boundary corrections for trigonometric series estimators; detailed asymptotics for polynomial regression; testing goodness-of-fit; estimation in partially linear models; practical aspects, problems and methods for co

625 citations


Journal ArticleDOI
TL;DR: In this article, an estimator of regression by means of a functional principal component analysis analogous to the one introduced by Bosq in the case of Hilbertian AR processes was proposed and both convergence in probability and almost sure convergence of this estimator are stated.

539 citations


Journal ArticleDOI
01 Jan 1999-Ecology
TL;DR: The authors proposed regression quantiles, which extend the concept of one-sample quantiles to the linear model by solving an optimization problem and provide estimates for linear models fit to any part of a response distribution, including near the upper bounds.
Abstract: In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem ...

510 citations


Journal ArticleDOI
TL;DR: A new approach to modeling the second part of two-part models utilizing extensions of the generalized linear model and the primary method of estimation for this model is maximum likelihood and the generalizations quasi-likelihood and extended quasi- likelihood are discussed.

444 citations


Journal ArticleDOI
TL;DR: Minimum norm algorithms for EEG source reconstruction are studied in view of their spatial resolution, regularization, and lead-field normalization properties, and their computational efforts.
Abstract: Minimum norm algorithms for EEG source reconstruction are studied in view of their spatial resolution, regularization, and lead-field normalization properties, and their computational efforts. Two classes of minimum norm solutions are examined: linear least squares methods and nonlinear L1-norm approaches. Two special cases of linear algorithms, the well known Minimum Norm Least Squares and an implementation with Laplacian smoothness constraints, are compared to two nonlinear algorithms comprising sparse and standard L1-norm methods. In a signal-to-noise-ratio framework, two of the methods allow automatic determination of the optimum regularization parameter. Compensation methods for the different depth dependencies of all approaches by lead-field normalization are discussed. Simulations with tangentially and radially oriented test dipoles at two different noise levels are performed to reveal and compare the properties of all approaches. Finally, cortically constrained versions of the algorithms are applied to two epileptic spike data sets and compared to results of single equivalent dipole fits and spatiotemporal source models.

423 citations


Journal ArticleDOI
TL;DR: The authors compared empirical type I error and power of different permutation techniques for the test of significance of a single partial regression coefficient in a multiple regression model, using simulations, and found that two methods that had been identified as equivalent formulations of permutation under the reduced model were actually quite different.
Abstract: This study compared empirical type I error and power of different permutation techniques for the test of significance of a single partial regression coefficient in a multiple regression model, using simulations. The methods compared were permutation of raw data values, two alternative methods proposed for permutation of residuals under the reduced model, and permutation of residuals under the full model. The normal-theory t-test was also included in simulations. We investigated effects of (1) the sample size, (2) the degree of collinearity between the predictor variables, (3) the size of the covariable’s parameter, (4) the distribution of the added random error and (5) the presence of an outlier in the covariable on these methods. We found that two methods that had been identified as equivalent formulations of permutation under the reduced model were actually quite different. One of these methods resulted in consistently inflated type 1 error. In addition, when the covariable contained an extreme outlier,...

Journal ArticleDOI
TL;DR: In this article, a set of analytically derived significance tests allowing a null hypothesis of no spatial parameter drift to be investigated is introduced, and the degree of parameter smoothing used in GWR is determined based on the Mallows Cp statistic.
Abstract: The technique of geographically weighted regression (GWR) is used to model spatial 'drift' in linear model coefficients. In this paper we extend the ideas of GWR in a number of ways. First, we introduce a set of analytically derived significance tests allowing a null hypothesis of no spatial parameter drift to be investigated. Second, we discuss 'mixed' GWR models where some parameters are fixed globally but others vary geographically. Again, models of this type may be assessed using significance tests. Finally, we consider a means of deciding the degree of parameter smoothing used in GWR based on the Mallows Cp statistic. To complete the paper, we analyze an example data set based on house prices in Kent in the U.K. using the techniques introduced.


Journal ArticleDOI
TL;DR: A generalized linear systems framework for PCA based on the singular value decomposition (SVD) model for representation of spatio-temporal fMRI data sets is presented and illustrated in the setting of dynamic time-series response data from fMRI experiments involving pharmacological stimulation of the dopaminergic nigro-striatal system in primates.

Journal ArticleDOI
TL;DR: This tutorial provides an introduction to the hierarchical linear models technique in general terms, and then specifies model notation and assumptions in detail, and elaborate on model interpretation and provide guidelines for model checking.
Abstract: Hierarchical linear models are useful for understanding relationships in hierarchical data structures, such as patients within hospitals or physicians within hospitals. In this tutorial we provide an introduction to the technique in general terms, and then specify model notation and assumptions in detail. We describe estimation techniques and hypothesis testing procedures for the three types of parameters involved in hierarchical linear models: fixed effects, covariance components, and random effects. We illustrate the application using an example from the Type II Diabetes Patient Outcomes Research Team (PORT) study and use two popular PC-based statistical computing packages, HLM/2L and SAS Proc Mixed, to perform two-level hierarchical analysis. We compare output from the two packages applied to our example data as well as to simulated data. We elaborate on model interpretation and provide guidelines for model checking.

Journal ArticleDOI
TL;DR: The new approach involves estimating the smoothness of standardized residual fields which approximates the smootherness of the component fields of the associated t-field and eschews bias due to deviation from the null hypothesis.

Journal ArticleDOI
TL;DR: In this paper, the authors compared the performance of nine methods of estimating parameters of the power-form model that expresses flood quantile as a function of basin area and assessed the performance based on its quantile prediction ability from an ungaged site in the region.

Journal ArticleDOI
TL;DR: In this article, the authors describe a new empirical watershed model, the prime feature of which is its parsimony, which involves only three free parameters, a characteristic unparalleled by continuous process models able to work on a wide array of catchments.
Abstract: This paper describes a new empirical watershed model, the prime feature of which is its parsimony. It involves only three free parameters, a characteristic unparalleled by continuous process models able to work on a wide array of catchments. In spite of its crude simplicity, it achieved, on average, worthwhile results on a set of 140 French catchments and overwhelmingly outperformed a linear model involving 16 parameters. It performed roughly as well as a conceptual model with five free parameters, derived from the well-known TOPMODEL.

Book ChapterDOI
01 Jan 1999
TL;DR: Linear models form the core of classical statistics and are still the basis of much of statistical practice; many modern modelling and analytical techniques build on the methodology developed for linear models.
Abstract: Linear models form the core of classical statistics and are still the basis of much of statistical practice; many modern modelling and analytical techniques build on the methodology developed for linear models.

Journal ArticleDOI
TL;DR: In this article, the authors investigated four models for minimizing the tracking error between the returns of a portfolio and a benchmark, and showed that linear tracking error optimization is equivalent to expected utility maximization and lower partial moment minimization.
Abstract: This article investigates four models for minimizing the tracking error between the returns of a portfolio and a benchmark. Due to linear performance fees of fund managers, we can argue that linear deviations give a more accurate description of the investors’ risk attitude than squared deviations. All models have in common that absolute deviations are minimized instead of squared deviations as is the case for traditional optimization models. Linear programs are formulated to derive explicit solutions. The models are applied to a portfolio containing six national stock market indexes (USA, Japan, UK, Germany, France, Switzerland) and the tracking error with respect to the MSCI (Morgan Stanley Capital International Index) world stock market index is minimized. The results are compared to those of a quadratic tracking error optimization technique. The portfolio weights of the optimized portfolio and its risk/return properties are different across the models which implies that optimization models should be targeted to the specific investment objective. Finally, it is shown that linear tracking error optimization is equivalent to expected utility maximization and lower partial moment minimization.

Journal ArticleDOI
TL;DR: Empirical comparisons between model selection using VC-bounds and classical methods are performed for various noise levels, sample size, target functions and types of approximating functions, demonstrating the advantages of VC-based complexity control with finite samples.
Abstract: It is well known that for a given sample size there exists a model of optimal complexity corresponding to the smallest prediction (generalization) error. Hence, any method for learning from finite samples needs to have some provisions for complexity control. Existing implementations of complexity control include penalization (or regularization), weight decay (in neural networks), and various greedy procedures (aka constructive, growing, or pruning methods). There are numerous proposals for determining optimal model complexity (aka model selection) based on various (asymptotic) analytic estimates of the prediction risk and on resampling approaches. Nonasymptotic bounds on the prediction risk based on Vapnik-Chervonenkis (VC)-theory have been proposed by Vapnik. This paper describes application of VC-bounds to regression problems with the usual squared loss. An empirical study is performed for settings where the VC-bounds can be rigorously applied, i.e., linear models and penalized linear models where the VC-dimension can be accurately estimated, and the empirical risk can be reliably minimized. Empirical comparisons between model selection using VC-bounds and classical methods are performed for various noise levels, sample size, target functions and types of approximating functions. Our results demonstrate the advantages of VC-based complexity control with finite samples.

Journal ArticleDOI
TL;DR: Issues in estimating population size N with capture-recapture models when there is variable catchability among subjects are examined and a logistic-normal mixed model is examined, for which the logit of the probability of capture is an additive function of a random subject and a fixed sampling occasion parameter.
Abstract: We examine issues in estimating population size N with capture-recapture models when there is variable catchability among subjects. We focus on a logistic-normal mixed model, for which the logit of the probability of capture is an additive function of a random subject and a fixed sampling occasion parameter. When the probability of capture is small or the degree of heterogeneity is large, the log-likelihood surface is relatively flat and it is difficult to obtain much information about N. We also discuss a latent class model and a log-linear model that account for heterogeneity and show that the log-linear model has greater scope. Models assuming homogeneity provide much narrower intervals for N but are usually highly overly optimistic, the actual coverage probability being much lower than the nominal level.

Journal ArticleDOI
TL;DR: The authors derived general expressions for the magnitude of the bias due to errors in the response and showed that, unless both the sensitivity and specificity are very high, ignoring errors in responses will yield highly biased covariate effect estimators.
Abstract: SUMMARY Methods that ignore errors in binary responses yield biased estimators of the associations of covariates with response. This paper derives general expressions for the magnitude of the bias due to errors in the response and shows that, unless both the sensitivity and specificity are very high, ignoring errors in the responses will yield highly biased covariate effect estimators. When the true, error-free response follows a generalised linear model and misclassification probabilities are known and independent of covariate values, responses observed with error also follow such a model with a modified link function. We describe a simple method to obtain consistent estimators of covariate effects and associated errors in this case, and derive an expression for the asymptotic relative efficiency of covariate effect estimators from the correct likelihood for the responses with errors with respect to estimates based on the true, error-free responses. This expression shows that errors in the response can lead to substantial losses of information about covariate effects. Data from a study on infection with human papilloma virus among women and simulation studies motivate this work and illustrate the findings.

Journal ArticleDOI
TL;DR: In this paper, a review of anisotropic models of turbulent flows is presented, where the authors show that even with significant nonlinearity, many features of turbulence can, at least qualitatively, be understood using linear theory alone, e.g. the directionality of velocity fluctuations and correlation lengths induced by strong mean shear near a wall or straining by duct flow.
Abstract: ▪ Abstract Because of mean distortion, most turbulent flows are anisotropic. Two-point descriptions, forming the heart of this review of anisotropic models, capture the continuum of anisotropically structured turbulent scales and, moreover, allow exact treatment of the linear terms representing mean distortion, only needing closure assumptions for the nonlinear part of the model. The rapid-distortion limit, in which nonlinear terms are neglected, is the main subject of Section 2, while Section 3 introduces nonlinearity. It is shown that, even with significant nonlinearity, many features of turbulence can, at least qualitatively, be understood using linear theory alone, e.g. the directionality of velocity fluctuations and correlation lengths induced by strong mean shear near a wall or straining by duct flow, whereas some, e.g. wave resonances in rotating turbulence, involve a subtle combination of linear and nonlinear terms. The importance of linear effects is reflected in the triadic models of Section 3, ...

Journal ArticleDOI
TL;DR: The results of a fuzzy linear regression model that uses symmetric triangular coefficient to one with non-symmetric fuzzy triangular coefficients are extended, eradicating the inflexibility of existing fuzzylinear regression models.

Book
01 Jan 1999
TL;DR: The Elements of Inference discusses statistical models, hypothesis testing, and confidence intervals for Bayesian and Bayesian estimation of linear models and other analytical approximations.
Abstract: Introduction Information The concept of probability Assessing subjective probabilities An example Linear algebra and probability Notation Outline of the book Elements of Inference Common statistical models Likelihood-based functions Bayes theorem Exchangeability Sufficiency and exponential family Parameter elimination Prior Distribution Entirely subjective specification Specification through functional forms Conjugacy with the exponential family Non-informative priors Hierarchical priors Estimation Introduction to decision theory Bayesian point estimation Classical point estimation Empirical Bayes estimation Comparison of estimators Interval estimation Estimation in the Normal model Approximating Methods The general problem of inference Optimization techniques Asymptotic theory Other analytical approximations Numerical integration methods Simulation methods Hypothesis Testing Introduction Classical hypothesis testing Bayesian hypothesis testing Hypothesis testing and confidence intervals Asymptotic tests Prediction Bayesian prediction Classical prediction Prediction in the Normal model Linear prediction Introduction to Linear Models The linear model Classical estimation of linear models Bayesian estimation of linear models Hierarchical linear models Dynamic linear models Linear models with constraints Sketched Solutions to Selected Exercises List of Distributions References Index Exercises appear at the end of each chapter.

Journal ArticleDOI
TL;DR: An alternative method based on the use of artificial neural networks (ANN) is described in order to work out a model that relates to the analysis of vehicular accidents in Milan and the degree of danger of urban intersections using different scenarios is quantified by the ANN model.

Journal ArticleDOI
TL;DR: A new computing technique feasible in Jacobi and conjugate gradient based iterative methods using iteration on data is presented and good performance was due to fast computing time per iteration and quick convergence to the final solutions.

Journal ArticleDOI
TL;DR: The method of weights is an implementation of the EM algorithm for general maximum-likelihood analysis of regression models, including generalized linear models (GLMs) with incomplete covariates.
Abstract: Missing data is a common occurrence in most medical research data collection enterprises. There is an extensive literature concerning missing data, much of which has focused on missing outcomes. Covariates in regression models are often missing, particularly if information is being collected from multiple sources. The method of weights is an implementation of the EM algorithm for general maximum-likelihood analysis of regression models, including generalized linear models (GLMs) with incomplete covariates. In this paper, we will describe the method of weights in detail, illustrate its application with several examples, discuss its advantages and limitations, and review extensions and applications of the method.

Journal ArticleDOI
TL;DR: In this article, the authors show that even though the Liang-Zeger approach in many situations yields consistent estimators for the regression parameters, these estimators are usually inefficient as compared to the regression estimators obtained by using the independence estimating equations approach.
Abstract: SUMMARY Liang & Zeger (1986) introduced a generalised estimating equations approach based on a 'working' correlation matrix to obtain consistent and efficient estimators of regression parameters in the class of generalised linear models for repeated measures data. As demonstrated by Crowder (1995), because of the uncertainty of definition of the working correlation matrix, the Liang-Zeger approach may in some cases lead to a complete breakdown of the estimation of the regression parameters. In this paper we show that, even though the Liang-Zeger approach in many situations yields consistent estimators for the regression parameters, these estimators are usually inefficient as compared to the regression estimators obtained by using the independence estimating equations approach.