scispace - formally typeset
Search or ask a question

Showing papers on "Linear model published in 2005"


Book ChapterDOI
01 Jan 2005
TL;DR: This chapter starts with the simplest replicated designs and progresses through experiments with two or more groups, direct designs, factorial designs and time course experiments with technical as well as biological replication.
Abstract: A survey is given of differential expression analyses using the linear modeling features of the limma package. The chapter starts with the simplest replicated designs and progresses through experiments with two or more groups, direct designs, factorial designs and time course experiments. Experiments with technical as well as biological replication are considered. Empirical Bayes test statistics are explained. The use of quality weights, adaptive background correction and control spots in conjunction with linear modelling is illustrated on the β7 data.

5,920 citations


Journal ArticleDOI
TL;DR: A method is proposed for extracting more information from within-array replicate spots in microarray experiments by estimating the strength of the correlation between them that greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes.
Abstract: Motivation: Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. The usual practice is to average the duplicate or triplicate results for each probe before assessing differential expression. This results in the loss of valuable information about genewise variability. Results: A method is proposed for extracting more information from within-array replicate spots in microarray experiments by estimating the strength of the correlation between them. The method involves fitting separate linear models to the expression data for each gene but with a common value for the between-replicate correlation. The method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes. The method may be combined with empirical Bayes methods for moderating the genewise variances between genes. The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spiked-in RNA. Comparing results for calibration and ratio control spots shows that the common correlation method results in substantially better discrimination of differentially expressed genes from those which are not. The spike-in experiment also confirms that the results may be further improved by empirical Bayes smoothing of the variances when the sample size is small. Availability: The methodology is implemented in the limma software package for R, available from the CRAN repository http://www.r-project.org Contact: [email protected]

1,384 citations


Book
20 Dec 2005
TL;DR: The second edition of the first edition as mentioned in this paper provides a comprehensive overview of the general linear model with R, including generalized linear models (GLMs), mixed effect models, and nonparametric regression models.
Abstract: Start Analyzing a Wide Range of Problems Since the publication of the bestselling, highly recommended first edition, R has considerably expanded both in popularity and in the number of packages available. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, Second Edition takes advantage of the greater functionality now available in R and substantially revises and adds several topics. New to the Second Edition Expanded coverage of binary and binomial responses, including proportion responses, quasibinomial and beta regression, and applied considerations regarding these models New sections on Poisson models with dispersion, zero inflated count models, linear discriminant analysis, and sandwich and robust estimation for generalized linear models (GLMs) Revised chapters on random effects and repeated measures that reflect changes in the lme4 package and show how to perform hypothesis testing for the models using other methods New chapter on the Bayesian analysis of mixed effect models that illustrates the use of STAN and presents the approximation method of INLA Revised chapter on generalized linear mixed models to reflect the much richer choice of fitting software now available Updated coverage of splines and confidence bands in the chapter on nonparametric regression New material on random forests for regression and classification Revamped R code throughout, particularly the many plots using the ggplot2 package Revised and expanded exercises with solutions now included Demonstrates the Interplay of Theory and Practice This textbook continues to cover a range of techniques that grow from the linear regression model. It presents three extensions to the linear framework: GLMs, mixed effect models, and nonparametric regression models. The book explains data analysis using real examples and includes all the R commands necessary to reproduce the analyses.

1,289 citations


Journal ArticleDOI
TL;DR: This paper uses a stagewise fitting process to construct the logistic regression models that can select relevant attributes in the data in a natural way, and shows how this approach can be used to build the logistics regression models at the leaves by incrementally refining those constructed at higher levels in the tree.
Abstract: Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there has been work on combining these two schemes into `model trees', i.e. trees that contain linear regression functions at the leaves. In this paper, we present an algorithm that adapts this idea for classification problems, using logistic regression instead of linear regression. We use a stagewise fitting process to construct the logistic regression models that can select relevant attributes in the data in a natural way, and show how this approach can be used to build the logistic regression models at the leaves by incrementally refining those constructed at higher levels in the tree. We compare the performance of our algorithm to several other state-of-the-art learning schemes on 36 benchmark UCI datasets, and show that it produces accurate and compact classifiers.

1,200 citations


Journal ArticleDOI
TL;DR: This algorithm provides an objective and noise-resistant method for quantification of qRT-PCR results that is independent of the specific equipment used to perform PCR reactions.
Abstract: Quantitative real-time polymerase chain reactions (qRT-PCR) have become the method of choice for rapid, sensitive, quantitative comparison of RNA transcript abundance. Useful data from this method depend on fitting data to theoretical curves that allow computation of mRNA levels. Calculating accurate mRNA levels requires important parameters such as reaction efficiency and the fractional cycle number at threshold (CT) to be used; however, many algorithms currently in use estimate these important parameters. Here we describe an objective method for quantifying qRT-PCR results using calculations based on the kinetics of individual PCR reactions without the need of the standard curve, independent of any assumptions or subjective judgments which allow direct calculation of efficiency and CT. We use a four-parameter logistic model to fit the raw fluorescence data as a function of PCR cycles to identify the exponential phase of the reaction. Next, we use a three-parameter simple exponent model to fit the exponential phase using an iterative nonlinear regression algorithm. Within the exponential portion of the curve, our technique automatically identifies candidate regression values using the P-value of regression and then uses a weighted average to compute a final efficiency for quantification. For CT determination, we chose the first positive second derivative maximum from the logistic model. This algorithm provides an objective and noise-resistant method for quantification of qRT-PCR results that is independent of the specific equipment used to perform PCR reactions.

1,186 citations


Book
01 Jan 2005
TL;DR: A review of basic statistics with SPSS can be found in this paper, where the authors present several measures of reliability, such as repeated measures and mixed ANOVAs, as well as a survey of the literature.
Abstract: Introduction and Review of Basic Statistics With SPSS. Data Coding and Exploratory Analysis (EDA). Several Measures of Reliability. Exploratory Factor Analysis and Principal Components Analysis. Selecting and Interpreting Inferential Statistics. Multiple Regression. Logistic Regression and Discriminant Analysis. Factorial ANOVA and ANCOVA. Repeated Measures and Mixed ANOVAs. Multivariate Analysis of Variance (MANOVA) and Canonical Correlation. Multilevel Linear Modeling/Hierarchical Linear Modeling. Appendices.

986 citations


Journal ArticleDOI
TL;DR: This investigation proposes a hybrid methodology that exploits the unique strength of the ARIMA model and the SVMs model in forecasting stock prices problems and results of computational tests are very promising.
Abstract: Traditionally, the autoregressive integrated moving average (ARIMA) model has been one of the most widely used linear models in time series forecasting However, the ARIMA model cannot easily capture the nonlinear patterns Support vector machines (SVMs), a novel neural network technique, have been successfully applied in solving nonlinear regression estimation problems Therefore, this investigation proposes a hybrid methodology that exploits the unique strength of the ARIMA model and the SVMs model in forecasting stock prices problems Real data sets of stock prices were used to examine the forecasting accuracy of the proposed model The results of computational tests are very promising

847 citations


Book
John Geweke1
14 Sep 2005
TL;DR: In this article, the authors proposed a Bayesian inference method based on the prior distribution of the probability distributions of the classes of a set of classes in a class with respect to the probability distribution of each class.
Abstract: Preface. 1. Introduction. 1.1 Two Examples. 1.1.1 Public School Class Sizes. 1.1.2 Value at Risk. 1.2 Observables, Unobservables, and Objects of Interest. 1.3 Conditioning and Updating. 1.4 Simulators. 1.5 Modeling. 1.6 Decisionmaking. 2. Elements of Bayesian Inference. 2.1 Basics. 2.2 Sufficiency, Ancillarity, and Nuisance Parameters. 2.2.1 Sufficiency. 2.2.2 Ancillarity. 2.2.3 Nuisance Parameters. 2.3 Conjugate Prior Distributions. 2.4 Bayesian Decision Theory and Point Estimation. 2.5 Credible Sets. 2.6 Model Comparison. 2.6.1 Marginal Likelihoods. 2.6.2 Predictive Densities. 3. Topics in Bayesian Inference. 3.1 Hierarchical Priors and Latent Variables. 3.2 Improper Prior Distributions. 3.3 Prior Robustness and the Density Ratio Class. 3.4 Asymptotic Analysis. 3.5 The Likelihood Principle. 4. Posterior Simulation. 4.1 Direct Sampling,. 4.2 Acceptance and Importance Sampling. 4.2.1 Acceptance Sampling. 4.2.2 Importance Sampling. 4.3 Markov Chain Monte Carlo. 4.3.1 The Gibbs Sampler. 4.3.2 The Metropolis-Hastings Algorithm. 4.4 Variance Reduction. 4.4.1 Concentrated Expectations. 4.4.2 Antithetic Sampling. 4.5 Some Continuous State Space Markov Chain Theory. 4.5.1 Convergence of the Gibbs Sampler. 4.5.2 Convergence of the Metropolis-Hastings Algorithm. 4.6 Hybrid Markov Chain Monte Carlo Methods. 4.6.1 Transition Mixtures. 4.6.2 Metropolis within Gibbs. 4.7 Numerical Accuracy and Convergence in Markov Chain Monte Carlo. 5. Linear Models. 5.1 BACC and the Normal Linear Regression Model. 5.2 Seemingly Unrelated Regressions Models. 5.3 Linear Constraints in the Linear Model. 5.3.1 Linear Inequality Constraints. 5.3.2 Conjectured Linear Restrictions, Linear Inequality Constraints, and Covariate Selection. 5.4 Nonlinear Regression. 5.4.1 Nonlinear Regression with Smoothness Priors. 5.4.2 Nonlinear Regression with Basis Functions. 6. Modeling with Latent Variables. 6.1 Censored Normal Linear Models. 6.2 Probit Linear Models. 6.3 The Independent Finite State Model. 6.4 Modeling with Mixtures of Normal Distributions. 6.4.1 The Independent Student-t Linear Model. 6.4.2 Normal Mixture Linear Models. 6.4.3 Generalizing the Observable Outcomes. 7. Modeling for Time Series. 7.1 Linear Models with Serial Correlation. 7.2 The First-Order Markov Finite State Model. 7.2.1 Inference in the Nonstationary Model. 7.2.2 Inference in the Stationary Model. 7.3 Markov Normal Mixture Linear Model. 8. Bayesian Investigation. 8.1 Implementing Simulation Methods. 8.1.1 Density Ratio Tests. 8.1.2 Joint Distribution Tests. 8.2 Formal Model Comparison. 8.2.1 Bayes Factors for Modeling with Common Likelihoods. 8.2.2 Marginal Likelihood Approximation Using Importance Sampling. 8.2.3 Marginal Likelihood Approximation Using Gibbs Sampling. 8.2.4 Density Ratio Marginal Likelihood Approximation. 8.3 Model Specification. 8.3.1 Prior Predictive Analysis. 8.3.2 Posterior Predictive Analysis. 8.4 Bayesian Communication. 8.5 Density Ratio Robustness Bounds. Bibliography. Author Index. Subject Index.

665 citations


Journal ArticleDOI
Leif Engqvist1
TL;DR: In this paper, the basic assumptions underlying one-factor ANCOVA analysis are discussed and the same general problem applies to all linear models with one or more covariates, including generalized linear models (GLIM) such as logistic regressions and even survival analysis.

628 citations


Journal ArticleDOI
TL;DR: In this paper, a profile least-squares technique was proposed for estimating the parametric component and the asymptotic normality of the profile least squares estimator was studied.
Abstract: Varying-coefficient partially linear models are frequently used in statistical modelling, but their estimation and inference have not been systematically studied. This paper proposes a profile least-squares technique for estimating the parametric component and studies the asymptotic normality of the profile least-squares estimator. The main focus is the examination of whether the generalized likelihood technique developed by Fan et al. is applicable to the testing problem for the parametric component of semiparametric models. We introduce the profile likelihood ratio test and demonstrate that it follows an asymptotically χ2 distribution under the null hypothesis. This not only unveils a new Wilks type of phenomenon, but also provides a simple and useful method for semiparametric inferences. In addition, the Wald statistic for semiparametric models is introduced and demonstrated to possess a sampling property similar to the profile likelihood ratio statistic. A new and simple bandwidth selection technique is proposed for semiparametric inferences on partially linear models and numerical examples are presented to illustrate the proposed methods.

603 citations


Journal ArticleDOI
TL;DR: This paper describes a simple set of "recipes" for the analysis of high spatial density EEG, and demonstrates how corresponding algorithms can be used to remove eye-motion artifacts, extract strong evoked responses, and decompose temporally overlapping components.

Journal ArticleDOI
TL;DR: Locally weighted projection regression is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.
Abstract: Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of—possibly redundant—inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.

Journal ArticleDOI
TL;DR: The conditional Akaike information (CAIC) as discussed by the authors was proposed for both maximum likelihood and residual maximum likelihood estimation of linear mixed-effects models in the analysis of clustered data, and the penalty term in CAIC is related to the effective degrees of freedom p for a linear mixed model proposed by Hodges & Sargent (2001); p reflects an intermediate level of complexity between a fixed-effects model with no cluster effect and a corresponding model with fixed cluster effects.
Abstract: SUMMARY This paper focuses on the Akaike information criterion, AIC, for linear mixed-effects models in the analysis of clustered data. We make the distinction between questions regarding the population and questions regarding the particular clusters in the data. We show that the AIC in current use is not appropriate for the focus on clusters, and we propose instead the conditional Akaike information and its corresponding criterion, the conditional AIC, CAIC. The penalty term in CAIC is related to the effective degrees of freedom p for a linear mixed model proposed by Hodges & Sargent (2001); p reflects an intermediate level of complexity between a fixed-effects model with no cluster effect and a corresponding model with fixed cluster effects. The CAIC is defined for both maximum likelihood and residual maximum likelihood estimation. A pharmacokinetics data appli cation is used to illuminate the distinction between the two inference settings, and to illustrate the use of the conditional AIC in model selection.

Journal ArticleDOI
TL;DR: In this paper, a generalized functional linear regression model for a regression situation where the response variable is a scalar and the predictor is a random function is proposed, where a linear predictor is obtained by forming the scalar product of the predictor function with a smooth parameter function and the expected value of the response is related to this linear predictor via a link function.
Abstract: We propose a generalized functional linear regression model for a regression situation where the response variable is a scalar and the predictor is a random function. A linear predictor is obtained by forming the scalar product of the predictor function with a smooth parameter function, and the expected value of the response is related to this linear predictor via a link function. If, in addition, a variance function is specified, this leads to a functional estimating equation which corresponds to maximizing a functional quasi-likelihood. This general approach includes the special cases of the functional linear model, as well as functional Poisson regression and functional binomial regression. The latter leads to procedures for classification and discrimination of stochastic processes and functional data. We also consider the situation where the link and variance functions are unknown and are estimated nonparametrically from the data, using a semiparametric quasi-likelihood procedure. An essential step in our proposal is dimension reduction by approximating the predictor processes with a truncated Karhunen-Loeve expansion. We develop asymptotic inference for the proposed class of generalized regression models. In the proposed asymptotic approach, the truncation parameter increases with sample size, and a martingale central limit theorem is applied to establish the resulting increasing dimension asymptotics. We establish asymptotic normality for a properly scaled distance between estimated and true functions that corresponds to a suitable L 2 metric and is defined through a generalized covariance operator. As a consequence, we obtain asymptotic tests and simultaneous confidence bands for the parameter function that determines the model. The proposed estimation, inference and classification procedures and variants with unknown link and variance functions are investigated in a simulation study. We find that the practical selection of the number of components works well with the AIC criterion, and this finding is supported by theoretical considerations. We include an application to the classification of medflies regarding their remaining longevity status, based on the observed initial egg-laying curve for each of 534 female medflies.

Book
01 Jan 2005
TL;DR: This paper presents a meta-modelling framework called Bayesian Modeling for Generalized Linear Models and Its Extensions (GLS) and some examples of this model include GANs, Bayesian models, and many others.
Abstract: The Art of Modeling.- Linear Models and Regression.- Nonlinear Models and Regression.- Variance Models, Weighting, and Transformations.- Case Studies in Linear and Nonlinear Modeling.- Linear Mixed Effects Models.- Nonlinear Mixed Effects Models: Theory.- Nonlinear Mixed Effects Models: Practical Issues.- Nonlinear Mixed Effects Models: Case Studies.- Appendix.- References.- Index.

Proceedings ArticleDOI
07 Aug 2005
TL;DR: This work considers the problem of multi-task learning, that is, learning multiple related functions, and presents a hierarchical Bayesian framework, that exploits the equivalence between parametric linear models and nonparametric Gaussian processes.
Abstract: We consider the problem of multi-task learning, that is, learning multiple related functions. Our approach is based on a hierarchical Bayesian framework, that exploits the equivalence between parametric linear models and nonparametric Gaussian processes (GPs). The resulting models can be learned easily via an EM-algorithm. Empirical studies on multi-label text categorization suggest that the presented models allow accurate solutions of these multi-task problems.

Journal ArticleDOI
TL;DR: This paper reviews the use of more specialized regression methods for twin data, based on generalized least squares or linear mixed models, and explains the relationship between these methods and the commonly used approach of analysing within-twin-pair difference values.
Abstract: Twin studies have long been recognized for their value in learning about the aetiology of disease and specifically for their potential for separating genetic effects from environmental effects. The recent upsurge of interest in life-course epidemiology and the study of developmental influences on later health has provided a new impetus to study twins as a source of unique insights. Twins are of special interest because they provide naturally matched pairs where the confounding effects of a large number of potentially causal factors (such as maternal nutrition or gestation length) may be removed by comparisons between twins who share them. The traditional tool of epidemiological 'risk factor analysis' is the regression model, but it is not straightforward to transfer standard regression methods to twin data, because the analysis needs to reflect the paired structure of the data, which induces correlation between twins. This paper reviews the use of more specialized regression methods for twin data, based on generalized least squares or linear mixed models, and explains the relationship between these methods and the commonly used approach of analysing within-twin-pair difference values. Methods and issues of interpretation are illustrated using an example from a recent study of the association between birth weight and cord blood erythropoietin. We focus on the analysis of continuous outcome measures but review additional complexities that arise with binary outcomes. We recommend the use of a general model that includes separate regression coefficients for within-twin-pair and between-pair effects, and provide guidelines for the interpretation of estimates obtained under this model.

Reference EntryDOI
15 Oct 2005
TL;DR: Some formulas are given to obtain insight in the design aspects that are most influential for standard errors and power in multilevel designs.
Abstract: Sample size determination in multilevel designs requires attention to the fact that statistical power depends on the total sample sizes for each level. It is usually desirable to have as many units as possible at the top level of the multilevel hierarchy. Some formulas are given to obtain insight in the design aspects that are most influential for standard errors and power. Keywords: power; statistical tests; design; multilevel analysis; sample size; multisite trial; cluster randomization

Journal ArticleDOI
TL;DR: An effective linear matrix inequality approach is developed to solve the neuron state estimation problem for neural networks with time-varying delays and can be easily extended to cope with the traditional stability analysis problem for delayed neural networks.
Abstract: In this letter, the state estimation problem is studied for neural networks with time-varying delays. The interconnection matrix and the activation functions are assumed to be norm-bounded. The problem addressed is to estimate the neuron states, through available output measurements, such that for all admissible time-delays, the dynamics of the estimation error is globally exponentially stable. An effective linear matrix inequality approach is developed to solve the neuron state estimation problem. In particular, we derive the conditions for the existence of the desired estimators for the delayed neural networks. We also parameterize the explicit expression of the set of desired estimators in terms of linear matrix inequalities (LMIs). Finally, it is shown that the main results can be easily extended to cope with the traditional stability analysis problem for delayed neural networks. Numerical examples are included to illustrate the applicability of the proposed design method.

Journal ArticleDOI
TL;DR: This paper introduces an information criterion for model selection based on composite likelihood, and describes applications to the modelling of time series of counts through dynamic generalised linear models and to the analysis of the well-known Old Faithful dataset.
Abstract: A composite likelihood consists of a combination of valid likelihood objects, usually related to small subsets of data. The merit of composite likelihood is to reduce the computational complexity so that it is possible to deal with large datasets and very complex models, even when the use of standard likelihood or Bayesian methods is not feasible. In this paper, we aim to suggest an integrated, general approach to inference and model selection using composite likelihood methods. In particular, we introduce an information criterion for model selection based on composite likelihood. We also describe applications to the modelling of time series of counts through dynamic generalised linear models and to the analysis of the well-known Old Faithful geyser dataset.

Journal ArticleDOI
TL;DR: This analysis focuses on the simplest and most widely used implementation of linear combiners, which consists of assigning a nonnegative weight to each individual classifier, and considers the ideal performance of this combining rule, i.e., that achievable when the optimal values of the weights are used.
Abstract: In this paper, a theoretical and experimental analysis of linear combiners for multiple classifier systems is presented. Although linear combiners are the most frequently used combining rules, many important issues related to their operation for pattern classification tasks lack a theoretical basis. After a critical review of the framework developed in works by Turner and Ghosh [1996], [1999] on which our analysis is based, we focus on the simplest and most widely used implementation of linear combiners, which consists of assigning a nonnegative weight to each individual classifier. Moreover, we consider the ideal performance of this combining rule, i.e., that achievable when the optimal values of the weights are used. We do not consider the problem of weights estimation, which has been addressed in the literature. Our theoretical analysis shows how the performance of linear combiners, in terms of misclassification probability, depends on the performance of individual classifiers, and on the correlation between their outputs. In particular, we evaluate the ideal performance improvement that can be achieved using the weighted average over the simple average combining rule and investigate in what way it depends on the individual classifiers. Experimental results on real data sets show that the behavior of linear combiners agrees with the predictions of our analytical model. Finally, we discuss the contribution to the state of the art and the practical relevance of our theoretical and experimental analysis of linear combiners for multiple classifier systems.

Journal ArticleDOI
Tae-Kyun Kim1, J. Kittler
TL;DR: A novel gradient-based learning algorithm is proposed for finding the optimal set of local linear bases for multiclass nonlinear discrimination and it is computationally highly efficient as compared to GDA.
Abstract: We present a novel method of nonlinear discriminant analysis involving a set of locally linear transformations called "Locally Linear Discriminant Analysis" (LLDA). The underlying idea is that global nonlinear data structures are locally linear and local structures can be linearly aligned. Input vectors are projected into each local feature space by linear transformations found to yield locally linearly transformed classes that maximize the between-class covariance while minimizing the within-class covariance. In face recognition, linear discriminant analysis (LIDA) has been widely adopted owing to its efficiency, but it does not capture nonlinear manifolds of faces which exhibit pose variations. Conventional nonlinear classification methods based on kernels such as generalized discriminant analysis (GDA) and support vector machine (SVM) have been developed to overcome the shortcomings of the linear method, but they have the drawback of high computational cost of classification and overfitting. Our method is for multiclass nonlinear discrimination and it is computationally highly efficient as compared to GDA. The method does not suffer from overfitting by virtue of the linear base structure of the solution. A novel gradient-based learning algorithm is proposed for finding the optimal set of local linear bases. The optimization does not exhibit a local-maxima problem. The transformation functions facilitate robust face recognition in a low-dimensional subspace, under pose variations, using a single model image. The classification results are given for both synthetic and real face data.

Journal ArticleDOI
TL;DR: The author makes amply clear the cases where R2 is not to be used, such as in the case of a clustered scatterplot where R1 is virtually determined by a point, making a line unreliable if not meaningless.
Abstract: ingenious uses and revealing examples of scatterplot matrices and smoothers like loess smooth mean function estimates. I find the titles and the contents of Chapter 4 (“Drawing Conclusions”), Chapter 9 (“Outliers and Influence”), and Chapter 10 (“Variable Selection”) user-friendly and helpful for a practitioner who is not necessarily a statistician by keeping away from the jargon of statistics thus easing the use of statistics. These are among the hot issues that can confront practitioners, and the userfriendly style will help make applied statistics amenable to them. A prominent supplementary feature of the book is the rich and extensive data and the datasets that have been made available via the Internet. The datasets are used throughout the book without interrupting the flow of the text’s presentation. This intelligent use of the Internet will make it possible for students, applied statisticians, and practitioners to readily relate the concepts to actual statistical data drawn from a wide spectrum of real sources without making the book voluminous or requiring time-consuming manual entry of the data. Some of these data have historical flavors, like the height inheritance traits data from Pearson and Lee (1903). All of the datasets used to illustrate examples throughout the book or needed for the exercise problems are made available on-line at the webpage created for the book, http://www.stat.umn.edu/alr. The website also accommodates an errata section for the typos in this edition. Many regression books inadvertently tempt practitioners to make inappropriate use of R2 as a “useful” summary of regression. This author makes amply clear the cases where R2 is not to be used, such as in the case of a clustered scatterplot where R2 is virtually determined by a point, making a line unreliable if not meaningless. It may seem odd to have Chapters 11 (“Nonlinear Regression”) and Chapter 12 (“Logistic Regression”) in a linear regression book, but I was pleased to see their inclusion. This is because many practitioners often seek solutions to their problems in the linear regression arena as the natural and most popular initial candidate. However, the analysis or the nature of the data may force them to consider nonlinear models or logistic regression, as in the case of categorical data. In these cases readers and practitioners may not need to consult another specialized book, because of the thoughtful coverage provided in this text. The book is intended to be focused on methodology, and hence no extensive discussion is directly provided about computer packages and graphical software, even though computation and simulation are used throughout. This is facilitated by the very intelligent use of the Internet as an optional supplement to the book using the previously mentioned URL. There the readers can also learn how to use many popular statistical packages in applying the examples of the book. Supported with on-line R and S–PLUS libraries, one finds these and the website materials invaluable.

Journal ArticleDOI
TL;DR: The authors examined the forecast accuracy of linear autoregressive, smooth transition auto-regression (STAR), and neural network (NN) time series models for 47 macroeconomic variables of the G7 economies.

Journal ArticleDOI
TL;DR: A new method to detect and adjust for noise and artifacts in functional MRI time series data is presented, which results in significantly increased sensitivity in the ability to detect regions of activation.

Book
01 Jan 2005
TL;DR: In this article, the authors present an overview of statistical methods for forecasting and data analysis, as well as a case study of the Box-Jenkins Seasonal Modeling and its application to time series regression.
Abstract: Part I: INTRODUCTION AND REVIEW OF BASIC STATISTICS. 1. An Introduction to Forecasting. Forecasting and Data. Forecasting Methods. Errors in Forecasting. Choosing a Forescasting Technique. An Overview of Quantitative Forecasting Techniques. 2. Basic Statistical Concepts. Populations. Probability. Random Samples and Sample Statistics. Continuous Probability Distributions. The Normal Probability Distribution. The t-Distribution, the F-Distribution, the Chi-Square Distribution. Confidence Intervals for a Population Mean. Hypothesis Testing for a Population Mean. Exercises. Part II: REGRESSION ANALYSIS. 3. Simple Linear Regression. The Simple Linear Regression Model. The Least Squares Point Estimates. Point Estimates and Point Predictions. Model Assumptions and the Standard Error. Testing the Significance of the Slope and y Intercept. Confidence and Prediction Intervals. Simple Coefficients of Determination and Correlation. An F Test for the Model. Exercises. 4. Multiple Linear Regression. The Linear Regression Model. The Least Squares Estimates, and Point Estimation and Prediction. The Mean Square Error and the Standard Error. Model Utility: R2, Adjusted R2, and the Overall F Test. Testing the Significance of an Independent Variable. Confidence and Prediction Intervals. The Quadratic Regression Model. Interaction. Using Dummy Variables to Model Qualitative Independent Variables. The Partial F Test: Testing the Significance of a Portion of a Regression Model. Exercises. 5. Model Building and Residual Analysis. Model Building and the Effects of Multicollinearity. Residual Analysis in Simple Regression. Residual Analysis in Multiple Regression. Diagnostics for Detecting Outlying and Influential Observations. Exercises. Part III: TIME SERIES REGRESSION, DECOMPOSITION METHODS, AND EXPONENTIAL SMOOTHING. 6. Time Series Regression. Modeling Trend by Using Polynomial Functions. Detecting Autocorrelation. Types of Seasonal Variation. Modeling Seasonal Variation by Using Dummy Variables and Trigonometric Functions. Growth Curves. Handling First-Order Autocorrelation. Exercises. 7. Decomposition Methods. Multiplicative Decomposition. Additive Decomposition. The X-12-ARIMA Seasonal Adjustment Method. Exercises. 8. Exponential Smoothing. Simple Exponential Smoothing. Tracking Signals. Holts Trend Corrected Exponential Smoothing. Holt-Winters Methods. Damped Trends and Other Exponential Smoothing Methods. Models for Exponential Smoothing and Prediction Intervals. Exercises. Part IV: THE BOX-JENKINS METHODOLOGY. 9. Nonseasonal Box-Jenkins Modeling and Their Tentative Identification. Stationary and Nonstationary Time Series. The Sample Autocorrelation and Partial Autocorrelation Functions: The SAC and SPAC. An Introduction to Nonseasonal Modeling and Forecasting. Tentative Identification of Nonseasonal Box-Jenkins Models. Exercises. 10. Estimation, Diagnostic Checking, and Forecasting for Nonseasonal Box-Jenkins Models. Estimation. Diagnostic Checking. Forecasting. A Case Study. Box-Jenkins Implementation of Exponential Smoothing. Exercises. 11. Box-Jenkins Seasonal Modeling. Transforming a Seasonal Time Series into a Stationary Time Series. Three Examples of Seasonal Modeling and Forecasting. Box-Jenkins Error Term Models in Time Series Regression. Exercises. 12. Advanced Box-Jenkins Modeling. The General Seasonal Model and Guidelines for Tentative Identificatino. Intervention Models. A Procedure for Building a Transfer Function Model. Exercises. Appendix A: Statistical Tables Appendix B: Matrix Algebra for Regression Calculations. Matrices and Vectors. The Transpose of a Matrix. Sums and Differences of Matrices. Matrix Multiplication. The Identity Matrix. Linear Dependence and Linear Independence. The Inverse of a Matrix. The Least Squares Point Esimates. The Unexplained Variation and Explained Variation. The Standard Error of the Estimate b. The Distance Value. Using Squared Terms. Using Interaction Terms. Using Dummy Variable. The Standard Error of the Estimate of a Linear Combination of Regression Parameters. Exercises. Appendix C: References.

Journal ArticleDOI
TL;DR: In this paper, a generalized functional linear regression model for a regression situation where the response variable is a scalar and the predictor is a random function is proposed, where a linear predictor is obtained by forming the scalar product of the predictor function with a smooth parameter function, and the expected value of the response is related to this linear predictor via a link function.
Abstract: We propose a generalized functional linear regression model for a regression situation where the response variable is a scalar and the predictor is a random function. A linear predictor is obtained by forming the scalar product of the predictor function with a smooth parameter function, and the expected value of the response is related to this linear predictor via a link function. If, in addition, a variance function is specified, this leads to a functional estimating equation which corresponds to maximizing a functional quasi-likelihood. This general approach includes the special cases of the functional linear model, as well as functional Poisson regression and functional binomial regression. The latter leads to procedures for classification and discrimination of stochastic processes and functional data. We also consider the situation where the link and variance functions are unknown and are estimated nonparametrically from the data, using a semiparametric quasi-likelihood procedure. An essential step in our proposal is dimension reduction by approximating the predictor processes with a truncated Karhunen-Loeve expansion. We develop asymptotic inference for the proposed class of generalized regression models. In the proposed asymptotic approach, the truncation parameter increases with sample size, and a martingale central limit theorem is applied to establish the resulting increasing dimension asymptotics. We establish asymptotic normality for a properly scaled distance between estimated and true functions that corresponds to a suitable L^2 metric and is defined through a generalized covariance operator.

Journal ArticleDOI
TL;DR: In this paper, the relationship between Chlorophyll-a and 16 chemical, physical, and biological water quality variables in Camlidere reservoir (Ankara, Turkey) were studied by using principal component scores (PCS) in multiple linear regression analysis (MLR) to predict CHL-a levels.

Journal ArticleDOI
TL;DR: The basic concepts to obtain a posteriori error estimates for the finite element solution of an elliptic linear model problem are reviewed and it is concluded that the actually practical error estimation techniques do not provide mathematically proven bounds on the error and need to be used with care.

Posted Content
TL;DR: In this article, a number of often applied nonlinear conditional mean models are introduced and their main properties discussed, and some empirical studies that compare forecasts from linear and nonlinear models are discussed.
Abstract: This article is concerned with forecasting from nonlinear conditional mean models. First, a number of often applied nonlinear conditional mean models are introduced and their main properties discussed. The next section is devoted to techniques of building nonlinear models. Ways of computing multi-step ahead forecasts from nonlinear models are surveyed. Tests of forecast accuracy in the case where the models generating the forecasts are nested are discussed. There is a numerical example, showing that even when a stationary nonlinear process generates the observations, future obervations may in some situations be better forecast by a linear model with a unit root. Finally, some empirical studies that compare forecasts from linear and nonlinear models are discussed.