Showing papers on "Unit-weighted regression published in 2007"

PDF

Open Access

Book•

[...]

06 Sep 2007

TL;DR: In this article, the authors define robustness as resistance and resistance to OLS estimates, and define robust regression for the linear model L-Estimators R-EstIMators M-Estimates GM-Estimate S-Estimation S- Estimate Generalized S-Evalator MM-Estime Comparing the various estimators Diagnostics Revisited: Robust Regression-Related Methods for Detecting Outliers.

...read moreread less

Abstract: List of Figures List of Tables Series Editor's Introduction Acknowledgments 1. Introduction Defining Robustness Defining Robust Regression A Real-World Example: Coital Frequency of Married Couples in the 1970s 2. Important Background Bias and Consistency Breakdown Point Influence Function Relative Efficiency Measures of Location Measures of Scale M-Estimation Comparing Various Estimates Notes 3. Robustness, Resistance, and Ordinary Least Squares Regression Ordinary Least Squares Regression Implications of Unusual Cases for OLS Estimates and Standard Errors Detecting Problematic Observations in OLS Regression Notes 4. Robust Regression for the Linear Model L-Estimators R-Estimators M-Estimators GM-Estimators S-Estimators Generalized S-Estimators MM-Estimators Comparing the Various Estimators Diagnostics Revisited: Robust Regression-Related Methods for Detecting Outliers Notes 5. Standard Errors for Robust Regression Asymptotic Standard Errors for Robust Regression Estimators Bootstrapped Standard Errors Notes 6. Influential Cases in Generalized Linear Models The Generalized Linear Model Detecting Unusual Cases in Generalized Linear Models Robust Generalized Linear Models Notes 7. Conclusions Appendix: Software Considerations for Robust Regression References Index About the Author

...read moreread less

322 citations

Proceedings Article•DOI•

Automatic Outlier Detection: A Bayesian Approach

[...]

Jo-Anne Ting¹, Aaron D'Souza², Stefan Schaal¹•Institutions (2)

University of Southern California¹, Google²

10 Apr 2007

TL;DR: A Bayesian way of dealing with outlier-infested sensory data is introduced and a "black box" approach to removing outliers in real-time and expressing confidence in the estimated data is developed.

...read moreread less

Abstract: In order to achieve reliable autonomous control in advanced robotic systems like entertainment robots, assistive robots, humanoid robots and autonomous vehicles, sensory data needs to be absolutely reliable, or some measure of reliability must be available. Bayesian statistics can offer favorable ways of accomplishing such robust sensory data pre-processing. In this paper, we introduce a Bayesian way of dealing with outlier-infested sensory data and develop a "black box" approach to removing outliers in real-time and expressing confidence in the estimated data. We develop our approach in the framework of Bayesian linear regression with heteroscedastic noise. Essentially, every measured data point is assumed to have its individual variance, and the final estimate is achieved by a weighted regression over observed data. An expectation-maximization algorithm allows us to estimate the variance of each data point in an incremental algorithm. With the exception of a time horizon (window size) over which the estimation process is averaged, no open parameters need to be tuned, and no special assumption about the generative structure of the data is required. The algorithm works efficiently in realtime. We evaluate our method on synthetic data and on a pose estimation problem of a quadruped robot, demonstrating its ease of usability, competitive nature with well-tuned alternative algorithms and advantages in terms of robust outlier removal

...read moreread less

50 citations

Book•

Handbook Of Regression And Modeling: Applications For The Clinical And Pharmaceutical Industries

[...]

Daryl S. Paulson

01 Jan 2007

TL;DR: In this article, simple linear regression is used to determine whether x from y, serial correlation, and curve fitting for simple linear regressions, and various problems in Simple Linear Regression: Determining X from Y, Serial Correlation, and Curve Fitting.

...read moreread less

Abstract: Basic Statistical Concepts Simple Linear Regression Special Problems in Simple Linear Regression: Determining x from y, Serial Correlation, and Curve Fitting Some Aspects and Examples in Constructing a Valid Simple Regression Study Multiple Linear Regression Correlation in Multiple Regression Issues in Multiple Linear Regression Polynomial Regression Special Topics in Multiple Regression Indicator (Dummy) Variable Regression Model Building/Model Selection Analysis of Covariance Logistic Regression Appendices

...read moreread less

9 citations

Journal Article•DOI•

Comment: Struggles with Survey Weighting and Regression Modeling

[...]

Danny Pfeffermann¹•Institutions (1)

Hebrew University of Jerusalem¹

26 Oct 2007-arXiv: Methodology

TL;DR: The idea behind regression weighting is to include in the regression model all the variables and interactions that are related to the outcome values and affect the sam ple selection and the response probabilities, such that the sampling and response mechanisms are ignorable in the sense that the model fitted to the observed data is irrelevant as mentioned in this paper.

...read moreread less

Abstract: This is an intriguing paper that raises important ques tions, and I feel privileged for being invited to discuss it. The paper deals with a very basic problem of sample surveys: how to weight the survey data in order to esti mate finite population quantities of interest like means, differences of means or regression coefficients. The paper focuses for the most part on the com mon estimator of a population mean, yw = Xw=i wiyil YTi=\ wi, and discusses different approaches to con structing the weights by use of linear regression mod els. These models vary in terms of the number and na ture of the regressors in the model and in the assump tions regarding the regression coefficients, whether fixed or random with prespecified distributions. The idea behind regression weighting is to include in the regression model all the variables and interactions that are related to the outcome values and affect the sam ple selection and the response probabilities, such that the sampling and response mechanisms are ignorable in the sense that the model fitted to the observed data is

...read moreread less

6 citations

Journal Article•DOI•

Predicting stock returns and assessing prediction performance

[...]

Rose Baker¹, Alexander Belgorodskiy¹•Institutions (1)

University of Salford¹

01 Oct 2007-Ima Journal of Management Mathematics

TL;DR: The authors used regression methods to predict the expected monthly return on stocks and the covariance matrix of returns, the predictor variables being a company's "fundamentals" such as dividend yield and the history of previous returns.

...read moreread less

Abstract: We use regression methods to predict the expected monthly return on stocks and the covariance matrix of returns, the predictor variables being a company's ‘fundamentals’, such as dividend yield and the history of previous returns. Predictions are evaluated out of sample for shares traded on the London Stock Exchange from 1976 to 2005. We explore and evaluate many modelling and inferential approaches, including the use of weighted regression, discounted regression, shrinkage of regression coefficients and the transformation to normality of predictor variables. We also investigate alternative covariance matrix models, such as a two-index model and a shrinkage model. Using suitable statistics to enable the out-of-sample performance of competing methodologies to be compared is crucial, and we develop some new statistics and a graphical aid for this purpose. What is original in this paper is an evaluation of many modelling and inferential procedures for which conflicting claims have been made in the literature and the development of new measures of portfolio performance.

...read moreread less

5 citations

Journal Article•DOI•

A note on permutation tests of significance for multiple regression coefficients.

[...]

Michael A. Long¹, Kenneth J. Berry¹, Paul W. Mielke¹•Institutions (1)

Colorado State University¹

01 Apr 2007-Psychological Reports

TL;DR: This paper demonstrates that asymptotic estimates of standard errors provided by multiple regression are not always accurate, and a resampling permutation procedure is used to estimate the standard errors.

...read moreread less

Abstract: In the vast majority of psychological research utilizing multiple regression analysis, asymptotic probability values are reported. This paper demonstrates that asymptotic estimates of standard errors provided by multiple regression are not always accurate. A resampling permutation procedure is used to estimate the standard errors. In some cases the results differ substantially from the traditional least squares regression estimates.

...read moreread less

4 citations

Journal Article•DOI•

An Omnibus Test When Using A Regression Estimator With Multiple Predictors

[...]

Rand R. Wilcox¹•Institutions (1)

University of Southern California¹

01 Nov 2007-Journal of Modern Applied Statistical Methods

4 citations

A preliminary investigation of the stability of Geographically-Weighted Regression

[...]

Peter A. Whigham¹, Geoff Hay•Institutions (1)

University of Otago¹

06 Dec 2007

TL;DR: In this article, the stability of parameter coefficient estimates for Geographically-Weighted Regression (GWR) models is analyzed and the results from GWR must be carefully considered in terms of the form of data, assumed coefficient surface being modelled, and the confidence of the resulting parameter estimates.

...read moreread less

Abstract: This paper describes preliminary work analysing the stability of parameter coefficient estimates for Geographically-Weighted Regression (GWR). Based on a large dataset (35721 points) various random samplings of this data were performed and models built using GWR. An analysis of the coefficient values for the independent variables showed that these values could varying significantly both between runs and between sampling sizes. This suggests that the results from GWR must be carefully considered in terms of the form of data, assumed coefficient surface being modelled, and the confidence of the resulting parameter estimates.

...read moreread less

3 citations

Proceedings Article•DOI•

Recursive estimation of a hydrological regression model

[...]

Thomas Kjeldsen, David A. Jones

11 May 2007

TL;DR: In this paper, a recursive metho-d for estimating a parameterised form of the cross correlation between the regression model errors, the variance of these errors and regression model parameters is presented.

...read moreread less

Abstract: The use of the generalised least square (GLS) technique for estimation of hydrological regression models has become good practi ce in hydrology. Through a regression model, a simple link between a part icular hydrological variable and a set of catchment descriptors can be established. The regression residuals can be treated as the sum of sampling errors in the hydrological variable and errors in the regression model. This paper presents a recursive metho d for estimating a parameterised form of the cross correlation between the regression model errors, the variance of these errors and the regression model parameters . A re -weighted set of regression residuals can be defined such that the covariance of these residuals is essentially similar to that of the model error. The cross products of the re -weighted regression residuals, pooled within bins, can be used to identify a structure and to fit a parameterised form for the cross -correlations of the regression e rrors. The procedure has been tested successfully on annual maximum flow data from 602 catchments located throughout the UK.

...read moreread less

1 citations

Journal Article•DOI•

Influence Analysis of Constrained Regression Models

[...]

Myung-Geun Kim

31 Aug 2007-Communications for Statistical Applications and Methods

TL;DR: In this paper, Cook's distance is generalized to the multiple linear regression with linear constraints on regression coefficients, and it is used for identifying influential observations in constrained regression models, and a numerical example is provided for illustration.

...read moreread less

Abstract: Cook's distance is generalized to the multiple linear regression with linear constraints on regression coefficients. It is used for identifying influential observations in constrained regression models. A numerical example is provided for illustration.

...read moreread less

1 citations

Posted Content•

GIS and Geographically Weighted Regression in stated preferences analysis of the externalities produced by linear infrastructures

[...]

Sergio Giaccaria, Frontuto Vito

01 Dec 2007-Research Papers in Economics

TL;DR: In this article, the authors employ Geographical Information Systems and a spatial econometric technique, the Geographic Weighted Regression, integrated in a dichotomous choice CV in order to improve both the sampling design and the econometrical analysis of a CV survey, by fitting local changes and highlighting spatial nonstationarity in the relationships between estimated WTP and explanatory variables.

...read moreread less

Abstract: The paper uses Contingent Valuation to investigate the externalities from linear infrastructures, with a particular concern for their dependence on characteristics of the local context within which they are perceived. We employ Geographical Information Systems and a spatial econometric technique, the Geographic Weighted Regression, integrated in a dichotomous choice CV in order to improve both the sampling design and the econometric analysis of a CV survey. These tools are helpful when local factors with an important spatial variability may have a crucial explanatory role in the structure of individual preferences. The Geographic Weighted Regression is introduced, beside GIS, as a way to enhance the flexibility of a stated preference analysis, by fitting local changes and highlighting spatial non-stationarity in the relationships between estimated WTP and explanatory variables. This local approach is compared with a standard double bounded contingent valuation through an empirical study about high voltage transmission lines. The GWR methodology has not been applied before in environmental economics. The paper shows its significance in testing the consistency of the standard approach by monitoring the spatial patterns in the distribution of the WTP and the spatial stability of the parameters estimated in order to compute the conditional WTPs

...read moreread less

Dissertation•

Weighted tree-based cluster ensembles for high dimensional data

[...]

Christine Smyth

01 Mar 2007

TL;DR: A holistic cluster analysis is presented: clusters are accurately unearthed within large datasets; an estimate of the natural number of clusters is obtained; and the variables important in defining the clusters are also established.

...read moreread less

Abstract: The increasing size of datasets is particularly evident in the field of bioinformatics. It is unlikely that analyzing these large datasets with a single model will produce an accurate solution. This has led to the ensemble approach, where many models are averaged to give a consensus representation of the data. Taking a weighted average of the individual models has improved the accuracy of both classification and regression ensembles. However, weighting models within a cluster ensemble has remained relatively undeveloped because there is no gold standard available for comparison. This thesis explores a technique of weighting cluster ensembles. A regression technique, multivariate regression trees, is shown to produce an accurate clustering solution. Each solution (tree) is then weighted purely in terms of its predictive accuracy. Various weighting strategies are trialed to determine the superior technique. After each individual tree is assigned a weight, the trees’ co-occurrence matrices are obtained. The co-occurrence matrices are then aggregated together, weighted according to the trees’ predictive weights. The final result is a single weighted co-occurrence matrix. A new technique, similarity-based k-means, is developed in order to partition the weighted co-occurrence matrix. Similarity-based k-means is demonstrated to produce accurate partitions of similarity matrices. The resulting clusters agree with the known groups in the investigated datasets. Furthermore, this thesis develops two other techniques so that maximal information can be obtained in conjunction with the weighted cluster ensemble. The first method suggests an estimate of the natural number of clusters in a dataset, by assessing the predictive performance and variability of similarity-based k-means for various numbers of clusters. The estimates agree with the known numbers of groups within the investigated datasets. The second method elucidates the variables that define the clusters. These variables have high classification power within the studied datasets. Therefore, this thesis presents a holistic cluster analysis: clusters are accurately unearthed within large datasets; an estimate of the natural number of clusters is obtained; and the variables important in defining the clusters are also established. The weighted cluster ensemble technique is applied to a variety of small and large datasets. All results demonstrate the power of weighting the individual models within the ensemble: the developed weighted cluster ensemble technique consistently outperforms the other techniques. The results of analyzing two DNA microarray datasets are particularly promising. The discovered clusters overlap with the known diagnoses in the datasets, and the variables deemed important in defining the clusters have previously been suggested as biomarkers. Whilst the size of contemporary datasets presents unique statistical challenges, the potential information within them is immense. Statistical techniques must be developed in order to accurately analyze these datasets. Motivated by the success of weighted regression and classification ensembles applied to large datasets, this thesis suggests a technique of weighting models within a cluster ensemble. The results highlight the potential of weighted cluster ensembles in high dimensional settings, such as the analysis of DNA microarrays.

...read moreread less

Research Article An M-Estimation-Based Procedure for Determining the Number of Regression Models in Regression Clustering

[...]

C. R. Rao, Y. Wu, Q. Shao

01 Jan 2007

TL;DR: In this paper, a procedure based on M-estimation to determine the number of regression models for the problem of regression clustering is proposed, and the true classification is attained when n increases to infinity under certain mild conditions, for instance, without assuming normality of the distribution of the random errors in each regression model.

...read moreread less

Abstract: In this paper, a procedure based on M-estimation to determine the number of regression models for the problem of regression clustering is proposed. We have shown that the true classification is attained when n increases to infinity under certain mild conditions, for instance, without assuming normality of the distribution of the random errors in each regression model.

...read moreread less