scispace - formally typeset
Search or ask a question

Showing papers on "Unit-weighted regression published in 2010"


Book ChapterDOI
01 Jan 2010
TL;DR: Geographically weighted regression (GWR) as mentioned in this paper is based on the nonparametric technique of locally weighted regression developed in statistics for curve-fitting and smoothing applications, where local regression parameters are estimated using subsets of data proximate to a model estimation point in variable space.
Abstract: Geographically weighted regression (GWR) was introduced to the geography literature by Brunsdon et al. (1996) to study the potential for relationships in a regression model to vary in geographical space, or what is termed parametric nonstationarity. GWR is based on the non-parametric technique of locally weighted regression developed in statistics for curve-fitting and smoothing applications, where local regression parameters are estimated using subsets of data proximate to a model estimation point in variable space. The innovation with GWR is using a subset of data proximate to the model calibration location in geographical space instead of variable space. While the emphasis in traditional locally weighted regression in statistics has been on curve-fitting, that is estimating or predic ting the response variable, GWR has been presented as a method to conduct inference on spatially varying relationships, in an attempt to extend the original emphasis on prediction to confirmatory analysis (Paez and Wheeler 2009).

163 citations


Journal ArticleDOI
TL;DR: In this article, the authors used stepwise multiple regression (SMR) to select the suitable controlled variables in forecast fish landing, which is combination of forward selection and backward elimination method.

96 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the significant improvement in the quality of the regression models can be obtained either with the weighted regression, exploratory regression using a compressed subset with a similar information content, or exploratory weighted regression on the compressed subset, which is weighted with one of the proposed weighting schemes.
Abstract: Symbolic regression of input-output data conventionally treats data records equally. We suggest a framework for automatic assignment of weights to data samples, which takes into account the sample's relative importance. In this paper, we study the possibilities of improving symbolic regression on real-life data by incorporating weights into the fitness function. We introduce four weighting schemes defining the importance of a point relative to proximity, surrounding, remoteness, and nonlinear deviation from k nearest-in-the-input-space neighbors. For enhanced analysis and modeling of large imbalanced data sets we introduce a simple multidimensional iterative technique for subsampling. This technique allows a sensible partitioning (and compression) of data to nested subsets of an arbitrary size in such a way that the subsets are balanced with respect to either of the presented weighting schemes. For cases where a given input-output data set contains some redundancy, we suggest an approach to considerably improve the effectiveness of regression by applying more modeling effort to a smaller subset of the data set that has a similar information content. Such improvement is achieved due to better exploration of the search space of potential solutions at the same number of function evaluations. We compare different approaches to regression on five benchmark problems with a fixed budget allocation. We demonstrate that the significant improvement in the quality of the regression models can be obtained either with the weighted regression, exploratory regression using a compressed subset with a similar information content, or exploratory weighted regression on the compressed subset, which is weighted with one of the proposed weighting schemes.

38 citations



Journal ArticleDOI
TL;DR: In this paper, an improved response surface method (RSM) based on weighted regression for the anti-slide reliability analysis of concrete gravity dam was designed and constructed, which not only saves the arithmetic operations but also greatly enhances the calculation efficiency and storage efficiency.
Abstract: The aim of this study was to design and construct an improved response surface method (RSM) based on weighted regression for the anti-slide reliability analysis of concrete gravity dam. The limitation and lacuna of the traditional RSM were briefly analyzed. Firstly, based on small experimental points, research was devoted to an improved RSM with singular value decomposition techniques. Then, the method was used on the basis of weighted regression and deviation coefficient correction to reduce iteration times and experimental points and improve the calculation method of checking point. Finally, a test example was given to verify this method. Compared with other conventional algorithms, this method has some strong advantages: this algorithm not only saves the arithmetic operations but also greatly enhances the calculation efficiency and the storage efficiency.

24 citations


Journal ArticleDOI
TL;DR: Because of its simplicity and ease of use, 1/X(2) is recommended for general application and if instrument signal variance is too high to be managed by statistical techniques, the only solution is to control such variance through laboratory-based solutions.

22 citations



Journal ArticleDOI
TL;DR: In this article, a robust version of locally weighted regression smoothing scatter plots technique (LOWESS) regression approach for obtaining projections of mean monthly maximum and minimum temperatures (Tmax and Tmin) to Pichola watershed in an arid region in India.
Abstract: Downscaling models are developed using robust version of locally weighted regression smoothing scatter plots technique (LOWESS) regression approach for obtaining projections of mean monthly maximum and minimum temperatures (Tmax and Tmin) to Pichola watershed in an arid region in India. Variable Importance in the Projection (VIP) score from Partial Least Squares (PLSs) regression is used to select the variables. A comparison is also done with LOWESS regression approach. The results show that an increasing trend is observed for Tmax and Tmin for A1B, A2 and Bl scenarios whereas no trend is discerned with the COMMIT.

14 citations


Book
10 May 2010
TL;DR: In this paper, the authors introduce linear regression models and describe other linear models including Poisson regression, logistic regression, proportional hazards regression, and nonparametric regression for a second course in basic statistics for undergraduates or first-year graduate students.
Abstract: This textbook for a second course in basic statistics for undergraduates or first-year graduate students introduces linear regression models and describes other linear models including Poisson regression, logistic regression, proportional hazards regression, and nonparametric regression. Numerous examples drawn from the news and current events with an emphasis on health issues illustrate these concepts. Assuming only a pre-calculus background, the author keeps equations to a minimum and demonstrates all computations using SAS. Most of the programs and output are displayed in a self-contained way, with an emphasis on the interpretation of the output in terms of how it relates to the motivating example. Plenty of exercises conclude every chapter. All of the datasets and SAS programs are available from the book's website, along with other ancillary material.

12 citations


Journal ArticleDOI
TL;DR: Numerical illustrations and comparisons through simulations showed that the proposed dose assignment can yield better estimates of trend in the regression analysis of two variants than that obtained by the conventional assignment methods such as those that use median values.
Abstract: For performing regression analysis of summarized response data containing grouped intervals of exposure, many researchers use pre-assigned doses such as the median values of each interval. However, the trend estimate is considerably sensitive to the choice of the assigned values. In this paper, we propose a method to assign the values to obtain a more accurate regression coefficient applying the likelihood approach. Numerical illustrations and comparisons through simulations showed that the proposed dose assignment can yield better estimates of trend in the regression analysis of two variants than that obtained by the conventional assignment methods such as those that use median values. In particular, for the data of a case-control study, the proposed dose improved the accuracy of the procedures such as that developed by Greenland and Longnecker (1992) compared with the conventional pre-assigned dose.

10 citations


Journal ArticleDOI
TL;DR: Moreno-Noguer et al. as discussed by the authors used regression depth to compare non-parametric regression lines corresponding to two independent groups when there is one predictor, and the focus is on a (conditional) robust measure of location.
Abstract: This paper deals with the problem of comparing non-parametric regression lines corresponding to two independent groups when there is one predictor. For the usual linear model, the goal reduces to testing the hypothesis that the slopes, as well as the intercepts, are equal. The approach is based in part on a slight generalization of the notion of regression depth as defined by Rousseeuw and Hubert [Regression depth, J. Am. Statis. Assoc. 94 (1999), pp. 388–402]. Roughly, the hypothesis testing strategy begins by fitting two robust non-parametric regression lines to the first group. The first is based on the data from the first group, and the other uses the data from the second group. If the null hypothesis is true, the difference between the resulting depths should be relatively small. The same is true when fitting non-parametric regression lines to the second group. In contrast to most methods for comparing non-parametric regression lines, the focus is on a (conditional) robust measure of location. Moreov...

Proceedings ArticleDOI
23 Apr 2010
TL;DR: A number of models have been used for estimating frequency of accidents as mentioned in this paper and in the recent years artificial neural network models have also been used as prediction models of accidents and researchers need to select and use some models with the best performance particularly with the minimum of mean square errors.
Abstract: A number of models have been used for estimating frequency of accidents. Weighted and simple linear regressions are common and in the recent years artificial neural network models have also been used as prediction models of accidents. Researchers need to select and use some models with the best performance particularly with the minimum of mean square errors. In this paper, traffic volume, surface condition, heavy traffic, and monthly accident data have been analysed in two Iranian major freeways named Tehran-Qom and Karaj-Qazvin-zanjan and three different kinds of models including simple and weighted linear regression and artificial neural network have been developed for estimating the number of monthly accident based on the above input variables. The well-known software of MATLAB has been used for analytical process and principle component analysis technique has been used to ensure that input variables don’t have inter-relations. Principle components and loading have been calculated and results of PCA show that all input variables should be considered in modeling. The effectiveness of input variables based on T-test has been analyzed and the results show that traffic volume and surface condition have more effect in rural accidents. For models’ performance comparison, the mean square errors have been considered. It can be concluded, from the results, that artificial neural network has the best performance with minimum mean square errors.

Journal Article
TL;DR: In this paper, the authors employed the technique of geographical weighted regression (GWR) to make an empirical study of China's RD knowledge spillovers at city level, and found that there is a significant difference between OLS and GWR in estimating the parameters of RD knowledge production, and that the relationships between level of regional innovation activities and various factors show considerable spatial variability.
Abstract: The present paper employs technique of geographical weighted regression(GWR) to make an empirical study of China's RD knowledge spillovers at city level.Conventional regression analysis can only produce 'average' and 'global' parameter estimates rather than 'local' parameter estimates which vary over space in some spatial systems.Geographically weighted regression(GWR),on the other hand, is a simple,but useful new technique for the analysis of spatial nonstationarity.Results show that there is a significant difference between OLS and GWR in estimating the parameters of RD knowledge production,and that the relationships between level of regional innovation activities and various factors show considerable spatial variability.


Journal ArticleDOI
29 Nov 2010
TL;DR: In this paper, the coefficients of regression for modeling the paraboloid cones and the scale parameter are estimated using robust weighted M-estimators where the weights decrease quadratically from 1 in the middle to zero at the border of the selected neighborhood.
Abstract: The yield map is generated by fitting the yield surface shape of yield monitor data mainly using paraboloid cones on floating neighborhoods. Each yield map value is determined by the fit of such a cone on an elliptical neighborhood that is wider across the harvest tracks than it is along them. The coefficients of regression for modeling the paraboloid cones and the scale parameter are estimated using robust weighted M-estimators where the weights decrease quadratically from 1 in the middle to zero at the border of the selected neighborhood. The robust way of estimating the model parameters supersedes a procedure for detecting outliers. For a given neighborhood shape, this yield mapping method is implemented by the Fortran program paraboloidmapping.exe, which can be downloaded from the web. The size of the selected neighborhood is considered appropriate if the variance of the yield map values equals the variance of the true yields, which is the difference between the variance of the raw yield data and the error variance of the yield monitor. It is estimated using a robust variogram on data that have not had the trend removed.

Posted Content
TL;DR: A kind of weighted regression, which can be used for econometric purposes, where the initial inputs are multiplied by the neural networks final optimum weights from input-hidden layer after the training process, is proposed.
Abstract: In this paper we present an autoregressive model with neural networks modeling and standard error backpropagation algorithm training optimization in order to predict the gross domestic product (GDP) growth rate of four countries. Specifically we propose a kind of weighted regression, which can be used for econometric purposes, where the initial inputs are multiplied by the neural networks final optimum weights from input-hidden layer after the training process. The forecasts are compared with those of the ordinary autoregressive model and we conclude that the proposed regression’s forecasting results outperform significant those of autoregressive model in the out-of-sample period. The idea behind this approach is to propose a parametric regression with weighted variables in order to test for the statistical significance and the magnitude of the estimated autoregressive coefficients and simultaneously to estimate the forecasts.

Book ChapterDOI
19 Mar 2010

Dissertation
01 Jan 2010
TL;DR: A spatially localised online learning algorithm set up in a probabilistic framework with principled Bayesian inference rule for the parameters of the model that learns local models completely independent of each other, uses only local information and adapts the local model complexity in a data driven fashion.
Abstract: Locally weighted regression is a non-parametric technique of regression that is capable of coping with non-stationarity of the input distribution. Online algorithms like Receptive Field Weighted Regression and Locally Weighted Projection Regression use a sparse representation of the locally weighted model to approximate a target function, resulting in an efficient learning algorithm. However, these algorithms are fairly sensitive to parameter initializations and have multiple open learning parameters that are usually set using some insights of the problem and local heuristics. In this thesis, we attempt to alleviate these problems by using a probabilistic formulation of locally weighted regression followed by a principled Bayesian inference of the parameters. In the Randomly Varying Coefficient (RVC) model developed in this thesis, locally weighted regression is set up as an ensemble of regression experts that provide a local linear approximation to the target function. We train the individual experts independently and then combine their predictions using a Product of Experts formalism. Independent training of experts allows us to adapt the complexity of the regression model dynamically while learning in an online fashion. The local experts themselves are modeled using a hierarchical Bayesian probability distribution with Variational Bayesian Expectation Maximization steps to learn the posterior distributions over the parameters. The Bayesian modeling of the local experts leads to an inference procedure that is fairly insensitive to parameter initializations and avoids problems like overfitting. We further exploit the Bayesian inference procedure to derive efficient online update rules for the parameters. Learning in the regression setting is also extended to handle a classification task by making use of a logistic regression to model discrete class labels. The main contribution of the thesis is a spatially localised online learning algorithm set up in a probabilistic framework with principled Bayesian inference rule for the parameters of the model that learns local models completely independent of each other, uses only local information and adapts the local model complexity in a data driven fashion. This thesis, for the first time, brings together the computational efficiency and the adaptability of ‘non-competitive’ locally weighted learning schemes and the modelling guarantees of the Bayesian formulation.

Journal Article
TL;DR: In this paper, a response surface method is proposed for reliability analysis of implicit limit state equation, and the structural response is computed by the finite element method, which aims to minimize computational time while producing satisfactory results.
Abstract: A response surface method is proposed for reliability analysis of implicit limit state equation,and the structural response is computed by the finite element method.Typically,the response surface method with an unknown coefficient polynomial instead of an implicit limit state function,which polynomial fitted by a number of sampling points,The locations of these points must be selected in a judicious way to reduce the computational time without deteriorating the quality of the polynomial approximation.To accelerate the iterative convergence speed,the authours propose some improvements,The response surface is fitted by the weighted regression technique,which allows the fitting points to be weighted according to their distance from the true failure surface and their distance from the estimated design point.This method aims to minimize computational time while producing satisfactory results.The efficiency and the accuracy of the proposed method can be evaluated by examples taken from the literature.

Proceedings ArticleDOI
19 Jun 2010
TL;DR: Compared with the conventional PLS algorithm, the GA-WPLS algorithm can greatly improve the prediction ability of NIR multivariate models with the prediction errors decreasing by up to 72.3%, indicating that it is an efficient way for developing promising model using NIR spectra.
Abstract: To take advantages of multiscale property of near infrared (NIR) spectra, a new hybrid algorithm (GA-WPLS) was proposed for developing the multivariate regression model in the wavelet domain instead of the spectra domain. At first, wavelet packet transform (WPT) algorithm and its reconstruction algorithm are employed to split the raw spectra into different frequency components in wavelet domain. Then the prediction models are developed by the WPT-based partial least squares (WPLS) algorithm where each component is characterized by the weighted regression coefficient. Through performance comparison of these WPLS-based models, the optimized decomposition level can be determined. At last, based on the components obtained with the optimized decomposition level, the genetic algorithm is used to select the informative components as the input data of WPLS-based regression model. To validate the GA-WPLS algorithm, it was applied to measure the original extract concentration of beer. Compared with the conventional PLS algorithm, the GA-WPLS algorithm can greatly improve the prediction ability of NIR multivariate models with the prediction errors decreasing by up to 72.3%, indicating that it is an efficient way for developing promising model using NIR spectra.

Journal ArticleDOI
TL;DR: To demonstrate the benefits of using a weighted analysis when some observations are pooled, the bias and confidence interval (CI) properties were compared using an ordinary least squares and a weighted least squares t-based confidence interval and the CI lengths were smaller using a weighed analysis instead of an unweighted analysis.
Abstract: Smaller organisms may have too little tissue to allow assaying as individuals. To get a sufficient sample for assaying, a collection of smaller individual organisms is pooled together to produce a simple observation for modeling and analysis. When a dataset contains a mix of pooled and individual organisms, the variances of the observations are not equal. An unweighted regression method is no longer appropriate because it assumes equal precision among the observations. A weighted regression method is more appropriate and yields more precise estimates because it incorporates a weight to the pooled observations. To demonstrate the benefits of using a weighted analysis when some observations are pooled, the bias and confidence interval (CI) properties were compared using an ordinary least squares and a weighted least squares t-based confidence interval. The slope and intercept estimates were unbiased for both weighted and unweighted analyses. While CIs for the slope and intercept achieved nominal coverage, the CI lengths were smaller using a weighted analysis instead of an unweighted analysis, implying that a weighted analysis will yield greater precision. Environ. Toxicol. Chem. 2010;29:1168–1171. © 2010 SETAC

Journal Article
TL;DR: In this article, the authors proposed a geographically weighted autogressive model for exploring spatial non-stationarity of a regression relationship, which has been applied to a variety of areas.
Abstract: Geographically weighted regression(GWR),as a useful method for exploring spatial non- stationarity of a regression relationship,has been applied to a variety of areas. In this paper,This paper considers the estimation of this spatial economtrics model when spatial autocorrelation is available.Firstly,we propose a geographically weighted autogressive models and provide local likelihood and two-step method estimating procedures.Secondly, the estimation of geographically weighted regression model's with spatial correlated errors is discussed.

22 Aug 2010
TL;DR: In this article, a weighted regression scheme is proposed to deal with the problem of missing values in classi cation-ranks, where consumers provide a ranking of some products instead of rating these products (i.e. explained variable presents missing values).
Abstract: Conjoint analysis seeks to explain an ordered categorical ordinal variable according to several variables using a multiple regression scheme. A common problem encountered, there, is the presence of missing values in classi cation-ranks. In this paper, we are interested in the cases where consumers provide a ranking of some products instead of rating these products (i.e. explained variable presents missing values).In order to deal with this problem, we propose a weighted regression scheme. We empirically show (in several cases of weighting) that, if the number of missing values is not too large, the data remain useful, and our results are close to those of the complete order. A simulation study con rms these ndings.