scispace - formally typeset
Search or ask a question

Showing papers on "Unit-weighted regression published in 2019"


Journal ArticleDOI
TL;DR: Caution is warranted when undertaking regression analysis of RDS data, and even when reported degree is accurate, low reported degree can unduly influence regression estimates.
Abstract: It is unclear whether weighted or unweighted regression is preferred in the analysis of data derived from respondent driven sampling. Our objective was to evaluate the validity of various regression models, with and without weights and with various controls for clustering in the estimation of the risk of group membership from data collected using respondent-driven sampling (RDS). Twelve networked populations, with varying levels of homophily and prevalence, based on a known distribution of a continuous predictor were simulated using 1000 RDS samples from each population. Weighted and unweighted binomial and Poisson general linear models, with and without various clustering controls and standard error adjustments were modelled for each sample and evaluated with respect to validity, bias and coverage rate. Population prevalence was also estimated. In the regression analysis, the unweighted log-link (Poisson) models maintained the nominal type-I error rate across all populations. Bias was substantial and type-I error rates unacceptably high for weighted binomial regression. Coverage rates for the estimation of prevalence were highest using RDS-weighted logistic regression, except at low prevalence (10%) where unweighted models are recommended. Caution is warranted when undertaking regression analysis of RDS data. Even when reported degree is accurate, low reported degree can unduly influence regression estimates. Unweighted Poisson regression is therefore recommended.

60 citations


Journal ArticleDOI
TL;DR: In this paper, a rapid and simple RP-HPLC method for the estimation of chlorthalidone in spiked human plasma was described, and the calibration curve was found to be linear in the range of 100 to 3200 ng/mL.
Abstract: The accuracy of any bioanalytical method depends on the selection of an appropriate calibration model. The most commonly used calibration model is the unweighted linear regression, where the response (y-axis) is plotted against the corresponding concentration (x-axis). The degree of association between these two variables is expressed in terms of correlation coefficient (r2). However, the satisfactory r2 alone is not adequate to accept the calibration model. The wide calibration curve range used in the bioanalytical methods is susceptible to the heteroscedasticity of the calibration curve data. The use of weighted linear regression with an appropriate weighting factor reduces the heteroscedasticity and improves the accuracy over the selected concentration range. The present work describes a rapid and simple RP-HPLC method for the estimation of chlorthalidone in spiked human plasma. The calibration curve standards were studied in the concentration range of 100–3200 ng/mL. The chromatography was performed on a C18 column (250 × 4.6 mm, 5 μm) in an isocratic mode at a flow rate of 1 mL/min using methanol:water (60:40%, v/v) as a mobile phase. The detection was carried out at 276 nm. Both the unweighted regression model and weighted regression models with different weighting factors (1/x, 1/√x, and 1/x2) were evaluated for heteroscedasticity. The statistical approach for the selection of a suitable regression model with appropriate weighting factors was discussed and the developed bioanalytical method was further validated, as per US-FDA guidelines. In calibration curve experiments, although the acceptable r2 of 0.998 was obtained, the % residual plot showed that the data were susceptible to heteroscedasticity. When the weighted linear regression was applied to the same calibration curve data set, no significant difference between % relative residual (% RR) was observed. Furthermore, when % relative error (% RE) was calculated for different weighting factors, it was observed that the regression model with 1/x weighting factor gave a minimum % RE. The calibration curve was found to be linear in the range of 100 to 3200 ng/mL. The validation experiments proved good accuracy, and intra- and inter-day variability and acceptable recovery. Stability studies proved that the drug was stable under tested stability cycles. From the statistical reports obtained from the present work, it was observed that the calibration curve in bioanalytical experiments was susceptible to heteroscedasticity using the unweighted linear regression model. Hence, to obtain homoscedasticity in the calibration curve experiments, there is a need for a weighted linear regression model. The appropriate regression model was further selected by evaluating the % RE for different weighting factors.

40 citations


Journal ArticleDOI
19 Feb 2019-Water SA
TL;DR: Wang et al. as discussed by the authors tried to establish a residential flood-damage function through interviewing the residents living in the region where flood disasters occur frequently, and the Geographically Weighted Regression (GWR) model was applied to modify the traditional regression model, which cannot capture spatial variations, and to reduce the problem of spatial autocorrelation.
Abstract: Flood damage functions are necessary to ensure comprehensive flood-risk management. This study attempts to establish a residential flood-damage function through interviewing the residents living in the region where flood disasters occur frequently. Keelung River basin, near Taipei Metropolitan in Taiwan was selected as study area. Flood damages are related to the flood depths, which are the most commonly considered factor in previously published work. Ordinary least squares (OLS) regression was used to construct the flood-damage function at the beginning. Analytical results indicate that flood depth is the significant variable, but the spatial pattern of the residuals shows that residuals exhibit spatial autocorrelation. The Geographically Weighted Regression (GWR) Model was then applied to modify the traditional regression model, which cannot capture spatial variations, and to reduce the problem of spatial autocorrelation. The R-square value was found to increase from 0.15 to 0.24, and the spatial autocorrelation in the residuals was no longer evident. A modified OLS model with a dummy variable to capture the spatial autocorrelation pattern was also proposed for future applications. In conclusion, the residential flood damage is determined by flood depth and zone, and the GWR model not only captures the spatial variations of the affecting factors, but also helps to discover the independent variable to modify the traditional regression model.

36 citations


Journal ArticleDOI
TL;DR: The traditional OLS and GWR are inadequate for describing the non-stationarity of PM2.5, and evidence of spatial–temporal heterogeneity and possible solutions for modeling the relationships between PM 2.5 and 5 criteria air pollutants is provided.
Abstract: Objective: This study investigated the relationships between PM2.5 and 5 criteria air pollutants (SO2, NO2, PM10, CO, and O3) in Heilongjiang, China, from 2015 to 2018 using global and geographically and temporally weighted regression models. Methods: Ordinary least squares regression (OLS), linear mixed models (LMM), geographically weighted regression (GWR), temporally weighted regression (TWR), and geographically and temporally weighted regression (GTWR) were applied to model the relationships between PM2.5 and 5 air pollutants. Results: The LMM and all GWR-based models (i.e., GWR, TWR, and GTWR) showed great advantages over OLS in terms of higher model R2 and more desirable model residuals, especially TWR and GTWR. The GWR, LMM, TWR, and GTWR improved the model explanation power by 3%, 5%, 12%, and 12%, respectively, from the R2 (0.85) of OLS. TWR yielded slightly better model performance than GTWR and reduced the root mean squared errors (RMSE) and mean absolute error (MAE) of the model residuals by 67% compared with OLS; while GWR only reduced RMSE and MAE by 15% against OLS. LMM performed slightly better than GWR by accounting for both temporal autocorrelation between observations over time and spatial heterogeneity across the 13 cities under study, which provided an alternative for modeling PM2.5. Conclusions: The traditional OLS and GWR are inadequate for describing the non-stationarity of PM2.5. The temporal dependence was more important and significant than spatial heterogeneity in our data. Our study provided evidence of spatial–temporal heterogeneity and possible solutions for modeling the relationships between PM2.5 and 5 criteria air pollutants for Heilongjiang province, China.

30 citations


Journal ArticleDOI
TL;DR: A new robust fuzzy regression modeling technique known as weighted least squares (LS) fuzzy regression to construct a model for crisp input-fuzzy output data and introduces a new weighted objective function to overcome the disadvantages of the ordinary LS approach in the presence of outliers.
Abstract: Weighted regression approach is one of the popular problems in robust regression analysis. Recently, robust fuzzy regression models have proven to be alternative approaches to fuzzy regression models attempting to identify, down-weight and/or ignore unusual points (outliers). This paper proposes a new robust fuzzy regression modeling technique known as weighted least squares (LS) fuzzy regression to construct a model for crisp input-fuzzy output data. We introduce a new weighted objective function to overcome the disadvantages of the ordinary LS approach in the presence of outliers. We derive and describe an iterative reweighted algorithm for minimization of the objective function. The algorithm is presented to approximate the weighted estimators of the fuzzy regression by solving the weighted optimization problem. The proposed algorithm decreases the affect of outliers on the model fit attempting to identify/down-weight them. To this end, experiments on datasets with different numbers of outliers are performed. The accuracy of our approach in a real setting is also tested on establishing a predictive model for evaluation of suspended load based on a real world dataset in hydrology engineering. The numerical results show that in the presence of unusual points the proposed weighted fit tracks the main body of the data considerably better than the ordinary LS fuzzy regression fit both in terms of the selected performance criteria and in terms of identifying and down weighting unusual data (outliers). The results of the numerical examples show that this approach has the capability to examine the behavior of value changes in the goodness-of-fit criteria of the fuzzy regression models when the downweighted observations are omitted.

26 citations


Journal ArticleDOI
TL;DR: In this article, the Akaike Information Criterion (AIC) is used to compare different weighting schemes as well as different models for hydrogen diffusion through an iron foil in a Devanathan-Stachurski cell.

19 citations


Journal ArticleDOI
TL;DR: In this article, a semiparametric generalized least square using Support Vector Regression (SVR) is used to estimate the conditional variance function. But the results show that the resulting estimator and an accompanying standard error correction offer substantially improved precision, nominal coverage rates, and shorter confidence intervals than Ordinary Least Squares with heteroskedasticity-consistent standard errors.

13 citations


Journal ArticleDOI
TL;DR: An automatic method to find trends in known instruments, by using a massive linear regression technique combined with a conventional machine learning proposal that works on optimizing the weights of the linear regressive structures.

8 citations


Journal ArticleDOI
TL;DR: A novel algorithm named the Distance Weighted Regression Classifier (DWLRC), based on the theory that images in the same class will also belong to same linear subspace and they can be represented through a linear equation, which can be used for face recognition under different expression and illumination conditions through a distance weighted method.
Abstract: Linear regression technique is an efficient method to solve face recognition problem. It’s based on the theory that images in the same class will also belong to same linear subspace and they can be represented through a linear equation. However, this method suffers from some misclassification problems for the infinite ductility of regression equation, moreover, it also doesn’t make a proper and full use of the information in each sample. For overcoming these problems, a novel algorithm named the Distance Weighted Regression Classifier (DWLRC) is proposed here. It can be used for face recognition under different expression and illumination conditions through a distance weighted method, and it can also be used for optimizing the error in the final distance calculating stage. Experiments on three benchmarks show the better performance of our DWLRC compared with the traditional LRC and some state-of-art methods.

6 citations


Journal ArticleDOI
TL;DR: The results of a dam deformation modelling application show that the GTWR model can establish a unified spatiotemporal model which can represent the whole deformation trend of the dam and furthermore can predict the deformation of any point in time and space, with stronger flexibility and applicability.
Abstract: The geographically and temporally weighted regression (GTWR) model is a dynamic model which considers the spatiotemporal correlation and the spatiotemporal nonstationarity. Taking into account these advantages, we proposed a spatiotemporal deformation modelling method based on GTWR. In order to further improve the modelling accuracy and efficiency and considering the application characteristics of deformation modelling, the inverse window transformation method is used to search the optimal fitting window width and furthermore the local linear estimation method is used in the fitting coefficient function. Moreover, a comprehensive model for the statistical tests method is proposed in GTWR. The results of a dam deformation modelling application show that the GTWR model can establish a unified spatiotemporal model which can represent the whole deformation trend of the dam and furthermore can predict the deformation of any point in time and space, with stronger flexibility and applicability. Finally, the GTWR model improves the overall temporal prediction accuracy by 43.6% compared to the single-point time-weighted regression (TWR) model.

4 citations


Journal ArticleDOI
TL;DR: In this article, the authors present the African Centre for DNA Barcoding, University of Johannesburg, South Africa and the Canadian Department of Forest & Conservation Sciences, Faculty of Forestry and University of British Columbia, Vancouver, BC V6T 1Z4.
Abstract: 1Department of Biology, McGill University, Montreal, Quebec, Canada 2African Centre for DNA Barcoding, University of Johannesburg, Johannesburg, South Africa 3Departments of Botany, University of British Columbia, Vancouver, BC V6T 1Z4, Canada 4Forest & Conservation Sciences, Faculty of Forestry, University of British Columbia, Vancouver, BC V6T 1Z4, Canada 5National Center for Ecological Analysis and Synthesis, Santa Barbara, California 6Arnold Arboretum, Boston, Massachusetts 7Organismic & Evolutionary Biology, Harvard University, Cambridge, Massachusetts 8School of Biology and Ecology & Sustainability, Solutions Initiative, University of Maine, Orono, Maine


Journal ArticleDOI
31 Oct 2019
TL;DR: In this article, the authors used a mixed geographically and temporally weighted regression (MGTWR) model to determine the best global and local variables in MGTWR and determine the model to be used in North Sumatra's poverty cases in 2010 to 2015.
Abstract: Geographically and temporally weighted regression (GTWR) is a method used when there is spatial and temporal diversity in an observation. GTWR model just consider the local influences of spatial-temporal independent variables on dependent variable. In some cases, the model not only about local influences but there are the global influences of spatial-temporal variables too, so that mixed geographically and temporally weighted regression (MGTWR) model more suitable to use. This study aimed to determine the best global and local variables in MGTWR and to determine the model to be used in North Sumatra’s poverty cases in 2010 to 2015. The result show that the Unemployment rate and labor force participation rates are global variables. Whereas the variable literacy rate, school enrollment rates and households buying rice for poor (raskin) are local variables. Furthermore, Based on Root Mean Square Error (RMSE) and Akaike Information Criterion (AIC) showed that MGTWR better than GTWR when it used in North Sumatra’s poverty cases.


Proceedings ArticleDOI
01 Jul 2019
TL;DR: In this paper, a cross-sensor relative radiometric normalization (RRN) method is proposed for optical satellite images from Landsat 8 OLI (L8) and Landsat 7 ETM+ (L7) sensors.
Abstract: Relative radiometric normalization (RRN) minimizes radiometric differences among images caused by inconsistencies of acquisition condition. In this study, a cross-sensor RRN method is proposed for optical satellite images from Landsat 8 OLI (L8) and Landsat 7 ETM+ (L7) sensors. The data from these two sensors have different pixel depths. Therefore, a rescaling on the radiometry resolution is performed in the preprocessing. Then, multivariate alteration detection (MAD) based on kernel canonical correlation analysis (KCCA) is adopted, which is called KCCA-based MAD, to select pseudo-invariant features (PIFs). The process of RRN is performed by using polynomial regression with Gaussian weighted regression. In experiments, qualitative and quantitative analyses on images from different sensors are conducted. The experimental result demonstrates the superiority of the proposed nonlinear transformation, in terms of regression quality and radiometric consistency, compared with RRN using linear regression.

Journal Article
TL;DR: In this article, S-estimator was used to handle outliers and estimate an RGTWR, which has a coefficient of determination equal to 98,2 while RMSE equal to 33.941 and MAD equal to 4.994.
Abstract: Geographically weighted regression (GWR) is a model that can be used for data with spatial varying. Geographically and Temporally Weighted Regression (GTWR) is a development of the GWR model for data spatial and temporal varying. Parameter estimation in GTWR model uses weighted least square method which is very sensitive to outliers data. The outlier caused bias in parameter estimation, so it must be handled by robust GTWR (RGTWR). In this research, S-estimator was used to handle outliers and estimate an RGTWR. Both GTWR and RGTWR is used to build model crime rate in East Java 2011-2015. The Crime rate is used as a response variable and the percentage of poor people, population density, and human development index are used as explanatory variables. The best model in this research is RGTWR using S-estimator. RGTWR using S-estimator has a coefficient of determination equal to 98,2 meanwhile RMSE equal to 33.941 and MAD equal to 4.994.

Journal ArticleDOI
TL;DR: The results show that the increase in the number of access points does not affect the accuracy of position determination, but the choice of the effective access point will be effective in reducing the error.
Abstract: . As technology and science develops and the coming of new equipment’s, standards and different waves spread. Each of these standards and technologies have involved in indoor positioning by various scholars. Various methods have been developed based on different systems, all of which are based on specific methods and concepts. The research tries to do indoor positioning using local Wi-Fi fingerprints and signals. To reduce the error to collect local fingerprints, RSS values are recorded in 4 directions and two times. Geographic weighted regression method has been used to train the network. In this research, a genetic algorithm is used to select the appropriate parameters. Ultimately, the accuracy of the model has reached 1.76 cm. The results show that the increase in the number of access points does not affect the accuracy of position determination, but the choice of the effective access point will be effective in reducing the error.

Book ChapterDOI
17 Jul 2019
TL;DR: The predictive power to validate the approach on a data about revenue of a large Russian restaurant chain is compared and methods for considering heterogeneity—observations weighting and estimating models on subsamples are described.
Abstract: In this paper, we address several aspects of applying classical machine learning algorithms to a regression problem. We compare the predictive power to validate our approach on a data about revenue of a large Russian restaurant chain. We pay special attention to solve two problems: data heterogeneity and a high number of correlated features. We describe methods for considering heterogeneity—observations weighting and estimating models on subsamples. We define a weighting function via Mahalanobis distance in the space of features and show its predictive properties on following methods: ordinary least squares regression, elastic net, support vector regression, and random forest.

Journal ArticleDOI
TL;DR: In this article, a cross-sensor relative radiometric normalization (RRN) method is proposed for optical satellite images from Landsat 8 OLI (L8) and Landsat 7 ETM+ (L7) sensors.
Abstract: . Relative radiometric normalization (RRN) minimizes radiometric differences among images caused by inconsistencies of acquisition condition. In this study, a cross-sensor RRN method is proposed for optical satellite images from Landsat 8 OLI (L8) and Landsat 7 ETM+ (L7) sensors. The data from these two sensors have different pixel depths. Therefore, a rescaling on the radiometry resolution is performed in the preprocessing. Then, multivariate alteration detection (MAD) based on kernel canonical correlation analysis (KCCA) is adopted, which is called KCCA-based MAD, to select pseudo-invariant features (PIFs). The process of RRN is performed by using polynomial regression with Gaussian weighted regression. In experiments, qualitative and quantitative analyses on images from different sensors are conducted. The experimental result demonstrates the superiority of the proposed nonlinear transformation, in terms of regression quality and radiometric consistency, compared with RRN using linear regression.

Book ChapterDOI
21 Nov 2019
TL;DR: In this paper, a semi-supervised regression model based on geo-weighted regression model is proposed, that is Self-GWR and CO-gWR based on the data of elevation, Aerosol Optical Depth (AOD), temperature, wind speed, humidity and pressure of Beijing-Tianjin-wing area.
Abstract: Aiming at the problem that the geometric weighted regression method has low prediction accuracy when the amount of training data is small, this paper combines the geometric weighted regression model with semi-supervised learning theory to make full use of semi-supervised learning that uses unlabeled samples to participate in training process to enhance the performance of the regression model, a semi-supervised regression model based on geo-weighted regression model is proposed, that is Self-GWR and CO-GWR Based on the data of elevation, Aerosol Optical Depth (AOD), temperature, wind speed, humidity and pressure of Beijing-Tianjin-wing area, this paper uses geo-weighted regression model, self-trained geography weighted regression model and collaborative training geography weighted regression model respectively to practice The experimental results show that CO-GWR effectively improves the accuracy of regression model through two regression models, and the accuracy of Self-GWR is slightly lower than that of GWR model, which indicates that the model may accumulate errors in the self-learning process, resulting in poor regression accuracy finally

Journal ArticleDOI
TL;DR: In order to improve effectiveness of statistic analysis of mathematics education, a statistic analysis method for mathematics education based on locally weighted regression is proposed.
Abstract: In order to improve effectiveness of statistic analysis of mathematics education, a statistic analysis method for mathematics education based on locally weighted regression is proposed. At first, attribute hierarchical model (AHM) is used to build statistic analysis model of mathematics education. Mode of evaluation is selected as diagnostic evaluation of standard reference. Specific entries related to mathematics teaching items in curriculum standards serve as measurement goal, which is regarded as cognitive attribute of AHM. Next, locally weighted regression is introduced, polynomial is weighted and fitted for points to be fitted by means of locally observed data and such points are estimated by least square method. In the end, effectiveness of the proposed method is verified by simulation experiment.

Proceedings ArticleDOI
01 Jul 2019
TL;DR: A local weighted ridge regression algorithm based on particle swarm optimization (PSO) is proposed, which improves the accuracy and adaptability of the algorithm and solves the multicollinearity problem and the unmodeled dynamic problem in the ship maneuvering motion modeling.
Abstract: By improving the strategy of distance measure learning in traditional local weighted algorithm, a local weighted ridge regression algorithm based on particle swarm optimization (PSO) is proposed. Different from the previous strategies of global spatial optimal based distance measure learning, the different distance measures in different dimensions are learned by PSO, which improves the accuracy and adaptability of the algorithm. For the problem that the local weighted learning initial value selection can only rely on the experience, the idea of PSO is introduced to avoid the difficulty of selecting the initial value. Meanwhile, by using black-box identification modeling based on ridge regression, the multicollinearity problem and the unmodeled dynamic problem in the ship maneuvering motion modeling are solved. Through learning of 3-DOF mariner ship model, the effectiveness and generalization of the algorithm are verified, and the modeling of the nonlinear system is realized.

Journal ArticleDOI
TL;DR: The unsupervised classification based on the larger number of clusters, in particular when combined with linear regression, presents a low RMSE due to its ability to identify the spectral signature of the objects present in the scene.
Abstract: Due to sensor or ground station failure and poor atmospheric conditions, missing information is a common problem that reduces the usage of optical remote sensing data. A large number of algorithms have been developed to reconstruct missing information. Simple methods are compared for filling missing data in the case that one spectral band presents missing information but the other bands are complete. The methods are (1) spatial convolution filtering, (2) unsupervised classification using the complete bands and assignment of the cluster’s average value to missing stripe pixels, (3) global regression models, (4) geographically weighted regression (GWR) models, and (5) a combination of classification and linear models. To evaluate the performance of the different methods, missing line stripes of different sizes are simulated and reconstructed using the five methods. Then, root mean square error (RMSE), correlation, maximum deviance, and bias are computed. To identify the conditions related to the performance of each method, some characteristics of the missing lines, as correlation between spectral bands and spatial autocorrelation, are also calculated. The unsupervised classification based on the larger number of clusters, in particular when combined with linear regression, presents a low RMSE due to its ability to identify the spectral signature of the objects present in the scene. The GWR model performs better than global regression model because it is able to fit the relationship between the missing band and the complete bands locally, which is an important advantage in heterogeneous landscape. Spatial filtering is the most inaccurate method except for one pixel-width missing line.