Showing papers on "Proper linear model published in 2016"

PDF

Open Access

Variable selection for regression models

[...]

01 Jan 2016

TL;DR: In this article, a simple method for subset selection of independent variables in regression models is proposed, which expands the usual regression equation to an equation that incorporates all possible subsets of predictors by adding indicator variables as parameters.

...read moreread less

Abstract: SUMMARY. A simple method for subset selection of independent variables in regression models is proposed. We expand the usual regression equation to an equation that incorporates all possible subsets of predictors by adding indicator variables as parameters. The vector of indicator variables dictates which predictors to include. Several choices of priors can be employed for the unknown regression coefficients and the unknown indicator parameters. The posterior distribution of the indicator vector is approximated by means of the Markov Chain Monte Carlo algorithm. We select subsets with high posterior probabilities. In addition to linear models, we consider generalized linear models. Many methods have been proposed for selecting suitable predictors in mul tiple regression. Classical methods for variable selection include backward elim ination, forward selection, and stepwise regression. They sequentially delete or add predictors by means of mean squared error or modified mean squared er ror criteria. Various Bayesian methods have also been proposed. They include model determination by means of the following criteria: Bayesian information criterion (BIC, Schwarz, 1978), asymptotic information criterion (AIC, Akaike, 1974), Bayes factor, and pseudo-Bayes factor. But the power explosion of the number of possible submodels (2P) being considered for p predictors often hand icaps the computation. A more automatic data driven tool is needed for the data analyst to identify a parsimonious model. Mitchell and Beauchamp(1988) proposed a Bayesian variable selection method assuming the prior distribution of each regression coefficient is a mixture of a point mass at 0 and a diffuse uniform distribution elsewhere. They also review other methods. Recently, George and McCulloch (1993) proposed a stochastic

...read moreread less

369 citations

Proceedings Article•DOI•

LASSO: A feature selection technique in predictive modeling for machine learning

[...]

R Muthukrishnan¹, R Rohini¹•Institutions (1)

Bharathiar University¹

01 Oct 2016

TL;DR: The features of the popular regression methods, OLS regression, ridge regression and the LASSO regression are explored in terms of model fitting and prediction accuracy using real data and simulated environment with the help of R package.

...read moreread less

Abstract: Feature selection is one of the techniques in machine learning for selecting a subset of relevant features namely variables for the construction of models. The feature selection technique aims at removing the redundant or irrelevant features or features which are strongly correlated in the data without much loss of information. It is broadly used for making the model much easier to interpret and increase generalization by reducing the variance. Regression analysis plays a vital role in statistical modeling and in turn for performing machine learning tasks. The traditional procedures such as Ordinary Least Squares (OLS) regression, Stepwise regression and partial least squares regression are very sensitive to random errors. Many alternatives have been established in the literature during the past few decades such as Ridge regression and LASSO and its variants. This paper explores the features of the popular regression methods, OLS regression, ridge regression and the LASSO regression. The performance of these procedures has been studied in terms of model fitting and prediction accuracy using real data and simulated environment with the help of R package.

...read moreread less

239 citations

Journal Article•DOI•

Pattern-based local linear regression models for short-term load forecasting

[...]

Grzegorz Dudek¹•Institutions (1)

Częstochowa University of Technology¹

01 Jan 2016-Electric Power Systems Research

TL;DR: In this article, the authors proposed univariate models for short-term load forecasting based on linear regression and patterns of daily cycles of load time series, where the patterns used as input and output variables simplify the forecasting problem by filtering out the trend and seasonal variations of periods longer than the daily one.

...read moreread less

220 citations

Journal Article•DOI•

Prediction of dynamical systems by symbolic regression

[...]

Markus Quade¹, Markus Abel¹, Kamran Shafi², Robert K. Niven², Bernd R. Noack³ - Show less +1 more•Institutions (3)

University of Potsdam¹, University of New South Wales², Braunschweig University of Technology³

13 Jul 2016-Physical Review E

TL;DR: This work investigates two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming, which are capable of learning an analytically tractable model from data, a highly valuable property.

...read moreread less

Abstract: We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.

...read moreread less

100 citations

Proceedings Article•DOI•

A comparative analysis on linear regression and support vector regression

[...]

S. Kavitha¹, Varuna S¹, Ramya R¹•Institutions (1)

Bannari Amman Institute of Technology, Sathy¹

01 Nov 2016

TL;DR: In this article, the authors compared linear regression and support vector regression models to predict the future of business with the current data or historical data for better prediction and accuracy, using the training data set in order to use the correct model.

...read moreread less

Abstract: In business, consumers interest, behavior, product profits are the insights required to predict the future of business with the current data or historical data. These insights can be generated with the statistical techniques for the purpose of forecasting. The statistical techniques can be evaluated for the predictive model based on the requirements of the data. The prediction and forecasting are done widely with time series data. Most of the applications such as weather forecasting, finance and stock market combine historical data with the current streaming data for better accuracy. However the time series data is analyzed with regression models. In this paper, linear regression and support vector regression model is compared using the training data set in order to use the correct model for better prediction and accuracy.

...read moreread less

85 citations

Journal Article•DOI•

One-day-ahead daily power forecasting of photovoltaic systems based on partial functional linear regression models

[...]

Guochang Wang¹, Yan Su², Lianjie Shu²•Institutions (2)

Jinan University¹, University of Macau²

01 Oct 2016-Renewable Energy

TL;DR: In this paper, the authors proposed a partial functional linear regression model (PFLRM) for forecasting the daily power output of PV systems, which is a generalization of the traditional multiple linear regression (MLR) model but enables to model nonlinearity structure.

...read moreread less

82 citations

Journal Article•DOI•

Parametric regression model for survival data: Weibull regression model as an example

[...]

Zhongheng Zhang¹•Institutions (1)

Zhejiang University¹

01 Dec 2016-Annals of Translational Medicine

TL;DR: Some basic knowledge on Weibull regression model is introduced and how to fit the model with R software is illustrated, which provides another way to report your findings.

...read moreread less

Abstract: Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings.

...read moreread less

82 citations

Journal Article•DOI•

Mathematical programming for piecewise linear regression analysis

[...]

Lingjian Yang¹, Songsong Liu¹, Sophia Tsoka², Lazaros G. Papageorgiou¹•Institutions (2)

University College London¹, King's College London²

01 Feb 2016-Expert Systems With Applications

TL;DR: This work proposes an efficient rule-based multivariate regression method based on piece-wise functions that achieves better prediction performance than state-of-the-arts approaches and can benefit expert systems in various applications by automatically acquiring knowledge from databases to improve the quality of knowledge base.

...read moreread less

Abstract: A novel piece-wise linear regression method has been proposed in this work.The method partitions samples into multiple regions from a single attribute.Each region is fitted with a linear regression function.An optimisation model is proposed to decide break-points and regression functions.Benchmark examples have been used to demonstrate its efficiency. In data mining, regression analysis is a computational tool that predicts continuous output variables from a number of independent input variables, by approximating their complex inner relationship. A large number of methods have been successfully proposed, based on various methodologies, including linear regression, support vector regression, neural network, piece-wise regression, etc. In terms of piece-wise regression, the existing methods in literature are usually restricted to problems of very small scale, due to their inherent non-linear nature. In this work, a more efficient piece-wise linear regression method is introduced based on a novel integer linear programming formulation. The proposed method partitions one input variable into multiple mutually exclusive segments, and fits one multivariate linear regression function per segment to minimise the total absolute error. Assuming both the single partition feature and the number of regions are known, the mixed integer linear model is proposed to simultaneously determine the locations of multiple break-points and regression coefficients for each segment. Furthermore, an efficient heuristic procedure is presented to identify the key partition feature and final number of break-points. 7 real world problems covering several application domains have been used to demonstrate the efficiency of our proposed method. It is shown that our proposed piece-wise regression method can be solved to global optimality for datasets of thousands samples, which also consistently achieves higher prediction accuracy than a number of state-of-the-art regression methods. Another advantage of the proposed method is that the learned model can be conveniently expressed as a small number of if-then rules that are easily interpretable. Overall, this work proposes an efficient rule-based multivariate regression method based on piece-wise functions and achieves better prediction performance than state-of-the-arts approaches. This novel method can benefit expert systems in various applications by automatically acquiring knowledge from databases to improve the quality of knowledge base.

...read moreread less

81 citations

Journal Article•DOI•

Robust estimation in generalized linear models: the density power divergence approach

[...]

Abhik Ghosh¹, Ayanendranath Basu¹•Institutions (1)

Indian Statistical Institute¹

01 Jun 2016-Test

TL;DR: This paper develops a robust estimation procedure for the generalized linear models that can generate robust estimators with little loss in efficiency and explores two particular special cases in detail—Poisson regression for count data and logistic regression for binary data.

...read moreread less

Abstract: The generalized linear model is a very important tool for analyzing real data in several application domains where the relationship between the response and explanatory variables may not be linear or the distributions may not be normal in all the cases. Quite often such real data contain a significant number of outliers in relation to the standard parametric model used in the analysis; in such cases inference based on the maximum likelihood estimator could be unreliable. In this paper, we develop a robust estimation procedure for the generalized linear models that can generate robust estimators with little loss in efficiency. We will also explore two particular special cases in detail—Poisson regression for count data and logistic regression for binary data. We will also illustrate the performance of the proposed estimators through some real-life examples.

...read moreread less

68 citations

Journal Article•DOI•

Classical testing in functional linear models

[...]

Dehan Kong¹, Ana-Maria Staicu², Arnab Maity²•Institutions (2)

University of North Carolina at Chapel Hill¹, North Carolina State University²

20 Sep 2016-Journal of Nonparametric Statistics

TL;DR: This work extends four tests common in classical regression – Wald, score, likelihood ratio and F tests – to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate.

...read moreread less

Abstract: We extend four tests common in classical regression – Wald, score, likelihood ratio and F tests – to functional linear regression, for testing the null hypothesis, that there is no association betw...

...read moreread less

59 citations

Journal Article•DOI•

New class of Johnson SB distributions and its associated regression model for rates and proportions.

[...]

Artur J. Lemonte¹, Jorge Luis Bazán²•Institutions (2)

Federal University of Pernambuco¹, University of São Paulo²

01 Jul 2016-Biometrical Journal

TL;DR: A new regression model is introduced by considering the distribution proposed in this article, which is useful for situations where the response is restricted to the standard unit interval and the regression structure involves regressors and unknown parameters.

...read moreread less

Abstract: By starting from the Johnson SB distribution pioneered by Johnson (), we propose a broad class of distributions with bounded support on the basis of the symmetric family of distributions. The new class of distributions provides a rich source of alternative distributions for analyzing univariate bounded data. A comprehensive account of the mathematical properties of the new family is provided. We briefly discuss estimation of the model parameters of the new class of distributions based on two estimation methods. Additionally, a new regression model is introduced by considering the distribution proposed in this article, which is useful for situations where the response is restricted to the standard unit interval and the regression structure involves regressors and unknown parameters. The regression model allows to model both location and dispersion effects. We define two residuals for the proposed regression model to assess departures from model assumptions as well as to detect outlying observations, and discuss some influence methods such as the local influence and generalized leverage. Finally, an application to real data is presented to show the usefulness of the new regression model.

...read moreread less

Proceedings Article•DOI•

Parameters estimation via dynamic regressor extension and mixing

[...]

Stanislav Aranovskiy¹, Alexey A. Bobtsov², Romeo Ortega³, Anton A. Pyrkin²•Institutions (3)

French Institute for Research in Computer Science and Automation¹, Saint Petersburg State University of Information Technologies, Mechanics and Optics², Supélec³

06 Jul 2016

TL;DR: A new way to design parameter estimators with enhanced performance is proposed via the application of a dynamic operator to the original regression to yield a new parameter estimator whose convergence is established without the usual requirement of regressor persistency of excitation.

...read moreread less

Abstract: A new way to design parameter estimators with enhanced performance is proposed in the paper. The procedure consists of two stages, first, the generation of new regression forms via the application of a dynamic operator to the original regression. Second, a suitable mix of these new regressors to obtain the final desired regression form. For classical linear regression forms the procedure yields a new parameter estimator whose convergence is established without the usual requirement of regressor persistency of excitation. The technique is also applied to nonlinear regressions with “partially” monotonic parameter dependence—giving rise again to estimators with enhanced performance. Simulation results illustrate the advantages of the proposed procedure in both scenarios.

...read moreread less

Journal Article•DOI•

An asymmetric logistic regression model for ecological data

[...]

Osamu Komori¹, Shinto Eguchi, Shiro Ikeda, Hiroshi Okamura², Momoko Ichinokawa², Shin-Ichiro Nakayama² - Show less +2 more•Institutions (2)

University of Fukui¹, Fisheries Agency²

01 Feb 2016-Methods in Ecology and Evolution

TL;DR: An asymmetric logistic regression model that uses a new parameter to account for data complexity and can enhance the applicability of a generalized linear model to various ecological problems using a slight modification, and significantly improves model fitting and model selection.

...read moreread less

Abstract: Summary Binary data are popular in ecological and environmental studies; however, due to various uncertainties and complexities present in data sets, the standard generalized linear model with a binomial error distribution often demonstrates insufficient predictive performance when analysing binary and proportional data. To address this difficulty, we propose an asymmetric logistic regression model that uses a new parameter to account for data complexity. We observe that this parameter controls the model's asymmetry and is important for adjusting the weights associated with observed data in order to improve model fitting. This model includes the ordinary logistic regression model as a special case. It is easily implemented using a slight modification of glm or glmer in statistical software R. Simulation studies suggest that our new approach outperforms a traditional approach in terms of both predictive accuracy and variable selection. In a case study involving fisheries data, we found that the annual catch amount had a greater impact on stock status prediction, and improved predictive capability was supported with a smaller AIC compared to a generalized linear model. In summary, our method can enhance the applicability of a generalized linear model to various ecological problems using a slight modification, and significantly improves model fitting and model selection.

...read moreread less

Journal Article•DOI•

Online Damping Ratio Prediction Using Locally Weighted Linear Regression

[...]

Junbo Zhang¹, Chi Yung Chung², Yingduo Han³•Institutions (3)

South China University of Technology¹, University of Saskatchewan², Tsinghua University³

01 May 2016-IEEE Transactions on Power Systems

TL;DR: In this article, a locally weighted linear regression (LWLR) method is proposed to predict damping ratio of a dominant mode online, which is essentially proposed for nonlinear data fitting.

...read moreread less

Abstract: In this study, a locally weighted linear regression (LWLR) method is proposed to predict damping ratio of a dominant mode online. The LWLR method, which is nonparametric and data-oriented, is essentially proposed for nonlinear data fitting; therefore, it can track the nonlinear power system operations and help damping ratio prediction in real power systems, which is hardly achieved by the conventional linear regression. To successfully implement this method, the measurement of weighting value and the choice of weighting function as well as its parameter setting, related to prediction accuracy and numerical conditions, are extensively discussed. Simulations are carried out in a two-area four-machine system and a large complex system, China Southern Grid. Both results validate the effectiveness of the proposed method.

...read moreread less

Proceedings Article•DOI•

Machine learning, linear and Bayesian models for logistic regression in failure detection problems

[...]

Bohdan M. Pavlyshenko¹•Institutions (1)

Lviv University¹

01 Dec 2016

TL;DR: This work used the data from Kaggle competition “Bosch Production Line Performance” as a data set for the analysis and considered the use of machine learning, linear and Bayesian models for logistic regression in manufacturing failures detection.

...read moreread less

Abstract: In this work, we study the use of logistic regression in manufacturing failures detection. As a data set for the analysis, we used the data from Kaggle competition “Bosch Production Line Performance”. We considered the use of machine learning, linear and Bayesian models. For machine learning approach, we analyzed XGBoost tree based classifier to obtain high scored classification. Using the generalized linear model for logistic regression makes it possible to analyze the influence of the factors under study. The Bayesian approach for logistic regression gives the statistical distribution for the parameters of the model. It can be useful in the probabilistic analysis, e.g. risk assessment.

...read moreread less

Journal Article•DOI•

Optimizing h value for fuzzy linear regression with asymmetric triangular fuzzy coefficients

[...]

Fangning Chen¹, Yizeng Chen¹, Jian Zhou¹, Yuanyuan Liu¹•Institutions (1)

Shanghai University¹

01 Jan 2016-Engineering Applications of Artificial Intelligence

TL;DR: The procedure to find the optimal h value to maximize the system credibility of the fuzzy linear regression model with asymmetric triangular fuzzy coefficients is described, and it is shown that thesystem credibility in the asymmetric case will be higher than that in the symmetric case.

...read moreread less

Proceedings Article•DOI•

Differentially Private Regression Diagnostics

[...]

Yan Chen¹, Ashwin Machanavajjhala¹, Jerome P. Reiter¹, Andrés F. Barrientos¹•Institutions (1)

Duke University¹

01 Dec 2016

TL;DR: ε-differentially private diagnostics for regression are developed, beginning to fill a gap in privacy-preserving data analysis and are adequate for diagnosing the fit and predictive power of regression models on representative datasets when the size of the dataset times the privacy parameter (ε) is at least 1000.

...read moreread less

Abstract: Linear and logistic regression are popular statistical techniques for analyzing multi-variate data. Typically, analysts do not simply posit a particular form of the regression model, estimate its parameters, and use the results for inference orprediction. Instead, they first use a variety of diagnostic techniques to assess how well the model fits the relationships in the data and how well it can be expected to predict outcomes for out-of-sample records, revising the model as necessary to improve fit and predictive power. In this article, we develop e-differentially private diagnostics for regression, beginning to fill a gap in privacy-preserving data analysis. Specifically, we create differentially private versions of residual plots for linear regression and of receiver operating characteristic (ROC) curves for logistic regression. The former helps determine whether or not the data satisfy the assumptions underlying the linear regression model, and the latter is used to assess the predictive power of the logistic regression model. These diagnostics improve the usefulness of algorithms for computing differentially private regression output, which alone does not allow analysts to assess the quality of the posited model. Our empirical studies show that these algorithms are adequate for diagnosing the fit and predictive power of regression models on representative datasets when the size of the dataset times the privacy parameter (e) is at least 1000.

...read moreread less

Journal Article•DOI•

A test of linearity in partial functional linear regression

[...]

Ping Yu¹, Ping Yu², Zhongzhan Zhang², Jiang Du²•Institutions (2)

Shanxi Teachers University¹, Beijing University of Technology²

08 Jun 2016-Metrika

TL;DR: In this article, a test procedure based on the residual sums of squares under the null and alternative hypothesis, and established the asymptotic properties of the resulting test procedure has good size and power with finite sample sizes.

...read moreread less

Abstract: This paper investigates the hypothesis test of the parametric component in partial functional linear regression. We propose a test procedure based on the residual sums of squares under the null and alternative hypothesis, and establish the asymptotic properties of the resulting test. A simulation study shows that the proposed test procedure has good size and power with finite sample sizes. Finally, we present an illustration through fitting the Berkeley growth data with a partial functional linear regression model and testing the effect of gender on the height of kids.

...read moreread less

Journal Article•DOI•

Sparse reduced-rank regression with covariance estimation

[...]

Lisha Chen, Jianhua Z. Huang¹•Institutions (1)

Renmin University of China¹

01 Jan 2016-Statistics and Computing

TL;DR: This work develops a numerical algorithm to solve the penalized regression problem and proposes to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by use a similar penalty on the precision matrix.

...read moreread less

Abstract: Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question On the one hand, it is desirable to seek model parsimony when facing a large number of parameters On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix We develop a numerical algorithm to solve the penalized regression problem In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection

...read moreread less

Journal Article•DOI•

Improved multiple linear regression based models for solar collectors

[...]

Richárd Kicsiny¹•Institutions (1)

Szent István University¹

01 Jun 2016-Renewable Energy

TL;DR: In this paper, the IMLR model and the MPR model are compared and validated and compared with the former model based on measured data of a real collector field, and the results show that the gained precision cannot be significantly improved any more if the regression functions are linear in terms of the input variables.

...read moreread less

Journal Article•DOI•

Shrinkage ridge regression in partial linear models

[...]

Mahdi Roozbeh¹, Mohammad Arashi²•Institutions (2)

Semnan University¹, University of Shahrood²

08 Jan 2016-Communications in Statistics-theory and Methods

TL;DR: In this paper, the shrinkage ridge estimator and its positive part are defined for the regression coefficient vector in a partial linear model and the differencing approach is used to enjoy the ease of parameter estimation after removing the non parametric part of the model.

...read moreread less

Abstract: In this paper, shrinkage ridge estimator and its positive part are defined for the regression coefficient vector in a partial linear model. The differencing approach is used to enjoy the ease of parameter estimation after removing the non parametric part of the model. The exact risk expressions in addition to biases are derived for the estimators under study and the region of optimality of each estimator is exactly determined. The performance of the estimators is evaluated by simulated as well as real data sets.

...read moreread less

Journal Article•DOI•

Suspended load estimation using L1-fuzzy regression, L2-fuzzy regression and MARS-fuzzy regression models

[...]

Jalal Chachi¹, Seyed Mahmoud Taheri², Hojat Rezaee Pazhand•Institutions (2)

Semnan University¹, University of Tehran²

10 May 2016-Hydrological Sciences Journal-journal Des Sciences Hydrologiques

TL;DR: The proposed MARS-fuzzy regression model can be used for modelling natural phenomena whose available observations are reported as imprecise rather than crisp and performs better than the other models in suspended load estimation for the particular datasets.

...read moreread less

Abstract: The problem of estimation of suspended load carried by a river is an important topic for many water resources projects. Conventional estimation methods are based on the assumption of exact observations. In practice, however, a major source of natural uncertainty is due to imprecise measurements and/or imprecise relationships between variables. In this paper, using the Multivariate Adaptive Regression Splines (MARS) technique, a novel fuzzy regression model for imprecise response and crisp explanatory variables is presented. The investigated fuzzy regression model is applied to forecast suspended load by discharge based on two real-world datasets. The accuracy of the proposed method is compared with two well-known parametric fuzzy regression models, namely, the fuzzy least-absolutes model and the fuzzy least-squares model. The comparison results reveal that the MARS-fuzzy regression model performs better than the other models in suspended load estimation for the particular datasets. This comparison...

...read moreread less

Journal Article•DOI•

Error measures for fuzzy linear regression

[...]

Duygu İçen¹, Haydar Demirhan²•Institutions (2)

Hacettepe University¹, RMIT University²

01 Sep 2016

TL;DR: This study demonstrates that the best error measures to estimate fuzzy linear regression model parameters with Monte Carlo method are proved to be E1, E2, and the mean square error.

...read moreread less

Abstract: HighlightsThe study covers different error measures that have not previously calculated for Monte Carlo study in fuzzy linear regression models.We obtain the most useful and the worst error measures to estimate fuzzy regression parameters without using any mathematical programming or heavy fuzzy arithmetic operations. The focus of this study is to use Monte Carlo method in fuzzy linear regression. The purpose of the study is to figure out the appropriate error measures for the estimation of fuzzy linear regression model parameters with Monte Carlo method. Since model parameters are estimated without any mathematical programming or heavy fuzzy arithmetic operations in fuzzy linear regression with Monte Carlo method. In the literature, only two error measures (E1 and E2) are available for the estimation of fuzzy linear regression model parameters. Additionally, accuracy of available error measures under the Monte Carlo procedure has not been evaluated. In this article, mean square error, mean percentage error, mean absolute percentage error, and symmetric mean absolute percentage error are proposed for the estimation of fuzzy linear regression model parameters with Monte Carlo method. Moreover, estimation accuracies of existing and proposed error measures are explored. Error measures are compared to each other in terms of estimation accuracy; hence, this study demonstrates that the best error measures to estimate fuzzy linear regression model parameters with Monte Carlo method are proved to be E1, E2, and the mean square error. One the other hand, the worst one can be given as the mean percentage error. These results would be useful to enrich the studies that have already focused on fuzzy linear regression models.

...read moreread less

Journal Article•DOI•

Benchmarking regression algorithms for income prediction modeling

[...]

Azamat Kibekbaev¹, Ekrem Duman¹•Institutions (1)

Özyeğin University¹

01 Oct 2016-Information Systems

TL;DR: It is found that the traditional linear regression results perform comparable to more sophisticated non-linear and two-stage models.

...read moreread less

Journal Article•DOI•

Interval-valued data regression using nonparametric additive models

[...]

Changwon Lim¹•Institutions (1)

Chung-Ang University¹

01 Sep 2016-Journal of The Korean Statistical Society

TL;DR: This paper proposes a nonparametric additive approach to properly analyze interval-valued data with a possibly nonlinear pattern and demonstrates the proposed approach using a simulation study and a real data example, and also compares its performance with those of existing methods.

...read moreread less

Abstract: Interval-valued data are observed as ranges instead of single values and frequently appear with advanced technologies in current data collection processes. Regression analysis of interval-valued data has been studied in the literature, but mostly focused on parametric linear regression models. In this paper, we study interval-valued data regression based on nonparametric additive models. By employing one of the current methods based on linear regression, we propose a nonparametric additive approach to properly analyze interval-valued data with a possibly nonlinear pattern. We demonstrate the proposed approach using a simulation study and a real data example, and also compare its performance with those of existing methods.

...read moreread less

Posted Content•

Online Active Linear Regression via Thresholding

[...]

Carlos Riquelme¹, Ramesh Johari¹, Baosen Zhang²•Institutions (2)

Stanford University¹, University of Washington²

09 Feb 2016-arXiv: Machine Learning

TL;DR: In this article, a threshold-based algorithm for selection of most informative observations is proposed to solve the problem of online active learning to collect data for regression modeling, where a decision maker with a limited experimentation budget must efficiently learn an underlying linear population model.

...read moreread less

Abstract: We consider the problem of online active learning to collect data for regression modeling. Specifically, we consider a decision maker with a limited experimentation budget who must efficiently learn an underlying linear population model. Our main contribution is a novel threshold-based algorithm for selection of most informative observations; we characterize its performance and fundamental lower bounds. We extend the algorithm and its guarantees to sparse linear regression in high-dimensional settings. Simulations suggest the algorithm is remarkably robust: it provides significant benefits over passive random sampling in real-world datasets that exhibit high nonlinearity and high dimensionality --- significantly reducing both the mean and variance of the squared error.

...read moreread less

Journal Article•DOI•

Varying coefficient partially functional linear regression models

[...]

Qing-Yan Peng¹, Jian-Jun Zhou¹, Nian-Sheng Tang¹•Institutions (1)

Yunnan University¹

01 Sep 2016

TL;DR: In this paper, a varying coefficient partially functional linear regression model (VCPFLM) is proposed, where the functional parameter is approximated by a polynomial spline, and the spline coefficients are estimated by the ordinary least squares method.

...read moreread less

Abstract: By relaxing the linearity assumption in partial functional linear regression models, we propose a varying coefficient partially functional linear regression model (VCPFLM), which includes varying coefficient regression models and functional linear regression models as its special cases. We study the problem of functional parameter estimation in a VCPFLM. The functional parameter is approximated by a polynomial spline, and the spline coefficients are estimated by the ordinary least squares method. Under some regular conditions, we obtain asymptotic properties of functional parameter estimators, including the global convergence rates and uniform convergence rates. Simulation studies are conducted to investigate the performance of the proposed methodologies.

...read moreread less

Proceedings Article•DOI•

Pattern-aided regression modeling and prediction model analysis

[...]

Guozhu Dong¹, Vahid Taslimitehrani¹•Institutions (1)

Wright State University¹

01 May 2016

TL;DR: This paper defines PXR models using several patterns and local regression models, which respectively serve as logical and behavioral characterizations of distinct predictor-response relationships, and introduces a contrast pattern aided regression (CPXR) method, to build accurate PxR models.

...read moreread less

Abstract: This paper first introduces a new style of regression models, namely pattern aided regression (PXR) models, aimed at representing accurate and interpretable prediction models. It also introduces a contrast pattern aided regression (CPXR) method, to build accurate PXR models in an efficient manner. In experiments, the PXR models built by CPXR are very accurate in general, often outperforming state-of-the-art regression methods by wide margins. From extensive experiments we also found that (1) regression modeling applications often involve complex diverse predictor-response relationships, which occur when the optimal regression models (of given regression model type) fitting distinct natural subgroups of data are highly different, and (2) state-of-the-art regression methods are often unable to adequately model such relationships. CPXR is also useful for analyzing how a given regression model makes prediction errors. This is an extended abstract of [6].

...read moreread less

Posted Content•

Regression Analysis for Microbiome Compositional Data

[...]

Pixu Shi, Anru Zhang, Hongzhe Li

03 Mar 2016-arXiv: Applications

TL;DR: In this paper, a penalized estimation procedure for estimating the regression coefficients and for selecting variables under the linear constraints is developed, and a method is also proposed to obtain de-biased estimates of the regression coefficient that are asymptotically unbiased and have a joint multivariate normal distribution, which can be used to obtain the $p$-values.

...read moreread less

Abstract: One important problem in microbiome analysis is to identify the bacterial taxa that are associated with a response, where the microbiome data are summarized as the composition of the bacterial taxa at different taxonomic levels. This paper considers regression analysis with such compositional data as covariates. In order to satisfy the subcompositional coherence of the results, linear models with a set of linear constraints on the regression coefficients are introduced. Such models allow regression analysis for subcompositions and include the log-contrast model for compositional covariates as a special case. A penalized estimation procedure for estimating the regression coefficients and for selecting variables under the linear constraints is developed. A method is also proposed to obtain de-biased estimates of the regression coefficients that are asymptotically unbiased and have a joint asymptotic multivariate normal distribution. This provides valid confidence intervals of the regression coefficients and can be used to obtain the $p$-values. Simulation results show the validity of the confidence intervals and smaller variances of the de-biased estimates when the linear constraints are imposed. The proposed methods are applied to a gut microbiome data set and identify four bacterial genera that are associated with the body mass index after adjusting for the total fat and caloric intakes.

...read moreread less

Journal Article•DOI•

Statistical Downscaling Using Local Polynomial Regression for Rainfall Predictions – A Case Study

[...]

Jany George¹, Letha Janaki², Jairaj Parameswaran Gomathy¹•Institutions (2)

College of Engineering, Trivandrum¹, Cochin University of Science and Technology²

01 Jan 2016-Water Resources Management

TL;DR: In this paper, a methodology for statistical downscaling using local polynomial regression for obtaining the future projections of rainfall in a catchment was presented, where the model was applied to forecast the rainfall in the catchment of Idukky reservoir in Kerala, India.

...read moreread less

Abstract: This article presents a methodology for statistical downscaling using local polynomial regression for obtaining the future projections of rainfall in a catchment. Local polynomial regression offers a method to catch the nonlinearities in the input–output relationship compared to traditional regression by identifying the nearest neighbors of the predictor point for a specified band width. It fits a low degree polynomial model to the subset of the data at each point by weighted least squares. The local regression fit is complete when the regression function values are calculated for all the data points. A smooth curve through the data points is obtained by this method. Mean sea level pressure, geopotential height 500 mb, air temperature, relative humidity and wind speed are identified as the potential predictors for predicting the rainfall. Monthly data on the predictors for nine grid points around the study area are obtained from National Centre for Environmental Prediction (NCEP)/ National Centre for Atmospheric Research (NCAR) Re-analysis data. The model was applied to forecast the rainfall in the catchment of Idukky reservoir in Kerala, India. The model performance was compared with that of multiple linear regression and artificial neural network models. It is seen that the local polynomial regression model gives a better performance in forecasting the rainfall. The new methodology adopted is computationally simple, easy to implement and it captures the linear and non linear features in the data set preserving the dynamics of the atmosphere and the properties of the historical series.

...read moreread less

Collapse