scispace - formally typeset
Search or ask a question

Showing papers on "Proper linear model published in 2016"


01 Jan 2016
TL;DR: In this article, a simple method for subset selection of independent variables in regression models is proposed, which expands the usual regression equation to an equation that incorporates all possible subsets of predictors by adding indicator variables as parameters.
Abstract: SUMMARY. A simple method for subset selection of independent variables in regression models is proposed. We expand the usual regression equation to an equation that incorporates all possible subsets of predictors by adding indicator variables as parameters. The vector of indicator variables dictates which predictors to include. Several choices of priors can be employed for the unknown regression coefficients and the unknown indicator parameters. The posterior distribution of the indicator vector is approximated by means of the Markov Chain Monte Carlo algorithm. We select subsets with high posterior probabilities. In addition to linear models, we consider generalized linear models. Many methods have been proposed for selecting suitable predictors in mul tiple regression. Classical methods for variable selection include backward elim ination, forward selection, and stepwise regression. They sequentially delete or add predictors by means of mean squared error or modified mean squared er ror criteria. Various Bayesian methods have also been proposed. They include model determination by means of the following criteria: Bayesian information criterion (BIC, Schwarz, 1978), asymptotic information criterion (AIC, Akaike, 1974), Bayes factor, and pseudo-Bayes factor. But the power explosion of the number of possible submodels (2P) being considered for p predictors often hand icaps the computation. A more automatic data driven tool is needed for the data analyst to identify a parsimonious model. Mitchell and Beauchamp(1988) proposed a Bayesian variable selection method assuming the prior distribution of each regression coefficient is a mixture of a point mass at 0 and a diffuse uniform distribution elsewhere. They also review other methods. Recently, George and McCulloch (1993) proposed a stochastic

369 citations


Proceedings ArticleDOI
01 Oct 2016
TL;DR: The features of the popular regression methods, OLS regression, ridge regression and the LASSO regression are explored in terms of model fitting and prediction accuracy using real data and simulated environment with the help of R package.
Abstract: Feature selection is one of the techniques in machine learning for selecting a subset of relevant features namely variables for the construction of models. The feature selection technique aims at removing the redundant or irrelevant features or features which are strongly correlated in the data without much loss of information. It is broadly used for making the model much easier to interpret and increase generalization by reducing the variance. Regression analysis plays a vital role in statistical modeling and in turn for performing machine learning tasks. The traditional procedures such as Ordinary Least Squares (OLS) regression, Stepwise regression and partial least squares regression are very sensitive to random errors. Many alternatives have been established in the literature during the past few decades such as Ridge regression and LASSO and its variants. This paper explores the features of the popular regression methods, OLS regression, ridge regression and the LASSO regression. The performance of these procedures has been studied in terms of model fitting and prediction accuracy using real data and simulated environment with the help of R package.

239 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed univariate models for short-term load forecasting based on linear regression and patterns of daily cycles of load time series, where the patterns used as input and output variables simplify the forecasting problem by filtering out the trend and seasonal variations of periods longer than the daily one.

220 citations


Journal ArticleDOI
TL;DR: This work investigates two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming, which are capable of learning an analytically tractable model from data, a highly valuable property.
Abstract: We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.

100 citations


Proceedings ArticleDOI
01 Nov 2016
TL;DR: In this article, the authors compared linear regression and support vector regression models to predict the future of business with the current data or historical data for better prediction and accuracy, using the training data set in order to use the correct model.
Abstract: In business, consumers interest, behavior, product profits are the insights required to predict the future of business with the current data or historical data. These insights can be generated with the statistical techniques for the purpose of forecasting. The statistical techniques can be evaluated for the predictive model based on the requirements of the data. The prediction and forecasting are done widely with time series data. Most of the applications such as weather forecasting, finance and stock market combine historical data with the current streaming data for better accuracy. However the time series data is analyzed with regression models. In this paper, linear regression and support vector regression model is compared using the training data set in order to use the correct model for better prediction and accuracy.

85 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a partial functional linear regression model (PFLRM) for forecasting the daily power output of PV systems, which is a generalization of the traditional multiple linear regression (MLR) model but enables to model nonlinearity structure.

82 citations


Journal ArticleDOI
TL;DR: Some basic knowledge on Weibull regression model is introduced and how to fit the model with R software is illustrated, which provides another way to report your findings.
Abstract: Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings.

82 citations


Journal ArticleDOI
TL;DR: This work proposes an efficient rule-based multivariate regression method based on piece-wise functions that achieves better prediction performance than state-of-the-arts approaches and can benefit expert systems in various applications by automatically acquiring knowledge from databases to improve the quality of knowledge base.
Abstract: A novel piece-wise linear regression method has been proposed in this work.The method partitions samples into multiple regions from a single attribute.Each region is fitted with a linear regression function.An optimisation model is proposed to decide break-points and regression functions.Benchmark examples have been used to demonstrate its efficiency. In data mining, regression analysis is a computational tool that predicts continuous output variables from a number of independent input variables, by approximating their complex inner relationship. A large number of methods have been successfully proposed, based on various methodologies, including linear regression, support vector regression, neural network, piece-wise regression, etc. In terms of piece-wise regression, the existing methods in literature are usually restricted to problems of very small scale, due to their inherent non-linear nature. In this work, a more efficient piece-wise linear regression method is introduced based on a novel integer linear programming formulation. The proposed method partitions one input variable into multiple mutually exclusive segments, and fits one multivariate linear regression function per segment to minimise the total absolute error. Assuming both the single partition feature and the number of regions are known, the mixed integer linear model is proposed to simultaneously determine the locations of multiple break-points and regression coefficients for each segment. Furthermore, an efficient heuristic procedure is presented to identify the key partition feature and final number of break-points. 7 real world problems covering several application domains have been used to demonstrate the efficiency of our proposed method. It is shown that our proposed piece-wise regression method can be solved to global optimality for datasets of thousands samples, which also consistently achieves higher prediction accuracy than a number of state-of-the-art regression methods. Another advantage of the proposed method is that the learned model can be conveniently expressed as a small number of if-then rules that are easily interpretable. Overall, this work proposes an efficient rule-based multivariate regression method based on piece-wise functions and achieves better prediction performance than state-of-the-arts approaches. This novel method can benefit expert systems in various applications by automatically acquiring knowledge from databases to improve the quality of knowledge base.

81 citations


Journal ArticleDOI
01 Jun 2016-Test
TL;DR: This paper develops a robust estimation procedure for the generalized linear models that can generate robust estimators with little loss in efficiency and explores two particular special cases in detail—Poisson regression for count data and logistic regression for binary data.
Abstract: The generalized linear model is a very important tool for analyzing real data in several application domains where the relationship between the response and explanatory variables may not be linear or the distributions may not be normal in all the cases. Quite often such real data contain a significant number of outliers in relation to the standard parametric model used in the analysis; in such cases inference based on the maximum likelihood estimator could be unreliable. In this paper, we develop a robust estimation procedure for the generalized linear models that can generate robust estimators with little loss in efficiency. We will also explore two particular special cases in detail—Poisson regression for count data and logistic regression for binary data. We will also illustrate the performance of the proposed estimators through some real-life examples.

68 citations


Journal ArticleDOI
TL;DR: This work extends four tests common in classical regression – Wald, score, likelihood ratio and F tests – to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate.
Abstract: We extend four tests common in classical regression – Wald, score, likelihood ratio and F tests – to functional linear regression, for testing the null hypothesis, that there is no association betw...

59 citations


Journal ArticleDOI
TL;DR: A new regression model is introduced by considering the distribution proposed in this article, which is useful for situations where the response is restricted to the standard unit interval and the regression structure involves regressors and unknown parameters.
Abstract: By starting from the Johnson SB distribution pioneered by Johnson (), we propose a broad class of distributions with bounded support on the basis of the symmetric family of distributions. The new class of distributions provides a rich source of alternative distributions for analyzing univariate bounded data. A comprehensive account of the mathematical properties of the new family is provided. We briefly discuss estimation of the model parameters of the new class of distributions based on two estimation methods. Additionally, a new regression model is introduced by considering the distribution proposed in this article, which is useful for situations where the response is restricted to the standard unit interval and the regression structure involves regressors and unknown parameters. The regression model allows to model both location and dispersion effects. We define two residuals for the proposed regression model to assess departures from model assumptions as well as to detect outlying observations, and discuss some influence methods such as the local influence and generalized leverage. Finally, an application to real data is presented to show the usefulness of the new regression model.

Proceedings ArticleDOI
06 Jul 2016
TL;DR: A new way to design parameter estimators with enhanced performance is proposed via the application of a dynamic operator to the original regression to yield a new parameter estimator whose convergence is established without the usual requirement of regressor persistency of excitation.
Abstract: A new way to design parameter estimators with enhanced performance is proposed in the paper. The procedure consists of two stages, first, the generation of new regression forms via the application of a dynamic operator to the original regression. Second, a suitable mix of these new regressors to obtain the final desired regression form. For classical linear regression forms the procedure yields a new parameter estimator whose convergence is established without the usual requirement of regressor persistency of excitation. The technique is also applied to nonlinear regressions with “partially” monotonic parameter dependence—giving rise again to estimators with enhanced performance. Simulation results illustrate the advantages of the proposed procedure in both scenarios.

Journal ArticleDOI
TL;DR: An asymmetric logistic regression model that uses a new parameter to account for data complexity and can enhance the applicability of a generalized linear model to various ecological problems using a slight modification, and significantly improves model fitting and model selection.
Abstract: Summary Binary data are popular in ecological and environmental studies; however, due to various uncertainties and complexities present in data sets, the standard generalized linear model with a binomial error distribution often demonstrates insufficient predictive performance when analysing binary and proportional data. To address this difficulty, we propose an asymmetric logistic regression model that uses a new parameter to account for data complexity. We observe that this parameter controls the model's asymmetry and is important for adjusting the weights associated with observed data in order to improve model fitting. This model includes the ordinary logistic regression model as a special case. It is easily implemented using a slight modification of glm or glmer in statistical software R. Simulation studies suggest that our new approach outperforms a traditional approach in terms of both predictive accuracy and variable selection. In a case study involving fisheries data, we found that the annual catch amount had a greater impact on stock status prediction, and improved predictive capability was supported with a smaller AIC compared to a generalized linear model. In summary, our method can enhance the applicability of a generalized linear model to various ecological problems using a slight modification, and significantly improves model fitting and model selection.

Journal ArticleDOI
TL;DR: In this article, a locally weighted linear regression (LWLR) method is proposed to predict damping ratio of a dominant mode online, which is essentially proposed for nonlinear data fitting.
Abstract: In this study, a locally weighted linear regression (LWLR) method is proposed to predict damping ratio of a dominant mode online. The LWLR method, which is nonparametric and data-oriented, is essentially proposed for nonlinear data fitting; therefore, it can track the nonlinear power system operations and help damping ratio prediction in real power systems, which is hardly achieved by the conventional linear regression. To successfully implement this method, the measurement of weighting value and the choice of weighting function as well as its parameter setting, related to prediction accuracy and numerical conditions, are extensively discussed. Simulations are carried out in a two-area four-machine system and a large complex system, China Southern Grid. Both results validate the effectiveness of the proposed method.

Proceedings ArticleDOI
01 Dec 2016
TL;DR: This work used the data from Kaggle competition “Bosch Production Line Performance” as a data set for the analysis and considered the use of machine learning, linear and Bayesian models for logistic regression in manufacturing failures detection.
Abstract: In this work, we study the use of logistic regression in manufacturing failures detection. As a data set for the analysis, we used the data from Kaggle competition “Bosch Production Line Performance”. We considered the use of machine learning, linear and Bayesian models. For machine learning approach, we analyzed XGBoost tree based classifier to obtain high scored classification. Using the generalized linear model for logistic regression makes it possible to analyze the influence of the factors under study. The Bayesian approach for logistic regression gives the statistical distribution for the parameters of the model. It can be useful in the probabilistic analysis, e.g. risk assessment.

Journal ArticleDOI
TL;DR: The procedure to find the optimal h value to maximize the system credibility of the fuzzy linear regression model with asymmetric triangular fuzzy coefficients is described, and it is shown that thesystem credibility in the asymmetric case will be higher than that in the symmetric case.

Proceedings ArticleDOI
01 Dec 2016
TL;DR: ε-differentially private diagnostics for regression are developed, beginning to fill a gap in privacy-preserving data analysis and are adequate for diagnosing the fit and predictive power of regression models on representative datasets when the size of the dataset times the privacy parameter (ε) is at least 1000.
Abstract: Linear and logistic regression are popular statistical techniques for analyzing multi-variate data. Typically, analysts do not simply posit a particular form of the regression model, estimate its parameters, and use the results for inference orprediction. Instead, they first use a variety of diagnostic techniques to assess how well the model fits the relationships in the data and how well it can be expected to predict outcomes for out-of-sample records, revising the model as necessary to improve fit and predictive power. In this article, we develop e-differentially private diagnostics for regression, beginning to fill a gap in privacy-preserving data analysis. Specifically, we create differentially private versions of residual plots for linear regression and of receiver operating characteristic (ROC) curves for logistic regression. The former helps determine whether or not the data satisfy the assumptions underlying the linear regression model, and the latter is used to assess the predictive power of the logistic regression model. These diagnostics improve the usefulness of algorithms for computing differentially private regression output, which alone does not allow analysts to assess the quality of the posited model. Our empirical studies show that these algorithms are adequate for diagnosing the fit and predictive power of regression models on representative datasets when the size of the dataset times the privacy parameter (e) is at least 1000.

Journal ArticleDOI
08 Jun 2016-Metrika
TL;DR: In this article, a test procedure based on the residual sums of squares under the null and alternative hypothesis, and established the asymptotic properties of the resulting test procedure has good size and power with finite sample sizes.
Abstract: This paper investigates the hypothesis test of the parametric component in partial functional linear regression. We propose a test procedure based on the residual sums of squares under the null and alternative hypothesis, and establish the asymptotic properties of the resulting test. A simulation study shows that the proposed test procedure has good size and power with finite sample sizes. Finally, we present an illustration through fitting the Berkeley growth data with a partial functional linear regression model and testing the effect of gender on the height of kids.

Journal ArticleDOI
TL;DR: This work develops a numerical algorithm to solve the penalized regression problem and proposes to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by use a similar penalty on the precision matrix.
Abstract: Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question On the one hand, it is desirable to seek model parsimony when facing a large number of parameters On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix We develop a numerical algorithm to solve the penalized regression problem In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection

Journal ArticleDOI
TL;DR: In this paper, the IMLR model and the MPR model are compared and validated and compared with the former model based on measured data of a real collector field, and the results show that the gained precision cannot be significantly improved any more if the regression functions are linear in terms of the input variables.

Journal ArticleDOI
TL;DR: In this paper, the shrinkage ridge estimator and its positive part are defined for the regression coefficient vector in a partial linear model and the differencing approach is used to enjoy the ease of parameter estimation after removing the non parametric part of the model.
Abstract: In this paper, shrinkage ridge estimator and its positive part are defined for the regression coefficient vector in a partial linear model. The differencing approach is used to enjoy the ease of parameter estimation after removing the non parametric part of the model. The exact risk expressions in addition to biases are derived for the estimators under study and the region of optimality of each estimator is exactly determined. The performance of the estimators is evaluated by simulated as well as real data sets.

Journal ArticleDOI
TL;DR: The proposed MARS-fuzzy regression model can be used for modelling natural phenomena whose available observations are reported as imprecise rather than crisp and performs better than the other models in suspended load estimation for the particular datasets.
Abstract: The problem of estimation of suspended load carried by a river is an important topic for many water resources projects. Conventional estimation methods are based on the assumption of exact observations. In practice, however, a major source of natural uncertainty is due to imprecise measurements and/or imprecise relationships between variables. In this paper, using the Multivariate Adaptive Regression Splines (MARS) technique, a novel fuzzy regression model for imprecise response and crisp explanatory variables is presented. The investigated fuzzy regression model is applied to forecast suspended load by discharge based on two real-world datasets. The accuracy of the proposed method is compared with two well-known parametric fuzzy regression models, namely, the fuzzy least-absolutes model and the fuzzy least-squares model. The comparison results reveal that the MARS-fuzzy regression model performs better than the other models in suspended load estimation for the particular datasets. This comparison...

Journal ArticleDOI
01 Sep 2016
TL;DR: This study demonstrates that the best error measures to estimate fuzzy linear regression model parameters with Monte Carlo method are proved to be E1, E2, and the mean square error.
Abstract: HighlightsThe study covers different error measures that have not previously calculated for Monte Carlo study in fuzzy linear regression models.We obtain the most useful and the worst error measures to estimate fuzzy regression parameters without using any mathematical programming or heavy fuzzy arithmetic operations. The focus of this study is to use Monte Carlo method in fuzzy linear regression. The purpose of the study is to figure out the appropriate error measures for the estimation of fuzzy linear regression model parameters with Monte Carlo method. Since model parameters are estimated without any mathematical programming or heavy fuzzy arithmetic operations in fuzzy linear regression with Monte Carlo method. In the literature, only two error measures (E1 and E2) are available for the estimation of fuzzy linear regression model parameters. Additionally, accuracy of available error measures under the Monte Carlo procedure has not been evaluated. In this article, mean square error, mean percentage error, mean absolute percentage error, and symmetric mean absolute percentage error are proposed for the estimation of fuzzy linear regression model parameters with Monte Carlo method. Moreover, estimation accuracies of existing and proposed error measures are explored. Error measures are compared to each other in terms of estimation accuracy; hence, this study demonstrates that the best error measures to estimate fuzzy linear regression model parameters with Monte Carlo method are proved to be E1, E2, and the mean square error. One the other hand, the worst one can be given as the mean percentage error. These results would be useful to enrich the studies that have already focused on fuzzy linear regression models.

Journal ArticleDOI
TL;DR: It is found that the traditional linear regression results perform comparable to more sophisticated non-linear and two-stage models.

Journal ArticleDOI
TL;DR: This paper proposes a nonparametric additive approach to properly analyze interval-valued data with a possibly nonlinear pattern and demonstrates the proposed approach using a simulation study and a real data example, and also compares its performance with those of existing methods.
Abstract: Interval-valued data are observed as ranges instead of single values and frequently appear with advanced technologies in current data collection processes. Regression analysis of interval-valued data has been studied in the literature, but mostly focused on parametric linear regression models. In this paper, we study interval-valued data regression based on nonparametric additive models. By employing one of the current methods based on linear regression, we propose a nonparametric additive approach to properly analyze interval-valued data with a possibly nonlinear pattern. We demonstrate the proposed approach using a simulation study and a real data example, and also compare its performance with those of existing methods.

Posted Content
TL;DR: In this article, a threshold-based algorithm for selection of most informative observations is proposed to solve the problem of online active learning to collect data for regression modeling, where a decision maker with a limited experimentation budget must efficiently learn an underlying linear population model.
Abstract: We consider the problem of online active learning to collect data for regression modeling. Specifically, we consider a decision maker with a limited experimentation budget who must efficiently learn an underlying linear population model. Our main contribution is a novel threshold-based algorithm for selection of most informative observations; we characterize its performance and fundamental lower bounds. We extend the algorithm and its guarantees to sparse linear regression in high-dimensional settings. Simulations suggest the algorithm is remarkably robust: it provides significant benefits over passive random sampling in real-world datasets that exhibit high nonlinearity and high dimensionality --- significantly reducing both the mean and variance of the squared error.

Journal ArticleDOI
01 Sep 2016
TL;DR: In this paper, a varying coefficient partially functional linear regression model (VCPFLM) is proposed, where the functional parameter is approximated by a polynomial spline, and the spline coefficients are estimated by the ordinary least squares method.
Abstract: By relaxing the linearity assumption in partial functional linear regression models, we propose a varying coefficient partially functional linear regression model (VCPFLM), which includes varying coefficient regression models and functional linear regression models as its special cases. We study the problem of functional parameter estimation in a VCPFLM. The functional parameter is approximated by a polynomial spline, and the spline coefficients are estimated by the ordinary least squares method. Under some regular conditions, we obtain asymptotic properties of functional parameter estimators, including the global convergence rates and uniform convergence rates. Simulation studies are conducted to investigate the performance of the proposed methodologies.

Proceedings ArticleDOI
01 May 2016
TL;DR: This paper defines PXR models using several patterns and local regression models, which respectively serve as logical and behavioral characterizations of distinct predictor-response relationships, and introduces a contrast pattern aided regression (CPXR) method, to build accurate PxR models.
Abstract: This paper first introduces a new style of regression models, namely pattern aided regression (PXR) models, aimed at representing accurate and interpretable prediction models. It also introduces a contrast pattern aided regression (CPXR) method, to build accurate PXR models in an efficient manner. In experiments, the PXR models built by CPXR are very accurate in general, often outperforming state-of-the-art regression methods by wide margins. From extensive experiments we also found that (1) regression modeling applications often involve complex diverse predictor-response relationships, which occur when the optimal regression models (of given regression model type) fitting distinct natural subgroups of data are highly different, and (2) state-of-the-art regression methods are often unable to adequately model such relationships. CPXR is also useful for analyzing how a given regression model makes prediction errors. This is an extended abstract of [6].

Posted Content
TL;DR: In this paper, a penalized estimation procedure for estimating the regression coefficients and for selecting variables under the linear constraints is developed, and a method is also proposed to obtain de-biased estimates of the regression coefficient that are asymptotically unbiased and have a joint multivariate normal distribution, which can be used to obtain the $p$-values.
Abstract: One important problem in microbiome analysis is to identify the bacterial taxa that are associated with a response, where the microbiome data are summarized as the composition of the bacterial taxa at different taxonomic levels. This paper considers regression analysis with such compositional data as covariates. In order to satisfy the subcompositional coherence of the results, linear models with a set of linear constraints on the regression coefficients are introduced. Such models allow regression analysis for subcompositions and include the log-contrast model for compositional covariates as a special case. A penalized estimation procedure for estimating the regression coefficients and for selecting variables under the linear constraints is developed. A method is also proposed to obtain de-biased estimates of the regression coefficients that are asymptotically unbiased and have a joint asymptotic multivariate normal distribution. This provides valid confidence intervals of the regression coefficients and can be used to obtain the $p$-values. Simulation results show the validity of the confidence intervals and smaller variances of the de-biased estimates when the linear constraints are imposed. The proposed methods are applied to a gut microbiome data set and identify four bacterial genera that are associated with the body mass index after adjusting for the total fat and caloric intakes.

Journal ArticleDOI
TL;DR: In this paper, a methodology for statistical downscaling using local polynomial regression for obtaining the future projections of rainfall in a catchment was presented, where the model was applied to forecast the rainfall in the catchment of Idukky reservoir in Kerala, India.
Abstract: This article presents a methodology for statistical downscaling using local polynomial regression for obtaining the future projections of rainfall in a catchment. Local polynomial regression offers a method to catch the nonlinearities in the input–output relationship compared to traditional regression by identifying the nearest neighbors of the predictor point for a specified band width. It fits a low degree polynomial model to the subset of the data at each point by weighted least squares. The local regression fit is complete when the regression function values are calculated for all the data points. A smooth curve through the data points is obtained by this method. Mean sea level pressure, geopotential height 500 mb, air temperature, relative humidity and wind speed are identified as the potential predictors for predicting the rainfall. Monthly data on the predictors for nine grid points around the study area are obtained from National Centre for Environmental Prediction (NCEP)/ National Centre for Atmospheric Research (NCAR) Re-analysis data. The model was applied to forecast the rainfall in the catchment of Idukky reservoir in Kerala, India. The model performance was compared with that of multiple linear regression and artificial neural network models. It is seen that the local polynomial regression model gives a better performance in forecasting the rainfall. The new methodology adopted is computationally simple, easy to implement and it captures the linear and non linear features in the data set preserving the dynamics of the atmosphere and the properties of the historical series.