scispace - formally typeset
Search or ask a question
Author

Shaohua Wu

Bio: Shaohua Wu is an academic researcher from Honeywell. The author has contributed to research in topics: Mean squared error & Estimation theory. The author has an hindex of 6, co-authored 7 publications receiving 223 citations. Previous affiliations of Shaohua Wu include Queen's University & Honeywell Aerospace.

Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors summarize extensive quantitative and qualitative results in the literature concerned with using simplified or misspecified models and develop a practical strategy to help modellers decide whether a simplified model should be used, and point out the difficulty in making such a decision.
Abstract: Simplified models have many appealing properties and sometimes give better parameter estimates and model predictions, in sense of mean-squared-error, than extended models, especially when the data are not informative. In this paper, we summarize extensive quantitative and qualitative results in the literature concerned with using simplified or misspecified models. Based on confidence intervals and hypothesis tests, we develop a practical strategy to help modellers decide whether a simplified model should be used, and point out the difficulty in making such a decision. We also evaluate several methods for statistical inference for simplified or misspecified models. Les modeles simplifies ont des proprietes interessantes et presentent parfois de meilleures estimations de parametres et predictions de modeles, pour ce qui est de l'erreur quadratique moyenne, que les modeles plus elabores, en particulier lorsque les donnees ne sont pas de type informatif. Nous presentons dans cet article un resume d'un grand nombre de resultats quantitatifs et qualitatifs de la litterature scientifique portant sur des modeles simplifies ou mal specifies. En nous appuyant sur des intervalles de confiance et des essais d'hypotheses, nous etablissons une strategie pratique afin d'aider les concepteurs de modeles a determiner s'ils doivent employer un modele simplifie et attirer leur attention sur la difficulte de prendre une telle decision. Nous evaluons egalement plusieurs methodes d'inference statistique pour des modeles simplifies ou mal specifies.

60 citations

Journal ArticleDOI
TL;DR: In this article, a mean squared error (MSE)-based model selection criterion is used to determine the optimal number of parameters to estimate from the ranked parameter list, so that the most reliable model predictions can be obtained.
Abstract: Parameter estimation in complex mathematical models is difficult, especially when there are too many unknown parameters to estimate, and the available data for parameter estimation are limited. Estimability analysis ranks parameters from most estimable to least estimable based on the model structure, uncertainties in initial parameter guesses, measurement uncertainties, and experimental settings. Difficulties associated with poor numerical conditioning are avoided by only estimating those parameters that are most estimable. The remaining parameters are left at their initial values or can be removed from the model via simplification. In this paper, a mean squared error (MSE)-based model-selection criterion is used to determine the optimal number of parameters to estimate from the ranked parameter list, so that the most reliable model predictions can be obtained. This methodology is illustrated using a dynamic chemical reactor model.

59 citations

Journal ArticleDOI
TL;DR: In this paper, an orthogonalization algorithm combined with a mean squared error (MSE) based selection criterion has been used to rank parameters from most to least estimable and to determine the parameter subset that should be estimated to obtain the best predictions.
Abstract: Engineers who develop fundamental models for chemical processes are often unable to estimate all of the parameters, especially when available data are limited or noisy. In these situations, modelers may decide to select only a subset of the parameters for estimation. An orthogonalization algorithm combined with a mean squared error (MSE) based selection criterion has been used to rank parameters from most to least estimable and to determine the parameter subset that should be estimated to obtain the best predictions. A robustness test is proposed and applied to a batch reactor model to assess the sensitivity of the selected parameter subset to initial parameter guesses. A new ranking and selection technique is also developed based on the MSE criterion and is compared with existing techniques in the literature. Results obtained using the proposed ranking and selection techniques agree with those from leave-one-out cross-validation but are more computationally attractive.

49 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new criterion to help modellers select the best simplified model with the lowest expected mean squared error (EME) and compared it with the effectiveness of Bayesian Information Criterion (BIC).
Abstract: Simplified models (SMs) with a reduced set of parameters are used in many practical situations, especially when the available data for parameter estimation are limited. A variety of candidate models are often considered during the model formulation, simplification, and parameter estimation processes. We propose a new criterion to help modellers select the best SM, so that predictions with lowest expected mean squared error can be obtained. The effectiveness of the proposed criterion for selecting simplified nonlinear univariate and multivariate models is demonstrated using Monte-Carlo simulations and is compared with the effectiveness of the Bayesian Information Criterion (BIC). Des modeles simplifies (MS) avec ensemble reduit de parametres sont utilises dans de nombreuses situations pratiques, particulierement lorsque les donnees disponibles pour l'estimation des parametres sont limitees. Divers modeles candidats sont souvent examines durant la formulation du modele, la simplification et l'estimation des parametres. Nous proposons un nouveau critere pour aider les modelisateurs a selectionner le meilleur MS et obtenir des predictions porteuses de l'erreur quadratique moyenne la plus faible. L'efficacite du critere propose pour la selection des modeles simplifies non lineaires a variable uniques et multiples est demontree en utilisant les simulations Monte-Carlo et comparee a l'efficacite du critere bayesian des informations (BIC).

46 citations

Journal ArticleDOI
TL;DR: Mean-squared error (MSE) is used to analyse nine commonly used model selection criteria (MSC) for their performance when selecting simplified models (SMs) as discussed by the authors.
Abstract: Mean-squared error (MSE) is used to analyse nine commonly used model-selection criteria (MSC) for their performance when selecting simplified models (SMs). Expressions are derived to enable exact calculations of the probability that a particular MSC will select a SM. For several common MSC, the relative propensities to select SMs are independent of model structure and data. It is shown that MSC that are effective in preventing overfitting are prone to underfitting when information content of the data is low. In a subsequent article, results are extended to develop a new MSE-based MSC for selecting nonlinear multi-response SMs. L'erreur quadratique moyenne (EQM) est utilisee pour analyser neuf criteres de selection de modeles (CSM) couramment utilises pour leur rendement lors de la selection de modeles simplifies (MS). Des expressions sont trouvees pour permettre des calculs exacts de la probabilite qu'un CSM particulier choisisse un MS. Pour plusieurs CSM communs, les propensions relatives pour choisir des MS sont independantes de la structure et des donnees des modeles. On a demontre que les CSM qui sont efficaces pour empecher le surajustement sont enclins au sous-ajustement lorsque le contenu de l'information des donnees est faible. Dans un article ulterieur, les resultats sont etendus pour creer un nouveau CSM fonde sur l'EQM pour la selection de MS non lineaires a reponses multiples.

27 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The distinction between explanatory and predictive models is discussed in this paper, and the practical implications of the distinction to each step in the model- ing process are discussed as well as a discussion of the differences that arise in the process of modeling for an explanatory ver- sus a predictive goal.
Abstract: Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal ex- planation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and pre- diction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the phi- losophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory ver- sus a predictive goal. The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the model- ing process.

1,747 citations

Journal ArticleDOI
TL;DR: The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process.
Abstract: Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal explanation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the philosophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory versus a predictive goal. The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process.

1,384 citations

Journal ArticleDOI
TL;DR: It is proposed that principles and techniques from the field of machine learning can help psychology become a more predictive science and an increased focus on prediction, rather than explanation, can ultimately lead to greater understanding of behavior.
Abstract: Psychology has historically been concerned, first and foremost, with explaining the causal mechanisms that give rise to behavior. Randomized, tightly controlled experiments are enshrined as the gold standard of psychological research, and there are endless investigations of the various mediating and moderating variables that govern various behaviors. We argue that psychology’s near-total focus on explaining the causes of behavior has led much of the field to be populated by research programs that provide intricate theories of psychological mechanism but that have little (or unknown) ability to predict future behaviors with any appreciable accuracy. We propose that principles and techniques from the field of machine learning can help psychology become a more predictive science. We review some of the fundamental concepts and tools of machine learning and point out examples where these concepts have been used to conduct interesting and important psychological research that focuses on predictive research ques...

1,026 citations

01 Sep 2011
TL;DR: To show that predictive analytics and explanatory statistical modeling are fundamentally disparate, it is shown that they are different in each step of the modeling process and these differences translate into different final models, so that a pure explanatory statistical model is best tuned for testing causal hypotheses and a pure predictive models is best in terms of predictive power.
Abstract: textThis research essay highlights the need to integrate predictive analytics into information systems research and shows several concrete ways in which this goal can be accomplished. Predictive analytics include empirical methods (statistical and other) that generate data predictions as well as methods for assessing predictive power. Predictive analytics not only assist in creating practically useful models, they also play an important role alongside explanatory modeling in theory building and theory testing. We describe six roles for predictive analytics: new theory generation, measurement development, comparison of competing theories, improvement of existing models, relevance assessment, and assessment of the predictability of empirical phenomena. Despite the importance of predictive analytics, we find that they are rare in the empirical IS literature. Extant IS literature relies nearly exclusively on explanatory statistical modeling, where statistical inference is used to test and evaluate the explanatory power of underlying causal models, and predictive power is assumed to follow automatically from the explanatory model. However, explanatory power does not imply predictive power and thus predictive analytics are necessary for assessing predictive power and for building empirical models that predict well. To show that predictive analytics and explanatory statistical modeling are fundamentally disparate, we show that they are different in each step of the modeling process. These differences translate into different final models, so that a pure explanatory statistical model is best tuned for testing causal hypotheses and a pure predictive model is best in terms of predictive power. We convert a well-known explanatory paper on TAM to a predictive context to illustrate these differences and show how predictive analytics can add theoretical and practical value to IS research.

558 citations

Journal ArticleDOI
TL;DR: The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the model- ing process.
Abstract: Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal explanation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the philosophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory versus a predictive goal. The purpose of this paper is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process.

441 citations