# Showing papers in "Computational Statistics & Data Analysis in 2006"

••

TL;DR: Investigation of the performance of information criteria in selecting latent class analysis models which are often used in research of phenotype identification finds the sample size and model dimensionality effects are found to be influential in the simulation study.

479 citations

••

TL;DR: An overview of numerical possibility theory is proposed, showing that some notions in statistics are naturally interpreted in the language of this theory and providing a natural definition of a subjective possibility distribution that sticks to the Bayesian framework of exchangeable bets.

411 citations

••

TL;DR: Evaluations are based on: number of iterations necessary to reach convergence, time consumption, quality of the solution and amount of resources required for the calculations (primarily memory).

393 citations

••

TL;DR: This article presents a simple and automatic procedure to accomplish this goal by maximizing a simple profile likelihood function and gives a wide variety of both simulated and real examples.

384 citations

••

TL;DR: A Bayesian version of GAM's and extensions to generalized structured additive regression (STAR) are developed and for the first time, Bayesian semiparametric inference for the widely used multinomial logit model is presented.

372 citations

••

TL;DR: As the results reveal, CART and MARS outperform traditional discriminant analysis, logistic regression, neural networks, and support vector machine (SVM) approaches in terms of credit scoring accuracy and hence provide efficient alternatives in implementing credit scoring tasks.

366 citations

••

TL;DR: The generalized theory of uncertainty (GTU) departs from existing theories in essential ways, and one of the principal objectives of GTU is achievement of NL-capability, that is, the capability to operate on information described in natural language.

350 citations

••

SNCF

^{1}TL;DR: The approach to investigate possible non-linear functional relationships based on fractional polynomials and the combination with backward elimination was proposed recently is introduced and advantages will be shown in two examples.

317 citations

••

TL;DR: The New York Stock Exchange is chosen to provide evidence of problems affecting ultra high-frequency data sets and several methods of aggregation of the data are suggested, according to which corresponding time series of interest for econometric analysis can be constructed.

311 citations

••

TL;DR: The empirical results show that DE is clearly and consistently superior compared to GAs and PSO for hard clustering problems, both with respect to precision as well as robustness (reproducibility) of the results.

310 citations

••

TL;DR: This paper introduces a Type-II progressively hybrid censoring scheme, where the experiment terminates at a pre-specified time and obtains the maximum-likelihood estimator of the unknown parameter in an exact form.

••

TL;DR: The statistical discrimination and clustering literature has studied the problem of identifying similarities in time series data and the use of both hierarchical and non-hierarchical clustering algorithms is considered.

••

TL;DR: A Bayesian method to select the most probable copula family among a given set of tested copulas is proposed, and the frequency of successful identification approaches 100% as the sample size increases, and for weakly correlated variables, larger samples are necessary for reliable identification.

••

TL;DR: A range of estimators are surveyed and a mean group version of the common-correlated-effects estimator stands out as the most robust since it is the preferred choice in rather general (non) stationary settings where regressors and errors share common factors and their factor loadings are possibly dependent.

••

TL;DR: A methodological and computational framework for centroid-based partitioning cluster analysis using arbitrary distance or similarity measures is presented and a new variant of centroid neighborhood graphs is introduced which gives insight into the relationships between adjacent clusters.

••

TL;DR: A framework of penalized generalized linear models and tensor products of B-splines with roughness penalties allows effective smoothing of data in multidimensional arrays and takes advantage of the special structure of both the data as an array and the model matrix as a tensor product.

••

TL;DR: Fitting of non-Gaussian hierarchical random effects models by approximate maximum likelihood can be made automatic to the same extent that Bayesian model fitting can be automated by the program BUGS.

••

TL;DR: The compact representation of incomplete probabilistic knowledge which can be encountered in risk evaluation problems, for instance in environmental studies is considered and the respective appropriateness of pairs of cumulative distributions, continuous possibility distributions or discrete random sets for representing information about the mean value, the mode, the median and other fractiles of ill-known probability distributions is discussed.

••

TL;DR: A multivariate extension of the well known wavelet denoising procedure widely examined for scalar valued signals, that combines a straightforward multivariate generalization of a classical one and principal component analysis is proposed.

••

TL;DR: It is proposed to use as covariates of the logistic model a reduced set of optimum principal components of the original predictors, to improve the estimation of thelogistic model parameters under multicollinearity and to reduce the dimension of the problem with continuous covariates.

••

TL;DR: The Mixture Modeling (MIXMOD) program fits mixture models to a given data set for the purposes of density estimation, clustering or discriminant analysis, and fourteen different Gaussian models can be distinguished according to different assumptions regarding the component variance matrix eigenvalue decomposition.

••

TL;DR: The Markov chain Monte Carlo (MCMC) algorithm is the first data-driven bandwidth selector for multivariate kernel density estimation that is applicable to data of any dimension and is superior to the normal reference rule.

••

TL;DR: A Monte Carlo study analyzing the performance of the bootstrap confidence bands (obtained with different resampling methods) of several functional estimators is presented, providing some insights on the asymptotic validity of thebootstrap methodology when functional data, as well as a functional parameter, are involved.

••

TL;DR: An approximate confidence interval is proposed for a robust measure of relative dispersion-the coefficient of quartile variation, which provides an alternative to interval estimates for other measures ofrelative dispersion.

••

TL;DR: It is shown that this stylized fact of the slow decay in the autocorrelation function of the squared returns can be described much better by means of hidden semi-Markov models.

••

TL;DR: Various models for time series of counts which can account for discreteness, overdispersion and serial correlation are compared, including observation- and parameter-driven models based upon corresponding conditional Poisson distributions.

••

TL;DR: A new bootstrap procedure to obtain prediction densities of returns and volatilities of GARCH processes is proposed, which allows incorporation of parameter uncertainty and does not rely on distributional assumptions.

••

TL;DR: A general linear regression model for studying the dependence of a LR fuzzy response variable on a set of crisp explanatory variables, along with a suitable iterative least squares estimation procedure, is introduced.

••

TL;DR: An extension of the stochastic approximation expectation-maximization (SAEM) algorithm is proposed to estimate parameters of nonlinear mixed-effects models to model this decrease of the viral load after initiation of treatment and to evaluate the intra- and inter-patient variability.

••

TL;DR: The procedure is shown to be able to deal successfully with the estimation of the parameters of homogeneous and heterogeneous generalized Pareto distributions, even when maximum likelihood and other estimation methods fail.