Showing papers in "Computational Statistics & Data Analysis in 2005"
••
[...]
TL;DR: PLS path modeling can be used for analyzing multiple tables so as to be related to more classical data analysis methods used in this field and some new improvements are proposed.
4,839 citations
••
TL;DR: A Bartlett's test is used to test the significance of the first principal component, indicating whether or not at least two variables share common variation in the entire data set, and a two-step approach appears to be highly effective.
738 citations
••
TL;DR: A stochastic approximation version of EM for maximum likelihood estimation of a wide class of nonlinear mixed effects models is proposed, able to provide an estimator close to the MLE in very few iterations.
452 citations
••
TL;DR: This work conducts the statistical analysis of this graph and shows that it follows the power-law model, and detects cliques and independent sets in this graph, which allows one to apply a new data mining technique of classifying financial instruments based on stock prices data, which provides a deeper insight into the internal structure of the stock market.
359 citations
••
TL;DR: Different estimation procedures have been used to estimate the unknown parameter(s) and their performances are compared using Monte Carlo simulations, and it is observed that this particular skewed distribution can be used quite effectively in analyzing lifetime data.
302 citations
••
TL;DR: The approach proposed for PLS generalised linear regression is simple and easy to implement and can be easily generalised to any model that is linear at the level of the explanatory variables.
252 citations
••
TL;DR: In regression models, appropriate bootstrap methods for inference robust to heteroskedasticity of unknown form are the wildbootstrap and the pairs bootstrap and simulation results suggest that one specific version of the wild bootstrap outperforms the other versions of theWild bootstraps.
192 citations
••
TL;DR: A multivariate approach based on projections—PCA and PLS—was introduced to cope with the rapidly increasing volumes of data produced in chemical laboratories and showed promising results.
189 citations
••
TL;DR: A mixture model for preferences data, which adequately represents the composite nature of the elicitation mechanism in ranking processes, is proposed and empirical evidence from different data sets confirming the goodness of fit of the proposed model to many real preferences data is shown.
177 citations
••
TL;DR: A new algorithm is presented for fitting the plaid model, a biclustering method developed for clustering gene expression data, and a benchmark for future evaluation of bic Lustering methods is established.
170 citations
••
TL;DR: The PLS components existence as eigenvectors of some operator and convergence properties of the PLS approximation are proved and the results of an application to stock-exchange data will be compared with those obtained by other methods.
••
TL;DR: In this article, classification trees are employed to bundle their predictions for the bootstrap sample, and a combined classifier is developed, which is superior to any of the single classifiers in many applications.
••
TL;DR: A new presentation of discriminant analysis consists in setting up patterns associated to the various groups and deriving latent variables in such a way that scores in each group are as highly clustered about their pattern as possible.
••
TL;DR: The L-PLSR is applied to the analysis of consumer liking data Y of six products assessed by 125 persons, in light of 10 other product descriptors X and 15 other person descriptors Z.
••
TL;DR: A hybrid routine is suggested which combines the mixed model idea with a classical Akaike information criteria and is evaluated with simulations and applied to data on the success and failure of newly founded companies.
••
TL;DR: In order to maximize the pairwise likelihood, a new expectation-maximization-type algorithm which uses numerical quadrature is introduced and is found to give reasonable parameter estimates and to be computationally efficient.
••
TL;DR: The clusterwise linear regression is studied when the set of predictor variables forms a L 2 -continuous stochastic process and the number of clusters is treated as unknown and the convergence of the clusterwise algorithm is discussed.
••
TL;DR: The forward-backward algorithm, which in particular enables to implement efficiently the E-step of the EM algorithm, and the Viterbi algorithm for the restoration of the most likely state sequence are derived.
••
TL;DR: Powerful omnibus tests of normality based on the likelihood ratio are proposed, which outperform the best tests in the literature, including the Shapiro-Wilk and Anderson-Darling tests.
••
TL;DR: An alternative bootstrap method is proposed which is both computationally simple and robust and a simulation study shows that this method performs well, particularly regarding confidence intervals for the regression parameters.
••
TL;DR: In this paper, the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution, is analyzed and the exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived.
••
TL;DR: Bayes and classical estimators have been obtained for two-parameter exponentiated-Weibull distribution when sample is available from type-II censoring scheme and it has been seen that the estimators obtained are not available in nice closed forms, although they can be easily evaluated for the given sample by using suitable numerical methods.
••
TL;DR: The latent class model for mixed binary and metric variables is extended to accommodate any type of data (including ordinal and nominal) and its use in Archaeometry for classifying archaeological findings/ objects into groups is discussed.
••
TL;DR: Results on the estimation of this index of stochastic dependence in a continuous setting are presented and computationally more efficient approximations of the mutual information based on the notion of k-additive truncation are proposed.
••
TL;DR: The well-known product partition model (PPM) is considered for the identification of multiple change points in the means and variances of normal data sequences and the posterior distributions of the partitions and the number of change points are extended.
••
TL;DR: Numerical results show that the maximum absolute error associated with the new transformation is substantially lower than that found for other power transformations of a chi-square random variable for all the degrees of freedom considered.
••
TL;DR: Monte Carlo evidence reported in this paper indicates that asymptotic critical values fail to give good control of finite sample significance levels of heteroskedasticity-robust versions of the standard Lagrange multiplier test, a Hausman-type check, and a new procedure.
••
TL;DR: Double bootstrap confidence intervals can be estimated using computational algorithms incorporating simple deterministic stopping rules that avoid unnecessary computations and efficiency gains are examined by means of a Monte Carlo study for examples of confidence intervals for a mean and for the cumulative impulse response in a second order autoregressive model.
••
TL;DR: An iterative scheme that generally improves on the default solution is suggested, and this scheme is compared with the ''best of 20 random starts'' method favoured by many users.
••
TL;DR: This paper will modify the tree for univariate response procedure and suggest a new tree-based method that can analyze any type of multiple responses by using generalized estimating equations techniques.