scispace - formally typeset
Search or ask a question
Author

Henk A.L. Kiers

Bio: Henk A.L. Kiers is an academic researcher from University of Groningen. The author has contributed to research in topics: Principal component analysis & Matrix (mathematics). The author has an hindex of 48, co-authored 219 publications receiving 9224 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The core consistency diagnostic (CORCONDIA) as discussed by the authors is a diagnostic for determining the appropriate number of components for multiway models, which is based on scrutinizing the appropriateness of the structural model based on the data and the estimated parameters.
Abstract: A new diagnostic called the core consistency diagnostic (CORCONDIA) is suggested for determining the proper number of components for multiway models. It applies especially to the parallel factor analysis (PARAFAC) model, but also to other models that can be considered as restricted Tucker3 models. It is based on scrutinizing the ‘appropriateness’ of the structural model based on the data and the estimated parameters of gradually augmented models. A PARAFAC model (employing dimension-wise combinations of components for all modes) is called appropriate if adding other combinations of the same components does not improve the fit considerably. It is proposed to choose the largest model that is still sufficiently appropriate. Using examples from a range of different types of data, it is shown that the core consistency diagnostic is an effective tool for determining the appropriate number of components in e.g. PARAFAC models. However, it is also shown, using simulated data, that the theoretical understanding of CORCONDIA is not yet complete. Copyright © 2003 John Wiley & Sons, Ltd.

1,110 citations

Journal ArticleDOI
TL;DR: This article presented a standardized notation and terminology to be used for three and multiway analyses, especially when these involve (variants of) the CANDECOMP/PARAFAC model and the Tucker model.
Abstract: This paper presents a standardized notation and terminology to be used for three- and multiway analyses, especially when these involve (variants of) the CANDECOMP/PARAFAC model and the Tucker model The notation also deals with basic aspects such as symbols for different kinds of products, and terminology for three- and higher-way data The choices for terminology and symbols to be used have to some extent been based on earlier (informal) conventions Simplicity and reduction of the possibility of confusion have also played a role in the choices made Copyright (C) 2000 John Wiley & Sons, Ltd

673 citations

Journal ArticleDOI
TL;DR: A procedure for fitting the PARAFAC2 model directly to the set of data matrices is proposed and it is shown that this algorithm is more efficient than the indirect fitting algorithm.
Abstract: PARAFAC is a generalization of principal component analysis (PCA) to the situation where a set of data matrices is to be analysed. If each data matrix has the same row and column units, the resulting data are three-way data and can be modelled by the PARAFAC1 model. If each data matrix has the same column units but different (numbers of) row units, the PARAFAC2 model can be used. Like the PARAFACI model, the PARAFAC2 model gives unique solutions under certain mild assumptions, whereas it is less severely constrained than PARAFAC 1. It may therefore also be used for regular three-way data in situations where the PARAFAC1 model is too restricted. Usually the PARAFAC2 model is fitted to a set of matrices with cross-products between the column units. However, this model-fitting procedure is computationally complex and inefficient. In the present paper a procedure for fitting the PARAFAC2 model directly to the set of data matrices is proposed. It is shown that this algorithm is more efficient than the indirect fitting algorithm. Moreover, it is more easily adjusted so as to allow for constraints on the parameter matrices, to handle missing data, as well as to handle generalizations to sets of three- and higher-way data. Furthermore, with the direct fitting approach we also gain information on the row units, in the form of 'factor scores'. As will be shown, this elaboration of the model in no way limits the feasibility of the method. Even though full information on the row units becomes available, the algorithm is based on the usually much smaller cross-product matrices only. Copyright (C) 1999 john Wiley & Sons, Ltd.

393 citations

Journal ArticleDOI
TL;DR: The Hull method, which aims to find a model with an optimal balance between model fit and number of parameters, is examined in an extensive simulation study in which the simulated data are based on major and minor factors.
Abstract: A common problem in exploratory factor analysis is how many factors need to be extracted from a particular data set. We propose a new method for selecting the number of major common factors: the Hull method, which aims to find a model with an optimal balance between model fit and number of parameters. We examine the performance of the method in an extensive simulation study in which the simulated data are based on major and minor factors. The study compares the method with four other methods such as parallel analysis and the minimum average partial test, which were selected because they have been proven to perform well and/or they are frequently used in applied research. The Hull method outperformed all four methods at recovering the correct number of major factors. Its usefulness was further illustrated by its assessment of the dimensionality of the Five-Factor Personality Inventory ( Hendriks, Hofstee, & De Raad, 1999 ). This inventory has 100 items, and the typical methods for assessing dimensionality prove to be useless: the large number of factors they suggest has no theoretical justification. The Hull method, however, suggested retaining the number of factors that the theoretical background to the inventory actually proposes.

337 citations

Journal ArticleDOI
TL;DR: In this article, a new study involving 525 subjects in four samples: men and women in Fall and Spring terms was conducted, and the results from traditional factor analyses of the separate groups showed that the loadings of corresponding factors were highly related, and that sets of common factors defined over all four groups had virtually the same explanatory power as separate components computed for each group separately.

336 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: This survey provides an overview of higher-order tensor decompositions, their applications, and available software.
Abstract: This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or $N$-way array. Decompositions of higher-order tensors (i.e., $N$-way arrays with $N \geq 3$) have applications in psycho-metrics, chemometrics, signal processing, numerical linear algebra, computer vision, numerical analysis, data mining, neuroscience, graph analysis, and elsewhere. Two particular tensor decompositions can be considered to be higher-order extensions of the matrix singular value decomposition: CANDECOMP/PARAFAC (CP) decomposes a tensor as a sum of rank-one tensors, and the Tucker decomposition is a higher-order form of principal component analysis. There are many other tensor decompositions, including INDSCAL, PARAFAC2, CANDELINC, DEDICOM, and PARATUCK2 as well as nonnegative variants of all of the above. The N-way Toolbox, Tensor Toolbox, and Multilinear Engine are examples of software packages for working with tensors.

9,227 citations

01 Jan 1999
TL;DR: The Big Five taxonomy as discussed by the authors is a taxonomy of personality dimensions derived from analyses of the natural language terms people use to describe themselves 3 and others, and it has been used for personality assessment.
Abstract: 2 Taxonomy is always a contentious issue because the world does not come to us in neat little packages (S. Personality has been conceptualized from a variety of theoretical perspectives, and at various levels of Each of these levels has made unique contributions to our understanding of individual differences in behavior and experience. However, the number of personality traits, and scales designed to measure them, escalated without an end in sight (Goldberg, 1971). Researchers, as well as practitioners in the field of personality assessment, were faced with a bewildering array of personality scales from which to choose, with little guidance and no overall rationale at hand. What made matters worse was that scales with the same name often measure concepts that are not the same, and scales with different names often measure concepts that are quite similar. Although diversity and scientific pluralism are useful, the systematic accumulation of findings and the communication among researchers became difficult amidst the Babel of concepts and scales. Many personality researchers had hoped that they might devise the structure that would transform the Babel into a community speaking a common language. However, such an integration was not to be achieved by any one researcher or by any one theoretical perspective. As Allport once put it, " each assessor has his own pet units and uses a pet battery of diagnostic devices " (1958, p. 258). What personality psychology needed was a descriptive model, or taxonomy, of its subject matter. One of the central goals of scientific taxonomies is the definition of overarching domains within which large numbers of specific instances can be understood in a simplified way. Thus, in personality psychology, a taxonomy would permit researchers to study specified domains of personality characteristics, rather than examining separately the thousands of particular attributes that make human beings individual and unique. Moreover, a generally accepted taxonomy would greatly facilitate the accumulation and communication of empirical findings by offering a standard vocabulary, or nomenclature. After decades of research, the field is approaching consensus on a general taxonomy of personality traits, the " Big Five " personality dimensions. These dimensions do not represent a particular theoretical perspective but were derived from analyses of the natural-language terms people use to describe themselves 3 and others. Rather than replacing all previous systems, the Big Five taxonomy serves an integrative function because it can represent the various and diverse systems of personality …

7,787 citations

Journal ArticleDOI
TL;DR: It was found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold-based pre-selection and the value of GLM in combination with penalised methods and thresholds when omitted variables are considered in the final interpretation.
Abstract: Collinearity refers to the non independence of predictor variables, usually in a regression-type analysis. It is a common feature of any descriptive ecological data set and can be a problem for parameter estimation because it inflates the variance of regression parameters and hence potentially leads to the wrong identification of relevant predictors in a statistical model. Collinearity is a severe problem when a model is trained on data from one region or time, and predicted to another with a different or unknown structure of collinearity. To demonstrate the reach of the problem of collinearity in ecology, we show how relationships among predictors differ between biomes, change over spatial scales and through time. Across disciplines, different approaches to addressing collinearity problems have been developed, ranging from clustering of predictors, threshold-based pre-selection, through latent variable methods, to shrinkage and regularisation. Using simulated data with five predictor-response relationships of increasing complexity and eight levels of collinearity we compared ways to address collinearity with standard multiple regression and machine-learning approaches. We assessed the performance of each approach by testing its impact on prediction to new data. In the extreme, we tested whether the methods were able to identify the true underlying relationship in a training dataset with strong collinearity by evaluating its performance on a test dataset without any collinearity. We found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold-based pre-selection. Our results highlight the value of GLM in combination with penalised methods (particularly ridge) and threshold-based pre-selection when omitted variables are considered in the final interpretation. However, all approaches tested yielded degraded predictions under change in collinearity structure and the ‘folk lore’-thresholds of correlation coefficients between predictor variables of |r| >0.7 was an appropriate indicator for when collinearity begins to severely distort model estimation and subsequent prediction. The use of ecological understanding of the system in pre-analysis variable selection and the choice of the least sensitive statistical approaches reduce the problems of collinearity, but cannot ultimately solve them.

6,199 citations