# Showing papers in "Psychometrika in 2009"

••

TL;DR: This discussion paper argues that both the use of Cronbach’s alpha as a reliability estimate and as a measure of internal consistency suffer from major problems.

Abstract: This discussion paper argues that both the use of Cronbach’s alpha as a reliability estimate and as a measure of internal consistency suffer from major problems. First, alpha always has a value, which cannot be equal to the test score’s reliability given the interitem covariance matrix and the usual assumptions about measurement error. Second, in practice, alpha is used more often as a measure of the test’s internal consistency than as an estimate of reliability. However, it can be shown easily that alpha is unrelated to the internal structure of the test. It is further discussed that statistics based on a single test administration do not convey much information about the accuracy of individuals’ test performance. The paper ends with a list of conclusions about the usefulness of alpha.

2,017 citations

••

TL;DR: In this article, it was shown that α is not appropriate when considering how well a test measures one concept, but recommend ωt rather than the glb, and the end user needs procedures that are readily available in open source software.

Abstract: There are three fundamental problems in Sijtsma (Psychometrika, 2008): (1) contrary to the name, the glb is not the greatest lower bound of reliability but rather is systematically less than ωt (McDonald, Test theory: A unified treatment, Erlbaum, Hillsdale, 1999), (2) we agree with Sijtsma that when considering how well a test measures one concept, α is not appropriate, but recommend ωt rather than the glb, and (3) the end user needs procedures that are readily available in open source software.

1,194 citations

••

TL;DR: Because the log-linear model with latent variables is a general model for cognitive diagnosis, new alternatives to modeling the functional relationship between attribute mastery and the probability of a correct response are discussed.

Abstract: This paper uses log-linear models with latent variables (Hagenaars, in Loglinear Models with Latent Variables, 1993) to define a family of cognitive diagnosis models. In doing so, the relationship between many common models is explicitly defined and discussed. In addition, because the log-linear model with latent variables is a general model for cognitive diagnosis, new alternatives to modeling the functional relationship between attribute mastery and the probability of a correct response are discussed.

464 citations

••

TL;DR: An old dimension-free coefficient and structural equation model based coefficients are proposed to improve the routine reporting of psychometric internal consistency.

Abstract: As pointed out by Sijtsma (in press), coefficient alpha is inappropriate as a single summary of the internal consistency of a composite score. Better estimators of internal consistency are available. In addition to those mentioned by Sijtsma, an old dimension-free coefficient and structural equation model based coefficients are proposed to improve the routine reporting of psychometric internal consistency. The various ways to measure internal consistency are also shown to be appropriate to binary and polytomous items.

336 citations

••

TL;DR: In this article, a method for estimating reliability using structural equation modeling (SEM) that allows for nonlinearity between factors and item scores is presented, assuming the focus is on consistency of summed item scores.

Abstract: A method is presented for estimating reliability using structural equation modeling (SEM) that allows for nonlinearity between factors and item scores. Assuming the focus is on consistency of summed item scores, this method for estimating reliability is preferred to those based on linear SEM models and to the most commonly reported estimate of reliability, coefficient alpha.

270 citations

••

TL;DR: In this article, structural equation modeling was discussed as an informative process both to assess the assumptions underlying coefficient alpha and to estimate reliability, and it was shown that violation of these assumptions can result in nontrivial negative or positive bias.

Abstract: The general use of coefficient alpha to assess reliability should be discouraged on a number of grounds. The assumptions underlying coefficient alpha are unlikely to hold in practice, and violation of these assumptions can result in nontrivial negative or positive bias. Structural equation modeling was discussed as an informative process both to assess the assumptions underlying coefficient alpha and to estimate reliability

254 citations

••

TL;DR: In this paper, K-means cluster analysis or hierarchical agglomerative cluster analysis is used to cluster subjects who possess the same skills for a language examination, and asymptotic classification accuracy results are given, along with simulations comparing effects of test length and method of clustering.

Abstract: Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and fitted. Cluster analysis provides an alternative approach that does not require specifying an item response model, but does require an item-by-attribute matrix. After summarizing the data with a particular vector of sum-scores, K-means cluster analysis or hierarchical agglomerative cluster analysis can be applied with the purpose of clustering subjects who possess the same skills. Asymptotic classification accuracy results are given, along with simulations comparing effects of test length and method of clustering. An application to a language examination is provided to illustrate how the methods can be implemented in practice.

191 citations

••

TL;DR: The critical reactions of Bentler, Green and Yang, and Revelle and Zinbarg to Sijtsma's paper on Cronbach's alpha are addressed and the dissemination of psychometric knowledge among substantive researchers is discussed.

Abstract: The critical reactions of Bentler (2009, doi:10.1007/s11336-008-9100-1), Green and Yang (2009a, doi:10.1007/s11336-008-9098-4; 2009b, doi:10.1007/s11336-008-9099-3), and Revelle and Zinbarg (2009, doi:10.1007/s11336-008-9102-z) to Sijtsma’s (2009, doi:10.1007/s11336-008-9101-0) paper on Cronbach’s alpha are addressed. The dissemination of psychometric knowledge among substantive researchers is discussed.

153 citations

••

TL;DR: The model is extended with a multivariate multilevel regression structure which allows the incorporation of covariates to explain the variance in speed and accuracy between individuals and groups of test takers.

Abstract: Response times on test items are easily collected in modern computerized testing. When collecting both (binary) responses and (continuous) response times on test items, it is possible to measure the accuracy and speed of test takers. To study the relationships between these two constructs, the model is extended with a multivariate multilevel regression structure which allows the incorporation of covariates to explain the variance in speed and accuracy between individuals and groups of test takers. A Bayesian approach with Markov chain Monte Carlo (MCMC) computation enables straightforward estimation of all model parameters. Model-specific implementations of a Bayes factor (BF) and deviance information criterium (DIC) for model selection are proposed which are easily calculated as byproducts of the MCMC computation. Both results from simulation studies and real-data examples are given to illustrate several novel analyses possible with this modeling framework.

141 citations

••

TL;DR: This paper showcases the application of the optimal sequential selection methodology in item selection of CAT that is built upon cognitive diagnostic models, and proposes two new heuristics that are compared against the randomized item selection method and the two heuristic investigated in Xu et al. (2003).

Abstract: Computerized adaptive testing (CAT) is a mode of testing which enables more efficient and accurate recovery of one or more latent traits. Traditionally, CAT is built upon Item Response Theory (IRT) models that assume unidimensionality. However, the problem of how to build CAT upon latent class models (LCM) has not been investigated until recently, when Tatsuoka (J. R. Stat. Soc., Ser. C, Appl. Stat. 51:337–350, 2002) and Tatsuoka and Ferguson (J. R. Stat., Ser. B 65:143–157, 2003) established a general theorem on the asymptotically optimal sequential selection of experiments to classify finite, partially ordered sets. Xu, Chang, and Douglas (Paper presented at the annual meeting of National Council on Measurement in Education, Montreal, Canada, 2003) then tested two heuristics in a simulation study based on Tatsuoka’s theoretical work in the context of computerized adaptive testing. One of the heuristics was developed based on Kullback–Leibler information, and the other based on Shannon entropy. In this paper, we showcase the application of the optimal sequential selection methodology in item selection of CAT that is built upon cognitive diagnostic models. Two new heuristics are proposed, and are compared against the randomized item selection method and the two heuristics investigated in Xu et al. (Paper presented at the annual meeting of National Council on Measurement in Education, Montreal, Canada, 2003). Finally, we show the connection between the Kullback–Leibler-information-based approaches and the Shannon-entropy-based approach, as well as the connection between algorithms built upon LCM and those built upon IRT models.

133 citations

••

TL;DR: Both the theoretical analyses and the studies of simulated data in this paper suggest that the criteria of A-optimality and D- Optimality lead to the most accurate estimates when all abilities are intentional, with the former slightly outperforming the latter.

Abstract: Several criteria from the optimal design literature are examined for use with item selection in multidimensional adaptive testing. In particular, it is examined what criteria are appropriate for adaptive testing in which all abilities are intentional, some should be considered as a nuisance, or the interest is in the testing of a composite of the abilities. Both the theoretical analyses and the studies of simulated data in this paper suggest that the criteria of A-optimality and D-optimality lead to the most accurate estimates when all abilities are intentional, with the former slightly outperforming the latter. The criterion of E-optimality showed occasional erratic behavior for this case of adaptive testing, and its use is not recommended. If some of the abilities are nuisances, application of the criterion of As-optimality (or Ds-optimality), which focuses on the subset of intentional abilities is recommended. For the measurement of a linear combination of abilities, the criterion of c-optimality yielded the best results. The preferences of each of these criteria for items with specific patterns of parameter values was also assessed. It was found that the criteria differed mainly in their preferences of items with different patterns of values for their discrimination parameters.

••

TL;DR: In this paper, the authors present a diffusion model for the analysis of continuous-time change in multivariate longitudinal data, where the data from a single person with an Ornstein-Uhlenbeck diffusion process is modeled.

Abstract: In this paper, we present a diffusion model for the analysis of continuous-time change in multivariate longitudinal data. The central idea is to model the data from a single person with an Ornstein–Uhlenbeck diffusion process. We extend it hierarchically by allowing the parameters of the diffusion process to vary randomly over different persons. With this approach, both intra and interindividual differences are analyzed simultaneously. Furthermore, the individual difference parameters can be regressed on covariates, thereby providing an explanation of between-person differences. Unstructured and unbalanced data pose no problem for the model to be applied. We demonstrate the method on data from an experience sampling study to investigate changes in the core affect. It can be concluded that different factors from the five factor model of personality are related to features of the trajectories in the core affect space, such as the cross-correlation and variability of the changes.

••

TL;DR: A regularized extension of GSCA is proposed that integrates a ridge type of regularization into G SCA in a unified framework, thereby enabling to handle multi-collinearity problems effectively.

Abstract: Generalized structured component analysis (GSCA) has been proposed as a component-based approach to structural equation modeling. In practice, GSCA may suffer from multi-collinearity, i.e., high correlations among exogenous variables. GSCA has yet no remedy for this problem. Thus, a regularized extension of GSCA is proposed that integrates a ridge type of regularization into GSCA in a unified framework, thereby enabling to handle multi-collinearity problems effectively. An alternating regularized least squares algorithm is developed for parameter estimation. A Monte Carlo simulation study is conducted to investigate the performance of the proposed method as compared to its non-regularized counterpart. An application is also presented to demonstrate the empirical usefulness of the proposed method.

••

TL;DR: This paper derived an analytic model of the inter-judge correlation as a function of five underlying parameters, i.e., inter-cue correlation and the number of cues, while differentiations between cues, the weights attached to the cues, and (un)reliability describe assumptions about the judges.

Abstract: We derive an analytic model of the inter-judge correlation as a function of five underlying parameters. Inter-cue correlation and the number of cues capture our assumptions about the environment, while differentiations between cues, the weights attached to the cues, and (un)reliability describe assumptions about the judges. We study the relative importance of, and interrelations between these five factors with respect to inter-judge correlation. Results highlight the centrality of the inter-cue correlation. We test the model’s predictions with empirical data and illustrate its relevance. For example, we show that, typically, additional judges increase efficacy at a greater rate than additional cues.

••

TL;DR: In this paper, the authors show that the classical chi-square goodness-of-fit test is unable to detect the presence of nonlinear terms in the model and explain this phenomenon by exploiting results on asymptotic robustness in structural equation models.

Abstract: In this paper, we show that for some structural equation models (SEM), the classical chi-square goodness-of-fit test is unable to detect the presence of nonlinear terms in the model. As an example, we consider a regression model with latent variables and interactions terms. Not only the model test has zero power against that type of misspecifications, but even the theoretical (chi-square) distribution of the test is not distorted when severe interaction term misspecification is present in the postulated model. We explain this phenomenon by exploiting results on asymptotic robustness in structural equation models. The importance of this paper is to warn against the conclusion that if a proposed linear model fits the data well according to the chi-quare goodness-of-fit test, then the underlying model is linear indeed; it will be shown that the underlying model may, in fact, be severely nonlinear. In addition, the present paper shows that such insensitivity to nonlinear terms is only a particular instance of a more general problem, namely, the incapacity of the classical chi-square goodness-of-fit test to detect deviations from zero correlation among exogenous regressors (either being them observable, or latent) when the structural part of the model is just saturated.

••

TL;DR: In this article, the appropriateness of multidimensional item response methods for assigning scores in high-stakes testing is called into question, and it is shown that many response models and statistical estimates can produce paradoxical results and that in the popular class of linearly compensatory models, maximum likelihood estimates are guaranteed to do so.

Abstract: In multidimensional item response theory (MIRT), it is possible for the estimate of a subject’s ability in some dimension to decrease after they have answered a question correctly. This paper investigates how and when this type of paradoxical result can occur. We demonstrate that many response models and statistical estimates can produce paradoxical results and that in the popular class of linearly compensatory models, maximum likelihood estimates are guaranteed to do so. In light of these findings, the appropriateness of multidimensional item response methods for assigning scores in high-stakes testing is called into question.

••

TL;DR: In this paper, the authors propose a coefficient of agreement to assess the degree of concordance between two independent groups of raters classifying items on a nominal scale, defined on a population-based model.

Abstract: We propose a coefficient of agreement to assess the degree of concordance between two independent groups of raters classifying items on a nominal scale. This coefficient, defined on a population-based model, extends the classical Cohen’s kappa coefficient for quantifying agreement between two raters. Weighted and intraclass versions of the coefficient are also given and their sampling variance is determined by the Jackknife method. The method is illustrated on medical education data which motivated the research.

••

TL;DR: In this paper, the authors present an approach for putting the estimates on a common scale to facilitate relative comparisons between models fit to binary or ordinal outcomes for both population-average and unit-specific models.

Abstract: When using linear models for cluster-correlated or longitudinal data, a common modeling practice is to begin by fitting a relatively simple model and then to increase the model complexity in steps. New predictors might be added to the model, or a more complex covariance structure might be specified for the observations. When fitting models for binary or ordered-categorical outcomes, however, comparisons between such models are impeded by the implicit rescaling of the model estimates that takes place with the inclusion of new predictors and/or random effects. This paper presents an approach for putting the estimates on a common scale to facilitate relative comparisons between models fit to binary or ordinal outcomes. The approach is developed for both population-average and unit-specific models.

••

TL;DR: In this paper, it was shown that their model is a special case of the threshold autoregressive (TAR) model and proposed a new estimation procedure and performed a simulation study to compare it to the estimation procedure developed by Gottman and Murray.

Abstract: Considering a dyad as a dynamic system whose current state depends on its past state has allowed researchers to investigate whether and how partners influence each other. Some researchers have also focused on how differences between dyads in their interaction patterns are related to other differences between them. A promising approach in this area is the model that was proposed by Gottman and Murray, which is based on nonlinear coupled difference equations. In this paper, it is shown that their model is a special case of the threshold autoregressive (TAR) model. As a consequence, we can make use of existing knowledge about TAR models with respect to parameter estimation, model alternatives and model selection. We propose a new estimation procedure and perform a simulation study to compare it to the estimation procedure developed by Gottman and Murray. In addition, we include an empirical example based on interaction data of three dyads.

••

TL;DR: This work reviews existing theory and methods for the clique partitioning problem and proposes two versions of a new neighborhood search algorithm for efficient solution, which are compared to simulated annealing and tabu search algorithms from the CPP literature.

Abstract: The clique partitioning problem (CPP) requires the establishment of an equivalence relation for the vertices of a graph such that the sum of the edge costs associated with the relation is minimized. The CPP has important applications for the social sciences because it provides a framework for clustering objects measured on a collection of nominal or ordinal attributes. In such instances, the CPP incorporates edge costs obtained from an aggregation of binary equivalence relations among the attributes. We review existing theory and methods for the CPP and propose two versions of a new neighborhood search algorithm for efficient solution. The first version (NS-R) uses a relocation algorithm in the search for improved solutions, whereas the second (NS-TS) uses an embedded tabu search routine. The new algorithms are compared to simulated annealing (SA) and tabu search (TS) algorithms from the CPP literature. Although the heuristics yielded comparable results for some test problems, the neighborhood search algorithms generally yielded the best performances for large and difficult instances of the CPP.

••

TL;DR: A simulation study shows that the constrained BLIM is more effective than the unconstrained one, in recovering a probabilistic knowledge structure.

Abstract: In the Basic Local Independence Model (BLIM) of Doignon and Falmagne (Knowledge Spaces, Springer, Berlin, 1999), the probabilistic relationship between the latent knowledge states and the observable response patterns is established by the introduction of a pair of parameters for each of the problems: a lucky guess probability and a careless error probability. In estimating the parameters of the BLIM with an empirical data set, it is desirable that such probabilities remain reasonably small. A special case of the BLIM is proposed where the parameter space of such probabilities is constrained. A simulation study shows that the constrained BLIM is more effective than the unconstrained one, in recovering a probabilistic knowledge structure.

••

TL;DR: In this paper, the authors investigated the strategies students used in solving the division problems in the two most recent assessments carried out in 1997 and in 2004, and found that the three main strategies were significantly less accurate in 2004 than they were in 1997.

Abstract: In the Netherlands, national assessments at the end of primary school (Grade 6) show a decline of achievement on problems of complex or written arithmetic over the last two decades. The present study aims at contributing to an explanation of the large achievement decrease on complex division, by investigating the strategies students used in solving the division problems in the two most recent assessments carried out in 1997 and in 2004. The students’ strategies were classified into four categories. A data set resulted with two types of repeated observations within students: the nominal strategies and the dichotomous achievement scores (correct/incorrect) on the items administered. It is argued that latent variable modeling methodology is appropriate to analyze these data. First, latent class analyses with year of assessment as a covariate were carried out on the multivariate nominal strategy variables. Results showed a shift from application of the traditional long division algorithm in 1997, to the less accurate strategy of stating an answer without writing down any notes or calculations in 2004, especially for boys. Second, explanatory IRT analyses showed that the three main strategies were significantly less accurate in 2004 than they were in 1997.

••

TL;DR: In this article, a semiparametric IRT model using a Dirichlet process mixture logistic distribution was proposed to deal with more types of item response patterns than the existing methods, such as the one-parameter normal ogive models or the two- or three-parameters logistic models.

Abstract: In Item Response Theory (IRT), item characteristic curves (ICCs) are illustrated through logistic models or normal ogive models, and the probability that examinees give the correct answer is usually a monotonically increasing function of their ability parameters. However, since only limited patterns of shapes can be obtained from logistic models or normal ogive models, there is a possibility that the model applied does not fit the data. As a result, the existing method can be rejected because it cannot deal with various item response patterns. To overcome these problems, we propose a new semiparametric IRT model using a Dirichlet process mixture logistic distribution. Our method does not rely on assumptions but only requires that the ICCs be a monotonically nondecreasing function; that is, our method can deal with more types of item response patterns than the existing methods, such as the one-parameter normal ogive models or the two- or three-parameter logistic models. We conducted two simulation studies whose results indicate that the proposed method can express more patterns of shapes for ICCs and can estimate the ability parameters more accurately than the existing parametric and nonparametric methods. The proposed method has also been applied to Facial Expression Recognition data with noteworthy results.

••

TL;DR: In this article, the authors proposed a new measure called RΛ, which is defined as the average reliability over different time points and can also be calculated for each time point separately.

Abstract: Reliability captures the influence of error on a measurement and, in the classical setting, is defined as one minus the ratio of the error variance to the total variance. Laenen, Alonso, and Molenberghs (Psychometrika 73:443–448, 2007) proposed an axiomatic definition of reliability and introduced the RT coefficient, a measure of reliability extending the classical approach to a more general longitudinal scenario. The RT coefficient can be interpreted as the average reliability over different time points and can also be calculated for each time point separately. In this paper, we introduce a new and complementary measure, the so-called RΛ, which implies a new way of thinking about reliability. In a longitudinal context, each measurement brings additional knowledge and leads to more reliable information. The RΛ captures this intuitive idea and expresses the reliability of the entire longitudinal sequence, in contrast to an average or occasion-specific measure. We study the measure’s properties using both theoretical arguments and simulations, establish its connections with previous proposals, and elucidate its performance in a real case study.

••

TL;DR: In this paper, a method for marginal maximum likelihood estimation of the nonlinear random coefficient model when the response function has some linear parameters is presented, which is done by writing the marginal distribution of the repeated measures as a conditional distribution given the non-linear random effects.

Abstract: A method is presented for marginal maximum likelihood estimation of the nonlinear random coefficient model when the response function has some linear parameters. This is done by writing the marginal distribution of the repeated measures as a conditional distribution of the response given the nonlinear random effects. The resulting distribution then requires an integral equation that is of dimension equal to the number of nonlinear terms. For nonlinear functions that have linear coefficients, the improvement in computational speed and accuracy using the new algorithm can be dramatic. An illustration of the method with repeated measures data from a learning experiment is presented.

••

TL;DR: The salient findings from the experiments are that the new method substantially outperforms a previous implementation of simulated annealing and is competitive with the most effective metaheuristics for the p-median problem.

Abstract: Several authors have touted the p-median model as a plausible alternative to within-cluster sums of squares (i.e., K-means) partitioning. Purported advantages of the p-median model include the provision of “exemplars” as cluster centers, robustness with respect to outliers, and the accommodation of a diverse range of similarity data. We developed a new simulated annealing heuristic for the p-median problem and completed a thorough investigation of its computational performance. The salient findings from our experiments are that our new method substantially outperforms a previous implementation of simulated annealing and is competitive with the most effective metaheuristics for the p-median problem.

••

TL;DR: In this article, the authors discuss methodological challenges that come in when investigating long-term trends in student achievements, such as the need for adequate operationalizations, the influence of the time of measurement and the necessity of the comparability of assessments, the effect of the assessment format, and the importance of inclusion relevant covariates in item response models.

Abstract: This article discusses large-scale assessment of change in student achievement and takes the study by Hickendorff, Heiser, Van Putten, and Verhelst (2009) as an example. This study compared the achievement of students in the Netherlands in 1997 and 2004 on written division problems. Based on this comparison, they claim that there is a performance decline in this subdomain of mathematics, and that there is a move from applying the digit-based long division algorithm to a less accurate way of working without writing down anything. In our discussion of this study, we address methodological challenges that come in when investigating long-term trends in student achievements, such as the need for adequate operationalizations, the influence of the time of measurement and the necessity of the comparability of assessments, the effect of the assessment format, and the importance of inclusion relevant covariates in item response models. All these issues matter when assessing change in student achievement.

••

TL;DR: A cluster-MDS model for two-way one-mode continuous rating dissimilarity data that aims at partitioning the objects into classes and simultaneously representing the cluster centers in a low-dimensional space is proposed.

Abstract: In this paper, we propose a cluster-MDS model for two-way one-mode continuous rating dissimilarity data. The model aims at partitioning the objects into classes and simultaneously representing the cluster centers in a low-dimensional space. Under the normal distribution assumption, a latent class model is developed in terms of the set of dissimilarities in a maximum likelihood framework. In each iteration, the probability that a dissimilarity belongs to each of the blocks conforming to a partition of the original dissimilarity matrix, and the rest of parameters, are estimated in a simulated annealing based algorithm. A model selection strategy is used to test the number of latent classes and the dimensionality of the problem. Both simulated and classical dissimilarity data are analyzed to illustrate the model.

••

TL;DR: The conditions that provide an easy interpretation and show that in maximum dimensionality they can be obtained without any loss are summarized and shown.

Abstract: Ideal point discriminant analysis is a classification tool which uses highly intuitive multidimensional scaling procedures. However, in the last paper, Takane wrote about it. He concludes that the interpretation is rather intricate and calls that a weakness of the model. We summarize the conditions that provide an easy interpretation and show that in maximum dimensionality they can be obtained without any loss. For reduced dimensionality, it is conjectured that loss is minor which is examined using several data sets.

••

TL;DR: In this article, variable neighborhood search (VNS) was used to develop two heuristics for variable selection in principal component analysis (PCA), which were compared to those obtained by a branch-and-bound algorithm, as well as forward stepwise, backward stepwise and tabu search heuristic.

Abstract: The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection is crucial for identifying those variables that are required for correct interpretation of the components. In this paper, we adapt the variable neighborhood search (VNS) paradigm to develop two heuristics for variable selection in PCA. The performances of these heuristics were compared to those obtained by a branch-and-bound algorithm, as well as forward stepwise, backward stepwise, and tabu search heuristics. In the first experiment, which considered candidate pools of 18 to 30 variables, the VNS heuristics matched the optimal subset obtained by the branch-and-bound algorithm more frequently than their competitors. In the second experiment, which considered candidate pools of 54 to 90 variables, the VNS heuristics provided better solutions than their competitors for a large percentage of the test problems. An application to a real-world data set is provided to demonstrate the importance of variable selection in the context of PCA.