scispace - formally typeset
Search or ask a question

Showing papers in "Psychological test and assessment modeling in 2013"


Journal Article
TL;DR: A systematic review of the methodology for person fit research targeted specifically at methodologists in training can be found in this paper, where the authors analyze the ways in which researchers in the area of person fit have conducted simulation studies for parametric and nonparametric unidimensional IRT models.
Abstract: This paper is a systematic review of the methodology for person fit research targeted specifically at methodologists in training. I analyze the ways in which researchers in the area of person fit have conducted simulation studies for parametric and nonparametric unidimensional IRT models since the seminal review paper by Meijer and Sijtsma (2001). I specifically review how researchers have operationalized different types of aberrant responding for particular testing conditions in order to compar e these simulation design characteristics with features of the real-life testing situations for which person fit analyses are officially reported. I discuss the alignment between the theoretical and practical work and the implications for future simulation work and guidelines for best practice.Key words: Person fit, systematic review, aberrant responding, item response theory, simulation study, generalizability, experimental design.This paper is situated in the conceptual space of research on person fit, which is one aspect of the comprehensive enterprise of critiquing the alignment of the structure of a particular statistical model with a particular data set using residual-based statistics (Engelhard Jr., 2009). I first analyze the ways in which researchers in the area of person fit have conducted simulation studies in non-parametric (e.g., Sijtsma & Molenaar, 2002; van der Aark, Hemker, & Sijtsma, 2002) and parametric unidimensional item response theory (IRT) (e.g., DeAyala, 2009; Yen & Fitzpatrick, 2006) since the seminal review paper by Meijer and Sijtsma (2001). I then discuss the alignment between the theoretical and practical work and the implications for future simulation work and guidelines for best practice.This paper is primarily intended for methodologists in training but should also prove useful for practitioners who are curious about the statistical foundations for proposed guidelines of best practice. The information in this paper may be of less interest for the relatively few specialists who are already conducting advanced simulation studies in this area. However, it should provide some useful insight into the ways these researchers conduct their work for the many other researchers and practitioners who want to be critical consumers of this work.Simulation studies are designed statistical experiments that can provide reliable scientific evidence about the performance of statistical methods. As noted concisely by Cook and Teo (2011):In evaluating methodologies, simulation studies: (i) provide a cost-effective way to quantify potential performance for a large range of scenarios, spanning different combinations of sample sizes and underlying parameters, (ii) allow average performance to be estimated under repeat Monte Carlo sampling and (iii) facilitate comparison of estimates against the "true" system underlying the simulations, none of which is really achievable via genuine applications, as gratifying as those are. (p. I)In the context of person fit research, simulation studies are most commonly used to quantify the frequency of type-I and type-II errors and associated power rates under a variety of test design and model misspecification conditions.Researchers who publish in this area clearly make some concerted and thoughtful efforts to summarize findings from simulation studies, especially when they are trying to situate their particular theoretical work within a relevant part of the literature. Thus, I initially started out writing this paper as a more "traditional" review paper that focused on what researchers had learned about person fit in roughly the last 10 years. However, while reviewing the recent body of work it became quickly clear that there is perhaps a more urgent need to discuss the methodology of simulation research with more scrutiny in order to help methodologists in training understand the kinds of generalizations that can and cannot be made based on this work. …

65 citations


Journal Article
TL;DR: In this paper, the impact of abilitydifficulty fit on test-taking motivation and emotion is unknown and rarely considered when interpreting test results, and the application of computerized adaptive testing is discussed.
Abstract: Usually, it is assumed that achievement tests measure maximum performance. However, test performance is not only associated with ability but also with motivational and emotional aspects of test-taking. These aspects are influenced by individual success probability, which in turn depends on the ratio of individual ability to item difficulty (ability-difficulty fit). The impact of abilitydifficulty fit on test-taking motivation and emotion is unknown and rarely considered when interpreting test results. N = 9,452 ninth-graders in Germany (PISA 2006) completed a mathematics test and a questionnaire on test-taking effort (motivation) and boredom/daydreaming (emotion). Overall, mean item difficulty exceeded individual ability. Ability-difficulty fit was positively linear related with effort and boredom/daydreaming. The results suggest that low ability students may not show maximum performance in a sequential achievement test. Thus, test score interpretation for this subsample may be invalid. As a solution to this problem the application of computerized adaptive testing is discussed.

46 citations


Journal Article
Roza Leikin1
TL;DR: In this article, an integrative theoretical framework was developed based on works devoted to both general and mathematical creativity for the evaluation of mathematical creativity, and the model was used to examine differences in creativity of students with different levels of excellence in mathematics.
Abstract: This paper presents an original model for evaluation of mathematical creativity. I describe different stages of the model's development and justify critical decisions taken throughout, based on the analysis of the model's implementation. The model incorporates an integrative theoretical framework that was developed based on works devoted to both general and mathematical creativity. The scoring scheme for the evaluation of creativity, which is an important part of the model, combines an examination of both divergent and convergent thinking as reflected in problem solving processes and outcomes. The theoretical connection between creativity and divergent thinking is reflected in the multiplicity component of the model, which is based on the explicit requirement to solve mathematical problems in multiple ways. It is evaluated for fluency and flexibility. The connection between creativity and convergent thinking is reflected in the component of insight, which is based on the possibility to produce insight-based solutions to mathematical problems. I provide examples of the study in which the model is used to examine differences in creativity of students with different levels of excellence in mathematics and different levels of general giftedness.

44 citations


Journal Article
TL;DR: In this article, the authors investigated some psychometric properties of the Creative Scientific Ability Test (C-SAT), a domain-specific test of scientific creativity which was developed based on the Scientific Discovery as Dual Search model and pioneering works on divergent thinking.
Abstract: The assessment of creativity has been a controversial issue in the studies of creativity. Contrary to old paradigms, contemporary researchers support the use of domain-specific tests to measure creativity. The purpose of this study was to investigate some psychometric properties of the Creative Scientific Ability Test (C-SAT), a domain-specific test of scientific creativity. The C-SAT was developed based on the Scientific Discovery as Dual Search model and pioneering works on divergent thinking. The test is composed of five subtests and measures fluency, flexibility and creativity and hypothesis generation, hypothesis testing and evidence evaluation in five areas of science. In the study, the C-SAT was administered to 288 sixth grade students in a city in the mid part of Turkey. Factor validity analysis revealed the presence of one component and concurrent validity analysis showed that mathematically talented students scored significantly higher on the C-SAT than did average students. Reliability values of the C-SAT ranged from good (.85) to excellent (.96) and all of the item discrimination correlations were medium or large. Research findings show that the C-SAT can be used as an objective measure of scientific creativity.

44 citations


Journal Article
TL;DR: The Actiotope Model of Giftedness as discussed by the authors focuses on the person-environment interactions and postulates that successful learning requires necessary resources, termed educational and learning capital, located both in the environment and the individual.
Abstract: Unlike traditional person-centered models of giftedness, the Actiotope Model of Giftedness focuses on the person-environment interactions. It postulates that successful learning requires necessary resources, termed educational and learning capital, located both in the environment and the individual. The Questionnaire of Educational and Learning Capital (QELC) is introduced. The results of a validation study with students from China, Turkey and Germany is reported which shows that the QELC has satisfactory psychometric properties as well as construct and concurrent validity.

36 citations


Journal Article
TL;DR: In this paper, the authors analyzed the robustness of the t-and the Wilcoxon-Mann-Whitney U tests under various degrees of positive and negative correlations, population distributions, sample sizes, and true differences in location.
Abstract: A large part of previous work dealt with the robustness of parametric significance tests against non-normality, heteroscedasticity, or a combination of both. The behavior of tests under violations of the independence assumption received comparatively less attention. Therefore, in applications, researches may overlook that robustness and power properties of tests can vary with the sign and the magnitude of the correlation between samples. The common paired t test is known to be less powerful in cases of negative between-group correlations. In this case, Bortz and Schuster (2010) recommend the application of the nonparametric Wilcoxon test. Using Monte-Carlo simulations, we analyzed the behavior of the t- and the Wilcoxon tests for the one- and two-sample problem under various degrees of positive and negative correlations, population distributions, sample sizes, and true differences in location. It is shown that already minimal departures from independence heavily affect Type 1 error rates of the two-sample tests. In addition, results for the one-sample tests clearly suggest that the sign of the underlying correlation cannot be used as a basis to decide whether to use the t test or the Wilcoxon test. Both tests show a dramatic power loss when samples are negatively correlated. Finally, in these cases, the well-known power advantage of the Wilcoxon test diminishes when distributions are skewed and samples are small.Key words: robustness, power, independence assumption, t test, Wilcoxon test(ProQuest: ... denotes formulae omitted.)Ever since the work of W. S. Gosset ('Student', 1908) and R. A. Fisher (1925) on statistical inference about differences in means (Student's t test), a good deal of research focused on the properties of the t statistic. When the assumptions of normality, homoscedasticity, and independence of observations are met, Student's two-sample t test was shown to be the optimal procedure for the comparison of means from independent samples (Hodges & Lehmann, 1956; Randies & Wolfe, 1979). However, in empirical data, violations of one or more assumptions might exist, and robustness properties of significance tests are of great interest. Early theoretical findings suggest that the two-sample t test is fairly robust against violations of the normality assumption (e.g., Bartlett, 1935). This result was confirmed in numerous simulation studies (e.g., Borneau, 1960; Neave & Granger, 1968; Posten, 1978, 1984; Rasch & Guiard, 2004). Although the two-sample t test is able to protect the nominal significance level a under non-normality, considerable evidence exists that the nonparametric Wilcoxon-Mann-Whitney U test is robust and even more powerful under various non-normal distributions (Hodges & Lehmann, 1956; Neave & Granger, 1968; Randies & Wolfe, 1979). In addition, it has been demonstrated that the two-sample t test is robust against violations of equality of variances when sample sizes are equal (e.g., Hsu, 1938; Scheffe, 1970; Posten, Yeh & Owen, 1982, Tuchscherer & Pierer, 1985; Zimmerman, 2006). When both, variances and sample sizes are unequal, the probability of the Type I error exceeds the nominal significance level if the larger variance is associated with the smaller sample size, and vice versa (Moder, 2010; Wiedermann & Alexandrowicz, 2007; Zimmerman, 2006). In this case, Welch's t test (Welch, 1938, 1947) is recommended as an adequate alternative (see also a recent reminder of Rasch, Kubinger & Moder, 2011).Although it is well known that the two-sample t test assumes independent observations, less attention has been paid to non-independence. Here, the distinction of between-group and within-group dependency has to be made. Between-group dependence refers to the fact that observations of two samples are correlated (for example, data obtained from a matched samples design or repeated observations). For the analysis of repeated measurements the term "one-sample problem" is commonly used, which underlines the fact that only one sample of research units is drawn from the underlying population of interest and the construct of interest is measured repeatedly (for details see Rasch, Kubinger & Yanagida, 2011). …

21 citations


Journal Article
TL;DR: In this paper, the authors examined the structure of the relationship between intelligence and mathematical ability and built a comprehensive model to describe this relationship and the nature of mathematical giftedness, and found that intelligence is a predictor of mathematical ability.
Abstract: This study aims to examine the structure of the relationship between intelligence and mathematical giftedness and build a comprehensive model to describe this relationship and the nature of mathematical giftedness. This study also purports to clarify the structure of components of mathematical ability. The third objective is to examine whether students who were identified by two different instruments - (a) mathematical ability and creativity instrument and (b) intelligence instrument - have statistically significant differences across the components of mathematical ability. That is, we want to investigate if variance in identification may be explained by variance in mathematical abilities exhibited by these individuals. To achieve these goals, this study proposes a new domainspecific identification instrument for the assessment of mathematical giftedness, assessing mathematical abilities and creativity. The study was conducted among 359 4th, 5th and 6th grade elementary school students in Cyprus, using two instruments measuring mathematical ability and mathematical creativity and fluid intelligence. The results revealed that mathematical giftedness can be described in terms of mathematical ability and mathematical creativity. Moreover, the analysis illustrated that intelligence is a predictor of mathematical giftedness. Furthermore, the analysis revealed that different groups of students are identified by each type of testing; that is, through the mathematical instrument and the intelligence instrument. This variance may be explained by performance in specific categories of tasks.Key words: giftedness, creativity, mathematical ability, intelligenceIntroductionIn recent years, intelligence testing as the exclusive means of identification of giftedness has received extensive criticism by a number of researchers (Dai, 2010; Lohman & Rocklin, 1995). Contemporary conceptualizations of giftedness acknowledge the multidimensionality (Gagne, 2003; Renzulli, 1978, 2002) and the domain specificity of the concept (Csikszentmihalyi, 2000; Clark, 2002). Hence, former identification processes measuring giftedness solely using intelligence instruments, should be enriched with other domain specific instruments measuring all dimensions of giftedness.In the field of mathematical giftedness, identification was in many cases conducted through intelligence testing with subtests designed to assess mathematical giftedness, such as the Naglieri Non-Verbal Ability Test (Naglieri, 1997), the Wechsler Intelligence Scale for Children Matrix reasoning Test (Wechsler, 1999) and the Raven Progressive Matrices (Raven, Raven, & Court, 2003). These subtests focus on visual perception, spatial ability and the ability to distinguish patterns and find missing elements. At the same time, instruments designed to capture mathematical giftedness, such as TOMAGS (Ryser & Johnsen, 1998) include tasks that are aligned with curriculum standards in mathematics, hence measuring mathematical knowledge more than mathematical reasoning processes. In contrast, we would argue that identification should attempt to capture students' mathematical reasoning abilities, rather than mathematical knowledge, because reasoning skills distinguish gifted from non-gifted students in mathematics.Given these facts, the concept of giftedness should be expanded, in order to encompass contemporary conceptions, models and approaches. Thus, mathematical giftedness may be expressed as a multidimensional construct that is domain specific. To this end, it is the purpose of this study to integrate mathematical abilities and creativity into the assessment of mathematical giftedness through a theoretical model and to translate it into an empirically examined identification process of mathematical giftedness. At the same time, we will show that there is a discrepancy in identifying mathematically gifted students with conventional intelligence tests and specifically designed mathematical instruments, thus suggesting a new way of identifying giftedness in mathematics. …

19 citations


Journal Article
TL;DR: Tirri and Nokelainen as mentioned in this paper introduced a conceptual definition of multiple intelligences based on multiple intelli-gences theory by Howard Gardner, and developed several instruments for self-assessment that can be used in educational settings.
Abstract: This paper is about issues relating to the assessment of multiple intelligences. The first section introduces the authors' work on building measures of multiple intelligences and moral sensitivities. It also provides a conceptual definition of multiple intelligences based on Multiple Intelligences theory by Howard Gardner (1983). The second section discusses the context specificity of intelligences and alternative approaches to measuring multiple intelligences. The third section analyses the validity of self-evaluation instruments and provides a case example of building such an instrument. The paper ends with concluding remarks.Key words: Giftedness, multiple intelligences theory, MIPQ, CFA, Bayesian modeling(ProQuest: ... denotes formula omitted.)IntroductionIn this paper, we introduce our work on building measures of multiple intelligences and moral sensitivities based on the Multiple Intelligences theory of Howard Gardner (1983, 1993). We have developed several instruments for self-assessment that can be used in educational settings (Tirri & Nokelainen, 2011). Gardner's theory of Multiple Intelli- gences (MI) focuses on the concept of an 'intelligence', which he defines as "the ability to solve problems, or to create products, that are valued within one or more cultural settings" (Gardner, 1993, p. x). Gardner lists seven intelligences that meet his criteria for an intelligence, namely linguistic, logical-mathematical, musical, spatial, bodily kines- thetic, interpersonal, and intrapersonal (Gardner, 1993, p. xi).In a broad sense, Gardner views his theory as a contribution to the tradition advocated by Thurstone (1960) and Guilford (1967) because all these theories argue for the existence of a number of factors, or components, of intelligence. All these theories also view intel- ligence as being broader and multidimensional rather than a single, general capacity for conceptualization and problem-solving. Gardner differs from the other pluralists, howev- er, in his attempt to base MI theory upon neurological, evolutionary, and cross-cultural evidence (Gardner, 1993, p. xii). In the first edition of his MI theory, thirty years ago, Gardner (1983) adopted a very individualistic point of view in exploring various intelli- gences. In a newer edition of MI theory, however, Gardner (1993) places more emphasis on the cultural and contextual factors involved in the development of the seven intelli- gences. Gardner retained the original seven intelligences, but acknowledged the possibil- ity of adding new intelligences to the list. For example, he has worked on an eighth intel- ligence - the intelligence of the naturalist - to be included in his list of multiple intelli- gences (Gardner, 1995, p. 206).Robert Sternberg identifies Gardner's MI theory as a systems approach, similar to his own triarchic theory. Although he appreciates Gardner's assessments at a theoretical level, he believes them to be a psychometric nightmare. The biggest challenge for advo- cates of Gardner's approach, then, is to demonstrate the psychometric soundness of their instrument. Sternberg is calling for hard data that would show that the theory works operationally in a way that will satisfy scientists as well as teachers. Sternberg's own theory promises the broader measurement implied by the triarchic theory (Sternberg, 1985). His theory provides "process scores for componential processing, coping with novelty, automatization, and practical-contextual intelligence, and content scores for the verbal, quantitative, and figural content domains" (Sternberg, 1991, p. 266).Sternberg's observations on Gardner's theory should be kept in mind in attempts to create tests based on his theory. However, in the educational setting his theory can be used as a framework in planning a program that would meet the needs of different learn- ers (Tirri, 1997). Gardner has shown a special interest in how schools encourage the different intelligences in students (Gardner, 1991). …

18 citations


Journal Article
TL;DR: Schneider et al. as discussed by the authors investigated the effect of working speed on the position effect in a test with a time limit on the last few items of the test, and found that the model of interruption and also working speed substantially improved model fit, and the best fitting model was characterized by a linearly increasing representation of the position effects combined with a logistic decrease in the more difficult items.
Abstract: The position effect is a possible source of impairment of the structural validity of a test concerning model fit. In the case of tests with a time limit there is even a complication of the situation because of a decreasing number of participants completing the last few items of the test. Therefore, it is assumed that the appropriate representation of the position effect must additionally consider interruption due to the time limit and the effect of working speed. Interruption can be represented by the same latent variable as the position effect whereas the contribution of working speed requires another one. Confirmatory factor models including a representation of the position effect as a linear, quadratic or logarithmic increase were compared with models additionally considering interruption as a logistic decrease or simply as immediate interruption. Furthermore, there were models additionally considering working speed. In the sample of 305 participants the investigation of probability-based covariances made apparent that the modeling of interruption and also working speed substantially improved model fit. The best-fitting model was characterized by a linearly increasing representation of the position effect combined with a logistic decrease in the more difficult items and a contribution due to working speed.Key words: position effect, confirmatory factor analysis, tau-equivalent model, method effect(ProQuest: ... denotes formulae omitted.)The investigation of the position effect started in the 50s (Campbell & Mohr, 1950). Since that time a position effect has been observed in the items of many personality and ability measures. The position effect describes the dependency of the responses to the items on the responses to items that have been processed immediately before. The position effect is especially obvious in the change of item reliability that was found to increase from the first to last items of a scale (Knowles, 1988). Since this effect is observable whenever items representing the same ability or trait must be processed successively, it is considered by some researchers as a source of method variance (Spector, 2006; Spector & Brannick, 2010). What so far has not been considered in this research are the consequences of a time limit to test application. Since a position effect can only be expected to occur when test takers actively process items, a time limit may impair or even eliminate the position effect and individual differences in working speed may become apparent instead.In the late 80s the position effect came into the focus of IRT research. Since that time there have accumulated a number of studies demonstrating that it is possible to identify a position effect by means of IRT models (e.g., Embretson, 1991; Gittler & Wild, 1989; Hohensinn, Kubinger, Reif, Holocher-Ertl, Khorramdel, & Frebort, 2008; Hohensinn, Kubinger, Reif, Schleich, & Khorramdel, 2011; Kubinger, 2003; Kubinger, Formann, & Farkas, 1991; Verguts & De Boeck, 2000). The models were mostly either multidimensional Rasch models or linear logistic test models. In the multidimensional Rasch models the ability or trait on one hand and the position effect on the other hand are represented independently of each other whereas in the linear logistic test models one dimension serves the simultaneous representation of the ability or trait and the position effect. A specificity of the linear logistic test models (LLTM) is the modeling of the position effect by appropriately selected numbers (e.g., Kubinger, 2003). Within the IRT approach of investigating the position effect a specific source of the effect was proposed: learning as the result of becoming familiar with completing the items (Embretson, 1991; Verguts & De Boeck, 2000).An even more recent development is the investigation of the position effect in the framework of confirmatory factor analysis (CFA) (Hartig, Holzl, & Moosbrugger, 2007; Ren, Goldhammer, Moosbrugger, & Schweizer, 2012; Schweizer, 2012a; Schweizer, Schreiner, & Gold, 2009; Schweizer, Troche, & Rammsayer, 2011). …

12 citations


Journal Article
TL;DR: An approach to detecting and accounting for DIF using longitudinal data in which covariation within individuals over time is accounted for by clustering on person is developed and applied to data from English speakers in the Canadian Study of Health and Aging.
Abstract: Many constructs are measured using multi-item data collection instruments. Differential item functioning (DIF) occurs when construct-irrelevant covariates interfere with the relationship between construct levels and item responses. DIF assessment is an active area of research, and several techniques are available to identify and account for DIF in cross-sectional settings. Many studies include data collected from individuals over time; yet appropriate methods for identifying and accounting for items with DIF in these settings are not widely available. We present an approach to this problem and apply it to longitudinal Modified Mini-Mental State Examination (3MS) data from English speakers in the Canadian Study of Health and Aging. We analyzed 3MS items for DIF with respect to sex, birth cohort and education. First, we focused on cross-sectional data from a subset of Canadian Study of Health and Aging participants who had complete data at all three data collection periods. We performed cross-sectional DIF analyses at each time point using an iterative hybrid ordinal logistic regression/item response theory (OLR/IRT) framework. We found that item-level findings differed at the three time points. We then developed and applied an approach to detecting and accounting for DIF using longitudinal data in which covariation within individuals over time is accounted for by clustering on person. We applied this approach to data for the "entire" dataset of English speaking participants including people who later dropped out or died. Accounting for longitudinal DIF modestly attenuated differences between groups defined by educational attainment. We conclude with a discussion of further directions for this line of research.

11 citations


Journal Article
Abstract: Despite the widespread use of standardized IQ tests to measure human intelligence, problems with such measures have led some to suggest that better indices may derive from measurement of cognitive processes underlying performance on IQ tests (e.g., working memory capacity). However, measures from both approaches may exhibit performance biases in favour of majority groups, due to the influence of prior learning and experience. Mental attentional (M-) capacity is proposed to be a causal factor underlying developmental growth in working memory. Measures of M-capacity index important cognitive variance underlying performance on standardized intelligence tests. These measures appear to be reasonably culture-fair and invariant across content domains. The current study tested theoretical predictions regarding the content-invariance of M-measures and the development of M-capacity for groups of children differing in performance on standardized IQ tests. Ninety-one participants differentiated on the basis of academic stream (intellectually gifted vs. mainstream) and age (grade 4 vs. grade 8) received measures of M-capacity in the verbal and visuo-spatial domains. Children identified as gifted scored about one stage higher on both measures. Results suggest that measures of M-capacity may be useful adjuncts to standardized intelligence measures.Key words: mental attention, working memory, intelligence, IQ, giftednessDevelopment of the IQ test to measure human intelligence has been lauded as one of the greatest achievements in the history of psychology (Nisbett et al., 2012). Advocates of IQ testing point to evidence that IQ scores in childhood are predictive of length of schooling (Neisser et al., 1996), academic success (Brody, 1997; Deary, Strand, Smith, & Fernandes, 2007; Gottfredson, 2004; Neisser et al., 1996; Nisbett et al., 2012), socioeconomic and vocational success (Firkowska-Mankiewicz, 2011; Gottfredson, 2004; Neisser et al., 1996; Schmidt & Hunter, 1998, 2004; Strenze, 2007), and even cognitive declines in late adulthood (Bourne, Fox, Deary, & Whalley, 2007). Intellectually precocious children, as identified by exceptionally high scores on standardized intelligence tests, display heightened performance in areas such as mathematics (Hoard, Geary, ByrdCraven, & Nugent, 2007), speed and efficiency of cognitive processing (Jausovec, 1998; Johnson, Im-Bolter, & Pascual-Leone, 2003; Saccuzzo, Johnson, & Guertin, 1994), and resistance to interfering stimuli (Johnson et al., 2003). On the strength of these findings, IQ measures have been widely adopted for selection, placement, and decision-making in educational, vocational, clinical, and research settings (Richardson, 2002; Weinberg, 1989).From this perspective, intelligence is viewed as a cognitive trait that can be reliably measured by IQ tests to yield scores that are related (perhaps causally) to superior cognitive performances and achievements across the lifespan. It has been argued that this cognitive trait is highly stable and largely resistant to meaningful long-term change (Herrnstein & Murray, 1994; Murray, 1996; Rushton, 1995). Numerous studies have demonstrated, however, both the short-term and long-term malleability of intelligence (as measured by IQ tests), as in the case of increased IQ scores after adoption into a more affluent family (Capron & Duyme, 1989; Duyme, Dumaret, & Tomkiewicz, 1999; van Ijzendoorn, Juffer, & Poelhuis, 2005), initial IQ gains and occasional later regression after cognitive training (Campbell, Pungello, Miller-Johnson, Burchinal, & Ramey, 2001; Jaeggi, Buschkuehl, Jonides, & Perrig, 2008; Klingberg, Forssberg, & Westerberg, 2002; Mackey, Hill, Stone, & Bunge, 2011; Rueda, Rothbart, McCandliss, Saccomanno, & Posner, 2005; Wasik, Ramey, Bryant, & Sparling, 1990), change in IQ as a result of various non-cognitive interventions (e.g., nutritional changes, curing infection, increasing motivation; Duckworth, Quinn, Lynam, Loeber, & Stouthamer-Loeber, 2011; Johnson, Swank, Howie, Baldwin, & Owen, 1996; Nokes & Bundy, 1994; Schoenthaler, Amos, Eysenck, Peritz, & Yudkin, 1991), and the rise and decline of IQ scores with continued or delayed/disrupted schooling, respectively (Baltes & Reinert, 1969; Bedard & Dhuey, 2006; Brinch & Galloway, 2012; Ceci, 1991; Ceci & Gilstrap, 2000). …

Journal Article
TL;DR: Several latent trait models for the joint distribution of the responses and response times in rating scales are compared and according to the AIC index, the generalization of the model of Ranger and Ortner (2011) can represent the data best.
Abstract: In this article several latent trait models for the joint distribution of the responses and response times in rating scales are compared. Among these models are two generalizations of established models for binary items, namely a generalization of the approach of Ferrando and Lorenzo-Seva (2007a) and a generalization of the approach of Ranger and Ortner (2011). Two new models and a variant of the hierarchical model of van der Linden (2007) are also considered. All these models combine the graded response model with a response time model based on the log-normal distribution. The models differ in the assumed relationship between the expected log response time and the underlying latent traits. Although the proposed models have different interpretations and implications they can all be calibrated within the same general framework using marginal maximum likelihood estimation and an application of the ECM-algorithm. The models are used for the analysis of an empirical data set. According to the AIC index, the generalization of the model of Ranger and Ortner (2011) can represent the data best.

Journal Article
TL;DR: In this article, the authors suggest that instruments of neuro-cognitive research enable the evaluation of giftedness in mathematics and suggest that mathematically gifted students are those who are both generally gifted and excel in mathematics.
Abstract: In this paper we suggest that instruments of neuro-cognitive research enable the evaluation of giftedness in mathematics. We start with a literature review on the related topics presented so as to situate our suggestions within the existing research on giftedness and excellence in mathematics. This literature review allows us later to discuss our findings, which are based on neurocognitive data collected in a large-scale multidimensional examination of mathematical giftedness. Sampling procedure in the study was performed based on two orthogonal (in our view) characteristics: general giftedness (G) and excellence in mathematics (EM). In this paper we present findings that lead to a definition of the mathematically gifted population. We present selected results to provide evidence for our findings. In this paper we demonstrate three major findings:A. Effects of G and EM factors are task-dependent both in behavioral and neurophysiological measures: the EM factor has significant main effects on tasks that require implementation of knowledge familiar to students from school mathematics. By contrast, the G factor has a significant main effect on insight-based problems which are not part of the school mathematical curriculum and, thus, require original mathematical reasoning.B. Mathematical performance in gifted students who excel in mathematics (G-EM students) on insight-based tasks has specific characteristics in both behavioral and electrophysiological results.C. G-EM participants exhibited superior performance in all the tests, showing a constant neuro-efficiency effect.Based on these observations we suggest that mathematically gifted students are those who are both generally gifted and excel in mathematics.Key words: Giftedness, Excellence in mathematics, Neurocognition, Evaluation, Problem solvingIntroductionExceptional performance in mathematics or mathematical giftedness appears to be a topic of great interest to researchers and educators. In the scientific and pedagogical literature, there is wide and multilateral discussion on the subject of mathematical gifted- ness, its nature and main characteristics, educational challenges and perspectives associ- ated with this phenomenon, and last (but not least) on the principles and methods of identification and assessment of mathematical giftedness. Even so, evaluation of individ- uals who are presented as overperformers or excelling in the field of mathematics is not an easy matter due to the lack of strong definitions of the phenomenon of mathematical giftedness. Consequently, development of tools for evaluation of individual abilities (especially high abilities) in the field of mathematics is not sufficient. Applying brain research to the study of mathematical giftedness seems to be of timely importance and can lead to an operative definition of mathematical giftedness and consequently to the development of tools that enable researchers to identify mathematical giftedness.1. Background1.1 Giftedness and excellence in mathematicsMathematical giftedness is an extremely complex construct which implies high mathe- matical abilities. The construct of mathematical giftedness bridges two fields of educa- tional psychology: gifted education and mathematics education. In the field of gifted education, mathematical giftedness is usually considered to be a distinct type of specific giftedness which is opposed to general giftedness (Piirto, 1999). General giftedness is usually measured by means of IQ tests, whereas mathematical giftedness implies high achievement in mathematics demonstrating strong mathematical skills. Mathematical giftedness among research mathematicians is expressed in the generation of original mathematical ideas and proofs that lead to advancements in mathematical theory. This leads to the question of how mathematical giftedness can be identified in school children. This paper examines brain activity associated with mathematical problem solving in 10th and 11th grade students. …

Journal Article
TL;DR: In this article, the influence of fine motor skills (FMS) and attention on underachievement and achievement was explored. And the results indicated that underachievers had lower attention and FMS, and that attention mediated the relation between FMS and maths achievement.
Abstract: Underachievers are children who show academic performance that is lower than what would be expected for their IQ. Previous research has investigated a number of variables that might explain underachievement and recently fine motor skills (FMS) have been implicated as playing an important role. We extend this work by exploring the influence of FMS and attention on underachievement and achievement. Fourth-grade children in Germany (n = 357, age = 10.8) were tested on measures of intelligence, attention, and FMS, and teachers were asked to report grades in mathematics. Amongst other findings, analyses indicated that underachievers had lower attention and FMS and that attention mediated the relation between FMS and maths achievement. Overall, the current findings contribute to the growing body of evidence that FMS play an important role in underachievement and are, therefore, a candidate for inclusion in the identification processes.

Journal Article
TL;DR: In this paper, the authors examined whether the unidimensional sequential probability ratio test (SPRT) can be pro- ductively combined with multidimensional adaptive testing (MAT) and concluded that MAT will result in a higher percentage of correct classifications than UCAT when more than two dimensions are measured.
Abstract: It is examined whether the unidimensional Sequential Probability Ratio Test (SPRT) can be pro- ductively combined with multidimensional adaptive testing (MAT). With a simulation study, it is investigated whether this combination results in more accurate simultaneous classifications on two or three dimensions compared to several instances of unidimensional adaptive testing (UCAT) in combination with SPRT. The number of cut scores, and the correlation between the dimensions measured were varied. The average test length was mainly influenced by the number of cut scores (one, four) and the adaptive algorithm (MAT, UCAT). With MAT, a lower average test length was achieved in comparison to the UCAT. It is concluded that MAT will result in a higher percentage of correct classifications than UCAT when more than two dimensions are measured.Key words: classification, computerized adaptive testing, item response theory, multidimensional adaptive testing, sequential probability ratio test(ProQuest: ... denotes formulae omitted.)Multidimensional adaptive testing (MAT) is a special approach to the assessment of two or more latent abilities in which the selection of the test items presented to the examinee is based on the responses given by the examinee to previously administered items (e.g., Frey & Seitz, 2009). The main advantage of MAT is its capacity to substantially increase measurement efficiency compared to sequential testing or unidimensional computerized adaptive testing (UCAT). Most of the studies on MAT are focusing its application for assessing individual abilities located on continuous scales. Currently, only very little is known about the capabilities of MAT regarding the classification of test takers to one of several ability categories (e.g., pass vs. fail). To fill in this gap, the present paper focuses on the combination of MAT with the sequential probability ratio test (SPRT; e.g., Kings- bury & Weiss, 1983; Reckase, 1983). The SPRT is a classification method that already has been used successfully in combination with UCAT (e.g., Eggen, 1999; Eggen & Straetmans, 2000; Spray & Reckase, 1996; Thompson, 2007b).Regarding MAT, Spray, Abdel-fattah, Huang, and Lau (1997) made an attempt to modi- fy the SPRT in order to use it with MAT based on items with within-item multidimen- sionality. Items with within-item multidimensionality are allowed to measure more than one dimension simultaneously (Wang, Wilson, & Adams, 1997). Dealing with within- item multidimensionality, the multidimensional item response theory (IRT) model used with MAT is a compensatory model (e.g., Reckase, 2009). With such an IRT-model, the linear combination of the abilities measured leads to a curvilinear function. Therefore, the test statistic of the SPRT, which is a likelihood ratio test, cannot be updated by two unique values required by the SPRT. For details, see Spray et al. (1997). Considering multidimensional pass-fail tests, Spray and colleagues did not find a satisfactory solution for implementing a multidimensional SPRT into such a MAT.Nevertheless, from a practical point of view, tests entailing items measuring exactly one dimension each (between-item multidimensionality) are much more common than tests based on an item pool with within-item multidimensionality. Hence, the present paper focusses on the combination of MAT and SPRT for items with between-item multidi- mensionality. Note that when the MAT approach of Segall (1996) is used for items with between-item multidimensionality, information from items which measure one dimen- sion is used as information about the person's score on other dimensions. This is done by incorporating assumption about the multivariate ability distribution in terms of correla- tions between the measured dimensions. Several studies showed that using this infor- mation results in substantial increase in measurement efficiency compared to using sev- eral unidimensional adaptive tests (e. …

Journal Article
TL;DR: In this article, the authors investigated the measurement invariance of the PISA 2000 and PISA 2009 reading instruments using Item Response Theory models and found that the instruments are not measurement invariant and that some link items show large differences in item difficulty.
Abstract: An important pre-requisite of trend analyses in large scale educational assessments is the measurement invariance of the testing instruments across cycles. This paper investigates the measurement invariance of the PISA 2000 and PISA 2009 reading instruments using Item Response Theory models. Links between the PISA 2000 and PISA 2009 instruments were analyzed using data from a sample tested in 2009 which took both the PISA 2000 and PISA 2009 instruments and additionally using part of the German PISA 2000 sample. Model fit comparisons showed that the instruments are not measurement invariant and that some link items show large differences in item difficulty. Position effects may explain some of these differences and may also influence the size of the link error.

Journal Article
TL;DR: In this paper, the authors present a method for calculating the classification accuracy of composite scores using a two stage applica-tion of the polytomous extension of the Lord-Wingersky recursive algorithm.
Abstract: Presented is a demonstration of an intuitively simple, flexible and computationally inexpensive approach to estimating classification accuracy indices for composite score scales formed from the aggregation of performance on two or more assessments. This approach uses a two stage applica-tion of the polytomous extension of the Lord-Wingersky recursive algorithm and can be driven by any IRT model with desired simplicity or required complexity to best represent the properties of the tests. The approach is demonstrated using operational data from a high stakes mathematics qualifi-cation which is formed from two tests administered on distinct occasions. To provide the simplest representation of a test containing both dichotomous and polytomous items, the partial credit model is applied to model behaviour on the two tests. As an extension to this, a testlet model is applied to allow joint calibration of parameters from both tests. This model provides more information to the calibration process at the expense of some added computational complexity. Further to this, the potential application of this approach in the absence of operational data is investigated using a comparison of simulated data to the observed data.Key words: Classification accuracy, IRT, composite scores(ProQuest: ... denotes formulae omitted.)1 IntroductionThe purpose of this paper is to present a new method for calculating the classification accuracy of composite scores. Wherever scores are reported as classifications such as pass / fail or grade A to grade E users of those scores have an interest in understanding how accurate those classification decisions are. Classification accuracy approaches pro-vide an estimate of the accuracy of the grading through a comparison of the degree to which observed classifications agree with those based on examinees' true scores (Lee, Hanson, & Brennan, 2002; Livingston & Lewis, 1995). Composite score classification accuracy refers to the accuracy of classification when scores have been scaled or aggre-gated across multiple assessments (Livingston & Lewis, 1995). There are a number of benefits to understanding the extent of misclassification and the factors that influence it. These include being aware of the potential consequences when designing assessments (or combinations of assessments) used for a qualification, as part of assessment quality control monitoring processes, and also in educating users of qualification results in areas such as the over-interpretation of grades.Many previous studies have considered classification accuracy for single assessments. Wheadon and Stockford (2011) presented an empirical evaluation of the classification accuracy and consistency of single assessments forming high stakes qualifications in England. This adopted both a Classical Test Theory (CTT) and Item Response Theory (IRT) approach as previously implemented in other assessment contexts by Livingston and Lewis (1995) (CTT) and Lee (2008) (IRT). For application of CTT approaches to classification accuracy see also Breyer and Lewis (1994), Hanson and Brennan (1990), Woodruffand Sawyer (1989), and Peng and Subkoviak (1980). In greater depth, Ver-stralen and Verhelst (1991) have investigated the consequences of applying different IRT based measurement models for item calibration and accuracy calculation in an item banking scheme, with Lee, Hanson, and Brennan (2002), Wang, Kolen, and Harris (2000), and Bramley and Dhawan (2010), considering further IRT based approaches at the test level.Regarding articulations of classification accuracy at the composite score level, He (2009) considers the extensions available for more conventional reliability indicators; however, composite score classification accuracy has only been considered in a limited number of studies. Van Rijn, Verstralen, and Beguin (2009), Douglas and Mislevy (2010) and Chester (2003) looked at the consequences of different decision rules applied to classify candidates based on composite scores including consideration of the validity of the rules dependent on the content and aims of the assessment. …

Journal Article
TL;DR: In this article, the authors investigated the effect of item order on item bank construction, item calibration, and ability estimation for a computer adaptive test for anxiety and found that item order had little impact on item calibration and ability.
Abstract: Item banks are typically constructed from responses to items that are presented in one fixed order; therefore, order effects between subsequent items may violate the independence assumption. We investigated the effect of item order on item bank construction, item calibration, and ability estimation. 15 polytomous items similar to items used in a pilot version of a computer adaptive test for anxiety (Walter et al., 2005; Walter et al., 2007) were presented in one fixed order or in a order randomly generated for each respondent. A total of n=520 out-patients participated in the study. Item calibration (Generalized Partial Credit Model) yielded only small differences of slope and location parameters. Simulated test runs using either the full item bank or an adaptive algorithm produced very similar ability estimates (expected a posteriori estimation). These results indicate that item order had little impact on item calibration and ability estimation for this item set.Key words: item response theory; computer adaptive testing; local independence; item bank construction(ProQuest: ... denotes formulae omitted.)1. IntroductionLocal item independence is a central assumption of almost any application of Item Response Theory models. Items are locally independent if for respondents at the same level of the underlying latent trait ? responses to any given item are independent of responses to other items of the test (Henning, 1989). Local independence does not prevent items from correlating across the range of all observed ability levels, but it does imply lack of correlation among items if the ability level is fixed. Therefore, local independence is a way to state that it is indeed the latent trait that explains the relations between item responses. Local independence may be violated if other person parameters such as other latent traits are involved in the responses. If this is the case, responses have to be explained by multiple latent variables rather than by one underlying latent trait only, and, therefore, the application of a unidimensional item response model may no longer be appropriate. Lack of independence can also ensue if the response to one item is no longer independent of the responses to previous items. This type of response dependence can occur when previous items contain clues to following items and item order obviously plays an important role here. In the literature, these two types of item dependence, trait multidimensionality and response dependence, are often not clearly distinguished from each other and checking an item bank for local independence is often simply referred to as "ensuring unidimensionality". This is particularly true when unidimensional item response models are used, which, despite the rising interest in multidimensional item response models (e.g. Reckase, 2009), are still dominant in practical applications of item response theory such as the construction of item banks for computer adaptive testing. Table 1 shows the steps required to construct an item bank for unidimensional computer adaptive testing (Walter, 2010). Local item independence and item order play a crucial role in this process. For the construction of the item bank, the order of presentation of the items is typically fixed. In an adaptive test, the item selection algorithm determines the order of presentation and this order can vary for each respondent.The purpose of the present study is to investigate the impact of item order on item bank construction. The general idea is to compare item parameter estimates obtained from responses given to items presented in fixed order with item parameters estimated from responses given to items that were presented in random order. Numerical differences in item parameter estimates may or may not have significant impact on ability estimates. Practitioners are usually much more interested in ability levels of respondents rather than in item parameters. The focus of this study is, therefore, on quantifying how much ability level estimations differ when item banks are constructed from responses to items given in fixed versus in random order. …

Journal Article
TL;DR: Howard, Johnson, and Pascual-Leone as discussed by the authors examined the role of fine motor skills and attention in underachievement in a sample of gifted and mainstream school students.
Abstract: The terms 'giftedness' and 'intelligence' are widely used in everyday parlance. But as psychological terms both have been embroiled in heated debate, at times, as to their definition and their measurement. In the early part of the twentieth century, giftedness was equated with intelligence, strongly influenced by Lewis Terman's work in the development of the Stanford-Binet Intelligence Test and the longitudinal study published as Genetic Studies of Genius (Burks, Jensen, & Terman, 1930; Cox, 1926; Terman, 1926; Terman & Oden, 1947, 1959). It is not surprising, therefore, that the IQ test became the default means for identifying giftedness.IQ tests, themselves, have engendered polarised viewpoints ranging from the laudatory to the denigratory. For example, the field of psychology has long regarded the IQ test as a success story and the American Academy for the Advancement of Science included it as one of the twenty most significant scientific discoveries of the twentieth century (Benson, 2003). At the other end of the spectrum, Stephen Jay Gould (1981) argued that the tests promoted racist agendas derived from their underlying principle of 'biological determinism'. Gould described biological determinism as "the abstraction of intelligence as a single entity, its location within the brain, its quantification as one number for each individual, and the use of these numbers to rank people in a single series of worthiness, invariably to find that oppressed and disadvantaged groups - races, classes, or sexes - are innately inferior and deserve their status" (pp. 24-25).While debate about the validity of IQ tests - their value and limitations - continued throughout the twentieth century, the conceptualisations of intelligence were steadily expanding as researchers concluded that the richness and complexity of intelligence was not adequately captured by such tests. By the middle of the twentieth century, theorists were defining intelligence more broadly (see, e.g., Gardner, 1983; Guilford, 1967: Sternberg, 1984).Given the link between intelligence and giftedness, it is not surprising that the concept of giftedness was also broadening during the twentieth century. The expanded notions of giftedness demanded means of identification that went beyond IQ tests, largely so that children from disadvantaged backgrounds could be more readily identified for gifted programs.It is beyond the scope of this editorial to comment on the ongoing issues related to the definitions of intelligence and giftedness, respectively. Suffice it to observe that both concepts have expanded in the last few decades and, as a result, research effort has been directed at developing assessments that capture the complexity of the concepts. The papers in this special issue share the aim of assessing - and, indeed, understanding - giftedness in some of its manifestations.The first paper in the issue is entitled "Measurement of mental attention: Assessing a cognitive component underlying performance on standardized intelligence tests" and was contributed by Howard, Johnson, and Pascual-Leone (2013). The researchers address one of the limitations of IQ tests, which is that related to cultural 'fairness'. They focus, instead, on mental attentional capacity, which they examine by comparing this capacity to intelligence in a sample of gifted and mainstream school students.The second paper, "Identifying the causes of underachievement: A plea for the inclusion of fine motor skills", by Stoeger, Suggate, and Ziegler (2013) considers the issue of discrepancies between a child's potential, as might be measured by an IQ test, and his or her school achievement. Where such discrepancies exist, the term 'underachievement' has been widely adopted in the giftedness literature (see, e.g., McCoach & Siegle, 2003). Stoeger and her colleagues explore the role of fine motor skills and attention in such under-performance. …

Journal Article
TL;DR: In this paper, three new predictions about the dynamics of top positions are formulated and tested with two samples from the world of sports: the best male chess players (individual sport) and male national soccer teams.
Abstract: The performance differences between successively ranked individuals tend to increase towards the top. However, the mathematical foundations of this effect are still largely untapped. This article will focus on developing such foundations. It will also be shown that the effect is stable for various natural distributions of eminent achievements. Three new predictions about the dynamics of top positions are formulated and tested with two samples from the world of sports: the best male chess players (individual sport) and male national soccer teams. The stabilization effect describes the phenomenon that the stability of ranks is higher among the top ranks. The reversed Matthew effect asserts that achievement gains among elite players and elite teams are positively correlated with their ranks (i.e. diminishing towards the top). However, in contrast, the Heraclitus effect predicts that the performance gains among the top ranks are nevertheless bigger than what can be mathematically expected from the position in the ranking. All three effects can be empirically corroborated.

Journal Article
TL;DR: In this paper, the recovery of item and person parameters of the one-parameter logistic model for short tests administered to small samples was investigated, and the degree of mismatch likely to occur in practice has a relatively modest effect on parameter recovery.
Abstract: This simulation study investigated the recovery of item and person parameters of the one-parameter logistic model for short tests administered to small samples. A potential problem with such small scale testing is the mismatch between item and person location parameter distributions. In our study, we manipulated the match of these distributions as well as test length, sample size, and item discrimination. Results showed the degree of mismatch likely to occur in practice has a relatively modest effect on parameter recovery. As expected, accuracy in parameter estimation decreased as sample size and test length decreased. Nevertheless, researchers investigating small scale tests are likely to view parameter recovery as acceptable if a study has at least 100 subjects and 8 items.

Journal Article
TL;DR: In this article, a study is presented which tries to explain and predict high academic achievement in children or adolescents on the basis of intellectual and non-intellectual determinants, in this case, performance relevant personality traits as well as the social environment of stimulation.
Abstract: In this paper a study is presented which tries to explain and predict high academic achievement in children or adolescents on the basis of intellectual and non-intellectual determinants – in this case, performance relevant personality traits as well as the social environment of stimulation. The prognosis of high academic achievement is based on a new diagnostic model, the Viennese Diagnostic Model of High Achievement Potential, which undergoes its first empirical validation here. The results show impressive evidence that performance-relevant personality traits and categories of social environment of stimulation contribute to high academic achievement in children and adolescents of above-average intelligence.

Journal Article
TL;DR: In this article, the effect of item order on item calibration and item bank construction for computer adaptive tests is investigated, and the correlation of the individual difference between the estimated ability and the mean difficulty of the processed items to self-reported effort and boredom is investigated.
Abstract: Guest editorialPart 2 of the special topic "Current issues in educational and psychological measurement: Design, calibration, and adaptive testing" of Psychological Test and Assessment Modeling continues the series of research papers dealing with empirical research questions related to calibration designs and computerized adaptive testing. This part includes three papers that add to the foregoing publications.The first paper entitled "Effect of item order on item calibration and item bank construction for computer adaptive tests" by Walter and Rose (2013) focuses on the central independence assumption and its relation to item calibration designs, bridging the gap to the papers by Yousfi and Bohme (2012), Kubinger, Steinfeld, Reif and Yanagida (2012) as well as Frey and Bernhardt (2012) published in the Part 1 of this special issue. Walter and Rose (2013) provide an experimental comparison to investigate the effect of two different calibration designs on the estimated item parameters and the ability estimates of simulated adaptive tests using the resulting item banks.The second paper entitled "Too hard, too easy, or just right? The relationship between effort or boredom and ability-difficulty fit" by Asseburg and Frey (2013) turns the attention to motivational and emotional aspects of achievement tests and their relation to test performance. Similar to Hartig and Buchholz (2012), individual differences in measures derived from Item Response Theory are investigated. In analyzing data from a second testing day of the PISA 2006 assessment in Germany, Asseburg and Frey (2013) show the correlation of the individual difference between the estimated ability and the mean difficulty of the processed items to self-reported effort and boredom.The final paper entitled "The sequential probability ratio test for multidimensional adaptive testing with between-item multidimensionality" by Seitz and Frey (2013) analyzes the sequential probability ratio test (SPRT) that was also addressed by Patton, Cheng, Yuan and Diao (2012) in Part 1. In comparison to Patton et al. (2012) who use the SPRT in combination with unidimensional adaptive testing, Seitz and Frey (2013) examine this method for classifying individuals into one of several ability categories within multidimensional adaptive testing with between-item multidimensionality.As guest editors of both parts of this special topic, we would again like to thank the contributing authors for their elaborated and highly interesting articles that are providing new and important insights in the field of "Educational and Psychological Measurement". …

Journal Article
TL;DR: In this paper, the unifying framework of generalized linear models was introduced for psychometrics research, and the first influential step in this formation was the postgraduate program at the Institute of Advanced Studies in Vienna, where he took courses, among others, from John Nelder and where he became familiar with the unified framework.
Abstract: He received his PhD from the University of Vienna in the famous and seminal psychometrics research group there, with Klaus Kubinger and Gerhard Fischer as his advisors. The next influential step in his formation was the postgraduate program at the Institute of Advanced Studies in Vienna, where he took courses, among others, from John Nelder and where he became familiar with the unifying framework of generalized linear models.

Journal Article
TL;DR: In this article, a rank ordering of preference of multi-attributive choice alternatives is suggested, where the choice alternatives are characterised by several attributes (dimensions) which themselves are given by strict partial orders (rank orders with possibly ties, or rating scales).
Abstract: A rank ordering of preference of multiattributive choice alternatives is suggested. The choice alternatives are characterised by several attributes (dimensions) which themselves are assumed to be given by strict partial orders (rank orders with possibly ties, or “rating scales”). A dominance relation is defined on the alternatives: an alternative dominates another if it is at least as good as the other in all dimensions and strictly superior in at least one dimension. The result is a multidimensional partial order. The problem is to choose a single best or a given number of best alternatives from the choice set. The solution must not involve comparisons of ranks of different dimensions, if the decision maker is a single individual, or of rank orders of different individuals, if the decision maker is a social group (social choice function). The (modified) percentile rank score (Scheiblechner, 2002, 2003) is suggested as scaling function. The performance of the (modified) percentile score is illustrated by the examples of the results of the competitors of the decathlon of Olympic Games at Beijing 2007 and World Championships at Berlin 2009.