scispace - formally typeset
Search or ask a question
Topic

Score test

About: Score test is a research topic. Over the lifetime, 2989 publications have been published within this topic receiving 137479 citations. The topic is also known as: Lagrange multiplier test & LM test.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the authors derived the likelihood analysis of vector autoregressive models allowing for cointegration and showed that the asymptotic distribution of the maximum likelihood estimator of the cointegrating relations can be found by reduced rank regression and derives the likelihood ratio test of structural hypotheses about these relations.
Abstract: This paper contains the likelihood analysis of vector autoregressive models allowing for cointegration. The author derives the likelihood ratio test for cointegrating rank and finds it asymptotic distribution. He shows that the maximum likelihood estimator of the cointegrating relations can be found by reduced rank regression and derives the likelihood ratio test of structural hypotheses about these relations. The author shows that the asymptotic distribution of the maximum likelihood estimator is mixed Gaussian, allowing inference for hypotheses on the cointegrating relation to be conducted using the Chi(" squared") distribution. Copyright 1991 by The Econometric Society.

9,112 citations

Journal ArticleDOI
TL;DR: The meaning of Cronbach’s alpha, the most widely used objective measure of reliability, is explained and the underlying assumptions behind alpha are explained in order to promote its more effective use.
Abstract: Medical educators attempt to create reliable and valid tests and questionnaires in order to enhance the accuracy of their assessment and evaluations. Validity and reliability are two fundamental elements in the evaluation of a measurement instrument. Instruments can be conventional knowledge, skill or attitude tests, clinical simulations or survey questionnaires. Instruments can measure concepts, psychomotor skills or affective values. Validity is concerned with the extent to which an instrument measures what it is intended to measure. Reliability is concerned with the ability of an instrument to measure consistently.1 It should be noted that the reliability of an instrument is closely associated with its validity. An instrument cannot be valid unless it is reliable. However, the reliability of an instrument does not depend on its validity.2 It is possible to objectively measure the reliability of an instrument and in this paper we explain the meaning of Cronbach’s alpha, the most widely used objective measure of reliability. Calculating alpha has become common practice in medical education research when multiple-item measures of a concept or construct are employed. This is because it is easier to use in comparison to other estimates (e.g. test-retest reliability estimates)3 as it only requires one test administration. However, in spite of the widespread use of alpha in the literature the meaning, proper use and interpretation of alpha is not clearly understood. 2, 4, 5 We feel it is important, therefore, to further explain the underlying assumptions behind alpha in order to promote its more effective use. It should be emphasised that the purpose of this brief overview is just to focus on Cronbach’s alpha as an index of reliability. Alternative methods of measuring reliability based on other psychometric methods, such as generalisability theory or item-response theory, can be used for monitoring and improving the quality of OSCE examinations 6-10, but will not be discussed here. What is Cronbach alpha? Alpha was developed by Lee Cronbach in 195111 to provide a measure of the internal consistency of a test or scale; it is expressed as a number between 0 and 1. Internal consistency describes the extent to which all the items in a test measure the same concept or construct and hence it is connected to the inter-relatedness of the items within the test. Internal consistency should be determined before a test can be employed for research or examination purposes to ensure validity. In addition, reliability estimates show the amount of measurement error in a test. Put simply, this interpretation of reliability is the correlation of test with itself. Squaring this correlation and subtracting from 1.00 produces the index of measurement error. For example, if a test has a reliability of 0.80, there is 0.36 error variance (random error) in the scores (0.80×0.80 = 0.64; 1.00 – 0.64 = 0.36).12 As the estimate of reliability increases, the fraction of a test score that is attributable to error will decrease.2 It is of note that the reliability of a test reveals the effect of measurement error on the observed score of a student cohort rather than on an individual student. To calculate the effect of measurement error on the observed score of an individual student, the standard error of measurement must be calculated (SEM).13 If the items in a test are correlated to each other, the value of alpha is increased. However, a high coefficient alpha does not always mean a high degree of internal consistency. This is because alpha is also affected by the length of the test. If the test length is too short, the value of alpha is reduced.2, 14 Thus, to increase alpha, more related items testing the same concept should be added to the test. It is also important to note that alpha is a property of the scores on a test from a specific sample of testees. Therefore investigators should not rely on published alpha estimates and should measure alpha each time the test is administered.14 Use of Cronbach’s alpha Improper use of alpha can lead to situations in which either a test or scale is wrongly discarded or the test is criticised for not generating trustworthy results. To avoid this situation an understanding of the associated concepts of internal consistency, homogeneity or unidimensionality can help to improve the use of alpha. Internal consistency is concerned with the interrelatedness of a sample of test items, whereas homogeneity refers to unidimensionality. A measure is said to be unidimensional if its items measure a single latent trait or construct. Internal consistency is a necessary but not sufficient condition for measuring homogeneity or unidimensionality in a sample of test items. 5, 15 Fundamentally, the concept of reliability assumes that unidimensionality exists in a sample of test items16 and if this assumption is violated it does cause a major underestimate of reliability. It has been well documented that a multidimensional test does not necessary have a lower alpha than a unidimensional test. Thus a more rigorous view of alpha is that it cannot simply be interpreted as an index for the internal consistency of a test. 5, 15, 17 Factor Analysis can be used to identify the dimensions of a test.18 Other reliable techniques have been used and we encourage the reader to consult the paper “Applied Dimensionality and Test Structure Assessment with the START-M Mathematics Test” and to compare methods for assessing the dimensionality and underlying structure of a test.19 Alpha, therefore, does not simply measure the unidimensionality of a set of items, but can be used to confirm whether or not a sample of items is actually unidimensional.5 On the other hand if a test has more than one concept or construct, it may not make sense to report alpha for the test as a whole as the larger number of questions will inevitable inflate the value of alpha. In principle therefore, alpha should be calculated for each of the concepts rather than for the entire test or scale. 2, 3 The implication for a summative examination containing heterogeneous, case-based questions is that alpha should be calculated for each case. More importantly, alpha is grounded in the ‘tau equivalent model’ which assumes that each test item measures the same latent trait on the same scale. Therefore, if multiple factors/traits underlie the items on a scale, as revealed by Factor Analysis, this assumption is violated and alpha underestimates the reliability of the test.17 If the number of test items is too small it will also violate the assumption of tau-equivalence and will underestimate reliability.20 When test items meet the assumptions of the tau-equivalent model, alpha approaches a better estimate of reliability. In practice, Cronbach’s alpha is a lower-bound estimate of reliability because heterogeneous test items would violate the assumptions of the tau-equivalent model.5 If the calculation of “standardised item alpha” in SPSS is higher than “Cronbach’s alpha”, a further examination of the tau-equivalent measurement in the data may be essential. Numerical values of alpha As pointed out earlier, the number of test items, item inter-relatedness and dimensionality affect the value of alpha.5 There are different reports about the acceptable values of alpha, ranging from 0.70 to 0.95. 2, 21, 22 A low value of alpha could be due to a low number of questions, poor inter-relatedness between items or heterogeneous constructs. For example if a low alpha is due to poor correlation between items then some should be revised or discarded. The easiest method to find them is to compute the correlation of each test item with the total score test; items with low correlations (approaching zero) are deleted. If alpha is too high it may suggest that some items are redundant as they are testing the same question but in a different guise. A maximum alpha value of 0.90 has been recommended.14 Summary High quality tests are important to evaluate the reliability of data supplied in an examination or a research study. Alpha is a commonly employed index of test reliability. Alpha is affected by the test length and dimensionality. Alpha as an index of reliability should follow the assumptions of the essentially tau-equivalent approach. A low alpha appears if these assumptions are not meet. Alpha does not simply measure test homogeneity or unidimensionality as test reliability is a function of test length. A longer test increases the reliability of a test regardless of whether the test is homogenous or not. A high value of alpha (> 0.90) may suggest redundancies and show that the test length should be shortened.

8,701 citations

Journal ArticleDOI
TL;DR: In this paper, a reliability coefficient is proposed to indicate quality of representation of interrelations among attributes in a battery by a maximum likelihood factor analysis, which can indicate that an otherwise acceptable factor model does not exactly represent the interrelations between the attributes for a population.
Abstract: Maximum likelihood factor analysis provides an effective method for estimation of factor matrices and a useful test statistic in the likelihood ratio for rejection of overly simple factor models. A reliability coefficient is proposed to indicate quality of representation of interrelations among attributes in a battery by a maximum likelihood factor analysis. Usually, for a large sample of individuals or objects, the likelihood ratio statistic could indicate that an otherwise acceptable factor model does not exactly represent the interrelations among the attributes for a population. The reliability coefficient could indicate a very close representation in this case and be a better indication as to whether to accept or reject the factor solution.

6,359 citations

Journal ArticleDOI
TL;DR: The Lagrange multiplier (LM) statistic as mentioned in this paper is based on the maximum likelihood ratio (LR) procedure and is used to test the effect on the first order conditions for a maximum of the likelihood of imposing the hypothesis.
Abstract: Many econometric models are susceptible to analysis only by asymptotic techniques and there are three principles, based on asymptotic theory, for the construction of tests of parametric hypotheses. These are: (i) the Wald (W) test which relies on the asymptotic normality of parameter estimators, (ii) the maximum likelihood ratio (LR) procedure and (iii) the Lagrange multiplier (LM) method which tests the effect on the first order conditions for a maximum of the likelihood of imposing the hypothesis. In the econometric literature, most attention seems to have been centred on the first two principles. Familiar " t-tests " usually rely on the W principle for their validity while there have been a number of papers advocating and illustrating the use of the LR procedure. However, all three are equivalent in well-behaved problems in the sense that they give statistics with the same asymptotic distribution when the null hypothesis is true and have the same asymptotic power characteristics. Choice of any one principle must therefore be made by reference to other criteria such as small sample properties or computational convenience. In many situations the W test is attractive for this latter reason because it is constructed from the unrestricted estimates of the parameters and their estimated covariance matrix. The LM test is based on estimation with the hypothesis imposed as parametric restrictions so it seems reasonable that a choice between W or LM be based on the relative ease of estimation under the null and alternative hypotheses. Whenever it is easier to estimate the restricted model, the LM test will generally be more useful. It then provides applied researchers with a simple technique for assessing the adequacy of their particular specification. This paper has two aims. The first is to exposit the various forms of the LM statistic and to collect together some of the relevant research reported in the mathematical statistics literature. The second is to illustrate the construction of LM tests by considering a number of particular econometric specifications as examples. It will be found that in many instances the LM statistic can be computed by a regression using the residuals of the fitted model which, because of its simplicity, is itself estimated by OLS. The paper contains five sections. In Section 2, the LM statistic is outlined and some alternative versions of it are discussed. Section 3 gives the derivation of the statistic for

5,826 citations

Journal ArticleDOI
TL;DR: The propensity score, defined as the conditional probability of being treated given the covariates, can be used to balance the variance of covariates in the two groups, and therefore reduce bias as mentioned in this paper.
Abstract: In observational studies, investigators have no control over the treatment assignment. The treated and non-treated (that is, control) groups may have large differences on their observed covariates, and these differences can lead to biased estimates of treatment effects. Even traditional covariance analysis adjustments may be inadequate to eliminate this bias. The propensity score, defined as the conditional probability of being treated given the covariates, can be used to balance the covariates in the two groups, and therefore reduce this bias. In order to estimate the propensity score, one must model the distribution of the treatment indicator variable given the observed covariates. Once estimated the propensity score can be used to reduce bias through matching, stratification (subclassification), regression adjustment, or some combination of all three. In this tutorial we discuss the uses of propensity score methods for bias reduction, give references to the literature and illustrate the uses through applied examples.

4,948 citations


Network Information
Related Topics (5)
Statistical hypothesis testing
19.5K papers, 1M citations
78% related
Nonparametric statistics
19.9K papers, 844.1K citations
78% related
Missing data
21.3K papers, 784.9K citations
74% related
Sample size determination
21.3K papers, 961.4K citations
74% related
Multivariate statistics
18.4K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202311
202225
202141
202053
201934
201854