scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 1994"


01 Jan 1994

869 citations


Journal ArticleDOI
TL;DR: In this article, the authors demonstrate how a non-recursive, a simple recursive, a modified recursive, and a hybrid outlier elimination procedure are influenced by population skew and sample size.
Abstract: Results from a Monte Carlo study demonstrate how a non-recursive, a simple recursive, a modified recursive, and a hybrid outlier elimination procedure are influenced by population skew and sample size. All the procedures are based on computing a mean and a standard deviation from a sample in order to determine whether an observation is an outlier. Miller (1991) showed that the estimated mean produced by the simple non-recursive procedure can be affected by sample size and that this effect can produce a bias in certain kinds of experiments. We extended this result to the other three procedures. We also create two new procedures in which the criterion used to identify outliers is adjusted as a function of sample size so as to produce results that are unaffected by sample size.

817 citations


Journal ArticleDOI
TL;DR: A general method for statistical testing in experiments with an adaptive interim analysis based on the observed error probabilities from the disjoint subsamples before and after the interim analysis, and rules for assessing the sample size in the second stage of the trial are given.
Abstract: SUMMARY A general method for statistical testing in experiments with an adaptive interim analysis is proposed. The method is based on the observed error probabilities from the disjoint subsamples before and after the interim analysis. Formally, an intersection of individual null hypotheses is tested by combining the twop-values into a global test statistic. Stopping rules for Fisher's product criterion in terms of critical limits for the p-value in the first subsample are introduced, including early stopping in the case of missing effects. The control of qualitative treatment-stage interactions is considered. A generalization to three stages is outlined. The loss of power when using the product criterion instead of the optimal classical test on the whole sample is calculated for the test of the mean of a normal distribution, depending on increasing proportions of the first subsample in relation to the total sample size. An upper bound on the loss of power due to early stopping is derived. A general example is presented and rules for assessing the sample size in the second stage of the trial are given. The problems of interpretation and precautions to be taken for applications are discussed. Finally, the sources of bias for estimation in such designs are described.

765 citations


Journal ArticleDOI
TL;DR: Empirical best linear unbiased prediction as well as empirical and hierarchical Bayes seem to have a distinct advantage over other methods in small area estimation.
Abstract: Small area estimation is becoming important in survey sampling due to a growing demand for reliable small area statistics from both public and private sectors. It is now widely recognized that direct survey estimates for small areas are likely to yield unacceptably large standard errors due to the smallness of sample sizes in the areas. This makes it necessary to "borrow strength" from related areas to find more accurate estimates for a given area or, simultaneously, for several areas. This has led to the development of alternative methods such as synthetic, sample size dependent, empirical best linear unbiased prediction, empirical Bayes and hierarchical Bayes estimation. The present article is largely an appraisal of some of these methods. The performance of these methods is also evaluated using some synthetic data resembling a business population. Empirical best linear unbiased prediction as well as empirical and hierarchical Bayes, for most purposes, seem to have a distinct advantage over other methods.

738 citations


Journal ArticleDOI
TL;DR: The problems with the framework of statistical power are elucidated in the course of explaining why post hoc estimates of power are of little help in interpreting results and why the focus of attention should be exclusively on confidence intervals.
Abstract: Although there is a growing understanding of the importance of statistical power considerations when designing studies and of the value of confidence intervals when interpreting data, confusion exists about the reverse arrangement: the role of confidence intervals in study design and of power in interpretation. Confidence intervals should play an important role when setting sample size, and power should play no role once the data have been collected, but exactly the opposite procedure is widely practiced. In this commentary, we present the reasons why the calculation of power after a study is over is inappropriate and how confidence intervals can be used during both study design and study interpretation.

595 citations


Journal ArticleDOI
13 Jul 1994-JAMA
TL;DR: The pattern over time in the level of statistical power and the reporting of sample size calculations in published randomized controlled trials (RCTs) with negative results is described and few trials discussed whether the observed differences were clinically important.
Abstract: Objective. —To describe the pattern over time in the level of statistical power and the reporting of sample size calculations in published randomized controlled trials (RCTs) with negative results. Design. —Our study was a descriptive survey. Power to detect 25% and 50% relative differences was calculated for the subset of trials with negative results in which a simple two-group parallel design was used. Criteria were developed both to classify trial results as positive or negative and to identify the primary outcomes. Power calculations were based on results from the primary outcomes reported in the trials. Population. —We reviewed all 383 RCTs published inJAMA, Lancet, and theNew England Journal of Medicinein 1975, 1980, 1985, and 1990. Results. —Twenty-seven percent of the 383 RCTs (n=102) were classified as having negative results. The number of published RCTs more than doubled from 1975 to 1990, with the proportion of trials with negative results remaining fairly stable. Of the simple two-group parallel design trials having negative results with dichotomous or continuous primary outcomes (n=70), only 16% and 36% had sufficient statistical power (80%) to detect a 25% or 50% relative difference, respectively. These percentages did not consistently increase overtime. Overall, only 32% of the trials with negative results reported sample size calculations, but the percentage doing so has improved over time from 0% in 1975 to 43% in 1990. Only 20 of the 102 reports made any statement related to the clinical significance of the observed differences. Conclusions. —Most trials with negative results did not have large enough sample sizes to detect a 25% or a 50% relative difference. This result has not changed over time. Few trials discussed whether the observed differences were clinically important. There are important reasons to change this practice. The reporting of statistical power and sample size also needs to be improved. (JAMA. 1994;272:122-124)

544 citations


Journal ArticleDOI
TL;DR: A nonparametric estimation technique is proposed which uses the concept of sample coverage in order to estimate the size of a closed population for capture-recapture models where time, behavior, or heterogeneity may affect the capture probabilities.
Abstract: A nonparametric estimation technique is proposed which uses the concept of sample coverage in order to estimate the size of a closed population for capture-recapture models where time, behavior, or heterogeneity may affect the capture probabilities. The technique also provides a unified approach to catch-effort models that allows for heterogeneity among removal probabilities. Real data examples are given for illustration. A simulation study investigates the behavior of the proposed procedure.

460 citations


Journal Article
TL;DR: In this paper, the authors discuss the features of study design and analysis which are necessary to derive valid reference centiles for fetal size, and describe a study which meets the stated criteria.

341 citations


Journal ArticleDOI
TL;DR: The features of study design and analysis which are necessary to derive valid reference centiles for fetal size, and a study which meets the stated criteria are discussed.

336 citations


Journal ArticleDOI
TL;DR: The power calculations indicate that sampling all concordant and half of the discordant pairs would be efficient, as along as the cost of screening is not too high, and a useful approach may be to assess the population with a screening instrument.
Abstract: We explore the power of the twin study to resolve sources of familial resemblance when the data are measured at the binary or ordinal level. Four components of variance were examined: additive genetic, nonadditive genetic, and common and specific environment. Curves are presented to compare the power of the continuous case with those of threshold models corresponding to different prevalences in the population: 1, 5, 10, 25, and 50%. Approximately three times the sample size is needed for equivalent power to the continuous case when the threshold is at the optimal 50%, and this ratio increases to about 10 times when 10% are above threshold. Some power may be recovered by subdividing those above threshold to form three or more ordered classes, but power is determined largely by the lowest threshold. Non-random ascertainment of twins (i) through affected twins and examining their cotwins or (ii) through ascertainment of all pairs in which at least one twin is affected increases power. In most cases, strategy i is more efficient than strategy ii. Though powerful for the rarer disorders, these methods suffer the disadvantage that they rely on prior knowledge of the population prevalence. Furthermore, sampling from hospital cases may introduce biases, reducing their value. A useful approach may be to assess the population with a screening instrument; the power calculations indicate that sampling all concordant and half of the discordant pairs would be efficient, as along as the cost of screening is not too high.

294 citations


Journal ArticleDOI
James R. Lewis1
TL;DR: Recently, Virzi (1992) presented data that support three claims regarding sample sizes for usability studies, providing evidence that the binomial probability formula may provide a good model for predicting problem discovery curves, given an estimate of the average likelihood of problem detection.
Abstract: Recently, Virzi (1992) presented data that support three claims regarding sample sizes for usability studies: (1) observing four or five participants will allow a usability practitioner to discover 80% of a product's usability problems, (2) observing additional participants will reveal fewer and fewer new usability problems, and (3) more severe usability problems are easier to detect with the first few participants Results from an independent usability study clearly support the second claim, partially support the first, but fail to support the third Problem discovery shows diminishing returns as a function of sample size Observing four to five participants will uncover about 80% of a product's usability problems as long as the average likelihood of problem detection ranges between 032 and 042, as in Virzi If the average likelihood of problem detection is lower, then a practitioner will need to observe more than five participants to discover 80% of the problems Using behavioral categories for problem severity (or impact), these data showed no correlation between problem severity (impact) and rate of discovery The data provided evidence that the binomial probability formula may provide a good model for predicting problem discovery curves, given an estimate of the average likelihood of problem detection Finally, data from economic simulations that estimated return on investment (ROI) under a variety of settings showed that only the average likelihood of problem detection strongly influenced the range of sample sizes for maximum ROI

Journal ArticleDOI
TL;DR: The idea of using data from the first 'few' patients entered in a clinical trial to estimate the final trial size needed to have specified power for rejecting H0 in favour of H1 if a real difference exists is developed.
Abstract: We develop the idea of using data from the first 'few' patients entered in a clinical trial to estimate the final trial size needed to have specified power for rejecting H0 in favour of H1 if a real difference exists. When comparing means derived from Normally distributed data, there is no important effect on test size, power or expected trial size, provided that a minimum of about 20 degrees of freedom are used to estimate residual variance. Relative advantages and disadvantages of using larger internal pilot studies are presented. These revolve around crude expectations of the final study size, recruitment rate, duration of follow-up and practical constraints on the ability to prevent the circulation of unblinded randomization codes to investigators and those involved in editing and checking data.

Journal ArticleDOI
TL;DR: In this article, the authors considered the properties of the X chart when the size of each sample depends on what is observed in the preceding hours and the usual practice in using a control chart to monitor a process is to take samples of size n from the process every h hours.
Abstract: The usual practice in using a control chart to monitor a process is to take samples of size n from the process every h hours. This article considers the properties of the X chart when the size of each sample depends on what is observed in the preceding ..

Journal ArticleDOI
TL;DR: In this paper, the authors considered the null distribution of 2 log λ petertodd n�, where λ� n� is the likelihood ratio statistic, and showed that it is pivotal in the sense of constant percentiles over the unknown parameter.
Abstract: We here consider testing the hypothesis ofhomogeneity against the alternative of a two-component mixture of densities. The paper focuses on the asymptotic null distribution of 2 log λ n , where λ n is the likelihood ratio statistic. The main result, obtained by simulation, is that its limiting distribution appears pivotal (in the sense of constant percentiles over the unknown parameter), but model specific (differs if the model is changed from Poisson to normal, say), and is not at all well approximated by the conventional χ (2) 2 -distribution obtained by counting parameters. In Section 3, the binomial with sample size parameter 2 is considered. Via a simple geometric characterization the case for which the likelihood ratio is 1 can easily be identified and the corresponding probability is found. Closed form expressions for the likelihood ratio λ n are possible and the asymptotic distribution of 2 log λ n is shown to be the mixture giving equal weights to the one point distribution with all its mass equal to zero and the χ2-distribution with 1 degree of freedom. A similar result is reached in Section 4 for the Poisson with a small parameter value (θ≤0.1), although the geometric characterization is different. In Section 5 we consider the Poisson case in full generality. There is still a positive asymptotic probability that the likelihood ratio is 1. The upper precentiles of the null distribution of 2 log λ n are found by simulation for various populations and shown to be nearly independent of the population parameter, and approximately equal to the (1–2α)100 percentiles of χ (1) 2 . In Sections 6 and 7, we close with a study of two continuous densities, theexponential and thenormal with known variance. In these models the asymptotic distribution of 2 log λ n is pivotal. Selected (1−α) 100 percentiles are presented and shown to differ between the two models.

Journal ArticleDOI
TL;DR: Using the Ewens sampling distribution of selectively neutral alleles in a finite population, it is possible to develop an exact test of neutrality by finding the probability of each configuration with the same sample size and observed number of allelic classes.
Abstract: Using the Ewens sampling distribution of selectively neutral alleles in a finite population, it is possible to develop an exact test of neutrality by finding the probability of each configuration with the same sample size and observed number of allelic classes. The exact test provides the probability of obtaining a configuration with the same or smaller probability as the observed configuration under the null hypothesis. The results from the exact test may be quite different from those from the Ewens-Watterson test based on the homozygosity in the sample. The advantages and disadvantages of using an exact test in this and other population genetic contexts are discussed.

Journal ArticleDOI
TL;DR: In this article, a Monte Carlo simulation assessed the relative power of two techniques that are commonly used to test for moderating effects and found that MMR was more powerful than SCC for virtually all of the 81 conditions.
Abstract: A Monte Carlo simulation assessed the relative power of 2 techniques that are commonly used to test for moderating effects. The authors drew 500 samples from simulation-based populations for each of 81 conditions in a design that varied sample size, the reliabilities of 2 predictor variables (1 of which was the moderator variable), and the magnitude of the moderating effect. They tested the null hypothesis of no interaction effect by using moderated multiple regression (MMR). They then successively polychotomized each sample into 2, 3, 4, 6, and 8 subgroups and tested the equality of the subgroup-based correlation coefficients (SCC). Results showed MMR to be more powerful than the SCC strategy for virtually all of the 81 conditions

Journal ArticleDOI
TL;DR: Conventional and alternative strategies for cross-validation are discussed as methods for evaluating overall discrepancy of a model fit to a particular sample, where overall discrepancy arises from the combined influences of discrepancy of approximation and discrepancy of estimation.
Abstract: Alternative strategies for two-sample cross-validation of covariance structure models are described and investigated. The strategies vary according to whether all (tight strategy) or some (partial strategy) of the model parameters are held constant when a calibration sample solution is re-fit to a validation sample covariance matrix. Justification is provided for three partial strategies. Conventional and alternative strategies for cross-validation are discussed as methods for evaluating overall discrepancy of a model fit to a particular sample, where overall discrepancy arises from the combined influences of discrepancy of approximation and discrepancy of estimation (Cudeck & Henly, 1991). Results of a sampling study using empirical data show that for tighter strategies simpler models are preferred in smaller samples. However, when partial cross-validation is employed, a more complex model may be supported even in a small sample. Implications for model comparison and evaluation, as well as the issues of model complexity and sample size are discussed.

Journal ArticleDOI
TL;DR: This work proposes practical Bayesian guidelines for deciding whether E is promising relative to S in settings where patient response is binary and the data are monitored continuously, and provides decision boundaries, a probability distribution for the sample size at termination, and operating characteristics under fixed response probabilities with E.
Abstract: A Phase IIB clinical trial typically is a single-arm study aimed at deciding whether a new treatment E is sufficiently promising, relative to a standard therapy, S, to include in a large-scale randomized trial. Thus, Phase IIB trials are inherently comparative even though a standard therapy arm usually is not included. Uncertainty regarding the response rate theta s of S is rarely made explicit, either in planning the trial or interpreting its results. We propose practical Bayesian guidelines for deciding whether E is promising relative to S in settings where patient response is binary and the data are monitored continuously. The design requires specification of an informative prior for theta s, a targeted improvement for E, and bounds on the allowed sample size. No explicit specification of a loss function is required. Sampling continues until E is shown to be either promising or not promising relative to S with high posterior probability, or the maximum sample size is reached. The design provides decision boundaries, a probability distribution for the sample size at termination, and operating characteristics under fixed response probabilities with E.

Journal ArticleDOI
TL;DR: The technique of cross-validation is extended to the case where observations form a general stationary sequence, and taking h to be a fixed fraction of the sample size is proposed to reduce the training set by removing the h observations preceding and following the observation in the test set.
Abstract: SUMMARY In this paper we extend the technique of cross-validation to the case where observations form a general stationary sequence. We call it h-block cross-validation, because the idea is to reduce the training set by removing the h observations preceding and following the observation in the test set. We propose taking h to be a fixed fraction of the sample size, and we add a term to our h-block cross-validated estimate to compensate for the underuse of the sample. The advantages of the proposed modification over the cross-validation technique are demonstrated via simulation.

Journal ArticleDOI
TL;DR: In this article, a class of boundaries for group sequential clinical trials that allow for early stopping also when small treatment differences are observed is considered. But these boundaries can be easily applied to both one-and two-sided hypothesis testing.

Journal ArticleDOI
TL;DR: This paper examined the influence of sample size and model parsimony on a set of 22 goodness-of-fit indices including those typically used in confirmatory factor analysis and some recently developed indices.
Abstract: The purpose of the present investigation is to examine the influence of sample size (N) and model parsimony on a set of 22 goodness-of-fit indices including those typically used in confirmatory factor analysis and some recently developed indices. For sample data simulated from two known population data structures, values for 6 of 22 fit indices were reasonably independent ofN and were not significantly affected by estimating parameters known to have zero values in the population: two indices based on noncentrality described by McDonald (1989; McDonald and Marsh, 1990), a relative (incremental) index based on noncentrality (Bentler, 1990; McDonald & Marsh, 1990), unbiased estimates of LISREL's GFI and AGFI (Joreskog & Sorbom, 1981) presented by Steiger (1989, 1990) that are based on noncentrality, and the widely known relative index developed by Tucker and Lewis (1973). Penalties for model complexity designed to control for sampling fluctuations and to address the inevitable compromise between goodness of fit and model parsimony were evaluated.

Journal ArticleDOI
TL;DR: In this article, the authors present two methods for stepwise selection of sampling units, and corresponding schemes for removal of units that can be used in connection with sample rotation, and describe practical, geometrically convergent algorithms for computing the wi from the 7i.
Abstract: SUMMARY Attention is drawn to a method of sampling a finite population of N units with unequal probabilities and without replacement. The method was originally proposed by Stern & Cover (1989) as a model for lotteries. The method can be characterized as maximizing entropy given coverage probabilities 7Ci, or equivalently as having the probability of a selected sample proportional to the product of a set of 'weights' wi. We show the essential uniqueness of the wi given the 7i, and describe practical, geometrically convergent algorithms for computing the wi from the 7i. We present two methods for stepwise selection of sampling units, and corresponding schemes for removal of units that can be used in connection with sample rotation. Inclusion probabilities of any order can be written explicitly in closed form. Second-order inclusion probabilities 7rij satisfy the condition 0< ij < 7ic 7j, which guarantees Yates & Grundy's variance estimator to be unbiased, definable for all samples and always nonnegative for any sample size.

Journal ArticleDOI
TL;DR: In this article, the authors show that the familiar bootstrap plug-in rule of Efron has a natural analog in finite population settings, and they show that their method can be used to generate second-order correct confidence intervals for smooth functions of population means, a property that has not been established for other resampling methods suggested in the literature.
Abstract: We show that the familiar bootstrap plug-in rule of Efron has a natural analog in finite population settings. In our method a characteristic of the population is estimated by the average value of the characteristic over a class of empirical populations constructed from the sample. Our method extends that of Gross to situations in which the stratum sizes are not integer multiples of their respective sample sizes. Moreover, we show that our method can be used to generate second-order correct confidence intervals for smooth functions of population means, a property that has not been established for other resampling methods suggested in the literature. A second resampling method is proposed that also leads to second-order correct confidence intervals and is less computationally intensive than our bootstrap. But a simulation study reveals that the second method can be quite unstable in some situations, whereas our bootstrap performs very well.

Journal ArticleDOI
TL;DR: In this article, the authors investigated the finite sample properties of likelihood ratio tests for stochastic cointegration and found that these asymptotic test procedures are not very powerful for sample sizes that are typical for economic time series.
Abstract: This paper investigates the finite sample properties of likelihood ratio tests for 'stochastic cointegration' that have recently been proposed by S. Johansen and P. Perron and J. Y. Campbell. The author transforms the model into a canonical form and conducts a comprehensive simulation study. He finds that the test performance is very sensitive to the value of the stationary root(s) of the process and to the correlation between the innovations that drive the stationary and nonstationary components of the process. Unfortunately, the simulation results suggest that these asymptotic test procedures are not very powerful for sample sizes that are typical for economic time series. Copyright 1994 by MIT Press.

Proceedings ArticleDOI
24 May 1994
TL;DR: In this article, the problem of approximately verifying the truth of sentences of tuple relational calculus in a given relation M by considering only a random sample of M was considered and upper and lower bounds for the sample sizes required for having a high probability that all the sentences with error at least e can be detected as false by considering the sample were given.
Abstract: We consider the problem of approximately verifying the truth of sentences of tuple relational calculus in a given relation M by considering only a random sample of M. We define two different measures for the error of a universal sentence in a relation. For a set of n universal sentences each with at most k universal quantifiers, we give upper and lower bounds for the sample sizes required for having a high probability that all the sentences with error at least e can be detected as false by considering the sample. The sample sizes are O((log n)/e) or O((|M|1–1/k)log n/e), depending on the error measure used. We also consider universal-existential sentences.

Posted Content
TL;DR: In this paper, the authors examine the small sample properties of the GMM estimator for models of covariance structures, where the technique is often referred to as the optimal minimum distance (OMD) estimator.
Abstract: We examine the small sample properties of the GMM estimator for models of covariance structures, where the technique is often referred to as the optimal minimum distance (OMD) estimator. We present a variety of Monte Carlo experiments based on simulated data and on the data used by Abowd and Card (1987, 1990) in an examination of the covariance structure of hours and earnings changes. Our main finding is that OMD is seriously biased in small samples for many distributions and in relatively large samples for poorly behaved distributions. The bias is almost always downward in absolute value. It arises because sampling errors in the second moments are correlated with sampling errors in the weighting matrix used by OMD. Furthermore, OMD usually has a larger root mean square error and median absolute error than equally weighted minimum distance (EWMD). We also propose and investigate an alternative estimator, which we call independently weighted optimal minimum distance (IWOMD). IWOMD is a split sample estimator using separate groups of observations to estimate the moments and the weights. IWOMD has identical large sample properties to the OMD estimator but is unbiased regardless of sample size. However, the Monte Carlo evidence indicates that IWOMD is usually dominated by EWMD.

Journal ArticleDOI
TL;DR: A comprehensive design equation is provided, relating sample size to precision for cohort and cross-sectional designs, and it is shown that the follow-up cost and selection bias attending a cohort design may outweigh any theoretical advantage in precision.
Abstract: In planning large longitudinal field trials, one is often faced with a choice between a cohort design and a cross-sectional design, with attendant issues of precision, sample size, and bias. To provide a practical method for assessing these trade-offs quantitatively, we present a unifying statistical model that embraces both designs as special cases. The model takes account of continuous and discrete endpoints, site differences, and random cluster and subject effects of both a time-invariant and a time-varying nature. We provide a comprehensive design equation, relating sample size to precision for cohort and cross-sectional designs, and show that the follow-up cost and selection bias attending a cohort design may outweigh any theoretical advantage in precision. We provide formulae for the minimum number of clusters and subjects. We relate this model to the recently published prevalence model for COMMIT, a multi-site trial of smoking cessation programmes. Finally, we tabulate parameter estimates for some physiological endpoints from recent community-based heart-disease prevention trials, work an example, and discuss the need for compiling such estimates as a basis for informed design of future field trials.

Journal ArticleDOI
TL;DR: Results presented here demonstrate that case-control designs can be used to detect gene-environment interaction when there is both a common exposure and a highly polymorphic marker of susceptibility.
Abstract: As genetic markers become more available, case-control studies will be increasingly important in defining the role of genetic factors in disease causality. The authors estimate the minimum sample size needed to assure adequate statistical power to detect gene-environment interaction. One assumption is made: the prevalence of exposure is independent of marker genotypes among controls. Given the assumption, six parameters (three odds ratios, the prevalence of exposure, the proportion of those with the susceptible genotype, and the ratio of controls to cases) dictate the expected cell sizes in a 2 x 2 x 2 table contrasting genetic susceptibility, exposure, and disease. The three odds ratios reflect the association between disease and 1) exposure among non-susceptibles; 2) susceptible genotypes among nonexposed individuals; and 3) the gene-environment interaction itself, respectively. Given these parameters, the number of cases and controls needed to assure any particular Type I and Type II error rates can be estimated. Results presented here demonstrate that case-control designs can be used to detect gene-environment interaction when there is both a common exposure and a highly polymorphic marker of susceptibility.

Journal ArticleDOI
TL;DR: In this paper, the authors derive exact finite sample disbibutions and characterizes the tail behavior of maximum likelihood estimators of the cointegrating coefficients in error correction models, showing that extreme outliers occur more frequently for the reduced rank regression estimator than for alternative asymptotically efficient procedures based on triangular representation.
Abstract: The author derives some exact finite sample disbibutions and characterizes the tail behavior of maximum likelihood estimators of the cointegrating coefficients in error correction models. The reduced rank regression estimator has a distribution with Cauchy-like tails and no finite moments of integer order. The maximum likelihood estimator of the coefficients in a particular triangular system representation has matrix t-distribution tails with finite integer moments to order T - n + r, where T is the sample size, n is the total number of variables, and r is the dimension of cointegration space. This helps explain some recent simulation studies where extreme outliers occur more frequently for the reduced rank regression estimator than for alternative asymptotically efficient procedures based on triangular representation. Copyright 1994 by The Econometric Society.

Journal ArticleDOI
TL;DR: In this paper, two nonparametric procedures, the Mantel-Haenszel (MH) procedure and the simultaneous item bias (SIB) procedure, were compared with respect to their Type I error rates and power.
Abstract: Two nonparametric procedures for detecting differ ential item functioning (DIF)—the Mantel-Haenszel (MH) procedure and the simultaneous item bias (SIB) procedure—were compared with respect to their Type I error rates and power. Data were simulated to reflect conditions varying in sample size, ability distribution differences between the focal and reference groups, pro portion of DIF items in the test, DIF effect sizes, and type of item. 1,296 conditions were studied. The SIB and MH procedures were equally powerful in detecting uniform DIF for equal ability distributions. The SIB procedure was more powerful than the MH procedure in detecting DIF for unequal ability distributions. Both procedures had sufficient power to detect DIF for a sample size of 300 in each group. Ability distribution did not have a significant effect on the SIB procedure but did affect the MH procedure. This is important because ability distribu tion differences between two groups often are found in practice. The Type I error rates f...