scispace - formally typeset
Search or ask a question

Showing papers in "Communications in Statistics - Simulation and Computation in 2009"


Journal ArticleDOI
Peter C. Austin1
TL;DR: The utility and interpretation of the standardized difference for comparing the prevalence of dichotomous variables between two groups is explored, and a standardized difference of 10% is equivalent to having a phi coefficient of 0.05 for the correlation between treatment group and the binary variable.
Abstract: Researchers are increasingly using the standardized difference to compare the distribution of baseline covariates between treatment groups in observational studies. Standardized differences were initially developed in the context of comparing the mean of continuous variables between two groups. However, in medical research, many baseline covariates are dichotomous. In this article, we explore the utility and interpretation of the standardized difference for comparing the prevalence of dichotomous variables between two groups. We examined the relationship between the standardized difference, and the maximal difference in the prevalence of the binary variable between two groups, the relative risk relating the prevalence of the binary variable in one group compared to the prevalence in the other group, and the phi coefficient for measuring correlation between the treatment group and the binary variable. We found that a standardized difference of 10% (or 0.1) is equivalent to having a phi coefficient of 0.05 ...

1,532 citations


Journal ArticleDOI
TL;DR: This article reviewed and proposed some estimators based on Kibria (2003) and Khalaf and Shukur (2005) that performed well compared to the ordinary least squared (OLS) estimator and some existing popular estimators.
Abstract: In ridge regression analysis, the estimation of the ridge parameter k is an important problem. Many methods are available for estimating such a parameter. This article reviewed and proposed some estimators based on Kibria (2003) and Khalaf and Shukur (2005). A simulation study has been made and mean squared error (MSE) criteria are used to compare the performances of the estimators. We observed that under certain conditions some of the proposed estimators performed well compared to the ordinary least squared (OLS) estimator and some existing popular estimators. Finally, a numerical example has been considered to illustrate the performance of the estimators.

249 citations


Journal ArticleDOI
TL;DR: A simulation study compares the performance of four major hierarchical methods for clustering functional data and yields concrete suggestions to future researchers to determine the best method for clustered their functional data.
Abstract: Functional data analysis (FDA)—the analysis of data that can be considered a set of observed continuous functions—is an increasingly common class of statistical analysis. One of the most widely used FDA methods is the cluster analysis of functional data; however, little work has been done to compare the performance of clustering methods on functional data. In this article, a simulation study compares the performance of four major hierarchical methods for clustering functional data. The simulated data varied in three ways: the nature of the signal functions (periodic, non periodic, or mixed), the amount of noise added to the signal functions, and the pattern of the true cluster sizes. The Rand index was used to compare the performance of each clustering method. As a secondary goal, clustering methods were also compared when the number of clusters has been misspecified. To illustrate the results, a real set of functional data was clustered where the true clustering structure is believed to be known. Compari...

210 citations


Journal ArticleDOI
TL;DR: In this article, acceptance sampling plans are developed for the Birnbaum–Saunders distribution percentiles when the life test is truncated at a pre-specified time to ensure the specified life percentile is obtained under a given customer's risk.
Abstract: Time to failure due to fatigue is one of the common quality characteristics in material engineering applications. In this article, acceptance sampling plans are developed for the Birnbaum–Saunders distribution percentiles when the life test is truncated at a pre-specified time. The minimum sample size necessary to ensure the specified life percentile is obtained under a given customer's risk. The operating characteristic values (and curves) of the sampling plans as well as the producer's risk are presented. The R package named spbsq is developed to implement the developed sampling plans. Two examples with real data sets are also given as illustration.

109 citations


Journal ArticleDOI
TL;DR: An algorithm is obtained to compute all the coherent systems with n components and their signatures and it is shown that there exist 180 coherent system with 5 components and the signatures are shown.
Abstract: The signatures of coherent systems are useful tools to compute the system reliability functions, the system expected lifetimes and to compare different systems using stochastic orderings. It is well known that there exist 2, 5, and 20 different coherent systems with 2, 3, and 4 components, respectively. The signatures for these systems were given in Shaked and Suarez-Llorens (2003). In this article, we obtain an algorithm to compute all the coherent systems with n components and their signatures. Using this algorithm we show that there exist 180 coherent systems with 5 components and we compute their signatures.

87 citations


Journal ArticleDOI
TL;DR: A series of different copula models providing various residual dependence structures are considered for vectors of count response variables whose marginal distributions depend on covariates through negative binomial regressions.
Abstract: Multivariate count data occur in several different disciplines. However, existing models do not offer great flexibility for dependence modeling. Models based on copulas nowadays are widely used for continuous data dependence modeling. Modeling count data via copulas is still in its infancy; see the recent article of Genest and Neslehova (2007). A series of different copula models providing various residual dependence structures are considered for vectors of count response variables whose marginal distributions depend on covariates through negative binomial regressions. A real data application related to the number of purchases of different products is provided.

77 citations


Journal ArticleDOI
TL;DR: A spectral domain method for handling time series of unequal length is proposed to make the spectral estimates comparable by producing statistics at the same frequency.
Abstract: In statistical data analysis it is often important to compare, classify, and cluster different time series. For these purposes various methods have been proposed in the literature, but they usually assume time series with the same sample size. In this article, we propose a spectral domain method for handling time series of unequal length. The method make the spectral estimates comparable by producing statistics at the same frequency. The procedure is compared with other methods proposed in the literature by a Monte Carlo simulation study. As an illustrative example, the proposed spectral method is applied to cluster industrial production series of some developed countries.

59 citations


Journal ArticleDOI
TL;DR: Variations of a method derived by Zou performed relatively well in simulations for the dependent case and for the independent case, the Wilcox–Muska method performed best.
Abstract: Methods for computing a confidence interval for the difference between two Pearson correlations are compared when dealing with nonnormality and heteroscedasticity. Variations of a method derived by Zou performed relatively well in simulations for the dependent case. For the independent case, the Wilcox–Muska method performed best.

58 citations


Journal ArticleDOI
TL;DR: A variables sampling plan based on L e is proposed to handle processes requiring low process loss and provides a feasible policy, which can be applied to products requiringLow process loss where classical sampling plans cannot be applied.
Abstract: For the implementation of an acceptance sampling plan, a problem the quality practitioners have to deal with is the determination of the critical acceptance values and inspection sample sizes that provide the desired levels of protection to both vendors and buyers. Traditionally, most acceptance sampling plans focus on the percentage of defective products instead of considering the process loss, which doesn't distinguish among the products that fall within the specification limits. However, the quality between products that fall within the specification limits may be very different. So how to design an acceptance sampling plan with process loss consideration is necessary. In this article, a variables sampling plan based on L e is proposed to handle processes requiring low process loss. The required sample sizes n and the critical acceptance value c with various combination of acceptance quality level are tabulated. The proposed sampling plan provides a feasible policy, which can be applied to products req...

48 citations


Journal ArticleDOI
TL;DR: The Archimedean two-parameter BB7 copula is adopted to describe the underlying dependence structure between two consecutive returns, while the log-Dagum distribution is employed to model the margins marked by skewness and kurtosis.
Abstract: In financial analysis it is useful to study the dependence between two or more time series as well as the temporal dependence in a univariate time series. This article is concerned with the statistical modeling of the dependence structure in a univariate financial time series using the concept of copula. We treat the series of financial returns as a first order Markov process. The Archimedean two-parameter BB7 copula is adopted to describe the underlying dependence structure between two consecutive returns, while the log-Dagum distribution is employed to model the margins marked by skewness and kurtosis. A simulation study is carried out to evaluate the performance of the maximum likelihood estimates. Furthermore, we apply the model to the daily returns of four stocks and, finally, we illustrate how its fitting to data can be improved when the dependence between consecutive returns is described through a copula function.

37 citations


Journal ArticleDOI
TL;DR: The comparison result shows the DS chart has performance as good as the VP chart, but it is more effective than the Shewhart chart in detecting small process mean shifts.
Abstract: In this study, the performances of double sampling (DS) chart under non normality are presented and compared with Shewhart chart and VP chart. The comparison result shows the DS chart has performance as good as the VP chart, but it is more effective than the Shewhart chart in detecting small process mean shifts.

Journal ArticleDOI
TL;DR: The main contribution of this study lies in showing that the Kenward–Roger method corrects the liberal Type I error rates obtained with the Between–Within and Satterthwaite approaches, especially with positive pairings between group sizes and covariance matrices.
Abstract: This research examines the Type I error rates obtained when using the mixed model with the Kenward-Roger correction and compares them with the Between–Within and Satterthwaite approaches in split-plot designs. A simulated study was conducted to generate repeated measures data with small samples under normal distribution conditions. The data were obtained via three covariance matrices (unstructured, heterogeneous first-order auto-regressive, and random coefficients), the one with the best fit being selected according to the Akaike criterion. The results of the simulation study showed the Kenward-Roger test to be more robust, particularly when the population covariance matrices were unstructured or heterogeneous first-order auto-regressive. The main contribution of this study lies in showing that the Kenward–Roger method corrects the liberal Type I error rates obtained with the Between–Within and Satterthwaite approaches, especially with positive pairings between group sizes and covariance matrices.

Journal ArticleDOI
TL;DR: Pitman closeness of sample order statistics to population quantiles of a location-scale family of distributions is discussed and explicit expressions are derived for some specific families such as uniform, exponential, and power function.
Abstract: In this article, Pitman closeness of sample order statistics to population quantiles of a location-scale family of distributions is discussed. Explicit expressions are derived for some specific families such as uniform, exponential, and power function. Numerical results are then presented for these families for sample sizes n = 10,15, and for the choices of p = 0.10, 0.25, 0.75, 0.90. The Pitman-closest order statistic is also determined in these cases and presented.

Journal ArticleDOI
TL;DR: This study considers using an estimate of the variance of sample median and applying the bootstrap methods to determine the control limits of the median control chart.
Abstract: In this study, we propose a median control chart. In order to determine the control limits, we consider using an estimate of the variance of sample median. Also, we consider applying the bootstrap methods. Then we illustrate the proposed median control chart with an example and compare the bootstrap methods by simulation study. Finally, we discuss some peculiar features for the median control chart as concluding remarks.

Journal ArticleDOI
TL;DR: Power and size of some tests for exogeneity of a binary explanatory variable in count models by conducting extensive Monte Carlo simulations are investigated and it is indicated that often the tests that are simpler to estimate outperform Tests that are more demanding.
Abstract: This article investigates power and size of some tests for exogeneity of a binary explanatory variable in count models by conducting extensive Monte Carlo simulations. The tests under consideration are Hausman contrast tests as well as univariate Wald tests, including a new test of notably easy implementation. Performance of the tests is explored under misspecification of the underlying model and under different conditions regarding the instruments. The results indicate that often the tests that are simpler to estimate outperform tests that are more demanding. This is especially the case for the new test.

Journal ArticleDOI
TL;DR: Simulation results show that the proposed parametric bootstrap (PB) approach for the generalized variable test for equality of several inverse Gaussian means with unknown and arbitrary variances performs very satisfactorily regardless of the number of samples and sample sizes.
Abstract: The inverse Gaussian distribution provides a flexible model for analyzing positive, right-skewed data. The generalized variable test for equality of several inverse Gaussian means with unknown and arbitrary variances has satisfactory Type-I error rate when the number of samples (k) is small (Tian, 2006). However, the Type-I error rate tends to be inflated when k goes up. In this article, we propose a parametric bootstrap (PB) approach for this problem. Simulation results show that the proposed test performs very satisfactorily regardless of the number of samples and sample sizes. This method is illustrated by an example.

Journal ArticleDOI
TL;DR: A synthetic scaled weighted variance (synthetic SWV- ) control chart is proposed to monitor the process mean of skewed populations to improve the detection of a negative shift in the mean.
Abstract: In this article, a synthetic scaled weighted variance (synthetic SWV- ) control chart is proposed to monitor the process mean of skewed populations. This control chart is an improvement over the synthetic weighted variance (synthetic WV- ) chart suggested by Khoo et al. (2008), in the detection of a negative shift in the mean. A comparison between the performances of the synthetic SWV- and synthetic WV- charts are made in terms of the average run length (ARL) values for the various levels of skewnesses as well as different magnitudes of positive and negative shifts in the mean. A method to construct the synthetic SWV- chart is explained in detail. An illustrative example is also given to show the implementation of the synthetic SWV- chart.

Journal ArticleDOI
TL;DR: It is numerically shown that the biases of variance component estimates by PQL are systematically related to the bias of regression coefficient estimates byPQL, and also show that the biased estimates increase as random effects become more heterogeneous.
Abstract: The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized linear mixed model (GLMM). However, it has been noticed that the PQL tends to underestimate variance components as well as regression coefficients in the previous literature. In this article, we numerically show that the biases of variance component estimates by PQL are systematically related to the biases of regression coefficient estimates by PQL, and also show that the biases of variance component estimates by PQL increase as random effects become more heterogeneous.

Journal ArticleDOI
TL;DR: Assessment of the performance of asymptotic confidence intervals for Zenga's new inequality measure shows that the coverage accuracy and the size of the confidence interval for the two measures are very similar in samples from economic size distributions.
Abstract: This work aims at assessing, by simulation methods, the performance of asymptotic confidence intervals for Zenga's new inequality measure. The results are compared with those obtained on Gini's measure, perhaps the most widely used index for measuring inequality in income and wealth distributions. Our findings show that the coverage accuracy and the size of the confidence intervals for the two measures are very similar in samples from economic size distributions.

Journal ArticleDOI
TL;DR: The VP control scheme generally has better performance in detecting small mean shifts than the standard and the other adaptive charts, but when the observations are highly autocorrelated, the complexity of the VP chart gives a negative effect on the performance.
Abstract: The variable parameters (VP) control chart varies all control parameters from the current sample information, and results in more effective monitoring based on statistical and economic criteria. The usual assumption for designing a control chart is that the observations from the process are independent. However, for many processes, such as chemical processes, consecutive measurements are often highly correlated. In the present article, the observations are modeled as an AR(1) process plus a random error, and the properties of the VP charts are evaluated and studied under this model. Based on the study, the VP control scheme generally has better performance in detecting small mean shifts than the standard and the other adaptive charts. However, when the observations are highly autocorrelated, the complexity of the VP chart gives a negative effect on the performance.

Journal ArticleDOI
TL;DR: This article has shown that both alternative optimal thresholds by using the true rate are the identical, and this single threshold coincides with the score corresponding to Kolmogorov–Smirnov statistic used to test the homogeneous distribution functions of the defaults and non defaults.
Abstract: Receiver Operating Characteristic (ROC) and Cumulative Accuracy Profile (CAP) curves are used to assess the discriminatory power of different credit-rating approaches. The thresholds of optimal classification accuracy on an ROC curve and of maximal profit on a CAP curve can be found by using iso-performance tangent lines, which are based on the standard notion of accuracy. In this article, we propose another accuracy measure called the true rate. Using this rate, one can obtain alternative optimal thresholds on both ROC and CAP curves. For most real populations of borrowers, the number of the defaults is much less than that of the non defaults, and in such cases using the true rate may be more efficient than using the accuracy rate in terms of cost functions. Moreover, it is shown that both alternative optimal thresholds by using the true rate are the identical, and this single threshold coincides with the score corresponding to Kolmogorov–Smirnov statistic used to test the homogeneous distribution functi...

Journal ArticleDOI
TL;DR: This article proposes the application of the Chen (1995) t-test modification to the EL ratio test and displays that the Chen approach leads to a location change of observed data whereas the classical Bartlett method is known to be a scale correction of the data distribution.
Abstract: The empirical likelihood (EL) technique has been well addressed in both the theoretical and applied literature in the context of powerful nonparametric statistical methods for testing and interval estimations. A nonparametric version of Wilks theorem (Wilks, 1938) can usually provide an asymptotic evaluation of the Type I error of EL ratio-type tests. In this article, we examine the performance of this asymptotic result when the EL is based on finite samples that are from various distributions. In the context of the Type I error control, we show that the classical EL procedure and the Student's t-test have asymptotically a similar structure. Thus, we conclude that modifications of t-type tests can be adopted to improve the EL ratio test. We propose the application of the Chen (1995) t-test modification to the EL ratio test. We display that the Chen approach leads to a location change of observed data whereas the classical Bartlett method is known to be a scale correction of the data distribution. Finally,...

Journal ArticleDOI
TL;DR: In general, the simulation results show that the proposed chart performs better than the existing multivariate charts for skewed populations and the standard T 2 chart, in terms of false alarm rates as well as moderate and large mean shift detection rates based on the various degrees of skewnesses.
Abstract: This article proposes a multivariate synthetic control chart for skewed populations based on the weighted standard deviation method. The proposed chart incorporates the weighted standard deviation method into the standard multivariate synthetic control chart. The standard multivariate synthetic chart consists of the Hotelling's T 2 chart and the conforming run length chart. The weighted standard deviation method adjusts the variance–covariance matrix of the quality characteristics and approximates the probability density function using several multivariate normal distributions. The proposed chart reduces to the standard multivariate synthetic chart when the underlying distribution is symmetric. In general, the simulation results show that the proposed chart performs better than the existing multivariate charts for skewed populations and the standard T 2 chart, in terms of false alarm rates as well as moderate and large mean shift detection rates based on the various degrees of skewnesses.

Journal ArticleDOI
TL;DR: A nonlinear logistic discriminant model is introduced based on Gaussian basis functions constructed by the self-organizing map based on information-theoretic and Bayesian approaches for multi-class classification methods for analyzing data with complex structure.
Abstract: We consider the problem of constructing multi-class classification methods for analyzing data with complex structure. A nonlinear logistic discriminant model is introduced based on Gaussian basis functions constructed by the self-organizing map. In order to select adjusted parameters, we employ model selection criteria derived from information-theoretic and Bayesian approaches. Numerical examples are conducted to investigate the performance of the proposed multi-class discriminant procedure. Our modeling procedure is also applied to protein structure recognition in life science. The results indicate the effectiveness of our strategy in terms of prediction accuracy.

Journal ArticleDOI
TL;DR: This work uses a simulation study to investigate the performance of several sequential regression imputation methods when the error distribution is flat or heavy tailed, and suggests that all methods can have poor performances for the regression coefficient because they cannot accommodate the extreme values well.
Abstract: Sequential regression multiple imputation has emerged as a popular approach for handling incomplete data with complex features. In this approach, imputations for each missing variable are produced based on a regression model using other variables as predictors in a cyclic manner. Normality assumption is frequently imposed for the error distributions in the conditional regression models for continuous variables, despite that it rarely holds in real scenarios. We use a simulation study to investigate the performance of several sequential regression imputation methods when the error distribution is flat or heavy tailed. The methods evaluated include the sequential normal imputation and its several extensions which adjust for non normal error terms. The results show that all methods perform well for estimating the marginal mean and proportion, as well as the regression coefficient when the error distribution is flat or moderately heavy tailed. When the error distribution is strongly heavy tailed, all methods ...

Journal ArticleDOI
TL;DR: Characteristics of the data, such as the covariance structure, parameter values, and sample size, greatly impacted performance of various model selection criteria and no one criterion was consistently better than the others.
Abstract: Predictive criteria, including the adjusted squared multiple correlation coefficient, the adjusted concordance correlation coefficient, and the predictive error sum of squares, are available for model selection in the linear mixed model. These criteria all involve some sort of comparison of observed values and predicted values, adjusted for the complexity of the model. The predicted values can be conditional on the random effects or marginal, i.e., based on averages over the random effects. These criteria have not been investigated for model selection success. We used simulations to investigate selection success rates for several versions of these predictive criteria as well as several versions of Akaike's information criterion and the Bayesian information criterion, and the pseudo F-test. The simulations involved the simple scenario of selection of a fixed parameter when the covariance structure is known. Several variance–covariance structures were used. For compound symmetry structures, higher success r...

Journal ArticleDOI
TL;DR: This article presents a parametric bootstrap (PB) approach and compares its performance to that of another simulation-based approach, namely, the generalized variable approach to testing equality of regression coefficients in several regression models.
Abstract: Testing equality of regression coefficients in several regression models is a common problem encountered in many applied fields. This article presents a parametric bootstrap (PB) approach and compares its performance to that of another simulation-based approach, namely, the generalized variable approach. Simulation studies indicate that the PB approach controls the Type I error rates satisfactorily regardless of the number of regression models and sample sizes whereas the generalized variable approach tends to be very liberal as the number of regression models goes up. The proposed PB approach is illustrated using a data set from stability study.

Journal ArticleDOI
TL;DR: The simulation results show that if the SDs are missing under Missing Completely at Random and Missing at Random mechanism, imputation is recommended and with non random missing, imputations can lead to overestimation of the SE of the estimate.
Abstract: A common problem in the meta analysis of continuous data is that some studies do not report sufficient information to calculate the standard deviation (SDs) of the treatment effect. One of the approaches in handling this problem is through imputation. This article examines the empirical implications of imputing the missing SDs on the standard error (SE) of the overall meta analysis estimate. The simulation results show that if the SDs are missing under Missing Completely at Random and Missing at Random mechanism, imputation is recommended. With non random missing, imputation can lead to overestimation of the SE of the estimate.

Journal ArticleDOI
TL;DR: An alternative test of discordancy in samples of univariate circular data is presented based on the effect of existence of an outlier on the summation of circular distances of the point of interest to all other points and performs relatively better than other known tests.
Abstract: In this article, we present an alternative test of discordancy in samples of univariate circular data. The new technique is based on the effect of existence of an outlier on the summation of circular distances of the point of interest to all other points. The percentage points are calculated and the performance is examined. We compare the performance of the test in detecting an outlier with other tests and show that the new approach performs relatively better than other known tests. As an illustration a practical example is presented.

Journal ArticleDOI
Marco Bee1
TL;DR: A defensive mixture is used, and a method of choosing the parameters via the EM algorithm is developed, and the technique which assumes the importance sampling density to belong to the same parametric family of the random variables to be summed is considered.
Abstract: In this paper we use Importance Sampling to estimate tail probabilities for a finite sum of lognormal distributions. We use a defensive mixture, and develop a method of choosing the parameters via the EM algorithm; we also consider the technique which assumes the importance sampling density to belong to the same parametric family of the random variables to be summed. In both cases, the instrumental density is found by minimizing Cross-Entropy. A comparison based on several simulation experiments shows that the defensive mixture has the best performance. Finally, we study the Poisson-lognormal compound distribution framework and present a real-data application.