scispace - formally typeset
Search or ask a question

Showing papers in "Journal of statistical theory and practice in 2018"


Journal ArticleDOI
TL;DR: In this article, the authors proposed a systematic sampling method for populations with a periodic component and showed that the efficiency of systematic sampling estimators is highly dependent on the relation between the length of the period and the sampling interval.
Abstract: Systematic sampling is one of the most prevalent sampling techniques. The popularity of the systematic design is mainly due to its practicality. Compared with simple random sampling, it is easier to draw a systematic sample, especially when the selection of sample units is done in the field. In addition, systematic sampling can provide more precise estimators than simple random sampling when explicit or implicit stratification is present in the sampling frame. However, the systematic design has two major drawbacks. First, if the population size is not an integral multiple of the desired sample size, the actual sample size will be random. Second, a single systematic sample cannot provide an unbiased estimator for the sampling variance. Another limitation in the systematic design is that for populations with a periodic component, the efficiency of systematic sampling estimators will be highly dependent on the relation between the length of the period and the sampling interval. In the literature, one...

29 citations


Journal ArticleDOI
TL;DR: In this work, a unified measure of model quality for a quantitative RRT model is proposed and several competing models with respect to this measure are compared.
Abstract: Model efficiency and respondent privacy are two important considerations while comparing two randomized response technique (RRT) models. Either the model efficiency level is kept fixed and the model offering greater privacy is preferred, or the privacy level is kept fixed and we prefer the model with better efficiency. In our current work, we propose a unified measure of model quality for a quantitative RRT model and compare several competing models with respect to this measure.

27 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed to use an expectation maximization (EM) algorithm to estimate the unknown parameters for the univariate and bivariate classes of discrete generalized exponential distributions.
Abstract: In 1997, Marshall and Olkin introduced a very powerful method to introduce an additional parameter to a class of continuous distribution functions that brings more flexibility to the model. They demonstrated their method for the exponential and Weibull classes. In the same paper they briefly indicated its bivariate extension. The main aim of this article is to introduce the same method, for the first time, to the class of discrete generalized exponential distributions both for the univariate and bivariate cases. We investigate several properties of the proposed univariate and bivariate classes. The univariate class has three parameters, whereas the bivariate class has five parameters. It is observed that depending on the parameter values, the univariate class can be zero inflated as well as heavy tailed. We propose to use an expectation-maximization (EM) algorithm to estimate the unknown parameters. Small simulation experiments have been performed to see the effectiveness of the proposed EM algorithm, and a bivariate data set has been analyzed; it is observed that the proposed models and the EM algorithm work quite well in practice.

20 citations


Journal ArticleDOI
TL;DR: In this paper, Stein and Chow this paper developed Stein type two-stage and Chow and Robbins type purely sequential strategies to estimate the unknown variance under a modified Linex loss function, and control the associated risk function per unit cost by bounding it from above with a fixed preassigned positive number.
Abstract: In a normal distribution with its mean unknown, we have developed Stein type two-stage and Chow and Robbins type purely sequential strategies to estimate the unknown variance under a modified Linex loss function. We control the associated risk function per unit cost by bounding it from above with a fixed preassigned positive number, . Under both proposed estimation strategies, we have emphasized (i) exact calculations of the distributions and moments of the stopping times as well as the biases and risks associated with our terminal estimators of , along with (ii) selected asymptotic properties. In developing asymptotic second-order properties under the purely sequential estimation methodology, we have relied upon nonlinear renewal theory. We report extensive data analysis carried out via (i) exact calculations as well as (ii) simulations when requisite sample sizes range from small to moderate to large. Both estimation methodologies have been implemented and illustrated with the help of real data ...

11 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a more efficient sampling method of recruiting subjects for survival analysis using a Moving Extreme Ranked Set Sampling (MERSS) or an Extreme Ranked set sampling (ERSS), with ranking based on an easy-to-evaluate baseline auxiliary variable known to be associated with survival time.
Abstract: Survival data are time-to-event data, such as time to death, time to appearance of a tumor, or time to recurrence of a disease. Accelerated failure time (AFT) models provide a linear relationship between the log of the failure time and covariates that affect the expected time to failure by contracting or expanding the time scale. The AFT model has intensive application in the field of social, medical, behavioral, and public health sciences. In this article we propose a more efficient sampling method of recruiting subjects for survival analysis. We propose using a Moving Extreme Ranked Set Sampling (MERSS) or an Extreme Ranked Set Sampling (ERSS) scheme with ranking based on an easy-to-evaluate baseline auxiliary variable known to be associated with survival time. This article demonstrates that these approaches provide a more powerful testing procedure, as well as a more efficient estimate of hazard ratio, than that based on simple random sampling (SRS). Theoretical derivation and simulation studies are provided. The Iowa 65+ Rural Health Study data are used to illustrate the methods developed in this article.

10 citations


Journal ArticleDOI
TL;DR: In this article, a multistage median ranked set sampling method was proposed to estimate the population ratio using simple random sampling, rank set sampling, and median set sampling methods, and the mean squared errors and bias equations of the suggested estimators were obtained.
Abstract: This article aims to estimate the population ratio using a multistage median ranked set sampling method. The mean squared errors and bias equations of the suggested estimators are obtained. The new estimators are compared with their counterparts using simple random sampling, ranked set sampling, and median ranked set sampling methods. A real data set is used for illustration. The results revealed that the multistage median ranked set sampling estimators are approximately unbiased in terms of the population ratio and their efficiencies increase in the number of stages for specific value of the sample size. Also, it is found that the multistage median ranked set sampling estimators are more efficient than their counterparts based on simple random sampling, ranked set sampling, and median ranked set sampling.

10 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that if the regression model assumed by the synthetic data producer is correctly specified, then synthetic data have the same joint distribution as the original data, and therefore one can use standard regression methodology and software to analyze the synthesized data.
Abstract: In this article we show, under the normal multiple linear regression model, how synthetic data can be generated using the principle of sufficiency. An advantage of this approach is that if the regression model assumed by the synthetic data producer is correctly specified, then the synthetic data have the same joint distribution as the original data, and therefore one can use standard regression methodology and software to analyze the synthetic data. If the same regression model used to generate the synthetic data is also used for data analysis, and the data are analyzed using standard regression methodology, then the synthetic data yield identical inference to that of the original data. We also study the effects of overfitting or under-fitting the linear regression model. We show that even if the data producer overspecifies the regression model when creating the synthetic data, the synthetic data will still have the same distribution as the original data, and hence valid inference can be obtained. However, if the data producer underspecifies the linear regression model, then one cannot expect to obtain valid inference from the synthetic data. The disclosure risk of the proposed method relative to a standard synthetic data method is also examined.

9 citations


Journal ArticleDOI
TL;DR: In this paper, a normal semiparametric mixture regression model is proposed for longitudinal data, which contains one smooth term and a set of possible linear predictors, and model terms are estimated using the penalized likelihood method with the EM algorithm.
Abstract: A normal semiparametric mixture regression model is proposed for longitudinal data. The proposed model contains one smooth term and a set of possible linear predictors. Model terms are estimated using the penalized likelihood method with the EM algorithm. A computationally feasible alternative method that provides an approximate solution is also introduced. Simulation experiments and a real data example are used to illustrate the methods.

9 citations


Journal ArticleDOI
TL;DR: In this article, the design performance of orthogonal arrays in which one or more runs are missing at random is considered, and the performance of the 18 run ternary arrays is investigated.
Abstract: This article considers the design performance of orthogonal arrays in which one or more runs are missing at random. We focus on orthogonal arrays of index unity and on the 18 run ternary arrays.

9 citations


Journal ArticleDOI
TL;DR: In this paper, the defective Dagum distribution (DDDDD) is introduced to accommodate survival data in the presence of a cure fraction, which is defined as the proportion of patients who are cured of disease and become long-term survivors.
Abstract: In this article we introduce a new distribution, namely, the defective Dagum distribution (DDD). This improper distribution can be seen as an extension of the Type I Dagum distribution and it is useful to accommodate survival data in the presence of a cure fraction. In the applications of survival methods to medical data, the cure fraction is defined as the proportion of patients who are cured of disease and become long-term survivors. The great advantage of the DDD is that the cure fraction can be written as a function of only one parameter. We also considered the presence of censored data and covariates. Maximum likelihood and Bayesian methods for estimation of the model parameters are presented. A simulation study is provided to evaluate the performance of the maximum likelihood method in estimating parameters. In the Bayesian analysis, posterior distributions of the parameters are estimated using the Markov-chain Monte Carlo (MCMC) method. An example involving a real data set is presented. The model based on the new distribution is easy to use and it is a good alternative for the analysis of real time-to-event data in the presence of censored information and a cure fraction.

9 citations


Journal ArticleDOI
TL;DR: In this article, the authors considered the asymptotic distribution of Hotelling's T2-type test statistic when a two-step monotone missing data set is drawn from a multivariate normal population under a large-sample asymmptotic framework.
Abstract: In this article, we consider the asymptotic distribution of Hotelling’s T2-type test statistic when a two-step monotone missing data set is drawn from a multivariate normal population under a large-sample asymptotic framework. In particular, asymptotic expansions for the distribution and upper percentiles are derived using a perturbation method up to the terms of order n−1, where n = N-2 and N denotes the total sample size. Furthermore, making use of Fujikoshi’s transformations, we also have Bartlett-type corrections of the test statistic considered in this article. Finally, we investigate the performance of the proposed approximation to the upper percentiles and Bartletttype correction for the test statistic by conducting Monte Carlo simulations for some selected parameters.

Journal ArticleDOI
TL;DR: In this paper, a progressive type-II censored sample from the generalized Pareto (GP) distribution is used to predict the remaining level exceedances by the River Nidd in North Yorkshire, England.
Abstract: The prediction of the unobserved units is typically based on the derivations of the predictive distributions of the individual observations. This technique is of little interest when one wishes to predict a function of missing or unobserved data such as the remaining testing time. In this article, based on a progressive type-II censored sample from the generalized Pareto (GP) distribution, we consider the problem of predicting times to failure of units in multiple stages. Importance sampling is used to estimate the model parameters, and Gibbs and Metropolis samplers are used to predict the testing times of the removed unfailed units. Data analyses involving the water-level exceedances by the River Nidd in North Yorkshire, England, have been performed and predictions of the total remaining level exceedances are discussed.

Journal ArticleDOI
TL;DR: In this paper, a bias-corrected predictor for small-area quantities under a log-transformed Fay-Herriot model is proposed. But, the accuracy of the predictor is not improved.
Abstract: Area-level models are often used for small-area estimation when auxiliary data are available only as area aggregates. A popular area-level model used for small-area estimation is the Fay-Herriot model. In many small-area applications, the Fay-Herriot model is often fitted on a logarithm (log) scale and model parameters are estimated under this model. This is followed by back-transformation to obtain the estimates for small-area quantities in the original scale. However, back-transformation leads to biased estimates of small-area quantities. This article develops a bias-corrected predictor for small-area quantities under a log-transformed Fay-Herriot model. The empirical results-based simulation studies show that the bias-corrected small-area predictor has both smaller bias and better efficiency as compared to the existing small-area predictors.

Journal ArticleDOI
TL;DR: Both Bayesian and frequentist approaches using a data-dependent Jeffreys-type prior to handle the monotone partial likelihood problem are developed and an efficient Markov-chain Monte Carlo algorithm is developed to carry out posterior computation.
Abstract: In medical studies, the monotone partial likelihood is frequently encountered in the analysis of time-to-event data using the Cox model. For example, with a binary covariate, the subjects can be classified into two groups. If the event of interest does not occur (zero event) for all the subjects in one of the groups, the resulting partial likelihood is monotone and consequently, the covariate effects are difficult to estimate. In this article, we develop both Bayesian and frequentist approaches using a data-dependent Jeffreys-type prior to handle the monotone partial likelihood problem. We first carry out an in-depth examination of the conditions of the monotone partial likelihood and then characterize sufficient and necessary conditions for the propriety of the Jeffreys-type prior. We further study several theoretical properties of the Jeffreys-type prior for the Cox model. In addition, we propose two variations of the Jeffreys-type prior: the shifted Jeffreys-type prior and the Jeffreys-type prior based on the first risk set. An efficient Markov-chain Monte Carlo algorithm is developed to carry out posterior computation. We perform extensive simulations to examine the performance of parameter estimates and demonstrate the applicability of the proposed method by analyzing real data from the SEER prostate cancer study.

Journal ArticleDOI
TL;DR: In this paper, the hazard rate function of the Lomax-exponential distribution is used to model positive real data, which provides a good alternative for some existing life distributions.
Abstract: Due to the flexibility of the hazard rate function of the Lomax-exponential distribution, it provides a good alternative for some existing life distributions in modelling positive real data...

Journal ArticleDOI
TL;DR: It has been found that the elastic net model performs better than latent variable models when considering less than ten principal components for each method, but Y-aware principal component regression predicts more accurately and captures more of the desired structure when the number of principal components increases to 20.
Abstract: DNA methylation of specific dinucleotides has been shown to be strongly linked with tissue age. The goal of this research is to explore different analysis techniques for microarray data in order to...

Journal ArticleDOI
TL;DR: In this paper, the authors extended the traditional Cochran-Mantel-Haenszel (CMH) suite of tests applicable to tables of count data to test for higher order moment effects.
Abstract: The Cochran-Mantel-Haenszel (CMH) methodology is a suite of tests applicable to tables of count data. The traditional CMH tests assess association, mean, and correlation effects. Here, testing for mean effects is extended to testing for higher order moment effects. Of especial interest are dispersion effects, reflecting, for example, market segmentation in certain scenarios. Correlation testing is extended to testing for generalized correlations. Of especial interest are correlations of order (1, 2) and (2, 1), which may reveal umbrella effects.

Journal ArticleDOI
TL;DR: In this article, the asymptotic distributional behavior of a class of location-invariant reduced-bias tail index estimators is derived under a convenient third-order framework.
Abstract: Under a convenient third-order framework, the asymptotic distributional behavior of a class of location-invariant reduced-bias tail index estimators is derived. Such a class is based on the PORT methodology, with PORT standing for peaks over random thresholds, and combines a PORT version of one of the pioneering classes of minimum-variance reduced-bias tail index estimators with two classes of location invariant estimators of adequate second- order parameters, recently introduced in the literature. An application to simulated Student-t data and to the log-exchange rates of the Euro against the U.S. dollar and the Euro against the GB pound is also provided.

Journal ArticleDOI
TL;DR: In this paper, a smooth alternative to the normal distribution is specified using Legendre polynomials, and the score statistic is derived under two scenarios: a common smooth alternative across different groups, or different smooth alternatives across the different groups.
Abstract: This article investigates the problem of simultaneously testing the normality and homoscedasticity assumptions in a linear fixed effects model when we have grouped data. This has been facilitated by the assumption of a smooth alternative to the normal distribution. The smooth alternative is specified using Legendre polynomials, and the score statistic is derived under two scenarios: a common smooth alternative across the different groups, or different smooth alternatives across the different groups. A data-driven approach available in the literature is used for determining the order of the polynomials. For the null distribution of the score statistic, the accuracy of the asymptotic chi-squared distribution is numerically investigated under a one-way fixed effects model with balanced and unbalanced data. The results are illustrated with an example.

Journal ArticleDOI
TL;DR: In this paper, the equality of several quantiles based on progressive Type II censored samples when the populations have two-parameter exponential distributions was considered and four approaches were presented: an approximation test, a parametric bootstrap, a generalized test approach, and a fiducial approach.
Abstract: To compare several populations, comparison of particular quantiles of these populations may be more relevant in some contexts. In this article, we consider testing equality of several quantiles based on progressive Type II censored samples when the populations have two-parameter exponential distributions. There is no approach for this problem in the literature. Here, we present four approaches: an approximation test, a parametric bootstrap, a generalized test approach, and a fiducial approach. The actual sizes and powers of these approaches are compared using Monte Carlo simulation. At the end, two examples are proposed to illustrate these approaches.

Journal ArticleDOI
TL;DR: In this paper, a new flexible family of distributions defined by means of a quantile function is introduced, which is the sum of quantile functions of the half logistic and expander functions.
Abstract: This article introduces a new flexible family of distributions, defined by means of a quantile function. The quantile function proposed is the sum of quantile functions of the half logistic and exp...

Journal ArticleDOI
TL;DR: Data do not support the association between aegeline and acute hepatitis and liver failure suggested by the data in Hawaii and a small number of spontaneous adverse reports in the mainland United States.
Abstract: The Centers for Disease Control and Prevention (CDC) identified an outbreak of acute hepatitis and liver failure in Hawaii in which a large proportion of patients were using the dietary supplement OxyELITE Pro, which contained aegeline, an extract from the bael tree and fruit. In response to Food and Drug Administration (FDA) regulatory action, USPLabs voluntarily recalled all OxyELITE Pro products and destroyed its remaining stock. To date, the majority of attention has focused on Hawaii despite the fact that the majority of the sales were in the mainland United States. Rates of acute hepatitis and liver failure were compared in equivalent 10-month periods before and after the introduction of aegeline to the product in the entire United States, prior to media attention related to the reported outbreak. The association between sales and liver injury was examined at the state level. Claims for ICD-9 code 570 (acute necrosis of the liver) obtained from both private insurance claims (MarketScan) and Medicaid were analyzed using mixed-effects Poisson regression models. Patients with private health insurance revealed a significant decrease in rates of acute hepatitis and liver failure following the introduction of aegeline in the United States and no association with per-capita sales. Patients with Medicaid showed no change in rate of acute hepatitis and liver failure following the introduction of aegeline. These data do not support the association between aegeline and acute hepatitis and liver failure suggested by the data in Hawaii and a small number of spontaneous adverse reports in the mainland United States.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new integer-valued autoregressive process with generalized Poisson difference marginal distributions based on difference of two quasi-binomial thinning operators.
Abstract: In this article, we propose a new integer-valued autoregressive process with generalized Poisson difference marginal distributions based on difference of two quasi-binomial thinning operators. This model is suitable for data sets on ℤ = {..., -2, -1, 0, 1, 2,...} and can be viewed as a generalization of the Poisson difference INAR(1) process. An advantage of the difference of two generalized Poisson random variables is it can have longer or shorter tails compared to the Poisson difference distribution. We present some basic properties of the process like mean, variance, skewness, and kurtosis, and conditional properties of the process are derived. Yule-Walker estimators are considered for the unknown parameters of the model and a Monte Carlo simulation is presented to study the performance of estimators. An application to a real data set is discussed to show the potential for practice of our model.

Journal ArticleDOI
TL;DR: In this paper, a generalized estimator for finite population mean in stratified random sampling when observations are contaminated with measurement errors is introduced, and the bias and mean square error of the proposed family of estimators are derived.
Abstract: A generalized estimator is introduced for finite population mean in stratified random sampling when observations are contaminated with measurement errors. Many special cases of the proposed estimator are possible. The bias and mean square error of the proposed family of estimators are derived. The performance of the proposed estimator is evaluated both theoretically and empirically in the presence and absence of measurement error.

Journal ArticleDOI
TL;DR: In this paper, the authors considered the statistical inference of the six-parameter McDonald extended Weibull distribution (McEW) based on the progressively Type-II censored sample.
Abstract: In recent years, there have been many efforts to develop a new statistical distribution with more flexibility that can be fitted well to complex data. In this article we consider the statistical inference of the six-parameter McDonald extended Weibull distribution (McEW) based on the progressively Type-II censored sample. The maximum likelihood estimates (MLEs) of the six parameters and their asymptotic distribution are obtained. Based on the asymptotic distribution, the asymptotic confidence limits of its parameters can be computed. We also propose bootstrap confidence intervals of the parameters. The Bayes estimates and the associated highest posterior density credible intervals are computed using the Markov-chain Monte Carlo (MCMC) method, including the Gibbs sampling technique and Metropolis-Hastings algorithm. Simulation experiments are performed to compare the proposed methods and the corresponding confidence intervals under the different censoring schemes. Finally, concluding remarks are given.

Journal ArticleDOI
TL;DR: For large-scale one-sided sequential tests from independent panels, the detection of sparse signals is considered when the signals only appear in a small portion of panels as mentioned in this paper, where sparse signals are detected by treating the signals as Gaussian noise.
Abstract: For large-scale one-sided sequential tests from independent panels, the detection of sparse signals is considered when the signals only appear in a small portion of panels. By treating the ...

Journal ArticleDOI
TL;DR: The RB-G family of distributions as discussed by the authors is a special class of univariate distributions, which have the same parameters of the baseline distribution plus an additional positive shape parameter a. In this paper, we focus on the characterizations of this family and discuss some structural properties.
Abstract: Over the last few decades, a significant development has been made toward the augmentation of some well-known lifetime distributions by various strategies. These newly developed models have enjoyed a considerable amount of success in modeling various real life phenomena. Motivated by this, Ristic and Balakrishnan developed a special class of univariate distributions. We call this family of distribution the RB-G family of distributions. The RB-G family has the same parameters of the baseline distribution plus an additional positive shape parameter a. Several RB-G distributions can be obtained from a specified G distribution. For , the baseline G distribution is a basic exemplar of the RB-G family with a continuous crossover toward cases with various shapes. In this article we focus our attention on the characterizations of this family and discuss some structural properties of the bivariate RB-G family of distributions that are not discussed in detail by Ristic and Balakrishnan.

Journal ArticleDOI
TL;DR: For three-level designs, the general minimum lower order confounding (GMC) criterion aims to choose optimal designs by treating aliased component-number pattern (ACNP) as a set.
Abstract: For three-level designs, the general minimum lower order confounding (GMC) criterion aims to choose optimal designs by treating aliased component-number pattern (ACNP) as a set. In this article, we develop some theoretical results of a three-level GMC criterion. The characterizations of three-level GMC designs are studied in terms of complementary sets. All GMC 3n–m designs with N = 3n–m runs and the factor number n = (N – 3r)/2 + i are constructed for r< n – m and i = 0,1,2,3. Furthermore, the confounding information of lower order component effects of GMC 3n–m designs is obtained.

Journal ArticleDOI
TL;DR: In this article, the second-order risk of estimators under the squared error loss function was studied and an improved estimator was proposed for the Weibull estimator with closed expressions.
Abstract: This article deals with improved estimation of a Weibull (a) shape parameter, (b) scale parameter, and (c) quantiles in a decision-theoretic setup. Though several convenient types of estimators have been proposed in the literature, we rely only on the maximum likelihood estimation of a parameter since it is based on the sufficient statistics (and hence there is no loss of information). However, the MLEs of the parameters just described do not have closed expressions, and hence studying their exact sampling properties analytically is impossible. To overcome this difficulty we follow the approach of second-order risk of estimators under the squared error loss function and study their second-order optimality. Among the interesting results that we have obtained, it has been shown that (a) the MLE of the shape parameter is always second-order inadmissible (and hence an improved estimator has been proposed); (b) the MLE of the scale parameter is always second-order admissible; and (c) the MLE of the p-t...

Journal ArticleDOI
TL;DR: In this article, an improved class of estimators for estimating the finite population mean is proposed using supplementary information on an auxiliary attribute, and the mathematical expressions for the bias and mean squared error of the proposed estimator are derived under the first order of the approximation using simple random sampling.
Abstract: In this article, an improved class of estimators for estimating the finite population mean is proposed using supplementary information on an auxiliary attribute. The mathematical expressions for the bias and mean squared error of the proposed estimator are derived under the first order of the approximation using simple random sampling. An empirical study is conducted to investigate the performances of the proposed and the existing estimators. It turns out that the proposed estimator is considerably more efficient than the recent estimators suggested by Koyuncu and by Haq and Shabbir.