scispace - formally typeset
Search or ask a question

Showing papers in "Communications in Statistics-theory and Methods in 2003"


Journal ArticleDOI
Kejian Liu1
TL;DR: In this paper, the shrinkage parameter selected by existing methods for ridge regression may not fully address the ill conditioning problem, and a new two-parameter estimator is proposed to solve this problem.
Abstract: Linear regression model and least squares method are widely used in many fields of natural and social sciences. In the presence of collinearity, the least squares estimator is unstable and often gives misleading information. Ridge regression is the most common method to overcome this problem. We find that when there exists severe collinearity, the shrinkage parameter selected by existing methods for ridge regression may not fully address the ill conditioning problem. To solve this problem, we propose a new two-parameter estimator. We show using both theoretic results and simulation that our new estimator has two advantages over ridge regression. First, our estimator has less mean squared error (MSE). Second, our estimator can fully address the ill conditioning problem. A numerical example from literature is used to illustrate the results.

250 citations


Journal ArticleDOI
TL;DR: Shaked, M., Shanthikumar, J. as discussed by the authors defined some new classes of distributions based on the random variable X t and study their interrelations, and established its relationship with the reversed hazard rate ordering.
Abstract: If the random variable X denotes the lifetime (X ≥ 0, with probability one) of a unit, then the random variable X t = (t − X|X ≤ t), for a fixed t > 0, is known as `time since failure', which is analogous to the residual lifetime random variable used in reliability and survival analysis. The reversed hazard rate function, which is related to the random variable X t , has received the attention of many researchers in the recent past [(cf. Shaked, M., Shanthikumar, J. G., 1994). Stochastic Orders and Their Applications. New York: Academic Press]. In this paper, we define some new classes of distributions based on the random variable X t and study their interrelations. We also define a new ordering based on the mean of the random variable Xt and establish its relationship with the reversed hazard rate ordering.

144 citations


Journal ArticleDOI
TL;DR: The important extension of the Weibull family is reviewed with various new statistical measures in this article, where an explicit expression for the mode is derived and a comparison between the authors'and Mudholkar and Hutson's results are tabulated for various values of the parameters of the distribution.
Abstract: The important extension of the Weibull family—the exponentiated Weibull distribution—is reviewed with various new statistical measures An explicit expression for the mode is derived and a comparison between the authors'and Mudholkar and Hutson's results are tabulated for various values of the parameters of the distribution A general formula for the mean residual life function is obtained

138 citations


Journal ArticleDOI
Dilip Roy1
TL;DR: In this paper, the authors proposed a discrete version of the continuous normal distribution for stochastic models of complex multicomponent systems made of normal variates and established a direct link between the discrete normal distribution and its continuous counterpart.
Abstract: The normal distribution has been playing a key role in stochastic modeling for a continuous setup. But its distribution function does not have an analytical form. Moreover, the distribution of a complex multicomponent system made of normal variates occasionally poses derivational difficulties. It may be worth exploring the possibility of developing a discrete version of the normal distribution so that the same can be used for modeling discrete data. Keeping in mind the above requirement we propose a discrete version of the continuous normal distribution. The Increasing Failure Rate property in the discrete setup has been ensured. Characterization results have also been made to establish a direct link between the discrete normal distribution and its continuous counterpart. The corresponding concept of a discrete approximator for the normal deviate has been suggested. An application of the discrete normal distributions for evaluating the reliability of complex systems has been elaborated as an alte...

136 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compare the generalized ridge regression estimator with the generalized Liu estimator in the matrix mean square error sense in the linear regression model and show that the improved ridge and Liu estimators outperform the ordinary least squares estimates.
Abstract: Consider the linear regression model in the usual notation. In the presence of multicollinearity certain biased estimators like the ordinary ridge regression estimator and the Liu estimator introduced by Liu (Liu, Ke Jian. (1993). A new class of biased estimate in linear regression. Communications in Statistics-Theory and Methods 22(2):393–402) or improved ridge and Liu estimators are used to outperform the ordinary least squares estimates in the linear regression model. In this article we compare the (almost unbiased) generalized ridge regression estimator with the (almost unbiased) generalized Liu estimator in the matrix mean square error sense.

115 citations


Journal ArticleDOI
TL;DR: In this article, the choice sets in the D-optimal design for a choice experiment for testing main effects and two-factor interactions were established, when there are k attributes, each with two levels, for choice set size m.
Abstract: In this article we establish the choice sets in the D-optimal design for a choice experiment for testing main effects and for testing main effects and two-factor interactions, when there are k attributes, each with two levels, for choice set size m. We also give a method to construct optimal and near-optimal designs with small numbers of choice sets.

106 citations


Journal ArticleDOI
TL;DR: In this article, it is observed that for a given gamma distribution there exists a generalized exponential distribution so that the two distribution functions are almost identical, and for all practical purposes it is possible to generate approximate gamma random numbers using generalized exponential distributions.
Abstract: Recently a new distribution, named as generalized exponential distribution or exponentiated exponential distribution was introduced and studied quite extensively by the authors. It is observed that the generalized exponential distribution can be used as an alternative to the gamma distribution in many situations. Different properties like monotonicity of the hazard functions and tail behaviors of the gamma distribution and the generalized exponential distribution are quite similar in nature, but the later one has a nice compact distribution function. It is observed that for a given gamma distribution there exists a generalized exponential distribution so that the two distribution functions are almost identical. Since the gamma distribution function does not have a compact form, efficiently generating gamma random numbers is known to be problematic. We observe that for all practical purposes it is possible to generate approximate gamma random numbers using generalized exponential distribution and ...

85 citations


Journal ArticleDOI
TL;DR: In this paper, a procedure for the estimation of probability density functions of positive random variables by its fractional moments is presented, where all the available information is provided by population fractional moment.
Abstract: A procedure for the estimation of probability density functions of positive random variables by its fractional moments, is presented. When all the available information is provided by population fractional moments a criterion of choosing fractional moments themselves is detected. When only a sample is known, Jaynes' maximum entropy procedure and the Akaike's estimation procedure are joined together for determining respectively, what and how many sample fractional moments have to be used in the estimation of the density. Some numerical experiments are provided.

85 citations


Journal ArticleDOI
TL;DR: In this article, a simple technique based on a control chart approach is proposed for selecting the number of principal components to retain for principal component analysis, which accounts for the sampling variability which can lead to the selection of components that are not in fact statistically significant.
Abstract: A vast literature has been devoted to the assessment of the proper number of eigenvalues that have to be retained in Principal Components Analysis. Most of the publications are based on either distributional assumptions for the underlying populations or on empirical evident. In addition, techniques that are based on bootstrap or cross-validatory techniques have been proposed despite the computational effort implied. In this paper a simple technique based on a control chart approach is proposed for selecting the number of principal components to retain for the analysis. This approach accounts for the sampling variability which can lead to the selection of components that are not in fact statistically significant. The method is compared with other methods and is found to be superior regardless of the underlying distributional properties of the population as well as the existing structure. An illustrative example is provided.

65 citations


Journal ArticleDOI
TL;DR: In this article, the authors present various diagnostic methods for elliptical multivariate regression models and show that the expressions and distribution of some usual standardized residuals are invariant in the class of Elliptical models.
Abstract: In this paper we present various diagnostic methods for elliptical multivariate regression models. We show that the expressions, and consequently the distribution of some usual standardized residuals, are invariant in the class of Elliptical models. This invariance is also verified for some influence measures of dropping observations, such as the Cook's distance. We also discuss the computation of the likelihood displacement as well as the normal curvature in the local influence method. An example with real data is given for illustration.

57 citations


Journal ArticleDOI
TL;DR: In this paper, the run-length distribution for a Shewhart chart with runs and scans rules is computed, and the results show that the run length distribution is highly skewed.
Abstract: In order to increase the power of the classical Shewhart control charts for detecting small shift, several supplementary rules based on runs and scans were introduced by the Western Electric Company in 1956. In this article we introduce a new method for computing the run-length distribution for a Shewhart chart with runs and scans rules. Our method yields an exact expression for the run-length generating function. We can then use either one of two techniques for extracting the probability function. One leads to recursive formulas and the other to non-recursive formulas. We investigate the performance of some popular runs and scans rules and show that the run-length distribution is highly skewed. Comparing the entire distributions of different rules, rather than simply the widely-used expectations (ARLs), leads to important new conclusions on the advantages of applying each of these rules vs. using a simple chart. Finally, we introduce a Web application that incorporates these theoretical results ...

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a procedure that keeps the powerful features of previous methods but improves the initial parameter estimate, avoids the confusion between innovative outliers and level shifts and includes joint tests for sequences of additive outliers in order to solve the masking problem.
Abstract: There are three main problems in the existing procedures for detecting outliers in ARIMA models. The first one is the biased estimation of the initial parameter values that may strongly affect the power to detect outliers. The second problem is the confusion between level shifts and innovative outliers when the series has a level shift. The third problem is masking. We propose a procedure that keeps the powerful features of previous methods but improves the initial parameter estimate, avoids the confusion between innovative outliers and level shifts and includes joint tests for sequences of additive outliers in order to solve the masking problem. A Monte Carlo study and one example of the performance of the proposed procedure are presented.

Journal ArticleDOI
TL;DR: In this article, Chen et al. proposed a two-parameter model that can be used to model bathtub-shaped failure rate and the asymptotic confidence intervals for the parameters are also derived from the Fisher information matrix.
Abstract: Recently, Chen (Chen, Z. (2000). A new two-parameter lifetime distribution with bathtub-shape or increasing failure rate function. Statistics & Probability Letters 49:155–161.) proposed a two-parameter model that can be used to model bathtub-shaped failure rate. Although this model has several interesting properties, it does not contain a scale parameter and hence not flexible in modeling real data. A generalized model including the scale parameter has shown to be interesting and it has the traditional Weibull distribution as an asymptotic case. In this article, a detailed analysis of this model is presented. Shapes of the density and failure rate function are studied. The asymptotic confidence intervals for the parameters are also derived from the Fisher information matrix. The likelihood ratio test is applied to test the goodness of fit of Weibull extension model. Some examples are shown to illustrate the application of the model and analysis.

Journal ArticleDOI
TL;DR: In this paper, zero-inflated distributions (ZID) are studied from the Bayesian point of view using the data augmentation algorithm using the zero inflated Poisson distribution and an illustrative example via MCMC algorithm.
Abstract: In this paper zero-inflated distributions (ZID) are studied from the Bayesian point of view using the data augmentation algorithm. This type of discrete model arises in count data with excess of zeros. The zero-inflated Poisson distribution (ZIP) and an illustrative example via MCMC algorithm are considered.

Journal ArticleDOI
TL;DR: A generalization of Chauvenet's test suitable to applied the problem of detecting r outliers in an univariate data set is proposed in this paper, where the authors consider the exponential case.
Abstract: A generalization of Chauvenet's test (see Bol'shev, L. N. 1969. On tests for rejecting outlying observations. Trudy In-ta prikladnoi Mat. Tblissi Gosudart. univ. 2:159–177. (In Russian); Voinov, V. G., Nikulin, M. N. 1996. Unbaised Estimators and Their Applications. Vol. 2. Kluwer Academic Publishers.) suitable to applied the problem of detecting r outliers in an univariate data set is proposed. In the exponential case, the Chauvenet's test can be used. Various modifications of this test were considered by Bol'shev, Ibrakimov and Khalfina (Ibrakimov, I. A., Khalfina 1978. Some asymptotic results concerning the Chauvenet test. Ter. Veroyatnost. i Primenen. 23(3):593–597.), Greenwood and Nikulin (Greenwood, Nikulin, P. E. 1996. A Guide to Chi-Squared Testing. New York: John Wiley and Sons, Inc.) depending on the choice of the estimation method used: MLE or MVUE. As procedures for testing one outlier in exponential model have been investigated by a number of authors including Chikkagoudar and Kunchu...

Journal ArticleDOI
TL;DR: A general form of a family of bounded two-sided continuous distributions is introduced in this article, and the uniform and triangular distributions are possibly the simplest and best known members of this family.
Abstract: A general form of a family of bounded two-sided continuous distributions is introduced. The uniform and triangular distributions are possibly the simplest and best known members of this family. We also describe families of continuous distribution on a bounded interval generated by convolutions of these two-sided distributions. Examples of various forms of convolutions of triangular distributions are presented and analyzed.

Journal ArticleDOI
TL;DR: In this paper, the pseudo maximum likelihood (PML) method is used to estimate a multilevel linear model fitted to the dependent observations coming from a finite population, which is computationally simpler than the iterative procedures suggested in the literature.
Abstract: An application of the pseudo maximum likelihood method to estimation of a multilevel linear model fitted to the dependent observations coming from a finite population is demonstrated. The proposed approach provides a closed form solution for estimating of the model parameters. It is computationally simpler than the iterative procedures suggested in the literature (e.g., the iterative probability weighted least squares method of Pfeffermann et al. (Pfeffermann, D., Skinner, C.J., Holmes, D.J., Goldstein, H., Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of Royal Statistical Society B 60:23–40)). Issues related to model and sample design hierarchies and their impact on estimation are discussed. A problem of weighting at different levels is addressed. A small simulation study showed that the proposed procedure is efficient even for small within group sample sizes.

Journal ArticleDOI
TL;DR: Satten et al. as mentioned in this paper showed that the product-limit estimator (PLE) can be expressed as an inverse-probability-weighted average for left-truncation and right-censoring data.
Abstract: For randomly censored data, (Satten, G. A., Datta S. (2001). The Kaplan–Meier estimator as an inverse-probability-of-censoring weighted average. Amer. Statist. Ass. 55:207–210) showed that the Kaplan–Meier estimator (product-limit estimator (PLE)) can be expressed as an inverse-probability-weighted average. In this article, we consider the other two PLEs: the truncation PLE and the censoring-truncation PLE. For the data subject to left-truncation or both left-truncation and right-censoring, it is shown that these two PLEs can be expressed as inverse-probability-weighted averages.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed an alternative approach to the Welch F-test for unequal means in a one-way ANOVA, where the terms in the numerator sum of squares were weighted by the respective inverses of the sample mean variances.
Abstract: The classical F-test for unequal means in a one-way ANOVA is known to be misleading when the populations have different variances. To overcome this (James, G. S. (1951). The comparison of several groups of observations when the ratios of the population variances are unknown. Biometrika 38:324–329 and Welch, B. L. (1951). On the comparison of several mean values: an alternative approach. Biometrika 38:330–336.) weighted the terms in the numerator sum of squares by the respective inverses of the sample mean variances, and they proposed equivalent tests based on F or χ 2 approximations to the null distribution of the weighted sum of squares for moderate sample sizes. We provide approximations for the nonnull distributions of their weighted statistics which are found to be useful in obtaining approximations to the power of the Welch F-test.

Journal ArticleDOI
TL;DR: In this article, the authors characterize three types of possible heterogeneity of variance in nonlinear regression models (NLMs), one of which is to introduce a variance function for the model, which is the extension of Cook and Weisberg (1983) and the other two are based on the randomization of regres...
Abstract: Homogeneity of variance is one of standard assumptions in regression analysis. However, this assumption is not necessarily appropriate. Cook and Weisberg (Cook, R. D., Weisberg, S. (1983). Diagnostics for heteroscedasticity in regression. Biometrika 70:1–10) provided a score test for heteroscedasticity in linear regression. Smith and Heitjan (Smith, P. J., Heitjan, D. F. (1993). Testing and adjusting for departures from nominal dispersion in generalized linear models. Applied Statistics 42:31–41) proposed a method based on the randomization of regression coefficients for detecting departures from nominal dispersion in generalized linear models. This paper is devoted to the tests for non-constant variance in the framework of nonlinear regression models (NLMs). We characterize three types of possible heterogeneity of variance in NLMs. One type is to introduce a variance function for the model, which is the extension of Cook and Weisberg (1983). The other two are based on the randomization of regres...

Journal ArticleDOI
TL;DR: In this article, the authors used bilinear transformations to map points on the unit circle in the complex plane into points x on the real line and showed that x has a Cauchy distribution in (−∞, ∞) when α is uniformly distributed on (−π, π).
Abstract: We use bilinear transformations to map points z = cos(α) + isin(α) on the unit circle in the complex plane into points x on the real line. Given any density function g(α) on the interval (−π, π), we show how a corresponding density function f(x) on (−∞, ∞) is induced. When α is uniformly distributed on (−π, π), we show that x has a Cauchy distribution in (−∞, ∞). When g(α) = Kn (1 + cos(α)) n , we show that x has a t-distribution in (−∞, ∞).

Journal ArticleDOI
TL;DR: In this article, under the assumption of linear relationship between two variables, the authors provided alternative simple method of proving the existing result connecting correlation coefficient with those of skewness of response and explanatory variables.
Abstract: In this paper, under the assumption of linear relationship between two variables we provide alternative simple method of proving the existing result connecting correlation coefficient with those of skewness of response and explanatory variables. Further we have given a relationship between correlation coefficient and coefficient of kurtosis of response and explanatory variables assuming the linear relationship between the two variables. Simple alternative way of deriving the formula, which helps in finding the direction dependence in linear regression, is discussed.

Journal ArticleDOI
TL;DR: Salmaso et al. as mentioned in this paper provided a new exact solution for testing effects in replicated 2-k factorial designs within a nonparametric framework, which allows for testing separately all effects by shuffling the appropriate set of sufficient statistics.
Abstract: In this article, based on the notion of synchronized permutations (Salmaso, L. (2000). Orthogonal Two-Level Factorial Designs and Permutation Tests for Effects. Ph.D. thesis, Department of Statistics, University of Padova.), we provide a new full exact solution for testing effects in replicated 2 k factorial designs within a nonparametric framework. Synchronized permutations allow for testing separately all effects by shuffling the appropriate set of sufficient statistics. Such tests are exact, unbiased, and consistent. Furthermore, they are uncorrelated to each other and asymptotically at least as powerful as the parametric solution when the latter is appropriate. A simulation study shows that also for small sample sizes their power is close to that of parametric counterparts based on normality of errors. This approach preserves the exchangeability of error components for testing all 2 k -1effects.

Journal ArticleDOI
TL;DR: In this article, the authors derived the distribution of stock returns for a security in an upgrade (or downgrade) market with the assumption that the log stock returns of the market proxy follow a mixture of normal distributions.
Abstract: For some investments, the relation between stock returns and the market proxy is conventionally described by a linear regression model with the normality assumption. This paper derives the distribution of stock returns for a security in an upgrade (or downgrade) market with the assumption that the log stock returns of the market proxy follow a mixture of normal distributions. We discuss MLE and the method of moment estimation for parameters involved in the model. An analysis of stock data in Johannesburg Stock Exchange is included to illustrate the model. This note explains the phenomenon in financial analysis regarding the shape of the distribution of long-run stock returns limited on an upgrade or downgrade market index.

Journal ArticleDOI
TL;DR: Bhattacharyya and Soejoeti as discussed by the authors proved that the maximum likelihood estimate of the shape parameters is unique for the Weibull distribution in a multiple step-stress accelerated life test.
Abstract: Bhattacharyya and Soejoeti (Bhattacharyya, G. K., Soejoeti, Z. A. (1989). Tampered failure rate model for step-stress accelerated life test. Commun. Statist.—Theory Meth. 18(5):1627–1643.) pro- posed the TFR model for step-stress accelerated life tests. Under the TFR model, this article proves that the maximum likelihood estimate of the shape parameters is unique for the Weibull distribution in a multiple step-stress accelerated life test, and investigates the accuracy of the maximum likelihood estimate using the Monte-Carlo simulation.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the issue of choosing the right model through a simulation-based Bayesian study using the output from the Gibbs sampler and found that the two models quite nicely represent a given data set although the concerned analyses and the related inferential procedures may differ drastically.
Abstract: The Weibull and the lognormal distributions are the most widely used models for analyzing a variety of data from different fields. It is often seen that the two models quite nicely represent a given data set although the concerned analyses and the related inferential procedures may differ drastically. It is, therefore, highly desirable to study the behavior of the two models for a given set of observations in the light of recent tool-kits of model comparison/model choice. The article considers examining the issue of choosing the right model through a simulation based Bayesian study using the output from the Gibbs sampler.

Journal ArticleDOI
TL;DR: In this article, an iterative EM algorithm is presented to compute maximum likelihood estimates (MLEs) for the linear hazard rate distribution (LHRD) based on records and inter-record times.
Abstract: The linear hazard rate distribution (LHRD) is a two-parameter distribution that contains exponential and generalized Rayleigh distributions as special cases. It has applications in a number of fields including reliability improvement, life testing, and survival analysis. An iterative EM algorithm is presented to compute maximum likelihood estimates (MLEs) for the LHRD based on records and inter-record times. Simulation results indicate that the estimates obtained by maximum likelihood method are better than those obtained by the least-squares type estimation and by the elemental percentile method. We also evaluate the expected values and variances of the MLEs for various sample sizes in order to determine the unbiasing factors of the MLEs which can be utilized in performing tests of exponentiality and also for examining the appropriateness of Rayleigh model to data at hand.

Journal ArticleDOI
TL;DR: Barlow Richard, E., Marshall, A. W., Proschan, Frank, and Frank as discussed by the authors have developed models and bounds for monotone increasing failure rate models and have become one of the most important models of failure time for reliability practitioners to consider and use.
Abstract: Monotone failure rate models [Barlow Richard, E., Marshall, A. W., Proschan, Frank. (1963). Properties of probability distributions with monotone failure rate. Annals of Mathematical Statistics 34:375–389, and Barlow Richard, E., Proschan, Frank. (1965). Mathematical Theory of Reliability. New York: John Wiley & Sons, Barlow Richard, E., Proschan, Frank. (1966a). Tolerance and confidence limits for classes of distributions based on failure rate. Annals of Mathematical Statistics 37(6):1593–1601, Barlow Richard, E., Proschan, Frank. (1966b). Inequalities for linear combinations of order statistics from restricted families. Annals of Mathematical Statistics 37(6):1574–1592, Barlow Richard, E., Proschan, Frank. (1975). Statistical Theory of Reliability and Life Testing. New York: Holt, Rinehart and Winston, Inc.] have become one of the most important models of failure time for reliability practitioners to consider and use. The above authors also developed models and bounds for monotone increasing fa...

Journal ArticleDOI
TL;DR: In this article, the maximum likelihood estimators for the location and the scale parameters of a generalized t (GT) distribution with known shape parameters were investigated and the uniqueness of the estimators was established.
Abstract: In this paper, we consider the univariate generalized t (GT) distribution, which is introduced by McDonald and Newey ((1988) Partially adaptive estimation of regression models via the generalized t distribution. Econometric Theory 4:428–457.). We show that the maximum likelihood estimators for the location and the scale parameters of a GT distribution with known shape parameters can provide alternative robust estimators for the location and scale parameters of a data set. We investigate the existence and the uniqueness of the maximum likelihood estimators. We show that the likelihood function can be unimodal or multimodal depending on the different choices of the shape parameters.

Journal ArticleDOI
TL;DR: In this paper, the authors developed an empirical Bayes (EB) procedure to estimate p using a beta-type prior distribution and a squared-error loss function, which is preferred over the usual maximum likelihood estimator (MLE) for small group sizes and small p.
Abstract: Group testing has long been recognized as a safe and sensible alternative to one-at-a-time testing in applications wherein the prevalence rate p is small. In this article, we develop an empirical Bayes (EB) procedure to estimate p using a beta-type prior distribution and a squared-error loss function. We show that the EB estimator is preferred over the usual maximum likelihood estimator (MLE) for small group sizes and small p. In addition, we also discuss interval estimation and consider the use of other loss functions perhaps more appropriate in public health studies. The proposed methods are illustrated using group-testing data from a prospective hepatitis C virus study conducted in Xuzhou City, China.