scispace - formally typeset
Search or ask a question

Showing papers in "Austrian Journal of Statistics in 2016"


Journal ArticleDOI
TL;DR: In this article, the authors generalize the Rayleigh distribution using the quadratic rank transmutation map studied by Shaw et al. They provide a comprehensive description of the mathematical properties of the subject distribution along with its reliability behavior.
Abstract: In this article, we generalize the Rayleigh distribution using the quadratic rank transmutation map studied by Shaw et al. (2009) to develop a transmuted Rayleigh distribution. We provide a comprehensive description of the mathematical properties of the subject distribution along with its reliability behavior. The usefulness of the transmuted Rayleigh distribution for modeling data is illustrated using real data.

128 citations


Journal ArticleDOI
TL;DR: Fuzzy set theory has been used to combine statistical methods and fuzzy set theory, called fuzzy statistics as mentioned in this paper, and a lot of studies have been done to combine statistics and fuzzy sets.
Abstract: After introducing and developing fuzzy set theory, a lot of studies have been done to combine statistical methods and fuzzy set theory. Thisworks, called fuzzy statistics, have been developed in some branches. In this article we review essential works on fuzzy estimation, fuzzy hypotheses testing, fuzzy regression, fuzzy Bayesian statistics, and some relevant fields.

121 citations


Journal ArticleDOI
TL;DR: A model for analyzing multiple response models for count data and that may take into account complex correlation structures is developed, a discrete multivariate response approach regarding the left side of models equations.
Abstract: The aim of this paper is to develop a model for analyzing multiple response models for count data and that may take into account complex correlation structures. The model is specified hierarchically in several layers and can be used for sparse data as it is shown in the second part of the paper. It is a discrete multivariate response approach regarding the left side of models equations. Markov Chain Monte Carlo techniques are needed for extracting inferential results. The possible correlation between different counts is more general than the one used in repeated measurements or longitudinal studies framework.

103 citations


Journal ArticleDOI
TL;DR: In this article, exponential ratio and product estimators for estimating finite population mean using auxiliary information in double sampling and analyzes their properties are compared for their precision with simple mean per unit, usual double sampling ratio and estimators.
Abstract: This paper presents exponential ratio and product estimators for estimating finite population mean using auxiliary information in double sampling and analyzes their properties. These estimators are compared for their precision with simple mean per unit, usual double sampling ratio and product estimators. An empirical study is also carried out to judge the merits of the suggested estimators.

91 citations


Journal ArticleDOI
TL;DR: Three methods for the identification of multivariate outliers are compared based on the Mahalanobis distance that will be made resistant against outliers and model deviations by robust estimation of location and covariance.
Abstract: Three methods for the identification of multivariate outliers (Rousseeuw and Van Zomeren, 1990; Becker and Gather, 1999; Filzmoser et al, 2005) are compared They are based on the Mahalanobis distance that will be made resistant against outliers and model deviations by robust estimation of location and covariance The comparison is made by means of a simulation study Not only the case of multivariate normally distributed data, but also heavy tailed and asymmetric distributions will be considered The simulations are focused on low dimensional ( p = 5 ) and high dimensional ( p = 30 ) data

87 citations


Journal ArticleDOI
TL;DR: In this article, it is shown that under some mild assumptions, two scatter matrices may be used together to find the independent components, and that the scatter matrix must then have the so-called independence property.
Abstract: In the independent component analysis (ICA) it is assumed that the components of the multivariate independent and identically distributed observations are linear transformations of latent independent components The problem then is to find the (linear) transformation which transforms the observations back to independent components In the paper the ICA is discussed and it is shown that, under some mild assumptions, two scatter matrices may be used together to find the independent components The scatter matrices must then have the so called independence property The theory is illustrated by examples

85 citations


Journal ArticleDOI
TL;DR: The fuzzy quality is discussed and fuzzy process capability indices are introduced, where instead of precise quality the authors have two membership functions for specification limits, which are helpful for comparing manufacturing processes with fuzzy specification limits.
Abstract: Most of the traditional methods for assessing the capability of manufacturing processes are dealing with crisp quality. In this paper we discuss the fuzzy quality and introduce fuzzy process capability indices, where instead of precise quality we have two membership functions for specification limits. These indices are necessary when the specification limits are fuzzy and they are helpful for comparing manufacturing processes with fuzzy specification limits. Some interesting relations among the introduced indices are obtained. Numerical examples are given to clarify the method.

79 citations


Journal ArticleDOI
TL;DR: This paper compares the MCMC implementations for several spike and slab priors with regard to posterior inclusion probabilities and their sampling efficiency for simulated data and investigates posterior inclusion probability analytically for different slabs in two simple settings.
Abstract: An important task in building regression models is to decide which regressors should be included in the final model. In a Bayesian approach, variable selection can be performed using mixture priors with a spike and a slab component for the effects subject to selection. As the spike is concentrated at zero, variable selection is based on the probability of assigning the corresponding regression effect to the slab component. These posterior inclusion probabilities can be determined by MCMC sampling. In this paper we compare the MCMC implementations for several spike and slab priors with regard to posterior inclusion probabilities and their sampling efficiency for simulated data. Further, we investigate posterior inclusion probabilities analytically for different slabs in two simple settings. Application of variable selection with spike and slab priors is illustrated on a data set of psychiatric patients where the goal is to identify covariates affecting metabolism.

73 citations


Journal ArticleDOI
TL;DR: In this article, the mathematics of compositional analysis based on equivalence relation is presented, which is essential attributes of the corresponding quotient space and a logarithmic isomorphism between quotient spaces induces a metric space structure for compositions.
Abstract: The term compositional data analysis is historically associated to the approach based on the logratio transformations introduced in the eighties. Two main principles of this methodology are scale invariance and subcompositional coherence. New developments and concepts emerged in the last decade revealed the need to clarify the concepts of compositions, compositional sample space and subcomposition. In this work the mathematics of compositional analysis based on equivalence relation is presented. The two principles are essential attributes of the corresponding quotient space. A logarithmic isomorphism between quotient spaces induces a metric space structure for compositions. Using this structure, the statistical analysis of compositions consists of analysing logratio coordinates.

68 citations


Journal ArticleDOI
TL;DR: The approach illustrated here that merges Bayesian estimation with principles of Compositional data analysis should be generally useful for high-dimensional count compositional data of the type generated by high throughput sequencing.
Abstract: High throughput sequencing generates sparse compositional data, yet these datasets are rarely analyzed using a compositional approach. In addition, the variation inherent in these datasets is rarely acknowledged, but ignoring it can result in many false positive inferences. We demonstrate that examination of point estimates of the data can result in false positive results, even with appropriate zero replacement approaches, using an in vitro selection dataset with an outside standard of truth. The variation inherent in real high-throughput sequencing datasets is demonstrated, and we show that this varia- tion can be approximated, and hence accounted for, by Monte-Carlo sampling from the Dirichlet distribution. This approximation when used by itself is itself problematic, but becomes useful when coupled with a log-ratio approach commonly used in compositional data analysis. Thus, the approach illustrated here that merges Bayesian estimation with principles of compositional data analysis should be generally useful for high-dimensional count compositional data of the type generated by high throughput sequencing.

60 citations


Journal ArticleDOI
TL;DR: In this article, a continuous additive Weibull distribution is proposed and studied, and the structural properties of the new distribution including explicit expressions for the moments, random number generation and order statistics are derived.
Abstract: In this article a continuous distribution, the so-called transmuted additive Weibull distribution, that extends the additive Weibull distribution and some other distributions is proposed and studied. We will use the quadratic rank transmutation map proposed by Shaw and Buckley (2009) in order to generate the transmuted additiveWeibull distribution. Various structural properties of the new distribution including explicit expressions for the moments, random number generation and order statistics are derived. Maximum likelihood estimation of the unknown parameters of the new model for complete sample is also discussed. It will be shown that the analytical results are applicable to model real world data.

Journal ArticleDOI
TL;DR: Hubert et al. as mentioned in this paper compare three procedures for robust principal component analysis (PCA): projection pursuit, robust covariance estimation, and LTS-subspace estimator, and compare the results by means of a simulation study.
Abstract: In this paper we compare three procedures for robust Principal Components Analysis (PCA). The first method is called ROBPCA (see Hubert et al., 2005). It combines projection pursuit ideas with robust covariance estimation. The original algorithm for its computation is designed to construct an optimal PCA subspace of a fixed dimension k . If instead the optimal PCA subspace is searched within a whole range of dimensions k , this algorithm is not computationally efficient. Hence we present an adjusted algorithm that yields several PCA models in one single run. A different approach is the LTS-subspace estimator (see Wolbers, 2002; Maronna, 2005). It seeks for the subspace that minimizes an objective function based on the squared orthogonal distances of the observations to this subspace. It can be computed in analogy with the computation of the LTS regression estimator (see Rousseeuw and Van Driessen, 2000). The three approaches are compared by means of a simulation study.

Journal ArticleDOI
TL;DR: This article describes the possibility of AUC estimate with the use of web based application of bootstrap (resampling) and results indicate that usually bootstrap confidence intervals are narrower than nonparametric one, mainly for small data samples.
Abstract: The accuracy of binary discrimination models (discrimination between cases with and without any condition) is usually summarized by classification matrix (also called a confusion, assignment, or prediction matrix). Receiver operating characteristic (ROC) curve can visualize the association between probabilities of incorrect classification of cases from the group without condition (False Positives) versus the probabilities of correct classification of cases from the group with condition (True Positives) across all the possible cut-point values of discrimination score. Area under ROC curve (AUC) is one of summary measures. This article describes the possibility of AUC estimate with the use of web based application of bootstrap (resampling). Bootstrap is useful mainly to data for which any distributional assumptions are not appropriate. The quality of the bootstrap application was evaluated with the use of a special programme written in C#.NET language that allows to automate the process of repeating different experiments. Estimates of AUC and confidence limits given by bootstrap method were compared with bi-normal and nonparametric estimates. Results indicate that usually bootstrap confidence intervals are narrower than nonparametric one, mainly for small data samples.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new method for constructing simultaneous confidence intervals for all pairwise ratios of means of lognormal distributions based on FGPQ for vector parameters, which has satisfactory small sample performance and correct asymptotic coverage.
Abstract: In this paper, we construct Fiducial Generalized Confidence Intervals (FGCI) for ratio of means of two lognormal distributions based on independent observations from the two distributions. We compared the proposed method with another method, the Z-Score method. A simulation study showed that the FGCI method performs much better than the Z-Score method, especially for small and medium samples. We also prove that the confidence intervals constructed using FGCI method have correct asymptotic coverage. In this paper we propose a new method for constructing simultaneous confidence intervals for all pairwise ratios of means of lognormal distributions. Our approach is based on Fiducial Generalized Pivotal Quantities (FGPQ) for vector parameters. Simulation studies show that the constructed confidence intervals have satisfactory small sample performance. We also prove that they have correct asymptotic coverage. The result has applications in bioequivalence studies for comparing three or more drug formulations.

Journal ArticleDOI
TL;DR: A fuzzy test for testing statistical hypotheses about an imprecising parameter is proposed for the case when the available data are also imprecise, and leads not to a binary decision but to a fuzzy decision showing the degrees of acceptability of the null and alternative hypotheses.
Abstract: A fuzzy test for testing statistical hypotheses about an imprecise parameter is proposed for the case when the available data are also imprecise. The proposed method is based on the relationship between the acceptance region of statistical tests at level β and confidence intervals for the parameter of interest at confidence level 1 − β. First, a fuzzy confidence interval is constructed for the fuzzy parameter of interest. Then, using such a fuzzy confidence interval, a fuzzy test function is constructed. The obtained fuzzy test, contrary to the classical approach, leads not to a binary decision (i.e. to reject or to accept the given null hypothesis) but to a fuzzy decision showing the degrees of acceptability of the null and alternative hypotheses. Numerical examples are given to demonstrate the theoretical results, and show the possible applications in testing hypotheses based on fuzzy observations.

Journal ArticleDOI
TL;DR: A record-linkage toolbox is developed in order to compare the performance of various string-similarity measures for German surnames and has been used successfully in sociological, economical and epidemiological research projects.
Abstract: We developed a record-linkage toolbox in order to compare the performance of various string-similarity measures for German surnames. This ”Matching Tool-Box” (MTB) is made up by independent, highly portable JAVA-programs. MTB is currently used for prototyping pre-processing tools and the empirical comparison of string-similarity measures. Furthermore, MTB has been used successfully in sociological, economical and epidemiological research projects.

Journal ArticleDOI
TL;DR: In this article, the authors proposed an alternative test statistic that is very similar to an ordinary analysis of variance, based on the Fisher's F-distribution, which is the same as the one used in this paper.
Abstract: One essential prerequisite to ANOVA is homogeneity of variances in underlying populations. Violating this assumption may lead to an increased type I error rate. The reason for this undesirable effect is due to the calculation of the corresponding F-value. A slightly different test statistic keeps the level ®. The underlying distribution of this alternative method is Hotelling’s T2. As Hotelling’s T2 can be approximated by a Fisher’s F-distribution, this alternative test is very similar to an ordinary analysis of variance.

Journal ArticleDOI
TL;DR: In this article, Ollila, Oja, and Koivunen showed that, under general assumptions, any two scatter matrices with the so called independent components property can be used to estimate the unmixing matrix for the independent component analysis (ICA).
Abstract: Oja, Sirkia, and Eriksson (2006) and Ollila, Oja, and Koivunen (2007) showed that, under general assumptions, any two scatter matrices with the so called independent components property can be used to estimate the unmixing matrix for the independent component analysis (ICA). The method is a generalization of Cardoso’s (Cardoso, 1989) FOBI estimate which uses the regular covariance matrix and a scatter matrix based on fourth moments. Different choices of the two scatter matrices are compared in a simulation study. Based on the study, we recommend always the use of two robust scatter matrices. For possible asymmetric independent components, symmetrized versions of the scatter matrix estimates should be used.

Journal ArticleDOI
TL;DR: This work introduces and study general mathematical properties of a new generator of continuous distributions with two extra parameters called the Generalized transmuted family of distributions, and introduces a bivariate extensions of the new family.
Abstract: We introduce and study general mathematical properties of a new generator of continuous distributions with two extra parameters called the Generalized transmuted family of distributions. We present some special models. We investigate the asymptotes and shapes. The new density function can be expressed as a linear combination of exponentiated densities based on the same baseline distribution. We obtain explicit expressions for the ordinary and incomplete moments and generating functions, Bonferroni and Lorenz curves, asymptotic distribution of the extreme values, Shannon and Renyi entropies and order statistics, which hold for any baseline model, certain characterisations are presented. Further, we introduce a bivariate extensions of the new family. We discuss the different method of estimation of the model parameters and illustrate the potentiality of the family by means of two applications to real data. A brief simulation for evaluating Maximum likelihood estimator is done.

Journal ArticleDOI
TL;DR: The theory and applications related to fractionally integrated generalized autoregressive conditional heteroscedastic (FIGARCH) models, mainly for describing the observed persistence in the volatility of a time series are reviewed.
Abstract: This paper reviews the theory and applications related to fractionally integrated generalized autoregressive conditional heteroscedastic (FIGARCH) models, mainly for describing the observed persistence in the volatility of a time series. The long memory nature of FIGARCH models allows to be a better candidate than other conditional heteroscedastic models for modeling volatility in exchange rates, option prices, stock market returns and inflation rates. We discuss some of the important properties of FIGARCH models in this review. We also compare the FIGARCH with the autoregressive fractionally integrated moving average (ARFIMA) model. Problems related to parameter estimation and forecasting using a FIGARCH model are presented. The application of a FIGARCH model to exchange rate data is discussed. We briefly introduce some other models, that are closely related to FIGARCH models. The paper ends with some concluding remarks and future directions of research.

Journal ArticleDOI
TL;DR: The main conclusion is that down-weighting some parts is approaching the geometry of the corresponding subcomposition, thus preserving a kind of coherence between standard and down- Weighted analyses.
Abstract: Standard analysis of compositional data under the assumption that the Aitchison geometry holds assumes a uniform distribution as reference measure of the space. Weighting of parts can be done changing the reference measure. The changes that appear in the algebraic-geometric structure of the simplex are analysed, as a step towards understanding the implications for elementary statistics of random compositions. Some of the standard tools in exploratory analysis of compositional data analysis, such as center, variation matrix and biplots are studied in some detail, although further research is still needed. The main conclusion is that down-weighting some parts is approaching the geometry of the corresponding subcomposition , thus preserving a kind of coherence between standard and down-weighted analyses.

Journal ArticleDOI
Thomas Ledl1
TL;DR: Insight is given to most popular bandwidth parameter selectors as well as to the performance of the kernel density estimator as a classification method compared to the classical linear and quadratic discriminant analysis, respectively.
Abstract: Nowadays, one can find a huge set of methods to estimate the density function of a random variable nonparametrically Since the first version of the most elementary nonparametric density estimator (the histogram) researchers produced a vast amount of ideas especially corresponding to the issue of choosing the bandwidth parameter in a kernel density estimator model To focus not only on a descriptive application, the model seems to be quite suitable for application in discriminant analysis, where (multivariate) class densities are the basis for the assignment of a vector to a given class Thisarticle gives insight to most popular bandwidth parameter selectors as well as to the performance of the kernel density estimator as a classification method compared to the classical linear and quadratic discriminant analysis, respectively Both a direct estimation in a multivariate space as well as an application of the concept to marginal normalizations of the single variables will be taken into consideration From this report the gap between theory and application is going to be pointed out

Journal ArticleDOI
TL;DR: In this article, several properties of the standard deviation are highlighted, including its relationship to the mean absolute deviation and the range of the data, its role in Chebyshev's inequality and the coefficient of variation.
Abstract: Unlike the mean, the standard deviation ¾ is a vague concept. In this paper, several properties of ¾ are highlighted. These properties include the minimum and the maximum of ¾, its relationship to the mean absolute deviation and the range of the data, its role in Chebyshev’s inequality and the coefficient of variation. The hidden information in the formula itself is extracted. The confusion about the denominator of the sample variance being n i 1 is also addressed. Some properties of the sample mean and variance of normal data are carefully explained. Pointing out these and other properties in classrooms may have significant effects on the understanding and the retention of the concept.

Journal ArticleDOI
TL;DR: In this paper, the selection of industry branches by employees in the Austrian labor market was analyzed using the standard logit model and the heteroscedastic extreme value model, and it was shown that the likelihood ratio test rejected the multinomial logit models in favor of the hetero-cedastic specification.
Abstract: In this paper we analyze the selection of industry branches by employees in the Austrian labor market. For this purpose we use the standard logit model and the heteroscedastic extreme value model. We show that the likelihood ratio test rejects the multinomial logit model in favor of the heteroscedastic specification. Consequently, we concentrate on estimation results of the heteroscedastic extreme value model. In our investigation we use 1997 social security records provided by the Hauptverband der Sozialversicherungen.


Journal ArticleDOI
TL;DR: In this article, the shape of a probability distribution is often summarized by the distribution's skewness and kurtosis, and the authors follow up a proposal of Jones (2004) and choose the Beta distribution as underlying weighting function w.r.t.
Abstract: The shape of a probability distribution is often summarized by the distribution’s skewness and kurtosis. Starting from a symmetric “parent” density f on the real line, we can modify its shape (i.e. introduce skewness and in-/decrease kurtosis) if f is appropriately weighted. In particular, every density w on the interval (0; 1) is a specific weighting function. Within this work, we follow up a proposal of Jones (2004) and choose the Beta distribution as underlying weighting function w. “Parent” distributions like the Student-t, the logistic and the normal distribution have already been investigated in the literature. Based on the assumption that f is the density of a hyperbolic secant distribution, we introduce the Beta-hyperbolic secant (BHS) distribution. In contrast to the Beta-normal distribution and to the Beta-Student-t distribution, BHS densities are always unimodal and all moments exist. In contrast to the Beta-logistic distribution, the BHS distribution is more flexible regarding the range of skewness and leptokurtosis combinations. Moreover, we propose a generalization which nests both the Beta-logistic and the BHS distribution. Finally, the goodness-of-fit between all above-mentioned distributions is compared for glass fibre data and aluminium returns.

Journal ArticleDOI
TL;DR: For the analysis of square contingency tables with ordered categories, Agresti and Tomizawa as discussed by the authors introduced the linear diagonals-parameter symmetry (LDPS) model, which has one more parameter than the LDPS model.
Abstract: For the analysis of square contingency tables with ordered categories, Agresti (1983) introduced the linear diagonals-parameter symmetry (LDPS) model Tomizawa (1991) considered an extended LDPS (ELDPS) model, which has one more parameter than the LDPS model These models are special cases of Caussinus (1965) quasi-symmetry (QS) model Caussinus showed that the symmetry (S) model is equivalent to the QS model and the marginal homogeneity (MH) model holding simultaneously For square tables with ordered categories, Agresti (2002, p430) gave a decomposition for the S model into the ordinal quasi-symmetry and MH models This paper proposes some decompositions which are different from Caussinus’ and Agresti’s decompositions It gives (i) two kinds of decomposition theorems of the S model for two-way tables, (ii) extended models corresponding to the LDPS and ELDPS, and the generalized model further for multi-way tables, and (iii) three kinds of decomposition theorems of the S model into their models and marginal equimoment models for multi-way tables The proposed decompositions may be useful if it is reasonable to assume the underlying multivariate normal distribution

Journal ArticleDOI
TL;DR: In this article, the distribution and moments of the ratio of independent inverted gamma variates have been considered and unbiased estimators of the parameter involved in the distribution have been proposed.
Abstract: In this paper the distribution and moments of the ratio of independent inverted gamma variates have been considered. Unbiased estimators of the parameter involved in the distribution have been proposed. As a particular case, the ratio of independent Levy variates have been studied.

Journal ArticleDOI
TL;DR: It is argued that the statistical computing community needs a more common understanding of software quality, and better domain-specific semantic resources.
Abstract: The number of R extension packages available from the CRAN repository has tremendously grown over the past 10 years. We look at this phenomenon in more detail, and discuss some of its consequences. In particular, we argue that the statistical computing community needs a more common understanding of software quality, and better domain-specific semantic resources.

Journal ArticleDOI
TL;DR: In this paper, the influence of meteorological and anthropogenic factors on particulate matter PM10 in Graz is discussed and a prediction model using current information and meteorological forecasts is introduced to predict the average concentration of PM10 for the next day.
Abstract: We summarize the investigations and results of recent empirical analysis on particulate matter PM10 in Graz. The influence of meteorological as well as anthropogenic factors is presented and discussed. Moreover we introduce a prediction model using current information and meteorological forecasts to predict the average concentration of particulate matter PM10 for the next day. Finally, we report on experiences with a test run carried out from December 16, 2004 until April 15, 2005.