scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Methodology in 2012"


Posted Content
TL;DR: It is shown how to evaluate the density of arbitrary regular vine specifications, which opens the vine copula methodology to the flexible modeling of complex dependencies even in larger dimensions.
Abstract: Regular vine distributions which constitute a flexible class of multivariate dependence models are discussed. Since multivariate copulae constructed through pair-copula decompositions were introduced to the statistical community, interest in these models has been growing steadily and they are finding successful applications in various fields. Research so far has however been concentrating on so-called canonical and D-vine copulae, which are more restrictive cases of regular vine copulae. It is shown how to evaluate the density of arbitrary regular vine specifications. This opens the vine copula methodology to the flexible modeling of complex dependencies even in larger dimensions. In this regard, a new automated model selection and estimation technique based on graph theoretical considerations is presented. This comprehensive search strategy is evaluated in a large simulation study and applied to a 16-dimensional financial data set of international equity, fixed income and commodity indices which were observed over the last decade, in particular during the recent financial crisis. The analysis provides economically well interpretable results and interesting insights into the dependence structure among these indices.

478 citations


Journal ArticleDOI
TL;DR: In this article, a unifying framework linking two classes of statistics used in two-sample and independence testing is presented, namely, the energy distance and distance covariances from the statistics literature; and the maximum mean discrepancy (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces.
Abstract: We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with a semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

453 citations


Journal ArticleDOI
TL;DR: A new family of Tensor regression models that efficiently exploit the special structure of tensor covariates are proposed and ultrahigh dimensionality is reduced to a manageable level, resulting in efficient estimation and prediction.
Abstract: Classical regression methods treat covariates as a vector and estimate a corresponding vector of regression coefficients. Modern applications in medical imaging generate covariates of more complex form such as multidimensional arrays (tensors). Traditional statistical and computational methods are proving insufficient for analysis of these high-throughput data due to their ultrahigh dimensionality as well as complex structure. In this article, we propose a new family of tensor regression models that efficiently exploit the special structure of tensor covariates. Under this framework, ultrahigh dimensionality is reduced to a manageable level, resulting in efficient estimation and prediction. A fast and highly scalable estimation algorithm is proposed for maximum likelihood estimation and its associated asymptotic properties are studied. Effectiveness of the new methods is demonstrated on both synthetic and real MRI imaging data.

368 citations


Journal ArticleDOI
TL;DR: A precise characterization of the effect of this hierarchy constraint is given, a bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint, and it is proved that hierarchy holds with probability one.
Abstract: We add a set of convex constraints to the lasso to produce sparse interaction models that honor the hierarchy restriction that an interaction only be included in a model if one or both variables are marginally important. We give a precise characterization of the effect of this hierarchy constraint, prove that hierarchy holds with probability one and derive an unbiased estimate for the degrees of freedom of our estimator. A bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint. We distinguish between parameter sparsity - the number of nonzero coefficients - and practical sparsity - the number of raw variables one must measure to make a new prediction. Hierarchy focuses on the latter, which is more closely tied to important data collection concerns such as cost, time and effort. We develop an algorithm, available in the R package hierNet, and perform an empirical study of our method.

355 citations


Journal ArticleDOI
TL;DR: Simulation results suggest the imputation by fully conditional specification proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible.
Abstract: Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation (MI). Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of MI may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing MI, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it to existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible.

264 citations


Posted Content
TL;DR: In this paper, an unbiased estimator of the likelihood is used within a Metropolis-Hastings chain, and it is necessary to trade off the number of Monte Carlo samples used to construct this estimator against the asymptotic variances of averages computed under this chain.
Abstract: When an unbiased estimator of the likelihood is used within a Metropolis--Hastings chain, it is necessary to trade off the number of Monte Carlo samples used to construct this estimator against the asymptotic variances of averages computed under this chain. Many Monte Carlo samples will typically result in Metropolis--Hastings averages with lower asymptotic variances than the corresponding Metropolis--Hastings averages using fewer samples. However, the computing time required to construct the likelihood estimator increases with the number of Monte Carlo samples. Under the assumption that the distribution of the additive noise introduced by the log-likelihood estimator is Gaussian with variance inversely proportional to the number of Monte Carlo samples and independent of the parameter value at which it is evaluated, we provide guidelines on the number of samples to select. We demonstrate our results by considering a stochastic volatility model applied to stock index returns.

215 citations


Journal ArticleDOI
TL;DR: The authors examined the impact of random effects distribution misspecification on a variety of inferences, including prediction, inference about covariate effects, prediction of random effect and estimation of variance.
Abstract: Statistical models that include random effects are commonly used to analyze longitudinal and correlated data, often with strong and parametric assumptions about the random effects distribution. There is marked disagreement in the literature as to whether such parametric assumptions are important or innocuous. In the context of generalized linear mixed models used to analyze clustered or longitudinal data, we examine the impact of random effects distribution misspecification on a variety of inferences, including prediction, inference about covariate effects, prediction of random effects and estimation of random effects variances. We describe examples, theoretical calculations and simulations to elucidate situations in which the specification is and is not important. A key conclusion is the large degree of robustness of maximum likelihood for a wide variety of commonly encountered situations.

171 citations


Posted Content
TL;DR: In this article, a new data-augmentation strategy for fully Bayesian inference in models with binomial likelihoods is proposed, which appeals to a new class of Polya-Gamma distributions, which are constructed in detail.
Abstract: We propose a new data-augmentation strategy for fully Bayesian inference in models with binomial likelihoods. The approach appeals to a new class of Polya-Gamma distributions, which are constructed in detail. A variety of examples are presented to show the versatility of the method, including logistic regression, negative binomial regression, nonlinear mixed-effects models, and spatial models for count data. In each case, our data-augmentation strategy leads to simple, effective methods for posterior inference that: (1) circumvent the need for analytic approximations, numerical integration, or Metropolis-Hastings; and (2) outperform other known data-augmentation strategies, both in ease of use and in computational efficiency. All methods, including an efficient sampler for the Polya-Gamma distribution, are implemented in the R package BayesLogit. In the technical supplement appended to the end of the paper, we provide further details regarding the generation of Polya-Gamma random variables; the empirical benchmarks reported in the main manuscript; and the extension of the basic data-augmentation framework to contingency tables and multinomial outcomes.

164 citations


Journal ArticleDOI
TL;DR: An updated overview of Thurstonian and Bradley-Terry extensions, including how to account for object- and subject-specific covariates and how to deal with ordinal paired comparison data is provided.
Abstract: Thurstonian and Bradley-Terry models are the most commonly applied models in the analysis of paired comparison data. Since their introduction, numerous developments have been proposed in different areas. This paper provides an updated overview of these extensions, including how to account for object- and subject-specific covariates and how to deal with ordinal paired comparison data. Special emphasis is given to models for dependent comparisons. Although these models are more realistic, their use is complicated by numerical difficulties. We therefore concentrate on implementation issues. In particular, a pairwise likelihood approach is explored for models for dependent paired comparison data, and a simulation study is carried out to compare the performance of maximum pairwise likelihood with other limited information estimation methods. The methodology is illustrated throughout using a real data set about university paired comparisons performed by students.

144 citations


Posted Content
TL;DR: This paper proposes non-backtracking random walk with re-weighting (NBRW-rw) and MH algorithm with delayed acceptance (MHDA) which are theoretically guaranteed to achieve, at almost no additional cost, not only unbiased graph sampling but also higher efficiency (smaller asymptotic variance of the resulting unbiased estimators) than the SRw-rw and the MH algorithm, respectively.
Abstract: Graph sampling via crawling has been actively considered as a generic and important tool for collecting uniform node samples so as to consistently estimate and uncover various characteristics of complex networks. The so-called simple random walk with re-weighting (SRW-rw) and Metropolis-Hastings (MH) algorithm have been popular in the literature for such unbiased graph sampling. However, an unavoidable downside of their core random walks -- slow diffusion over the space, can cause poor estimation accuracy. In this paper, we propose non-backtracking random walk with re-weighting (NBRW-rw) and MH algorithm with delayed acceptance (MHDA) which are theoretically guaranteed to achieve, at almost no additional cost, not only unbiased graph sampling but also higher efficiency (smaller asymptotic variance of the resulting unbiased estimators) than the SRW-rw and the MH algorithm, respectively. In particular, a remarkable feature of the MHDA is its applicability for any non-uniform node sampling like the MH algorithm, but ensuring better sampling efficiency than the MH algorithm. We also provide simulation results to confirm our theoretical findings.

131 citations


Posted Content
TL;DR: In this article, the authors address the prediction of stationary functional time series and make a connection between functional and multivariate predictions for the important case of vector and functional autoregressions.
Abstract: This paper addresses the prediction of stationary functional time series. Existing contributions to this problem have largely focused on the special case of first-order functional autoregressive processes because of their technical tractability and the current lack of advanced functional time series methodology. It is shown here how standard multivariate prediction techniques can be utilized in this context. The connection between functional and multivariate predictions is made precise for the important case of vector and functional autoregressions. The proposed method is easy to implement, making use of existing statistical software packages, and may therefore be attractive to a broader, possibly non-academic, audience. Its practical applicability is enhanced through the introduction of a novel functional final prediction error model selection criterion that allows for an automatic determination of the lag structure and the dimensionality of the model. The usefulness of the proposed methodology is demonstrated in a simulation study and an application to environmental data, namely the prediction of daily pollution curves describing the concentration of particulate matter in ambient air. It is found that the proposed prediction method often significantly outperforms existing methods.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a structural reliability analysis with equality information, which enables the use of any SRM, including those based on simulation, for reliability updating by solving a standard structural system reliability problem.
Abstract: In many instances, information on engineering systems can be obtained through measurements, monitoring or direct observations of system performances and can be used to update the system reliability estimate. In structural reliability analysis, such information is expressed either by inequalities (e.g. for the observation that no defect is present) or by equalities (e.g. for quantitative measurements of system characteristics). When information Z is of the equality type, the a-priori probability of Z is zero and most structural reliability methods (SRM) are not directly applicable to the computation of the updated reliability. Hitherto, the computation of the reliability of engineering systems conditional on equality information was performed through first- and second order approximations. In this paper, it is shown how equality information can be transformed into inequality information, which enables reliability updating by solving a standard structural system reliability problem. This approach enables the use of any SRM, including those based on simulation, for reliability updating with equality information. It is demonstrated on three numerical examples, including an application to fatigue reliability.

Posted Content
TL;DR: Simulations reveal that the new approach has better power than existing approaches when the dimension of the data is moderate to high, and is illustrated on two applications: the determination of authorship of a classic novel, and the detection of change in a network over time.
Abstract: We consider the testing and estimation of change-points -- locations where the distribution abruptly changes -- in a data sequence. A new approach, based on scan statistics utilizing graphs representing the similarity between observations, is proposed. The graph-based approach is non-parametric, and can be applied to any data set as long as an informative similarity measure on the sample space can be defined. Accurate analytic approximations to the significance of graph-based scan statistics for both the single change-point and the changed interval alternatives are provided. Simulations reveal that the new approach has better power than existing approaches when the dimension of the data is moderate to high. The new approach is illustrated on two applications: The determination of authorship of a classic novel, and the detection of change in a network over time.

Posted Content
TL;DR: In this article, the authors proposed new methods to release aggregate genome-wide association studies data without compromising an individual's privacy, and compared these approaches on simulated data and on a GWAS study of canine hair length involving 685 dogs.
Abstract: Traditional statistical methods for confidentiality protection of statistical databases do not scale well to deal with GWAS (genome-wide association studies) databases especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach which provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees come at a serious price in terms of data utility. Building on such notions, we propose new methods to release aggregate GWAS data without compromising an individual's privacy. We present methods for releasing differentially private minor allele frequencies, chi-square statistics and p-values. We compare these approaches on simulated data and on a GWAS study of canine hair length involving 685 dogs. We also propose a privacy-preserving method for finding genome-wide associations based on a differentially-private approach to penalized logistic regression.

Posted Content
TL;DR: Two classes of transformations of cosine similarity and Pearson and Spearman correlations into metric distances are investigated, utilising the simple tool of metric-preserving functions.
Abstract: We investigate two classes of transformations of cosine similarity and Pearson and Spearman correlations into metric distances, utilising the simple tool of metric-preserving functions. The first class puts anti-correlated objects maximally far apart. Previously known transforms fall within this class. The second class collates correlated and anti-correlated objects. An example of such a transformation that yields a metric distance is the sine function when applied to centered data.

Posted Content
Art B. Owen1
TL;DR: A new method for estimating Sobol' indices makes use of 3 independent input vectors rather than the usual 2, and attains much greater accuracy on problems where the target Sobol’ index is small, even outperforming some oracles that adjust using the true but unknown mean of the function.
Abstract: A new method for estimating Sobol' indices is proposed. The new method makes use of 3 independent input vectors rather than the usual 2. It attains much greater accuracy on problems where the target Sobol' index is small, even outperforming some oracles which adjust using the true but unknown mean of the function. When the target Sobol' index is quite large, the oracles do better than the new method.

Posted Content
TL;DR: In this paper, a multivariate peaks-over-threshold method was proposed to estimate extreme-value parameters from observations in the max-domain of attraction (MDA) of multivariate max-stable distribution commonly uses aggregated data such as block maxima.
Abstract: Estimation of extreme-value parameters from observations in the max-domain of attraction (MDA) of a multivariate max-stable distribution commonly uses aggregated data such as block maxima. Since we expect that additional information is contained in the non-aggregated, single "large" observations, we introduce a new approach of inference based on a multivariate peaks-over-threshold method. We show that for any process in the MDA of the frequently used Husler-Reiss model or its spatial extension, the Brown-Resnick process, suitably defined conditional increments asymptotically follow a multivariate Gaussian distribution. This leads to computationally efficient estimates of the Husler-Reiss parameter matrix. Further, the results enable parametric inference for Brown-Resnick processes. A simulation study compares the performance of the new estimators to other commonly used methods. As an application, we fit a non-isotropic Brown-Resnick process to the extremes of 12 year data of daily wind speed measurements.

Posted Content
TL;DR: A method for solving this problem, as well as the related problem of finding credible regions for contour curves, for latent Gaussian models is proposed, based on using a parametric family for the excursion sets in combination with a sequential importance sampling method for estimating joint probabilities.
Abstract: An interesting statistical problem is to find regions where some studied process exceeds a certain level. Estimating such regions so that the probability for exceeding the level in the entire set is equal to some predefined value is a difficult problem that occurs in several areas of applications ranging from brain imaging to astrophysics. In this work, a method for solving this problem, as well as the related problem of finding uncertainty regions for contour curves, for latent Gaussian models is proposed. The method is based on using a parametric family for the excursion sets in combination with a sequential importance sampling method for estimating joint probabilities. The accuracy of the method is investigated using simulated data and two environmental applications are presented. In the first application, areas where the air pollution in the Piemonte region in northern Italy exceeds the daily limit value, set by the European Union for human health protection, are estimated. In the second application, regions in the African Sahel that experienced an increase in vegetation after the drought period in the early 1980s are estimated.

Journal ArticleDOI
TL;DR: The paper addresses shortcomings of current approaches to a fundamental problem of how to perform valid statistical inference from data released by privacy mechanisms, and lays a foundational groundwork on how to achieve optimal and private statistical inference in a principled manner by modeling the privacy mechanism.
Abstract: The $\beta$-model of random graphs is an exponential family model with the degree sequence as a sufficient statistic. In this paper, we contribute three key results. First, we characterize conditions that lead to a quadratic time algorithm to check for the existence of MLE of the $\beta$-model, and show that the MLE never exists for the degree partition $\beta$-model. Second, motivated by privacy problems with network data, we derive a differentially private estimator of the parameters of $\beta$-model, and show it is consistent and asymptotically normally distributed - it achieves the same rate of convergence as the nonprivate estimator. We present an efficient algorithm for the private estimator that can be used to release synthetic graphs. Our techniques can also be used to release degree distributions and degree partitions accurately and privately, and to perform inference from noisy degrees arising from contexts other than privacy. We evaluate the proposed estimator on real graphs and compare it with a current algorithm for releasing degree distributions and find that it does significantly better. Finally, our paper addresses shortcomings of current approaches to a fundamental problem of how to perform valid statistical inference from data released by privacy mechanisms, and lays a foundational groundwork on how to achieve optimal and private statistical inference in a principled manner by modeling the privacy mechanism; these principles should be applicable to a class of models beyond the $\beta$-model.

Posted Content
TL;DR: It is shown that coupled with an efficiency augmentation procedure, this method produces clinically meaningful estimators in a variety of settings and can be useful for practicing personalized medicine: determining from a large set of biomarkers, the subset of patients that can potentially benefit from a treatment.
Abstract: We consider a setting in which we have a treatment and a large number of covariates for a set of observations, and wish to model their relationship with an outcome of interest. We propose a simple method for modeling interactions between the treatment and covariates. The idea is to modify the covariate in a simple way, and then fit a standard model using the modified covariates and no main effects. We show that coupled with an efficiency augmentation procedure, this method produces valid inferences in a variety of settings. It can be useful for personalized medicine: determining from a large set of biomarkers the subset of patients that can potentially benefit from a treatment. We apply the method to both simulated datasets and gene expression studies of cancer. The modified data can be used for other purposes, for example large scale hypothesis testing for determining which of a set of covariates interact with a treatment variable.

Book ChapterDOI
TL;DR: A new metric for RMHMC is proposed without limitations and its success on a distribution that emulates many hierarchical and latent models is verified.
Abstract: Markov Chain Monte Carlo (MCMC) is an invaluable means of inference with complicated models, and Hamiltonian Monte Carlo, in particular Riemannian Manifold Hamiltonian Monte Carlo (RMHMC), has demonstrated impressive success in many challenging problems. Current RMHMC implementations, however, rely on a Riemannian metric that limits their application to analytically-convenient models. In this paper I propose a new metric for RMHMC without these limitations and verify its success on a distribution that emulates many hierarchical and latent models.

Posted Content
TL;DR: In this article, a thresholded variant of the Group Lasso estimator for discovering Granger causal interactions among the nodes of the network is introduced. But the performance of the proposed methodology is assessed through an extensive set of simulation studies and comparisons with existing techniques.
Abstract: The problem of estimating high-dimensional network models arises naturally in the analysis of many physical, biological and socio-economic systems. Examples include stock price fluctuations in financial markets and gene regulatory networks representing effects of regulators (transcription factors) on regulated genes in genetics. We aim to learn the structure of the network over time employing the framework of Granger causal models under the assumptions of sparsity of its edges and inherent grouping structure among its nodes. We introduce a thresholded variant of the Group Lasso estimator for discovering Granger causal interactions among the nodes of the network. Asymptotic results on the consistency of the new estimation procedure are developed. The performance of the proposed methodology is assessed through an extensive set of simulation studies and comparisons with existing techniques.

Posted Content
TL;DR: In this article, the pairwise likelihood estimation for max-stable space-time processes is proposed to estimate the model parameters and prove strong consistency and asymptotic normality of the parameter estimates for an increasing space time dimension, as the joint number of spatial locations and time points tends to infinity.
Abstract: Max-stable processes have proved to be useful for the statistical modelling of spatial extremes. Several representations of max-stable random fields have been proposed in the literature. One such representation is based on a limit of normalized and scaled pointwise maxima of stationary Gaussian processes that was first introduced by Kabluchko, Schlather and de Haan (2009). This paper deals with statistical inference for max-stable space-time processes that are defined in an analogous fashion. We describe pairwise likelihood estimation, where the pairwise density of the process is used to estimate the model parameters and prove strong consistency and asymptotic normality of the parameter estimates for an increasing space-time dimension, i.e., as the joint number of spatial locations and time points tends to infinity. A simulation study shows that the proposed method works well for these models.

Posted Content
TL;DR: The control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models are discussed.
Abstract: This paper addresses the problem of measurement errors in causal inference and highlights several algebraic and graphical methods for eliminating systematic bias induced by such errors. In particulars, the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models.

Journal ArticleDOI
TL;DR: The goal is to address a statistical audience, and to provide a primarily statistical treatment of the lessons that have been learned from this remarkable set of data on collaborative filtering and recommender systems.
Abstract: Inspired by the legacy of the Netflix contest, we provide an overview of what has been learned---from our own efforts, and those of others---concerning the problems of collaborative filtering and recommender systems. The data set consists of about 100 million movie ratings (from 1 to 5 stars) involving some 480 thousand users and some 18 thousand movies; the associated ratings matrix is about 99% sparse. The goal is to predict ratings that users will give to movies; systems which can do this accurately have significant commercial applications, particularly on the world wide web. We discuss, in some detail, approaches to "baseline" modeling, singular value decomposition (SVD), as well as kNN (nearest neighbor) and neural network models; temporal effects, cross-validation issues, ensemble methods and other considerations are discussed as well. We compare existing models in a search for new models, and also discuss the mission-critical issues of penalization and parameter shrinkage which arise when the dimensions of a parameter space reaches into the millions. Although much work on such problems has been carried out by the computer science and machine learning communities, our goal here is to address a statistical audience, and to provide a primarily statistical treatment of the lessons that have been learned from this remarkable set of data.

Posted Content
TL;DR: A general method for obtaining more flexible new distributions by compounding the extended Weibull and power series distributions and defines 68 new sub-models, which includes some well-known mixing distributions.
Abstract: In this paper, we introduce a new class of distributions which is obtained by compounding the extended Weibull and power series distributions. The compounding procedure follows the same set-up carried out by Adamidis and Loukas (1998) and defines at least new 68 sub-models. This class includes some well-known mixing distributions, such as the Weibull power series (Morais and Barreto-Souza, 2010) and exponential power series (Chahkandi and Ganjali, 2009) distributions. Some mathematical properties of the new class are studied including moments and generating function. We provide the density function of the order statistics and obtain their moments. The method of maximum likelihood is used for estimating the model parameters and an EM algorithm is proposed for computing the estimates. Special distributions are investigated in some detail. An application to a real data set is given to show the flexibility and potentiality of the new class of distributions.

Posted Content
TL;DR: In this paper, quantile autocorrelation function (QACF) and quantile partial autocorecorrelation functions (QPACF) are proposed to estimate the autoregressive order of quantile models.
Abstract: In this paper, we propose two important measures, quantile correlation (QCOR) and quantile partial correlation (QPCOR). We then apply them to quantile autoregressive (QAR) models, and introduce two valuable quantities, the quantile autocorrelation function (QACF) and the quantile partial autocorrelation function (QPACF). This allows us to extend the classical Box-Jenkins approach to quantile autoregressive models. Specifically, the QPACF of an observed time series can be employed to identify the autoregressive order, while the QACF of residuals obtained from the fitted model can be used to assess the model adequacy. We not only demonstrate the asymptotic properties of QCOR, QPCOR, QACF, and PQACF, but also show the large sample results of the QAR estimates and the quantile version of the Ljung-Box test. Simulation studies indicate that the proposed methods perform well in finite samples, and an empirical example is presented to illustrate usefulness.

Posted Content
TL;DR: In this paper, a framework for conditional simulations of max-stable processes and closed forms for Brown-Resnick and Schlather processes is proposed, which can handle real-sized problems.
Abstract: Since many environmental processes such as heat waves or precipitation are spatial in extent, it is likely that a single extreme event affects several locations and the areal modelling of extremes is therefore essential if the spatial dependence of extremes has to be appropriately taken into account. This paper proposes a framework for conditional simulations of max-stable processes and give closed forms for Brown-Resnick and Schlather processes. We test the method on simulated data and give an application to extreme rainfall around Zurich and extreme temperature in Switzerland. Results show that the proposed framework provides accurate conditional simulations and can handle real-sized problems.

Journal ArticleDOI
TL;DR: Power to detect association in plausible genetic scenarios is low for studies of medium size unless a high proportion of the chosen variants are causal, and considerable attention must be given to relevant biological information that can guide the selection of variants for testing.
Abstract: In the search for genetic factors that are associated with complex heritable human traits, considerable attention is now being focused on rare variants that individually have small effects. In response, numerous recent papers have proposed testing strategies to assess association between a group of rare variants and a trait, with competing claims about the performance of various tests. The power of a given test in fact depends on the nature of any association and on the rareness of the variants in question. We review such tests within a general framework that covers a wide range of genetic models and types of data. We study the performance of specific tests through exact or asymptotic power formulas and through novel simulation studies of over 10,000 different models. The tests considered are also applied to real sequence data from the 1000 Genomes project and provided by the GAW17. We recommend a testing strategy, but our results show that power to detect association in plausible genetic scenarios is low for studies of medium size unless a high proportion of the chosen variants are causal. Consequently, considerable attention must be given to relevant biological information that can guide the selection of variants for testing.

Posted Content
TL;DR: In this paper, the Kumaraswamy Pareto distribution is introduced and studied, which can have a decreasing and upside-down bathtub failure rate function depending on the values of its parameters.
Abstract: The modeling and analysis of lifetimes is an important aspect of statistical work in a wide variety of scientific and technological fields. For the first time, the called Kumaraswamy Pareto distribution is introduced and studied. The new distribution can have a decreasing and upside-down bathtub failure rate function depending on the values of its parameters. It includes as special sub-models the Pareto and exponentiated Pareto (Gupta et al. [12]) distributions. Some structural properties of the proposed distribution are studied including explicit expressions for the moments and generating function. We provide the density function of the order statistics and obtain their moments. The method of maximum likelihood is used for estimating the model parameters and the observed information matrix is derived. A real data set is used to compare the new model with widely known distributions.