scispace - formally typeset
Search or ask a question

Showing papers in "Statistics and Computing in 2014"


Journal ArticleDOI
TL;DR: The Akaike, deviance, and Watanabe-Akaike information criteria are reviewed from a Bayesian perspective and it is better understood, through small examples, how these methods can apply in practice.
Abstract: We review the Akaike, deviance, and Watanabe-Akaike information criteria from a Bayesian perspective, where the goal is to estimate expected out-of-sample-prediction error using a bias-corrected adjustment of within-sample error. We focus on the choices involved in setting up these measures, and we compare them in three simple examples, one theoretical and two applied. The contribution of this paper is to put all these information criteria into a Bayesian predictive context and to better understand, through small examples, how these methods can apply in practice.

1,654 citations


Journal ArticleDOI
TL;DR: Estimation strategies to reduce the computational burden and inefficiency associated with the Monte Carlo EM algorithm are discussed and a combination of Gaussian quadrature approximations and non-smooth optimization algorithms are presented.
Abstract: Dependent data arise in many studies. Frequently adopted sampling designs, such as cluster, multilevel, spatial, and repeated measures, may induce this dependence, which the analysis of the data needs to take into due account. In a previous publication (Geraci and Bottai in Biostatistics 8:140–154, 2007), we proposed a conditional quantile regression model for continuous responses where subject-specific random intercepts were included to account for within-subject dependence in the context of longitudinal data analysis. The approach hinged upon the link existing between the minimization of weighted absolute deviations, typically used in quantile regression, and the maximization of a Laplace likelihood. Here, we consider an extension of those models to more complex dependence structures in the data, which are modeled by including multiple random effects in the linear conditional quantile functions. We also discuss estimation strategies to reduce the computational burden and inefficiency associated with the Monte Carlo EM algorithm we have proposed previously. In particular, the estimation of the fixed regression coefficients and of the random effects’ covariance matrix is based on a combination of Gaussian quadrature approximations and non-smooth optimization algorithms. Finally, a simulation study and a number of applications of our models are presented.

249 citations


Journal ArticleDOI
TL;DR: Comparisons are presented to illustrate the relative performance of the restricted and unrestricted models, and demonstrate the usefulness of the recently proposed methodology for the unrestricted MST mixture, by some applications to three real datasets.
Abstract: Finite mixtures of multivariate skew t (MST) distributions have proven to be useful in modelling heterogeneous data with asymmetric and heavy tail behaviour. Recently, they have been exploited as an effective tool for modelling flow cytometric data. A number of algorithms for the computation of the maximum likelihood (ML) estimates for the model parameters of mixtures of MST distributions have been put forward in recent years. These implementations use various characterizations of the MST distribution, which are similar but not identical. While exact implementation of the expectation-maximization (EM) algorithm can be achieved for `restricted' characterizations of the component skew t-distributions, Monte Carlo (MC) methods have been used to fit the `unrestricted' models. In this paper, we review several recent fitting algorithms for finite mixtures of multivariate skew t-distributions, at the same time clarifying some of the connections between the various existing proposals. In particular, recent results have shown that the EM algorithm can be implemented exactly for faster computation of ML estimates for mixtures with unrestricted MST components. The gain in computational time is effected by noting that the semi-infinite integrals on the E-step of the EM algorithm can be put in the form of moments of the truncated multivariate non-central t-distribution, similar to the restricted case, which subsequently can be expressed in terms of the non-truncated form of the central t-distribution function for which fast algorithms are available. We present comparisons to illustrate the relative performance of the restricted and unrestricted models, and demonstrate the usefulness of the recently proposed methodology for the unrestricted MST mixture, by some applications to three real datasets.

233 citations


Journal ArticleDOI
TL;DR: The presented approach to the fitting of generalized linear mixed models includes an L1-penalty term that enforces variable selection and shrinkage simultaneously and a gradient ascent algorithm is proposed that allows to maximize the penalized log-likelihood yielding models with reduced complexity.
Abstract: Generalized linear mixed models are a widely used tool for modeling longitudinal data. However, their use is typically restricted to few covariates, because the presence of many predictors yields unstable estimates. The presented approach to the fitting of generalized linear mixed models includes an L 1-penalty term that enforces variable selection and shrinkage simultaneously. A gradient ascent algorithm is proposed that allows to maximize the penalized log-likelihood yielding models with reduced complexity. In contrast to common procedures it can be used in high-dimensional settings where a large number of potentially influential explanatory variables is available. The method is investigated in simulation studies and illustrated by use of real data sets.

195 citations


Journal ArticleDOI
TL;DR: A new variable importance measure that is applicable to any kind of data—whether it does or does not contain missing values is presented, which takes the occurrence of missing values into account and makes results also differ from those obtained under multiple imputation.
Abstract: Random forests are widely used in many research fields for prediction and interpretation purposes Their popularity is rooted in several appealing characteristics, such as their ability to deal with high dimensional data, complex interactions and correlations between variables Another important feature is that random forests provide variable importance measures that can be used to identify the most important predictor variables Though there are alternatives like complete case analysis and imputation, existing methods for the computation of such measures cannot be applied straightforward when the data contains missing values This paper presents a solution to this pitfall by introducing a new variable importance measure that is applicable to any kind of data--whether it does or does not contain missing values An extensive simulation study shows that the new measure meets sensible requirements and shows good variable ranking properties An application to two real data sets also indicates that the new approach may provide a more sensible variable ranking than the widespread complete case analysis It takes the occurrence of missing values into account which makes results also differ from those obtained under multiple imputation

152 citations


Journal ArticleDOI
TL;DR: An in depth description of several highly efficient sampling schemes that allow to estimate complex models with several hierarchy levels and a large number of observations within a couple of minutes (often even seconds) is provided.
Abstract: Models with structured additive predictor provide a very broad and rich framework for complex regression modeling. They can deal simultaneously with nonlinear covariate effects and time trends, unit- or cluster-specific heterogeneity, spatial heterogeneity and complex interactions between covariates of different type. In this paper, we propose a hierarchical or multilevel version of regression models with structured additive predictor where the regression coefficients of a particular nonlinear term may obey another regression model with structured additive predictor. In that sense, the model is composed of a hierarchy of complex structured additive regression models. The proposed model may be regarded as an extended version of a multilevel model with nonlinear covariate terms in every level of the hierarchy. The model framework is also the basis for generalized random slope modeling based on multiplicative random effects. Inference is fully Bayesian and based on Markov chain Monte Carlo simulation techniques. We provide an in depth description of several highly efficient sampling schemes that allow to estimate complex models with several hierarchy levels and a large number of observations within a couple of minutes (often even seconds). We demonstrate the practicability of the approach in a complex application on childhood undernutrition with large sample size and three hierarchy levels.

105 citations


Journal ArticleDOI
TL;DR: A family of multivariate heavy-tailed distributions that allow variable marginal amounts of tailweight and can account for a variety of shapes and have a simple tractable form with a closed-form probability density function whatever the dimension.
Abstract: We propose a family of multivariate heavy-tailed distributions that allow variable marginal amounts of tailweight. The originality comes from introducing multidimensional instead of univariate scale variables for the mixture of scaled Gaussian family of distributions. In contrast to most existing approaches, the derived distributions can account for a variety of shapes and have a simple tractable form with a closed-form probability density function whatever the dimension. We examine a number of properties of these distributions and illustrate them in the particular case of Pearson type VII and t tails. For these latter cases, we provide maximum likelihood estimation of the parameters and illustrate their modelling flexibility on simulated and real data clustering examples.

103 citations


Journal ArticleDOI
TL;DR: A novel predictive statistical modeling technique called Hybrid Radial Basis Function Neural Networks (HRBF-NN), a flexible forecasting technique that integrates regression trees, ridge regression, with radial basis function (RBF) neural networks (NN) as a forecaster, is introduced.
Abstract: We introduce a novel predictive statistical modeling technique called Hybrid Radial Basis Function Neural Networks (HRBF-NN) as a forecaster. HRBF-NN is a flexible forecasting technique that integrates regression trees, ridge regression, with radial basis function (RBF) neural networks (NN). We develop a new computational procedure using model selection based on information-theoretic principles as the fitness function using the genetic algorithm (GA) to carry out subset selection of best predictors. Due to the dynamic and chaotic nature of the underlying stock market process, as is well known, the task of generating economically useful stock market forecasts is difficult, if not impossible. HRBF-NN is well suited for modeling complex non-linear relationships and dependencies between the stock indices. We propose HRBF-NN as our forecaster and a predictive modeling tool to study the daily movements of stock indices. We show numerical examples to determine a predictive relationship between the Istanbul Stock Exchange National 100 Index (ISE100) and seven other international stock market indices. We select the best subset of predictors by minimizing the information complexity (ICOMP) criterion as the fitness function within the GA. Using the best subset of variables we construct out-of-sample forecasts for the ISE100 index to determine the daily directional movements. Our results obtained demonstrate the utility and the flexibility of HRBF-NN as a clever predictive modeling tool for highly dependent and nonlinear data.

102 citations


Journal ArticleDOI
TL;DR: The results obtained using simulated and real data show the superiority of this method for the modeling of non-stationary count data with overdispersion compared with competing models, such as global regressions, e.g., Poisson and negative binomial and Geographically Weighted Poisson Regression (GWPR).
Abstract: Global regression assumes that a single model adequately describes all parts of a study region. However, the heterogeneity in the data may be sufficiently strong that relationships between variables can not be spatially constant. In addition, the factors involved are often sufficiently complex that it is difficult to identify them in the form of explanatory variables. As a result Geographically Weighted Regression (GWR) was introduced as a tool for the modeling of non-stationary spatial data. Using kernel functions, the GWR methodology allows the model parameters to vary spatially and produces non-parametric surfaces of their estimates. To model count data with overdispersion, it is more appropriate to use a negative binomial distribution instead of a Poisson distribution. Therefore, we propose the Geographically Weighted Negative Binomial Regression (GWNBR) method for the modeling of data with overdispersion. The results obtained using simulated and real data show the superiority of this method for the modeling of non-stationary count data with overdispersion compared with competing models, such as global regressions, e.g., Poisson and negative binomial and Geographically Weighted Poisson Regression (GWPR). Moreover, we illustrate that these competing models are special cases of the more robust model GWNBR.

81 citations


Journal ArticleDOI
TL;DR: A robust probabilistic mixture model based on theMultivariate skew-t-normal distribution, a skew extension of the multivariate Student's t distribution with more powerful abilities in modelling data whose distribution seriously deviates from normality is presented.
Abstract: This paper presents a robust probabilistic mixture model based on the multivariate skew-t-normal distribution, a skew extension of the multivariate Student's t distribution with more powerful abilities in modelling data whose distribution seriously deviates from normality. The proposed model includes mixtures of normal, t and skew-normal distributions as special cases and provides a flexible alternative to recently proposed skew t mixtures. We develop two analytically tractable EM-type algorithms for computing maximum likelihood estimates of model parameters in which the skewness parameters and degrees of freedom are asymptotically uncorrelated. Standard errors for the parameter estimates can be obtained via a general information-based method. We also present a procedure of merging mixture components to automatically identify the number of clusters by fitting piecewise linear regression to the rescaled entropy plot. The effectiveness and performance of the proposed methodology are illustrated by two real-life examples.

75 citations


Journal ArticleDOI
TL;DR: It is shown that both of these splitting approaches can reduce the computational cost of sampling from the posterior distribution for a logistic regression model, using either a Gaussian approximation centered on the posterior mode, or a Hamiltonian split into a term that depends on only a small number of critical cases, and another term that involves the larger number of cases whose influence on the anterior distribution is small.
Abstract: We show how the Hamiltonian Monte Carlo algorithm can sometimes be speeded up by “splitting” the Hamiltonian in a way that allows much of the movement around the state space to be done at low computational cost. One context where this is possible is when the log density of the distribution of interest (the potential energy function) can be written as the log of a Gaussian density, which is a quadratic function, plus a slowly-varying function. Hamiltonian dynamics for quadratic energy functions can be analytically solved. With the splitting technique, only the slowly-varying part of the energy needs to be handled numerically, and this can be done with a larger stepsize (and hence fewer steps) than would be necessary with a direct simulation of the dynamics. Another context where splitting helps is when the most important terms of the potential energy function and its gradient can be evaluated quickly, with only a slowly-varying part requiring costly computations. With splitting, the quick portion can be handled with a small stepsize, while the costly portion uses a larger stepsize. We show that both of these splitting approaches can reduce the computational cost of sampling from the posterior distribution for a logistic regression model, using either a Gaussian approximation centered on the posterior mode, or a Hamiltonian split into a term that depends on only a small number of critical cases, and another term that involves the larger number of cases whose influence on the posterior distribution is small.

Journal ArticleDOI
TL;DR: Simulation studies show that these new tests improve the power of the t-test under non-normality and it will be shown that permutation or bootstrap schemes, which neglect the dependency structure in the data, are asymptotically valid.
Abstract: We study various bootstrap and permutation methods for matched pairs, whose distributions can have different shapes even under the null hypothesis of no treatment effect. Although the data may not be exchangeable under the null, we investigate different permutation approaches as valid procedures for finite sample sizes. It will be shown that permutation or bootstrap schemes, which neglect the dependency structure in the data, are asymptotically valid. Simulation studies show that these new tests improve the power of the t-test under non-normality.

Journal ArticleDOI
TL;DR: A result from the numerical analysis literature is presented which can reduce the bias in the estimate of the evidence by addressing the error arising from numerically integrating across the inverse temperatures.
Abstract: The statistical evidence (or marginal likelihood) is a key quantity in Bayesian statistics, allowing one to assess the probability of the data given the model under investigation. This paper focuses on refining the power posterior approach to improve estimation of the evidence. The power posterior method involves transitioning from the prior to the posterior by powering the likelihood by an inverse temperature. In common with other tempering algorithms, the power posterior involves some degree of tuning. The main contributions of this article are twofold--we present a result from the numerical analysis literature which can reduce the bias in the estimate of the evidence by addressing the error arising from numerically integrating across the inverse temperatures. We also tackle the selection of the inverse temperature ladder, applying this approach additionally to the Stepping Stone sampler estimation of evidence. A key practical point is that both of these innovations incur virtually no extra cost.

Journal ArticleDOI
TL;DR: By measuring the extent of possible valleys of the density along the segment connecting pairs of observations, the proposed procedure shifts the formulation from a space with arbitrary dimension to a univariate one, thus leading benefits both in computation and visualization.
Abstract: Density-based clustering methods hinge on the idea of associating groups to the connected components of the level sets of the density underlying the data, to be estimated by a nonparametric method. These methods claim some desirable properties and generally good performance, but they involve a non-trivial computational effort, required for the identification of the connected regions. In a previous work, the use of spatial tessellation such as the Delaunay triangulation has been proposed, because it suitably generalizes the univariate procedure for detecting the connected components. However, its computational complexity grows exponentially with the dimensionality of data, thus making the triangulation unfeasible for high dimensions. Our aim is to overcome the limitations of Delaunay triangulation. We discuss the use of an alternative procedure for identifying the connected regions associated to the level sets of the density. By measuring the extent of possible valleys of the density along the segment connecting pairs of observations, the proposed procedure shifts the formulation from a space with arbitrary dimension to a univariate one, thus leading benefits both in computation and visualization.

Journal ArticleDOI
TL;DR: This paper analyzes the performance of Dagpunar’s algorithm and combines it with a new rejection method which ensures a uniformly fast generator and finds it suitable for the varying parameter case.
Abstract: The generalized inverse Gaussian distribution has become quite popular in financial engineering. The most popular random variate generator is due to Dagpunar (Commun. Stat., Simul. Comput. 18:703–710, 1989). It is an acceptance-rejection algorithm method based on the Ratio-of-Uniforms method. However, it is not uniformly fast as it has a prohibitive large rejection constant when the distribution is close to the gamma distribution. Recently some papers have discussed universal methods that are suitable for this distribution. However, these methods require an expensive setup and are therefore not suitable for the varying parameter case which occurs in, e.g., Gibbs sampling. In this paper we analyze the performance of Dagpunar's algorithm and combine it with a new rejection method which ensures a uniformly fast generator. As its setup is rather short it is in particular suitable for the varying parameter case.

Journal ArticleDOI
TL;DR: A variational approach for fitting the mixture of latent trait models is developed and it is shown to yield intuitive clustering results and it gives a much better fit than either latent class analysis or latent trait analysis alone.
Abstract: Model-based clustering methods for continuous data are well established and commonly used in a wide range of applications. However, model-based clustering methods for categorical data are less standard. Latent class analysis is a commonly used method for model-based clustering of binary data and/or categorical data, but due to an assumed local independence structure there may not be a correspondence between the estimated latent classes and groups in the population of interest. The mixture of latent trait analyzers model extends latent class analysis by assuming a model for the categorical response variables that depends on both a categorical latent class and a continuous latent trait variable; the discrete latent class accommodates group structure and the continuous latent trait accommodates dependence within these groups. Fitting the mixture of latent trait analyzers model is potentially difficult because the likelihood function involves an integral that cannot be evaluated analytically. We develop a variational approach for fitting the mixture of latent trait models and this provides an efficient model fitting strategy. The mixture of latent trait analyzers model is demonstrated on the analysis of data from the National Long Term Care Survey (NLTCS) and voting in the U.S. Congress. The model is shown to yield intuitive clustering results and it gives a much better fit than either latent class analysis or latent trait analysis alone.

Journal ArticleDOI
TL;DR: In this paper, the determinant of the large, sparse, symmetric positive definite precision matrix is computed using matrix functions, Krylov subspaces, and probing vectors to construct an iterative numerical method for computing the log likelihood.
Abstract: In order to compute the log-likelihood for high dimensional Gaussian models, it is necessary to compute the determinant of the large, sparse, symmetric positive definite precision matrix. Traditional methods for evaluating the log-likelihood, which are typically based on Cholesky factorisations, are not feasible for very large models due to the massive memory requirements. We present a novel approach for evaluating such likelihoods that only requires the computation of matrix-vector products. In this approach we utilise matrix functions, Krylov subspaces, and probing vectors to construct an iterative numerical method for computing the log-likelihood.

Journal ArticleDOI
TL;DR: A new quantile regression model is proposed by combining multiple sets of unbiased estimating equations that can account for correlations between the repeated measurements and produce more efficient estimates.
Abstract: Quantile regression has become a powerful complement to the usual mean regression. A simple approach to use quantile regression in marginal analysis of longitudinal data is to assume working independence. However, this may incur potential efficiency loss. On the other hand, correctly specifying a working correlation in quantile regression can be difficult. We propose a new quantile regression model by combining multiple sets of unbiased estimating equations. This approach can account for correlations between the repeated measurements and produce more efficient estimates. Because the objective function is discrete and non-convex, we propose induced smoothing for fast and accurate computation of the parameter estimates, as well as their asymptotic covariance, using Newton-Raphson iteration. We further develop a robust quantile rank score test for hypothesis testing. We show that the resulting estimate is asymptotically normal and more efficient than the simple estimate using working independence. Extensive simulations and a real data analysis show the usefulness of the method.

Journal ArticleDOI
Luc Devroye1
TL;DR: A uniformly efficient and simple random variate generator for the entire parameter range of the generalized inverse Gaussian distribution and a general algorithm is provided that works for all densities that are proportional to a log-concave function φ.
Abstract: We provide a uniformly efficient and simple random variate generator for the entire parameter range of the generalized inverse Gaussian distribution. A general algorithm is provided as well that works for all densities that are proportional to a log-concave function ?, even if the normalization constant is not known. It requires only black box access to ? and its derivative.

Journal ArticleDOI
TL;DR: An exact algorithm is proposed from the view of cutting a convex polytope with hyperplanes to compute the projection depth and most of its associated estimators of dimension p≥2, including Stahel-Donoho location and scatter estimators, projection trimmed mean, projection depth contours and median, etc.
Abstract: To facilitate the application of projection depth, an exact algorithm is proposed from the view of cutting a convex polytope with hyperplanes. Based on this algorithm, one can obtain a finite number of optimal direction vectors, which are x-free and therefore enable us (Liu et al., Preprint, 2011) to compute the projection depth and most of its associated estimators of dimension p?2, including Stahel-Donoho location and scatter estimators, projection trimmed mean, projection depth contours and median, etc. Both real and simulated examples are also provided to illustrate the performance of the proposed algorithm.

Journal ArticleDOI
TL;DR: It is shown that the standard EM algorithm can be adapted to infer the model parameters and a semiparametric model where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility.
Abstract: In unsupervised classification, Hidden Markov Models (HMM) are used to account for a neighborhood structure between observations. The emission distributions are often supposed to belong to some parametric family. In this paper, a semiparametric model where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility. We show that the standard EM algorithm can be adapted to infer the model parameters. For the initialization step, starting from a large number of components, a hierarchical method to combine them into the hidden states is proposed. Three likelihood-based criteria to select the components to be combined are discussed. To estimate the number of hidden states, BIC-like criteria are derived. A simulation study is carried out both to determine the best combination between the combining criteria and the model selection criteria and to evaluate the accuracy of classification. The proposed method is also illustrated using a biological dataset from the model plant Arabidopsis thaliana. A R package HMMmix is freely available on the CRAN.

Journal ArticleDOI
TL;DR: This work presents and implements algorithms that use an accelerated line search for optimization on the orthogonal Stiefel manifold and shows that the ‘extra’ models that these decompositions facilitate outperform the current state of the art when applied to two benchmark data sets.
Abstract: Within the mixture model-based clustering literature, parsimonious models with eigen-decomposed component covariance matrices have dominated for over a decade. Although originally introduced as a fourteen-member family of models, the current state-of-the-art is to utilize just ten of these models; the rationale for not using the other four models usually centers around parameter estimation difficulties. Following close examination of these four models, we find that two are actually easily implemented using existing algorithms but that two benefit from a novel approach. We present and implement algorithms that use an accelerated line search for optimization on the orthogonal Stiefel manifold. Furthermore, we show that the `extra' models that these decompositions facilitate outperform the current state-of-the art when applied to two benchmark data sets.

Journal ArticleDOI
TL;DR: The intention is to use a family of univariate distribution functions, to replace the normal, for which the only constraint is unimodality, and devise a new family of nonparametric unimodal distributions, which has large support over the space of undimensional unimodAL distributions.
Abstract: Within the context of mixture modeling, the normal distribution is typically used as the components distribution. However, if a cluster is skewed or heavy tailed, then the normal distribution will be inefficient and many may be needed to model a single cluster. In this paper, we present an attempt to solve this problem. We define a cluster, in the absence of further information, to be a group of data which can be modeled by a unimodal density function. Hence, our intention is to use a family of univariate distribution functions, to replace the normal, for which the only constraint is unimodality. With this aim, we devise a new family of nonparametric unimodal distributions, which has large support over the space of univariate unimodal distributions. The difficult aspect of the Bayesian model is to construct a suitable MCMC algorithm to sample from the correct posterior distribution. The key will be the introduction of strategic latent variables and the use of the Product Space view of Reversible Jump methodology.

Journal ArticleDOI
TL;DR: This paper discusses how a newly developed local dependence measure, the local Gaussian correlation, can be used to construct local and global tests of independence, and how this measure and asymptotics of the corresponding estimate are discussed.
Abstract: It is well known that the traditional Pearson correlation in many cases fails to capture non-linear dependence structures in bivariate data. Other scalar measures capable of capturing non-linear dependence exist. A common disadvantage of such measures, however, is that they cannot distinguish between negative and positive dependence, and typically the alternative hypothesis of the accompanying test of independence is simply "dependence". This paper discusses how a newly developed local dependence measure, the local Gaussian correlation, can be used to construct local and global tests of independence. A global measure of dependence is constructed by aggregating local Gaussian correlation on subsets of $\mathbb{R}^{2}$ , and an accompanying test of independence is proposed. Choice of bandwidth is based on likelihood cross-validation. Properties of this measure and asymptotics of the corresponding estimate are discussed. A bootstrap version of the test is implemented and tried out on both real and simulated data. The performance of the proposed test is compared to the Brownian distance covariance test. Finally, when the hypothesis of independence is rejected, local independence tests are used to investigate the cause of the rejection.

Journal ArticleDOI
TL;DR: A new method for flexible fitting of D-vines using penalized Bernstein polynomials or constant and linear B-splines as spline bases in each knot of the D-vine throughout each level is presented.
Abstract: The paper presents a new method for flexible fitting of D-vines. Pair-copulas are estimated semi-parametrically using penalized Bernstein polynomials or constant and linear B-splines, respectively, as spline bases in each knot of the D-vine throughout each level. A penalty induce smoothness of the fit while the high dimensional spline basis guarantees flexibility. To ensure uniform univariate margins of each pair-copula, linear constraints are placed on the spline coefficients and quadratic programming is used to fit the model. The amount of penalizations for each pair-copula is driven by a penalty parameter which is selected in a numerically efficient way. Simulations and practical examples accompany the presentation.

Journal ArticleDOI
TL;DR: A new optimization method based on coordinate descent based on the cyclic version of the coordinate descent algorithm is developed, which has a number of advantages over the majorize-minimize approach, including its simplicity, computing speed and numerical stability.
Abstract: Bien and Tibshirani (Biometrika, 98(4):807–820, 2011) have proposed a covariance graphical lasso method that applies a lasso penalty on the elements of the covariance matrix. This method is definitely useful because it not only produces sparse and positive definite estimates of the covariance matrix but also discovers marginal independence structures by generating exact zeros in the estimated covariance matrix. However, the objective function is not convex, making the optimization challenging. Bien and Tibshirani (Biometrika, 98(4):807–820, 2011) described a majorize-minimize approach to optimize it. We develop a new optimization method based on coordinate descent. We discuss the convergence property of the algorithm. Through simulation experiments, we show that the new algorithm has a number of advantages over the majorize-minimize approach, including its simplicity, computing speed and numerical stability. Finally, we show that the cyclic version of the coordinate descent algorithm is more efficient than the greedy version.

Journal ArticleDOI
TL;DR: This work applies semi-infinite programming (SIP) to solve minimax design problems for nonlinear models in a systematic way using a discretization based strategy and solvers from the General Algebraic Modeling System.
Abstract: Minimax optimal experimental designs are notoriously difficult to study largely because the optimality criterion is not differentiable and there is no effective algorithm for generating them. We apply semi-infinite programming (SIP) to solve minimax design problems for nonlinear models in a systematic way using a discretization based strategy and solvers from the General Algebraic Modeling System (GAMS). Using popular models from the biological sciences, we show our approach produces minimax optimal designs that coincide with the few theoretical and numerical optimal designs in the literature. We also show our method can be readily modified to find standardized maximin optimal designs and minimax optimal designs for more complicated problems, such as when the ranges of plausible values for the model parameters are dependent and we want to find a design to minimize the maximal inefficiency of estimates for the model parameters.

Journal ArticleDOI
TL;DR: This work proposes a semiparametric method to estimate mixed-effects ODE models, rather than using the ODE numeric solution directly, which requires providing initial conditions, and estimates a spline function to approximate the dynamic process using smoothing splines.
Abstract: Ordinary differential equations (ODEs) are popular tools for modeling complicated dynamic systems in many areas. When multiple replicates of measurements are available for the dynamic process, it is of great interest to estimate mixed-effects in the ODE model for the process. We propose a semiparametric method to estimate mixed-effects ODE models. Rather than using the ODE numeric solution directly, which requires providing initial conditions, this method estimates a spline function to approximate the dynamic process using smoothing splines. A roughness penalty term is defined using the ODEs, which measures the fidelity of the spline function to the ODEs. The smoothing parameter, which controls the trade-off between fitting the data and maintaining fidelity to the ODEs, can be specified by users or selected objectively by generalized cross validation. The spline coefficients, the ODE random effects, and the ODE fixed effects are estimated in three nested levels of optimization. Two simulation studies show that the proposed method obtains good estimates for mixed-effects ODE models. The semiparametric method is demonstrated with an application of a pharmacokinetic model in a study of HIV combination therapy.

Journal ArticleDOI
TL;DR: Methods for detecting communities in undirected graphs have been recently reviewed by Fortunato and a review of methods and algorithms for detecting essentially structurally homogeneous subsets of vertices in binary or weighted and directed and undirecting graphs is made.
Abstract: The analysis of complex networks is a rapidly growing topic with many applications in different domains. The analysis of large graphs is often made via unsupervised classification of vertices of the graph. Community detection is the main way to divide a large graph into smaller ones that can be studied separately. However another definition of a cluster is possible, which is based on the structural distance between vertices. This definition includes the case of community clusters but is more general in the sense that two vertices may be in the same group even if they are not connected. Methods for detecting communities in undirected graphs have been recently reviewed by Fortunato. In this paper we expand Fortunato's work and make a review of methods and algorithms for detecting essentially structurally homogeneous subsets of vertices in binary or weighted and directed and undirected graphs.

Journal ArticleDOI
TL;DR: This work lays out general algorithms, namely, the simplex algorithm and its variant for generating regularized solution paths for the feature selection problems, which allow a complete exploration of the model space along the paths and provide a broad view of persistent features in the data.
Abstract: We consider statistical procedures for feature selection defined by a family of regularization problems with convex piecewise linear loss functions and penalties of l 1 nature. Many known statistical procedures (e.g. quantile regression and support vector machines with l 1-norm penalty) are subsumed under this category. Computationally, the regularization problems are linear programming (LP) problems indexed by a single parameter, which are known as `parametric cost LP' or `parametric right-hand-side LP' in the optimization theory. Exploiting the connection with the LP theory, we lay out general algorithms, namely, the simplex algorithm and its variant for generating regularized solution paths for the feature selection problems. The significance of such algorithms is that they allow a complete exploration of the model space along the paths and provide a broad view of persistent features in the data. The implications of the general path-finding algorithms are outlined for several statistical procedures, and they are illustrated with numerical examples.