# Showing papers in "arXiv: Methodology in 2007"

•

TL;DR: The mixed membership stochastic block model as discussed by the authors extends block models for relational data to ones which capture mixed membership latent relational structure, thus providing an object-specific low-dimensional representation.

Abstract: Observations consisting of measurements on relationships for pairs of objects arise in many settings, such as protein interaction and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilisic models can be delicate because the simple exchangeability assumptions underlying many boilerplate models no longer hold. In this paper, we describe a latent variable model of such data called the mixed membership stochastic blockmodel. This model extends blockmodels for relational data to ones which capture mixed membership latent relational structure, thus providing an object-specific low-dimensional representation. We develop a general variational inference algorithm for fast approximate posterior inference. We explore applications to social and protein interaction networks.

1,546 citations

•

TL;DR: This article explores nonstationary modeling methodologies that couple stationary Gaussian processes with treed partitioning and shows that this approach is effective in other arenas as well.

Abstract: Motivated by a computer experiment for the design of a rocket booster, this paper explores nonstationary modeling methodologies that couple stationary Gaussian processes with treed partitioning. Partitioning is a simple but effective method for dealing with nonstationarity. The methodological developments and statistical computing details which make this approach efficient are described in detail. In addition to providing an analysis of the rocket booster simulator, our approach is demonstrated to be effective in other arenas.

462 citations

••

TL;DR: In this paper, the authors considered the issue of modeling fractional data observed in the interval [0, 1], (0,1] or [0.1] and proposed mixed continuous-discrete distributions.

Abstract: This paper considers the issue of modeling fractional data observed in the interval [0,1), (0,1] or [0,1]. Mixed continuous-discrete distributions are proposed. The beta distribution is used to describe the continuous component of the model since its density can have quite diferent shapes depending on the values of the two parameters that index the distribution. Properties of the proposed distributions are examined. Also, maximum likelihood and method of moments estimation is discussed. Finally, practical applications that employ real data are presented.

228 citations

•

TL;DR: Pascual-Marqui et al. as mentioned in this paper defined linear dependence (coherence) and nonlinear dependence (phase synchronization) between any number of multivariate time series, expressed as the sum of lagged dependence and instantaneous dependence.

Abstract: Measures of linear dependence (coherence) and nonlinear dependence (phase synchronization) between any number of multivariate time series are defined The measures are expressed as the sum of lagged dependence and instantaneous dependence The measures are non-negative, and take the value zero only when there is independence of the pertinent type These measures are defined in the frequency domain and are applicable to stationary and non-stationary time series These new results extend and refine significantly those presented in a previous technical report (Pascual-Marqui 2007, arXiv:07061776 [statME], this http URL), and have been largely motivated by the seminal paper on linear feedback by Geweke (1982 JASA 77:304-313) One important field of application is neurophysiology, where the time series consist of electric neuronal activity at several brain locations Coherence and phase synchronization are interpreted as "connectivity" between locations However, any measure of dependence is highly contaminated with an instantaneous, non-physiological contribution due to volume conduction and low spatial resolution The new techniques remove this confounding factor considerably Moreover, the measures of dependence can be applied to any number of brain areas jointly, ie distributed cortical networks, whose activity can be estimated with eLORETA (Pascual-Marqui 2007, arXiv:07103341 [math-ph])

207 citations

•

TL;DR: A latent variable model for inference and prediction of symmetric relational data, based on the idea of the eigenvalue decomposition, that generalizes other popular latent variable models.

Abstract: This article discusses a latent variable model for inference and prediction of symmetric relational data.
The model, based on the idea of the eigenvalue decomposition, represents the relationship between two nodes as the weighted inner-product of node-specific vectors of latent characteristics. This ``eigenmodel'' generalizes other popular latent variable models, such as latent class and distance models: It is shown mathematically that any latent class or distance model has a representation as an eigenmodel, but not vice-versa. The practical implications of this are examined in the context of three real datasets, for which the eigenmodel has as good or better out-of-sample predictive performance than the other two models.

194 citations

•

TL;DR: In this paper, the use of O'Sullivan penalized splines in contemporary semiparametric regression, including mixed model and Bayesian formulations, is discussed. And exact expressions for the OSullivan penalty matrix are obtained.

Abstract: This is an expos\'e on the use of O'Sullivan penalised splines in contemporary semiparametric regression, including mixed model and Bayesian formulations. O'Sullivan penalised splines are similar to P-splines, but have an advantage of being a direct generalisation of smoothing splines. Exact expressions for the O'Sullivan penalty matrix are obtained. Comparisons between the two reveals that O'Sullivan penalised splines more closely mimic the natural boundary behaviour of smoothing splines. Implementation in modern computing environments such as Matlab, R and BUGS is discussed.

150 citations

••

TL;DR: This work reviews the historical evolution of hospital profiling with special emphasis on outcomes; presents a detailed history of cardiac surgery report cards, the paradigm for modern provider profiling; discusses the potential unintended negative consequences of public report cards; and describes various statistical methodologies for quantifying the relative performance of cardiac Surgery programs.

Abstract: Hospital profiling involves a comparison of a health care provider's structure, processes of care, or outcomes to a standard, often in the form of a report card. Given the ubiquity of report cards and similar consumer ratings in contemporary American culture, it is notable that these are a relatively recent phenomenon in health care. Prior to the 1986 release of Medicare hospital outcome data, little such information was publicly available. We review the historical evolution of hospital profiling with special emphasis on outcomes; present a detailed history of cardiac surgery report cards, the paradigm for modern provider profiling; discuss the potential unintended negative consequences of public report cards; and describe various statistical methodologies for quantifying the relative performance of cardiac surgery programs. Outstanding statistical issues are also described.

145 citations

•

TL;DR: In this paper, the authors proposed a particle filter scheme for a class of partially-observed multivariate diffusions, which does not require approximations of the transition and/or the observation density using timediscretisations.

Abstract: In this paper we introduce a novel particle filter scheme for a class of partially-observed multivariate diffusions. %continuous-time dynamic models where the %signal is given by a multivariate diffusion process. We consider a variety of observation schemes, including diffusion observed with error, observation of a subset of the components of the multivariate diffusion and arrival times of a Poisson process whose intensity is a known function of the diffusion (Cox process). Unlike currently available methods, our particle filters do not require approximations of the transition and/or the observation density using time-discretisations. Instead, they build on recent methodology for the exact simulation of the diffusion process and the unbiased estimation of the transition density as described in \cite{besk:papa:robe:fear:2006}. %In particular, w We introduce the Generalised Poisson Estimator, which generalises the Poisson Estimator of \cite{besk:papa:robe:fear:2006}. %Thus, our filters avoid the systematic biases caused by %time-discretisations and they have significant computational %advantages over alternative continuous-time filters. These %advantages are supported theoretically by a A central limit theorem is given for our particle filter scheme.

120 citations

•

TL;DR: A simple algorithm, using a coordinate descent procedure for the lasso, is developed that solves a 1000 node problem in at most a minute, and is 30 to 4000 times faster than competing methods.

Abstract: We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm| the Graphical Lasso| that is remarkably fast: it solves a 1000 node problem (» 500; 000 parameters) in at most a minute, and is 30 to 4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen & B˜ uhlmann (2006). We illustrate the method on some cell-signaling data from proteomics.

99 citations

•

TL;DR: An active set algorithm for the maximum likelihood estimation of a log-concave density based on complete data and an EM algorithm to treat arbitrarily censored or binned data are developed.

Abstract: We develop an active set algorithm for the maximum likelihood estimation of a log-concave density based on complete data. Building on this fast algorithm, we indidate an EM algorithm to treat arbitrarily censored or binned data.

69 citations

••

TL;DR: Treelets as discussed by the authors extends wavelet wavelet to nonsmooth signals and returns a hierarchical tree and an orthonormal basis which both reflect the internal structure of the data, and are especially wellsuited as a dimensionality reduction and feature selection tool prior to regression and classification.

Abstract: In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered--with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this paper we present treelets--a novel construction of multi-scale bases that extends wavelets to nonsmooth signals. The method is fully adaptive, as it returns a hierarchical tree and an orthonormal basis which both reflect the internal structure of the data. Treelets are especially well-suited as a dimensionality reduction and feature selection tool prior to regression and classification, in situations where sample sizes are small and the data are sparse with unknown groupings of correlated or collinear variables. The method is also simple to implement and analyze theoretically. Here we describe a variety of situations where treelets perform better than principal component analysis, as well as some common variable selection and cluster averaging schemes. We illustrate treelets on a blocked covariance model and on several data sets (hyperspectral image data, DNA microarray data, and internet advertisements) with highly complex dependencies between variables.

•

TL;DR: The new connectivity measures proposed here can be applied to pairs of univariate EEG/MEG signals, as is traditional in the published literature, but these calculations cannot be interpreted as connectivity, since it is in general incorrect to associate an extracranial electrode or sensor to the underlying cortex.

Abstract: Coherence and phase synchronization between time series corresponding to different spatial locations are usually interpreted as indicators of the connectivity between locations. In neurophysiology, time series of electric neuronal activity are essential for studying brain interconnectivity. Such signals can either be invasively measured from depth electrodes, or computed from very high time resolution, non-invasive, extracranial recordings of scalp electric potential differences (EEG: electroencephalogram) and magnetic fields (MEG: magnetoencephalogram) by means of a tomography such as sLORETA (standardized low resolution brain electromagnetic tomography). There are two problems in this case. First, in the usual situation of unknown cortical geometry, the estimated signal at each brain location is a vector with three components (i.e. a current density vector), which means that coherence and phase synchronization must be generalized to pairs of multivariate time series. Second, the inherent low spatial resolution of the EEG/MEG tomography introduces artificially high zero-lag coherence and phase synchronization. In this report, solutions to both problems are presented. Two additional generalizations are briefly mentioned: (1) conditional coherence and phase synchronization; and (2) non-stationary time-frequency analysis. Finally, a non-parametric randomization method for connectivity significance testing is outlined. The new connectivity measures proposed here can be applied to pairs of univariate EEG/MEG signals, as is traditional in the published literature. However, these calculations cannot be interpreted as connectivity, since it is in general incorrect to associate an extracranial electrode or sensor to the underlying cortex.

••

TL;DR: A new visualization is proposed, which shows the statistician the range of trade-offs that are available in SiZer, and demonstrates the effectiveness of the method.

Abstract: Smoothing methods and SiZer are a useful statistical tool for discovering statistically significant structure in data. Based on scale space ideas originally developed in the computer vision literature, SiZer (SIgnificant ZERo crossing of the derivatives) is a graphical device to assess which observed features are `really there' and which are just spurious sampling artifacts. In this paper, we develop SiZer like ideas in time series analysis to address the important issue of significance of trends. This is not a straightforward extension, since one data set does not contain the information needed to distinguish `trend' from `dependence'. A new visualization is proposed, which shows the statistician the range of trade-offs that are available. Simulation and real data results illustrate the effectiveness of the method.

•

TL;DR: In this paper, a deterministic scan Gibbs sampler alternating between missing data in the unobserved solution components, and parameters is used to model a variety of phenomena in applications ranging from molecular dynamics to audio signal analysis.

Abstract: Hypoelliptic diffusion processes can be used to model a variety of phenomena in applications ranging from molecular dynamics to audio signal analysis. We study parameter estimation for such processes in situations where we observe some components of the solution at discrete times. Since exact likelihoods for the transition densities are typically not known, approximations are used that are expected to work well in the limit of small inter-sample times $\Delta t$ and large total observation times $N\Delta t$. Hypoellipticity together with partial observation leads to ill-conditioning requiring a judicious combination of approximate likelihoods for the various parameters to be estimated. We combine these in a deterministic scan Gibbs sampler alternating between missing data in the unobserved solution components, and parameters. Numerical experiments illustrate asymptotic consistency of the method when applied to simulated data. The paper concludes with application of the Gibbs sampler to molecular dynamics data.

•

TL;DR: In this article, a stratified sampling algorithm is proposed in which the random drawings made in the strata to compute the expectation of interest are also used to adaptively modify the proportion of further drawings in each stratum.

Abstract: In this paper, we propose a stratified sampling algorithm in which the random drawings made in the strata to compute the expectation of interest are also used to adaptively modify the proportion of further drawings in each stratum. These proportions converge to the optimal allocation in terms of variance reduction. And our stratified estimator is asymptotically normal with asymptotic variance equal to the minimal one. Numerical experiments confirm the efficiency of our algorithm.

•

TL;DR: This work proposes a model‐based analysis of binary trait data and presents a Markov chain Monte Carlo algorithm that can sample from the resulting posterior distribution, based on using a birth–death process for the evolution of the elements of sets of traits.

Abstract: Binary trait data record the presence or absence of distinguishing traits in individuals. We treat the problem of estimating ancestral trees with time depth from binary trait data. Simple analysis of such data is problematic. Each homology class of traits has a unique birth event on the tree, and the birth event of a trait visible at the leaves is biased towards the leaves. We propose a model-based analysis of such data, and present an MCMC algorithm that can sample from the resulting posterior distribution. Our model is based on using a birth-death process for the evolution of the elements of sets of traits. Our analysis correctly accounts for the removal of singleton traits, which are commonly discarded in real data sets. We illustrate Bayesian inference for two binary-trait data sets which arise in historical linguistics. The Bayesian approach allows for the incorporation of information from ancestral languages. The marginal prior distribution of the root time is uniform. We present a thorough analysis of the robustness of our results to model mispecification, through analysis of predictive distributions for external data, and fitting data simulated under alternative observation models. The reconstructed ages of tree nodes are relatively robust, whilst posterior probabilities for topology are not reliable.

•

TL;DR: In this article, the convergence of the Gibbs sampler is studied in hierarchical linear models with arbitrary symmetric error distributions. But the convergence can be uniform, geometric or sub-geometric depending on the relative tail behaviour of the error distributions, and on the parametrisation chosen.

Abstract: We characterise the convergence of the Gibbs sampler which samples from the joint posterior distribution of parameters and missing data in hierarchical linear models with arbitrary symmetric error distributions. We show that the convergence can be uniform, geometric or sub-geometric depending on the relative tail behaviour of the error distributions, and on the parametrisation chosen. Our theory is applied to characterise the convergence of the Gibbs sampler on latent Gaussian process models. We indicate how the theoretical framework we introduce will be useful in analyzing more complex models.

••

TL;DR: The fundamental statistical foundations for predictive modeling and the general questions associated with unlabeled data are overviewed, highlighting the relevance of venerable concepts of sampling design and prior specification.

Abstract: The incorporation of unlabeled data in regression and classification analysis is an increasing focus of the applied statistics and machine learning literatures, with a number of recent examples demonstrating the potential for unlabeled data to contribute to improved predictive accuracy. The statistical basis for this semisupervised analysis does not appear to have been well delineated; as a result, the underlying theory and rationale may be underappreciated, especially by nonstatisticians. There is also room for statisticians to become more fully engaged in the vigorous research in this important area of intersection of the statistical and computer sciences. Much of the theoretical work in the literature has focused, for example, on geometric and structural properties of the unlabeled data in the context of particular algorithms, rather than probabilistic and statistical questions. This paper overviews the fundamental statistical foundations for predictive modeling and the general questions associated with unlabeled data, highlighting the relevance of venerable concepts of sampling design and prior specification. This theory, illustrated with a series of central illustrative examples and two substantial real data analyses, shows precisely when, why and how unlabeled data matter.

•

TL;DR: In this article, a weighted L1-minimization problem is solved by solving a sequence of weighted L 1 minimization problems, where the weights used for the next iteration are computed from the value of the current solution, and a series of experiments demonstrate the remarkable performance and broad applicability of this algorithm in the areas of sparse signal recovery, statistical estimation, error correction and image processing.

Abstract: It is now well understood that (1) it is possible to reconstruct sparse signals exactly from what appear to be highly incomplete sets of linear measurements and (2) that this can be done by constrained L1 minimization. In this paper, we study a novel method for sparse signal recovery that in many situations outperforms L1 minimization in the sense that substantially fewer measurements are needed for exact recovery. The algorithm consists of solving a sequence of weighted L1-minimization problems where the weights used for the next iteration are computed from the value of the current solution. We present a series of experiments demonstrating the remarkable performance and broad applicability of this algorithm in the areas of sparse signal recovery, statistical estimation, error correction and image processing. Interestingly, superior gains are also achieved when our method is applied to recover signals with assumed near-sparsity in overcomplete representations--not by reweighting the L1 norm of the coefficient sequence as is common, but by reweighting the L1 norm of the transformed object. An immediate consequence is the possibility of highly efficient data acquisition protocols by improving on a technique known as compressed sensing.

•

TL;DR: It is shown empirically that the prior that assigns equal probability over graph sizes outperforms the prior over all graphs in more efficiently estimating the covariance matrix.

Abstract: A Bayesian approach is used to estimate the covariance matrix of Gaussian data. Ideas from Gaussian graphical models and model selection are used to construct a prior for the covariance matrix that is a mixture over all decomposable graphs. For this prior the probability of each graph size is specified by the user and graphs of equal size are assigned equal probability. Most previous approaches assume that all graphs are equally probable. We show empirically that the prior that assigns equal probability over graph sizes outperforms the prior that assigns equal probability over all graphs, both in identifying the correct decomposable graph and in more efficiently estimating the covariance matrix.

••

TL;DR: In this article, a family of robust estimates for the parametric and nonparametric components under a generalized partially linear model is introduced, where the data are modeled by $y_i|(\mathbf{x}_i,t_i)\sim F(cdot,\mu_i)$ with

Abstract: In this paper, we introduce a family of robust estimates for the parametric and nonparametric components under a generalized partially linear model, where the data are modeled by $y_i|(\mathbf{x}_i,t_i)\sim F(\cdot,\mu_i)$ with $\mu_i=H(\eta(t_i)+\mathbf{x}_i^{$\mathrm{T}$}\beta)$, for some known distribution function F and link function H. It is shown that the estimates of $\beta$ are root-n consistent and asymptotically normal. Through a Monte Carlo study, the performance of these estimators is compared with that of the classical ones.

•

TL;DR: In this article, the basic latent class model proposed originally by the sociologist Paul F. Lazarfeld for categorical variables is studied and its geometric structure is explained. And the authors draw parallels between the statistical and geometric properties of latent class models and illustrate geometrically the causes of many problems associated with maximum likelihood estimation and related statistical inference.

Abstract: Statistical models with latent structure have a history going back to the 1950s and have seen widespread use in the social sciences and, more recently, in computational biology and in machine learning. Here we study the basic latent class model proposed originally by the sociologist Paul F. Lazarfeld for categorical variables, and we explain its geometric structure. We draw parallels between the statistical and geometric properties of latent class models and we illustrate geometrically the causes of many problems associated with maximum likelihood estimation and related statistical inference. In particular, we focus on issues of non-identifiability and determination of the model dimension, of maximization of the likelihood function and on the effect of symmetric data. We illustrate these phenomena with a variety of synthetic and real-life tables, of different dimension and complexity. Much of the motivation for this work stems from the “100 Swiss Francs” problem, which we introduce and describe in detail.

•

TL;DR: In this article, the Girsanov theorem is used for evaluating the likelihood ratios needed in importance sampling in a continuous-discrete optimal filtering problem, where the system model is a stochastic differential equation and noisy measurements are obtained at discrete instances of time.

Abstract: This article considers the application of particle filtering to continuous-discrete optimal filtering problems, where the system model is a stochastic differential equation, and noisy measurements of the system are obtained at discrete instances of time. It is shown how the Girsanov theorem can be used for evaluating the likelihood ratios needed in importance sampling. It is also shown how the methodology can be applied to a class of models, where the driving noise process is lower in the dimensionality than the state and thus the laws of state and noise are not absolutely continuous. Rao-Blackwellization of conditionally Gaussian models and unknown static parameter models is also considered.

••

TL;DR: In this article, a method for computing distributions associated with patterns in the state sequence of a hidden Markov model, conditional on observing all or part of the observation sequence, is presented.

Abstract: This paper gives a method for computing distributions associated with patterns in the state sequence of a hidden Markov model, conditional on observing all or part of the observation sequence. Probabilities are computed for very general classes of patterns (competing patterns and generalized later patterns), and thus, the theory includes as special cases results for a large class of problems that have wide application. The unobserved state sequence is assumed to be Markovian with a general order of dependence. An auxiliary Markov chain is associated with the state sequence and is used to simplify the computations. Two examples are given to illustrate the use of the methodology. Whereas the first application is more to illustrate the basic steps in applying the theory, the second is a more detailed application to DNA sequences, and shows that the methods can be adapted to include restrictions related to biological knowledge.

••

TL;DR: This paper is concerned with active network tomography where the goal is to recover information about quality-of-service parameters at the link level from aggregate data measured on end-to- end network paths.

Abstract: The analysis of computer and communication networks gives rise to some interesting inverse problems. This paper is concerned with active network tomography where the goal is to recover information about quality-of-service (QoS) parameters at the link level from aggregate data measured on end-to- end network paths. The estimation and monitoring of QoS parameters, such as loss rates and delays, are of considerable interest to network engineers and Internet service providers. The paper provides a review of the inverse problems and recent research on inference for loss rates and delay distributions. Some new results on parametric inference for delay distributions are also developed. In addition, a real application on Internet telephony is discussed.

••

TL;DR: The notion of data depth has long been in use to obtain robust location and scale estimates in a multivariate setting and can also be used to screen for extreme observations or outliers (the observations with low data depth).

Abstract: The notion of data depth has long been in use to obtain robust location and scale estimates in a multivariate setting. The depth of an observa- tion is a measure of its centrality, with respect to a data set or a distribution. The data depths of a set of multivariate observations translates to a center- outward ordering of the data. Thus, data depth provides a generalization of the median to a multivariate setting (the deepest observation), and can also be used to screen for extreme observations or outliers (the observations with low data depth). Data depth has been used in the development of a wide range of robust and non-parametric methods for multivariate data, such as non-parametric tests of location and scale (Li and Liu (2004)), multivariate rank-tests (Liu and Singh (1993)), non-parametric classification and clustering (Jornsten (2004)), and robust regression (Rousseeuw and Hubert (1999)). Many different notions of data depth have been developed for multivariate data. In contrast, data depth measures for functional data have only recently been proposed (Fraiman and Muniz (1999), Lopez-Pintado and Romo (2006a)). While the definitions of both of these data depth measures are motivated by the functional aspect of the data, the measures themselves are in fact invari- ant with respect to permutations of the domain (i.e. the compact interval on which the functions are defined). Thus, these measures are equally applicable to multivariate data where there is no explicit ordering of the data dimensions. In this paper we explore some extensions of functional data depths, so as to take the ordering of the data dimensions into account.

•

TL;DR: In this article, the authors presented a geometric method to determine confidence sets for the ratio E(Y)/E(X) of the means of random variables X and Y. This method is valid in a large variety of circumstances.

Abstract: We present a geometric method to determine confidence sets for the ratio E(Y)/E(X) of the means of random variables X and Y. This method reduces the problem of constructing confidence sets for the ratio of two random variables to the problem of constructing confidence sets for the means of one-dimensional random variables. It is valid in a large variety of circumstances. In the case of normally distributed random variables, the so constructed confidence sets coincide with the standard Fieller confidence sets. Generalizations of our construction lead to definitions of exact and conservative confidence sets for very general classes of distributions, provided the joint expectation of (X,Y) exists and the linear combinations of the form aX + bY are well-behaved. Finally, our geometric method allows to derive a very simple bootstrap approach for constructing conservative confidence sets for ratios which perform favorably in certain situations, in particular in the asymmetric heavy-tailed regime.

•

TL;DR: Three mean integrated squared error based bandwidth selection methods are introduced, the least-squares cross-validation method, the adaptive weight kernel density estimator and boundary problems are studied.

Abstract: Weighted kernel-density estimates (wKDE) are broadly used in many statistical areas, for instant, density estimation under right-censoring. However, bandwidth selection could be a problem by reweighting the ker- nels. In this paper, we investigate the methods of bandwidth selection for wKDE. Three mean integrated squared error based bandwidth selection methods are introduced. The least-squares cross-validation method, the adaptive weight kernel density estimator and boundary problems are also studied. Monte Carlo simulations were conducted to demonstrate the per- formance of the proposed bandwidth selection methods. Finally, the perfor- mance of wKDE is illustrated via an application to biased sampling problem and a real data application.

•

TL;DR: In this paper, a 1-alpha confidence interval for theta with uncertain prior information that tau = 0.95 has been proposed, which is optimal in the sense that the largest weight is given to this expected length when tau=0.95.

Abstract: We consider a linear regression model with regression parameter beta=(beta_1,...,beta_p) and independent and identically N(0,sigma^2) distributed errors. Suppose that the parameter of interest is theta = a^T beta where a is a specified vector. Define the parameter tau=c^T beta-t where the vector c and the number t are specified and a and c are linearly independent. Also suppose that we have uncertain prior information that tau = 0. We present a new frequentist 1-alpha confidence interval for theta that utilizes this prior information. We require this confidence interval to (a) have endpoints that are continuous functions of the data and (b) coincide with the standard 1-alpha confidence interval when the data strongly contradicts this prior information. This interval is optimal in the sense that it has minimum weighted average expected length where the largest weight is given to this expected length when tau=0. This minimization leads to an interval that has the following desirable properties. This interval has expected length that (a) is relatively small when the prior information about tau is correct and (b) has a maximum value that is not too large. The following problem will be used to illustrate the application of this new confidence interval. Consider a 2-by 2 factorial experiment with 20 replicates. Suppose that the parameter of interest theta is a specified simple effect and that we have uncertain prior information that the two-factor interaction is zero. Our aim is to find a frequentist 0.95 confidence interval for theta that utilizes this prior information.

•

TL;DR: In this paper, the posterior distribution of the number k of components in a finite mixture of normals is computed by using a Poisson distribution as the prior for k. Two aspects of prior specification are also studied: an argument is made for the use of a poisson(1)-approximation of the distribution as a prior for the number of components, and methods are given for the selection of hyperparameter values with natural conjugate priors on the components parameters.

Abstract: A new method for the computation of the posterior distribution of the number k of components in a finite mixture is presented. Two aspects of prior specification are also studied: an argument is made for the use of a Poisson(1) distribution as the prior for k; and methods are given for the selection of hyperparameter values in the mixture of normals model, with natural conjugate priors on the components parameters.