scispace - formally typeset
Search or ask a question

Showing papers in "Bayesian Analysis in 2006"


Journal ArticleDOI
TL;DR: In this paper, a folded-noncentral-$t$ family of conditionally conjugate priors for hierarchical standard deviation parameters is proposed, and weakly informative priors in this family are considered.
Abstract: Various noninformative prior distributions have been suggested for scale parameters in hierarchical models. We construct a new folded-noncentral-$t$ family of conditionally conjugate priors for hierarchical standard deviation parameters, and then consider noninformative and weakly informative priors in this family. We use an example to illustrate serious problems with the inverse-gamma family of "noninformative" prior distributions. We suggest instead to use a uniform prior on the hierarchical standard deviation, using the half-$t$ family when the number of groups is small and in other settings where a weakly informative prior is desired. We also illustrate the use of the half-$t$ family for hierarchical modeling of multiple variance parameters such as arise in the analysis of variance.

3,012 citations


Journal ArticleDOI
TL;DR: A variational inference algorithm forDP mixtures is presented and experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a large-scale image analysis problem are presented.
Abstract: Dirichlet process (DP) mixture models are the cornerstone of non- parametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of non- parametric Bayesian methods to a variety of practical data analysis problems. However, MCMC sampling can be prohibitively slow, and it is important to ex- plore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003). Thus far, varia- tional methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias 2000; Ghahramani and Beal 2001; Blei et al. 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a large-scale image analysis problem.

1,471 citations


Journal ArticleDOI
TL;DR: Nested sampling as mentioned in this paper estimates directly how the likelihood function relates to prior mass, and the evidence (alternatively the marginal likelihood, marginal den- sity of the data, or the prior predictive) is immediately obtained by summation.
Abstract: Nested sampling estimates directly how the likelihood function relates to prior mass. The evidence (alternatively the marginal likelihood, marginal den- sity of the data, or the prior predictive) is immediately obtained by summation. It is the prime result of the computation, and is accompanied by an estimate of numerical uncertainty. Samples from the posterior distribution are an optional by- product, obtainable for any temperature. The method relies on sampling within a hard constraint on likelihood value, as opposed to the softened likelihood of an- nealing methods. Progress depends only on the shape of the ested" contours of likelihood, and not on the likelihood values. This invariance (over monotonic re- labelling) allows the method to deal with a class of phase-change problems which eectiv ely defeat thermal annealing.

1,118 citations


Journal ArticleDOI
TL;DR: The deviance information criterion is reassessed for missing data models, testing the behaviour of variousextensions in the cases of mixture and random models.
Abstract: The deviance information criterion (DIC) introduced by is directly inspired by linear and generalised linear models, but it is not so naturally defined for missing data models. In this paper, we reassess the criterion for such models, testing the behaviour of various extensions in the cases of mixture and random effect models.

860 citations


Journal ArticleDOI
TL;DR: It is suggested that the statistical community should accept formal objective Bayesian techniques with confidence, but should be more cautious about casual objectiveBayesian techniques.
Abstract: Bayesian statistical practice makes extensive use of versions of ob- jective Bayesian analysis. We discuss why this is so, and address some of the criticisms that have been raised concerning objective Bayesian analysis. The dan- gers of treating the issue too casually are also considered. In particular, we suggest that the statistical community should accept formal objective Bayesian techniques with confldence, but should be more cautious about casual objective Bayesian techniques.

659 citations


Journal ArticleDOI
TL;DR: The Bayesian and likelihood-based methods studied are considerably faster computationally than MCMC, and steady improvements in recent years in both hardware speed and e-ciency of MonteCarloalgorithms are steady.
Abstract: We use simulation studies, whose design is realistic for educational andmedicalresearch(aswellasotherfleldsofinquiry),tocompareBayesianand likelihood-basedmethodsforflttingvariance-components(VC)andrandom-efiects logistic regression (RELR) models. The likelihood (and approximate likelihood) approachesweexaminearebasedonthemethodsmostwidelyusedincurrentap- plied multilevel (hierarchical) analyses: maximum likelihood (ML) and restricted ML(REML)forGaussianoutcomes,andmarginalandpenalizedquasi-likelihood (MQL and PQL) for Bernoulli outcomes. Our Bayesian methods use Markov chain Monte Carlo (MCMC) estimation, with adaptive hybrid Metropolis-Gibbs sampling for RELR models, and several difiuse prior distributions (i i1 (†;†) and U(0; 1 ) priors for variance components). For evaluation criteria we consider bias of point estimates and nominal versus actual coverage of interval estimates in re- peated sampling. In two-level VC models we flnd that (a) both likelihood-based and Bayesian approaches can be made to produce approximately unbiased esti- mates, although the automatic manner in which REML accomplishes this is an advantage, but (b) both approaches had di-culty achieving nominal coverage in smallsamplesandwithsmallvaluesoftheintraclasscorrelation. Withthethree- levelRELRmodelsweexamineweflndthat(c)quasi-likelihoodmethodsforesti- mating random-efiects variances perform badly with respect to bias and coverage intheexamplewesimulated,and(d)Bayesiandifiuse-priormethodsleadtowell- calibratedpointandintervalRELRestimates. Whileitistruethatthelikelihood- based methods we study are considerably faster computationally than MCMC, (i) steady improvements in recent years in both hardware speed and e-ciency of MonteCarloalgorithmsand(ii)thelackofcalibrationoflikelihood-basedmethods insomecommonhierarchicalsettingscombinetomakeMCMC-basedBayesianflt- tingofmultilevelmodelsanattractiveapproach,evenwithratherlargedatasets. Other analytic strategies based on less approximate likelihood methods are also possible butwouldbeneflt fromfurtherstudy ofthe type summarized here.

522 citations


Journal ArticleDOI
TL;DR: A simple technique using joint updating that improves the performance of the conventional probit regression algorithm and is shown how the logistic method is easily extended to multinomial regression models.
Abstract: In this paper we discuss auxiliary variable approaches to Bayesian binary and multinomial regression. These approaches are ideally suited to automated Markov chain Monte Carlo simulation. In the first part we describe a simple technique using joint updating that improves the performance of the conventional probit regression algorithm. In the second part we discuss auxiliary variable methods for inference in Bayesian logistic regression, including covariate set uncertainty. Finally, we show how the logistic method is easily extended to multinomial regression models. All of the algorithms are fully automatic with no user set parameters and no necessary Metropolis-Hastings accept/reject steps. © 2006 International Society for Bayesian Analysis.

470 citations


Journal ArticleDOI
TL;DR: An overview of key Bayesian developments, beginning with Bayes' posthumously published 1763 paper and continuing up through approximately 1970, including the period of time when "Bayesian" emerged as the label of choice for those who advocated Bayesian methods, can be found in this paper.
Abstract: While Bayes' theorem has a 250-year history, and the method of inverse probability that flowed from it dominated statistical thinking into the twentieth century, the adjective "Bayesian" was not part of the statistical lexicon until relatively recently. This paper provides an overview of key Bayesian developments, beginning with Bayes' posthumously published 1763 paper and continuing up through approximately 1970, including the period of time when "Bayesian" emerged as the label of choice for those who advocated Bayesian methods.

422 citations


Journal ArticleDOI
TL;DR: It is argued that the subjectivist Bayes approach is the only feasible method for tackling many important practical problems and possible modiflcations to the Bayesian approach from a subjectivist viewpoint are considered.
Abstract: We address the position of subjectivism within Bayesian statistics. We argue, flrst, that the subjectivist Bayes approach is the only feasible method for tackling many important practical problems. Second, we describe the essential role of the subjectivist approach in scientiflc analysis. Third, we consider possible modiflcations to the Bayesian approach from a subjectivist viewpoint. Finally, we address the issue of pragmatism in implementing the subjectivist approach.

204 citations


Journal ArticleDOI
TL;DR: How to assess whether or not a prior-data conflict exists, and how to assess when its effects can be ignored for inferences is examined.
Abstract: Inference proceeds from ingredients chosen by the analyst and data. To validate any inferences drawn it is essential that the inputs chosen be deemed appropriate for the data. In the Bayesian context these inputs consist of both the sampling model and the prior. There are thus two possibilities for failure: the data may not have arisen from the sampling model, or the prior may place most of its mass on parameter values that are not feasible in light of the data (referred to here as prior-data conflict). Failure of the sampling model can only be fixed by modifying the model, while prior-data conflict can be overcome if sufficient data is available. We examine how to assess whether or not a prior-data conflict exists, and how to assess when its effects can be ignored for inferences. The concept of prior-data conflict is seen to lead to a partial characterization of what is meant by a noninformative prior or a noninformative sequence of priors.

168 citations


Journal ArticleDOI
TL;DR: In this article, a two-stage, spatially explicit, Hierarchical Logistic Regression (HRL) was used to model the suitability or potential presence for each species at each cell, given species attributes along with grid cell (site-level) climate, precipitation, topography and geology data using species-level coecien ts, and a spatial random eect.
Abstract: Understanding spatial patterns of species diversity and the distri- butions of individual species is a consuming problem in biogeography and con- servation. The Cape Floristic Region (CFR) of South Africa is a global hotspot of diversity and endemism, and the Protea Atlas Project, with some 60,000 site records across the region, provides an extraordinarily rich data set to analyze bio- diversity patterns. Analysis for the region is developed at the spatial scale of one minute grid-cells ( 37; 000 cells total for the region). We report on results for 40 species of a o wering plant family Proteaceae (of about 330 in the CFR) for a dened subregion. Using a Bayesian framework, we develop a two stage, spatially explicit, hierar- chical logistic regression. Stage one models the suitability or potential presence for each species at each cell, given species attributes along with grid cell (site-level) climate, precipitation, topography and geology data using species-level coecien ts, and a spatial random eect. The second level of the hierarchy models, for each species, observed presence=absence at a sampling site through a conditional speci- cation of the probability of presence at an arbitrary location in the grid cell given that the location is suitable. Because the atlas data are not evenly distributed across the landscape, grid cells contain variable numbers of sampling localities. Indeed, some grid cells are entirely unsampled; others have been transformed by human intervention (agriculture, urbanization) such that none of the species are there though some may have the potential to be present in the absence of distur- bance. Thus the modeling takes the sampling intensity at each site into account by assuming that the total number of times that a particular species was observed within a site follows a binomial distribution. In fact, a range of models can be examined incorporating dieren t rst and second stage specications. This necessitates model comparison in a misaligned multilevel setting. All models are tted using MCMC methods. A \best" model is selected. Parameter summaries oer considerable insight. In addition, results

Journal ArticleDOI
TL;DR: In this paper, a generalised iterative algorithm for variational Bayesian estimates for a normal mixture model was proposed, and it was shown theoretically that the variational estimator converges locally to the maximum likelihood estimator at the rate of O(1=n) in the large sample limit.
Abstract: In this paper we propose a generalised iterative algorithm for calcu- lating variational Bayesian estimates for a normal mixture model and investigate its convergence properties. It is shown theoretically that the variational Bayesian estimator converges locally to the maximum likelihood estimator at the rate of O(1=n) in the large sample limit.

Journal ArticleDOI
TL;DR: It is proved that a VB approximation can always be constructed in such a way that guarantees it to be more accurate than the CS approximation, and also to a sampling based gold standard, Annealed Importance Sampling (AIS).
Abstract: A key problem in statistics and machine learning is inferring suitable structure of a model given some observed data. A Bayesian approach to model comparison makes use of the marginal likelihood of each candidate model to form a posterior distribution over models; unfortunately for most models of interest, notably those containing hidden or latent variables, the marginal likelihood is intractable to compute. We present the variational Bayesian (VB) algorithm for directed graphical mod- els, which optimises a lower bound approximation to the marginal likelihood in a procedure similar to the standard EM algorithm. We show that for a large class of models, which we call conjugate exponential, the VB algorithm is a straightfor- ward generalisation of the EM algorithm that incorporates uncertainty over model parameters. In a thorough case study using a small class of bipartite DAGs con- taining hidden variables, we compare the accuracy of the VB approximation to existing asymptotic-data approximations such as the Bayesian Information Crite- rion (BIC) and the Cheeseman-Stutz (CS) criterion, and also to a sampling based gold standard, Annealed Importance Sampling (AIS). We nd that the VB algo- rithm is empirically superior to CS and BIC, and much faster than AIS. Moreover, we prove that a VB approximation can always be constructed in such a way that guarantees it to be more accurate than the CS approximation.

Journal ArticleDOI
Joseph B. Kadane, Galit Shmueli, Tom Minka1, Sharad Borle, Peter Boatwright 
TL;DR: In this article, a Bayesian analysis of a generalization of the Poisson distribution is presented, and a necessary and sufficient condition on the hyperparameters of the conjugate family for the prior to be proper is established.
Abstract: This article explores a Bayesian analysis of a generalization of the Poisson distribution. By choice of a second parameter , both under-dispersed and over-dispersed data can be modeled. The Conway-Maxwell-Poisson distribu- tion forms an exponential family of distributions, so it has sucien t statistics of xed dimension as the sample size varies, and a conjugate family of prior distribu- tions. The article displays and proves a necessary and sucien t condition on the hyperparameters of the conjugate family for the prior to be proper, and it discusses methods of sampling from the conjugate distribution. An elicitation program to nd the hyperparameters from the predictive distribution is also discussed.

Journal ArticleDOI
TL;DR: In this paper, the sensitivity of free surface velocity to variations in the uncertain inputs, to constrain the values of these inputs to be consistent with experiment, and to predict free surface velocities based on the constrained inputs.
Abstract: A y er plate experiment involves forcing a plane shock wave through stationary test samples of material and measuring the free surface velocity of the target as a function of time. These experiments are conducted to learn about the behavior of materials subjected to high strain rate environments. Computer simulations of y er plate experiments are conducted with a two-dimensional hydro- dynamic code developed under the Advanced Strategic Computing (ASC) program at Los Alamos National Laboratory. This code incorporates physical models that contain parameters having uncertain values. The objectives of the analyses pre- sented in this paper are to assess the sensitivity of free surface velocity to variations in the uncertain inputs, to constrain the values of these inputs to be consistent with experiment, and to predict free surface velocity based on the constrained inputs. We implement a Bayesian approach that combines detailed physics simulations with experimental data for the desired statistical inference (Kennedy and O'Hagan 2001; Higdon, Kennedy, Cavendish, Cafeo, and Ryne 2004). The approach given here allows for: uncertainty regarding model inputs (i.e. calibration); accounting for uncertainty due to limitations on the number of simulations that can be carried out; discrepancy between the simulation code and the actual physical system; and uncertainty in the observation process that yields the actual eld data on the true physical system.

Journal ArticleDOI
TL;DR: In this article, a new skew-probit link for item response theory was introduced by considering an accumulated skew-normal distribution, and an efficiency study in the estimation of the model parameters was undertaken for a data set from a Mathematical Test applied in Peruvian schools.
Abstract: We introduce a new skew-probit link for item response theory (IRT) by considering an accumulated skew-normal distribution. The model extends the symmetric probit-normal IRT model by considering a new item (or skewness) parameter for the item characteristic curve. A special interpretation is given for this parameter, and a latent linear structure is indicated for the model when an augmented likelihood is considered. Bayesian MCMC inference approach is developed and an efficiency study in the estimation of the model parameters is undertaken for a data set from (Tanner 1996, pg. 190) by using the notion of effective sample size (ESS) as defined in Kass et al. (1999) and the sample size per second (ESS/s) as considered in Sahu (2002). The methodology is illustrated using a data set corresponding to a Mathematical Test applied in Peruvian schools for which a sensitivity analysis of the chosen priors is conducted and also a comparison with seven parametric IRT models is conducted. The main conclusion is that the skew-probit item response model seems to provide the best fit.

Journal ArticleDOI
TL;DR: The power prior has emerged as a useful informative prior for the incorporation of historical data in a Bayesian analysis Viewing hierarchical modeling as the "gold standard" for combining information across studies, this article provided a formal justification of the power prior by examining formal analytical relationships between the power priors and hierarchical modeling in linear models.
Abstract: The power prior has emerged as a useful informative prior for the incorporation of historical data in a Bayesian analysis Viewing hierarchical modeling as the "gold standard" for combining information across studies, we provide a formal justification of the power prior by examining formal analytical relationships between the power prior and hierarchical modeling in linear models Asymptotic relationships between the power prior and hierarchical modeling are obtained for non-normal models, including generalized linear models, for example These analytical relationships unify the theory of the power prior, demonstrate the generality of the power prior, shed new light on benchmark analyses, and provide insights into the elicitation of the power parameter in the power prior Several theorems are presented establishing these formal connections, as well as a formal methodology for eliciting a guide value for the power parameter $a_0$ via hierarchical models

Journal ArticleDOI
TL;DR: For a scalar random-effect variance, Browne and Draper as mentioned in this paper suggested an inverse Wishart prior for the vector case, where the scale matrix is determined from the first-stage variance.
Abstract: For a scalar random-effect variance, Browne and Draper (2005) have found that the uniform prior works well. It would be valuable to know more about the vector case, in which a second-stage prior on the random effects variance matrix ${\bf D}$ is needed. We suggest consideration of an inverse Wishart prior for ${\bf D}$ where the scale matrix is determined from the first-stage variance.

Journal ArticleDOI
TL;DR: In this article, a model-based approach is proposed to identify clusters of objects based on subsets of attributes, so that the attributes that distinguish a cluster from the rest of the population may depend on the cluster being considered.
Abstract: We discuss a model-based approach to identifying clusters of objects based on subsets of attributes, so that the attributes that distinguish a cluster from the rest of the population may depend on the cluster being considered. The method is based on a Polya urn cluster model for multivariate means and variances, resulting in a multivariate Dirichlet process mixture model. This particular model-based approach accommodates outliers and allows for the incorporation of application-specific data features into the clustering scheme. For example, in an analysis of genetic CGH array data we are able to design a clustering method that accounts for spatial dependence of chromosomal abnormalities.

Journal ArticleDOI
TL;DR: Results show that SEL-optimal ranks perform well over a broad class of loss functions but can be improved upon when classifying units above or below a percentile cut-point, and even optimal rank estimates can perform poorly in many real-world settings.
Abstract: Performance evaluations of health services providers burgeons. Similarly, analyzing spatially related health information, ranking teachers and schools, and identification of differentially expressed genes are increasing in prevalence and importance. Goals include valid and efficient ranking of units for profiling and league tables, identification of excellent and poor performers, the most differentially expressed genes, and determining "exceedances" (how many and which unit-specific true parameters exceed a threshold). These data and inferential goals require a hierarchical, Bayesian model that accounts for nesting relations and identifies both population values and random effects for unit-specific parameters. Furthermore, the Bayesian approach coupled with optimizing a loss function provides a framework for computing non-standard inferences such as ranks and histograms.Estimated ranks that minimize Squared Error Loss (SEL) between the true and estimated ranks have been investigated. The posterior mean ranks minimize SEL and are "general purpose," relevant to a broad spectrum of ranking goals. However, other loss functions and optimizing ranks that are tuned to application-specific goals require identification and evaluation. For example, when the goal is to identify the relatively good (e.g., in the upper 10%) or relatively poor performers, a loss function that penalizes classification errors produces estimates that minimize the error rate. We construct loss functions that address this and other goals, developing a unified framework that facilitates generating candidate estimates, comparing approaches and producing data analytic performance summaries. We compare performance for a fully parametric, hierarchical model with Gaussian sampling distribution under Gaussian and a mixture of Gaussians prior distributions. We illustrate approaches via analysis of standardized mortality ratio data from the United States Renal Data System.Results show that SEL-optimal ranks perform well over a broad class of loss functions but can be improved upon when classifying units above or below a percentile cut-point. Importantly, even optimal rank estimates can perform poorly in many real-world settings; therefore, data-analytic performance summaries should always be reported.

Journal ArticleDOI
TL;DR: The magnitude of the posterior odds of authorship led us to conclude that Ronald Reagan drafted 167 speeches and was aided in the preparation of the remaining 145, and the cross-validated accuracy of the best fully Bayesian model was above 90 percent in all cases.
Abstract: In his campaign for the U.S. presidency from 1975 to 1979, Ronald Reagan delivered over 1000 radio broadcasts. For over 600 of these we have direct evidence of Reagan's authorship. The aim of this study was to determine the authorship of 312 of the broadcasts for which no direct evidence is available. We addressed the prediction problem for speeches delivered in different epochs and we explored a wide range of off-the-shelf classification methods and fully Bayesian generative models. Eventually we produced separate sets of predictions using the most accurate classifiers, based on non-contextual words as well as on semantic features, for the 312 speeches of uncertain authorship. All the predictions agree on 135 of the "unknown" speeches, whereas the fully Bayesian models agree on an additional 154 of them. The magnitude of the posterior odds of authorship led us to conclude that Ronald Reagan drafted 167 speeches and was aided in the preparation of the remaining 145. Our inferences were not sensitive to "reasonable" variations in the sets of constants underlying the prior distributions, and the cross-validated accuracy of our best fully Bayesian model was above 90 percent in all cases. The agreement of multiple methods for predicting the authorship for the "unknown" speeches reinforced our confidence in the accuracy of our classifications.

Journal ArticleDOI
TL;DR: In this article, a Bayesian semiparametric accelerated failure time (AFT) model is proposed, where the baseline survival distribution is modeled as a Dirichlet process mixture of gamma densities.
Abstract: We propose a Bayesian semiparametric accelerated failure time (AFT) model in which the baseline survival distribution is modeled as a Dirichlet process mixture of gamma densities. The model is highly flexible and readily captures features such as multimodality in predictive survival densities. The approach can be used in a "black-box" manner in that the prior information needed to fit the model can be quite vague, and we recommend a particular prior in the absence of information on the baseline survival distribution. The resulting posterior baseline distribution has mass only on the positive reals, a desirable feature in a failure-time model. The formulae needed to fit the model are available in closed-form and the model is relatively easy to code and implement. We provide both simulated and real data examples, including data on the cosmetic effects of cancer therapy.

Journal ArticleDOI
TL;DR: In this article, the authors use regular variation theory to establish sufficient conditions in the pure scale parameter structure under which it is possible to resolve conflicts among the sources of information. But the authors also note some important differences between the scale and the location parameters cases.
Abstract: Bayesian robustness modelling using heavy-tailed distributions provides a flexible approach to resolving problems of conflicts between the data and prior distributions. See Dawid (1973) and O'Hagan (1979, 1988, 1990), who provided sufficient conditions on the distributions in the model in order to reject the conflicting data or the prior distribution in favour of the other source of information. However, the literature has almost concentrated exclusively on robustness of the posterior distribution of location parameters; little attention has been given to scale parameters. In this paper we propose a new approach for Bayesian robustness modelling, in which we use the class of regularly varying distributions. Regular variation provides a very natural description of tail thickness in heavy-tailed distributions. Using regular variation theory, we establish sufficient conditions in the pure scale parameter structure under which is possible to resolve conflicts amongst the sources of information. We also note some important differences between the scale and the location parameters cases. Finally, we obtain new conditions in the pure location parameter structure which may be easier to verify than those proposed by Dawid and O'Hagan.

Journal ArticleDOI
TL;DR: The results indicate the proposed scheme delivers accurate parameter estimates while employing only a single pass through the data, as well as enabling a direct comparison with Ridgeway and Madigan (2002), namely a mixture model for Markov chains and Bayesian logistic regression.
Abstract: For Bayesian analysis of massive data, Markov chain Monte Carlo (MCMC) techniques often prove infeasible due to computational resource con- straints. Standard MCMC methods generally require a complete scan of the dataset for each iteration. Ridgeway and Madigan (2002) and Chopin (2002b) recently presented importance sampling algorithms that combined simulations from a posterior distribution conditioned on a small portion of the dataset with a reweighting of those simulations to condition on the remainder of the dataset. While these algorithms drastically reduce the number of data accesses as compared to traditional MCMC, they still require substantially more than a single pass over the dataset. In this paper, we present \1PFS," an ecien t, one-pass algorithm. The algorithm employs a simple modication of the Ridgeway and Madigan (2002) particle ltering algorithm that replaces the MCMC based \rejuvenation" step with a more ecien t \shrinkage" kernel smoothing based step. To show proof- of-concept and to enable a direct comparison, we demonstrate 1PFS on the same examples presented in Ridgeway and Madigan (2002), namely a mixture model for Markov chains and Bayesian logistic regression. Our results indicate the proposed scheme delivers accurate parameter estimates while employing only a single pass through the data.

Journal ArticleDOI
TL;DR: In this paper, a hierarchical dynamic Bayesian network is proposed to model the spiking patterns of neuronal ensembles over time, where the parameters characterizing the discrete-time spiking process are introduced at separate model stages, the unknown structure of the functional connections among the analysed neurons and its dependence on their spatial arrangement.
Abstract: This paper illustrates a novel hierarchical dynamic Bayesian network modelling the spiking patterns of neuronal ensembles over time. We introduce, at separate model stages, the parameters characterizing the discrete-time spiking process, the unknown structure of the functional connections among the analysed neurons and its dependence on their spatial arrangement. Estimates for all model parameters and predictions for future spiking states are computed under the Bayesian paradigm via the standard Gibbs sampler using a shrinkage prior. The adequacy of the model is investigated by plotting the residuals and by applying the time-rescaling theorem. We analyse a simulated dataset and a set of experimental multiple spike trains obtained from a culture of neurons in vitro. For the latter data, we nd that one neuron plays a pivotal role for the initiation of each cycle of network activity and that the estimated network structure signicantly depends on the spatial arrangement of the neurons. © 2006 International Society for Bayesian Analysis.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate the occurrence of two words, \Bayes" and \Bayesian," since 1970 in journal articles in a variety of disciplines, with a focus on economics and statistics.
Abstract: To measure the impact of Bayesian reasoning, this paper investigates the occurrence of two words, \Bayes" and \Bayesian," since 1970 in journal articles in a variety of disciplines, with a focus on economics and statistics. The growth in statistics is documented, but the growth in economics is largely conned to economic theory/mathematical economics rather than econometrics.

Journal ArticleDOI
TL;DR: A class of multi-scale models for time series with a variety of autocorrelation structures based on a parsimonious parameterization that has the ability to combine information across levels of resolution, and it also has the capacity to emulate long memory processes is introduced.
Abstract: We introduce a class of multi-scale models for time series. The novel framework couples standard linear models at different levels of resolution via stochastic links across scales. Jeffrey's rule of conditioning is used to revise the implied distributions and ensure that the probability distributions at the different levels are strictly compatible. This results in a new class of models for time series with three key characteristics: this class exhibits a variety of autocorrelation structures based on a parsimonious parameterization, it has the ability to combine information across levels of resolution, and it also has the capacity to emulate long memory processes. The potential applications of such multi-scale models include problems in which it is of interest to develop consistent stochastic models across time-scales and levels of resolution, in order to coherently combine and integrate information arising at different levels of resolution. Bayesian estimation based on MCMC analysis and forecasting based on simulation are developed. One application to the analysis of the flow of a river illustrates the new class of models and its utility.

Journal ArticleDOI
TL;DR: As stated by the authors, the use of multilevel models has grown substantially over the last few years, but there are a number of competing methods proposed for their estimation, both Bayesian and likelihood based.
Abstract: As stated by the authors, the use of multilevel models has grown substantially over the last few years. However, as listed in the first paragraph of section 1, there are a number of competing methods proposed for their estimation, both Bayesian and likelihood based. Within the Bayesian framework there is of course the added issue of the choice of prior distributions for the various model parameters. It is worth noting here that the increased use of Bayesian methods over the last decade or so has not necessarily been due to a philosophical shift, but rather a desire to fit complex models, with software such as WinBUGS enabling users to do this. Many of these users want their ‘data to dominate’ and therefore want all prior distributions to be non-informative. However, this is rarely straightforward and in hierarchical models it is the choice of prior distribution for the hierarchical variance parameters that has been shown to be most crucial, particularly in small samples. In earlier work we conducted a simulation study on the choice of prior distribution for the variance component (between study variance) in a meta-analysis of aggregated data (Lambert et al. 2005). One of the advantages of using aggregated data is that models are quicker to fit and we were able to compare 13 different prior distributions for 9 different scenarios. When the number of level 2 units is large the choice of prior distribution becomes less important. However, for many real applications in medicine one would expect the number of level 2 units to be small, for example meta-analysis (Sutton and Abrams 2001) and cluster randomised trials (Turner et al. 2001). It is to the situations where there are only a small number of level 2 units that I wish to address most of my comments.

Journal ArticleDOI
TL;DR: A novel Bayesian method for the analysis of comparative experiments performed with oligonucleotide microarrays that models gene expression data by log-normal and gamma distributions with hierarchical prior distributions on the parameters of interest, and uses model averaging to compute the posterior probability of dieren tial expression.
Abstract: A major challenge to the statistical analysis of microarray data is the small number of samples | limited by both cost and sample availability | compared to the large number of genes, now soaring into the tens of thousands per experiment. This situation is made even more dicult by the complex nature of the empirical distributions of gene expression measurements and the necessity to limit the number of false detections due to multiple comparisons. This paper introduces a novel Bayesian method for the analysis of comparative experiments performed with oligonucleotide microarrays. Our method models gene expression data by log-normal and gamma distributions with hierarchical prior distributions on the parameters of interest, and uses model averaging to compute the posterior probability of dieren tial expression. An initial approximate Bayesian analysis is used to identify genes that have a large probability of dieren tial expression, and this list of candidate genes is further rened by using stochastic computations. We assess the performance of this method using real data sets and show that it has an almost negligible false positive rate in small sample experiments that leads to a better detection performance.

Journal ArticleDOI
TL;DR: The subjective-objective dialogue between Goldstein and Berger as mentioned in this paper lays out strong cases for what seem to be two schools of Bayesian thought. But a closer look suggests to me that while both authors address the pragmatics of their approaches, only one qualifies as a school of thought.
Abstract: The subjective-objective dialogue between Goldstein (2006) and Berger (2006) lays out strong cases for what seem to be two schools of Bayesian thought. But a closer look suggests to me that while both authors address the pragmatics of their approaches, only one qualifies as a school of thought. In these comments I address briefly seven dimensions: the history of Bayesian thought, the different roles for a Bayesian approach, the subjectivity of scientists and the illusion of objectivity, the subjectivity of the likelihood function, the difficulty in separating likelihood from prior, pragmatism, and the fruitless search for the objective prior.