scispace - formally typeset
Search or ask a question

Showing papers on "Bayes' theorem published in 2000"


Journal ArticleDOI
TL;DR: A new framework for discovering interactions between genes based on multiple expression measurements is proposed and a method for recovering gene interactions from microarray data is described using tools for learning Bayesian networks.
Abstract: DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a "snapshot" of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graph-based model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes and because they provide a clear methodology for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cell-cycle measurements of Spellman et al. (1998).

3,507 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed empirical Bayes selection criteria that use hyperparameter estimates instead of fixed choices for variable selection for the normal linear model, and their performance is seen to approximate adaptively the performance of the best fixed penalty criterion across a variety of orthogonal and nonorthogonal set-ups, including wavelet regression.
Abstract: For the problem of variable selection for the normal linear model, selection criteria such as AIC, C p , BIC and RIC have fixed dimensionality penalties. Such criteria are shown to correspond to selection of maximum posterior models under implicit hyperparameter choices for a particular hierarchical Bayes formulation. Based on this calibration, we propose empirical Bayes selection criteria that use hyperparameter estimates instead of fixed choices. For obtaining these estimates, both marginal and conditional maximum likelihood methods are considered. As opposed to traditional fixed penalty criteria, these empirical Bayes criteria have dimensionality penalties that depend on the data. Their performance is seen to approximate adaptively the performance of the best fixed-penalty criterion across a variety of orthogonal and nonorthogonal set-ups, including wavelet regression. Empirical Bayes shrinkage estimators of the selected coefficients are also proposed.

493 citations


Journal ArticleDOI
TL;DR: The foundations of the theory are based on a system of three axioms — in addition to Kolmogorov's axiom — and definitions of independence as well as of conditional-probability, and the resulting theory does not depend upon interpretations of the probability concept.

384 citations


Journal ArticleDOI
TL;DR: A full structured review of applications of Bayesian methods to randomised controlled trials, observational studies, and the synthesis of evidence, in a form which should be reasonably straightforward to update is provided.
Abstract: Background Bayesian methods may be defined as the explicit quantitative use of external evidence in the design, monitoring, analysis, interpretation and reporting of a health technology assessment. In outline, the methods involve formal combination through the use of Bayes's theorem of: 1. a prior distribution or belief about the value of a quantity of interest (for example, a treatment effect) based on evidence not derived from the study under analysis, with 2. a summary of the information concerning the same quantity available from the data collected in the study (known as the likelihood), to yield 3. an updated or posterior distribution of the quantity of interest. These methods thus directly address the question of how new evidence should change what we currently believe. They extend naturally into making predictions, synthesising evidence from multiple sources, and designing studies: in addition, if we are willing to quantify the value of different consequences as a 'loss function', Bayesian methods extend into a full decision-theoretic approach to study design, monitoring and eventual policy decision-making. Nonetheless, Bayesian methods are a controversial topic in that they may involve the explicit use of subjective judgements in what is conventionally supposed to be a rigorous scientific exercise. Objectives This report is intended to provide: 1. a brief review of the essential ideas of Bayesian analysis 2. a full structured review of applications of Bayesian methods to randomised controlled trials, observational studies, and the synthesis of evidence, in a form which should be reasonably straightforward to update 3. a critical commentary on similarities and differences between Bayesian and conventional approaches 4. criteria for assessing the reporting of a Bayesian analysis 5. a comprehensive list of published 'three-star' examples, in which a proper prior distribution has been used for the quantity of primary interest 6. tutorial case studies of a variety of types 7. recommendations on how Bayesian methods and approaches may be assimilated into health technology assessments in a variety of contexts and by a variety of participants in the research process. Methods The BIDS ISI database was searched using the terms 'Bayes' or 'Bayesian'. This yielded almost 4000 papers published in the period 1990-98. All resultant abstracts were reviewed for relevance to health technology assessment; about 250 were so identified, and used as the basis for forward and backward searches. In addition EMBASE and MEDLINE databases were searched, along with websites of prominent authors, and available personal collections of references, finally yielding nearly 500 relevant references. A comprehensive review of all references describing use of 'proper' Bayesian methods in health technology assessment (those which update an informative prior distribution through the use of Bayes's theorem) has been attempted, and around 30 such papers are reported in structured form. There has been very limited use of proper Bayesian methods in practice, and relevant studies appear to be relatively easily identified. Results Bayesian methods in the health technology assessment context 1. Different contexts may demand different statistical approaches. Prior opinions are most valuable when the assessment forms part of a series of similar studies. A decision-theoretic approach may be appropriate where the consequences of a study are reasonably predictable. 2. The prior distribution is important and not unique, and so a range of options should be examined in a sensitivity analysis. Bayesian methods are best seen as a transformation from initial to final opinion, rather than providing a single 'correct' inference. 3. The use of a prior is based on judgement, and hence a degree of subjectivity cannot be avoided. However, subjective priors tend to show predictable biases, and archetypal priors may be useful for identifying a reasonable range of prior opinion.

364 citations


Journal ArticleDOI
TL;DR: A revision of the penalty term in BIC is proposed so that it is defined in terms of the number of uncensored events instead of thenumber of observations, which corresponds to a more realistic prior on the parameter space and is shown to improve predictive performance for assessing stroke risk in the Cardiovascular Health Study.
Abstract: We investigate the Bayesian Information Criterion (BIC) for variable selection in models for censored survival data. Kass and Wasserman (1995, Journal of the American Statistical Association 90, 928-934) showed that BIC provides a close approximation to the Bayes factor when a unit-information prior on the parameter space is used. We propose a revision of the penalty term in BIC so that it is defined in terms of the number of uncensored events instead of the number of observations. For a simple censored data model, this revision results in a better approximation to the exact Bayes factor based on a conjugate unit-information prior. In the Cox proportional hazards regression model, we propose defining BIC in terms of the maximized partial likelihood. Using the number of deaths rather than the number of individuals in the BIC penalty term corresponds to a more realistic prior on the parameter space and is shown to improve predictive performance for assessing stroke risk in the Cardiovascular Health Study.

273 citations


Journal ArticleDOI
TL;DR: This paper proposes the application of lazy learning techniques to Bayesian tree induction and presents the resulting lazy Bayesian rule learning algorithm, called LBR, which can be justified by a variant of Bayes theorem which supports a weaker conditional attribute independence assumption than is required by naive Bayes.
Abstract: The naive Bayesian classifier provides a simple and effective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and generates a local naive Bayesian classifier at each leaf. The tests leading to a leaf can alleviate attribute inter-dependencies for the local naive Bayesian classifier. However, Bayesian tree learning still suffers from the small disjunct problem of tree learning. While inferred Bayesian trees demonstrate low average prediction error rates, there is reason to believe that error rates will be higher for those leaves with few training examples. This paper proposes the application of lazy learning techniques to Bayesian tree induction and presents the resulting lazy Bayesian rule learning algorithm, called LBR. This algorithm can be justified by a variant of Bayes theorem which supports a weaker conditional attribute independence assumption than is required by naive Bayes. For each test example, it builds a most appropriate rule with a local naive Bayesian classifier as its consequent. It is demonstrated that the computational requirements of LBR are reasonable in a wide cross-section of natural domains. Experiments with these domains show that, on average, this new algorithm obtains lower error rates significantly more often than the reverse in comparison to a naive Bayesian classifier, C4.5, a Bayesian tree learning algorithm, a constructive Bayesian classifier that eliminates attributes and constructs new attributes using Cartesian products of existing nominal attributes, and a lazy decision tree learning algorithm. It also outperforms, although the result is not statistically significant, a selective naive Bayesian classifier.

262 citations



Journal ArticleDOI
TL;DR: This chapter introduces statistical concepts, prior structures, posterior smoothing, and Bayes-Stein estimation, and discusses models with several unknown parameters.
Abstract: 1. Introductory statistical concepts 2. The discrete version of Bayes' theorem 3. Models with a single unknown parameter 4. The expected utility hypothesis and its alternatives 5. Models with several unknown parameters 6. Prior structures, posterior smoothing, and Bayes-Stein estimation Guide to worked examples Guide to self-study exercises.

221 citations


Posted Content
TL;DR: This paper explored the similarities and differences between classical and Bayesian methods and showed that they result in virtually equivalent conditional estimates of partworths for customers, and that the choice between Bayesian and classical estimation becomes one of implementation convenience and philosophical orientation, rather than pragmatic usefulness.
Abstract: An exciting development in modeling has been the ability to estimate reliable individual-level parameters for choice models. Individual partworths derived from these parameters have been very useful in segmentation, identifying extreme individuals, and in creating appropriate choice simulators. In marketing, hierarchical Bayes models have taken the lead in combining information about the aggregate distribution of tastes with the individuals choices to arrive at a conditional estimate of the individuals parameters. In economics, the same behavioral model has been derived from a classical rather than a Bayesian perspective. That is, instead of Gibbs sampling, the method of maximum simulated likelihood provides estimates of both the aggregate and the individual parameters. This paper explores the similarities and differences between classical and Bayesian methods and shows that they result in virtually equivalent conditional estimates of partworths for customers. Thus, the choice between Bayesian and classical estimation becomes one of implementation convenience and philosophical orientation, rather than pragmatic usefulness.

213 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed empirical Bayes (EB) prior selection methods for various error distributions including the normal and the heavier-tailed Student t-distribution, and obtained threshold shrinkage estimators based on model selection, and multiple-shrinkage estimator based on a model averaging.
Abstract: Summary. Wavelet shrinkage estimation is an increasingly popular method for signal denoising and compression. Although Bayes estimators can provide excellent mean-squared error (MSE) properties, the selection of an effective prior is a difficult task. To address this problem, we propose empirical Bayes (EB) prior selection methods for various error distributions including the normal and the heavier-tailed Student t-distributions. Under such EB prior distributions, we obtain threshold shrinkage estimators based on model selection, and multiple-shrinkage estimators based on model averaging. These EB estimators are seen to be computationally competitive with standard classical thresholding methods, and to be robust to outliers in both the data and wavelet domains. Simulated and real examples are used to illustrate the flexibility and improved MSE performance of these methods in a wide variety of settings.

188 citations


Journal ArticleDOI
TL;DR: This paper shows how to apply the naive Bayes methodology to numeric prediction tasks by modeling the probability distribution of the target value with kernel density estimators, and compares it to linear regression, locally weightedlinear regression, and a method that produces “model trees”—decision trees with linear regression functions at the leaves.
Abstract: Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the correct class. This suggests that its good performance might be restricted to situations where the output is categorical. It is therefore interesting to see how it performs in domains where the predicted value is numeric, because in this case, predictions are more sensitive to inaccurate probability estimates. This paper shows how to apply the naive Bayes methodology to numeric prediction (i.e., regression) tasks by modeling the probability distribution of the target value with kernel density estimators, and compares it to linear regression, locally weighted linear regression, and a method that produces “model trees”—decision trees with linear regression functions at the leaves. Although we exhibit an artificial dataset for which naive Bayes is the method of choice, on real-world datasets it is almost uniformly worse than locally weighted linear regression and model trees. The comparison with linear regression depends on the error measure: for one measure naive Bayes performs similarly, while for another it is worse. We also show that standard naive Bayes applied to regression problems by discretizing the target value performs similarly badly. We then present empirical evidence that isolates naive Bayes' independence assumption as the culprit for its poor performance in the regression setting. These results indicate that the simplistic statistical assumption that naive Bayes makes is indeed more restrictive for regression than for classification.

Journal ArticleDOI
TL;DR: This work proposes the use of the cross-validation posterior predictive distribution, obtained by reanalyzing the data without a suspect small area, as a method for assessing whether the observed count in the area is consistent with the model.
Abstract: Disease incidence or disease mortality rates for small areas are often displayed on maps. Maps of raw rates, disease counts divided by the total population at risk, have been criticized as unreliable due to non-constant variance associated with heterogeneity in base population size. This has led to the use of model-based Bayes or empirical Bayes point estimates for map creation. Because the maps have important epidemiological and political consequences, for example, they are often used to identify small areas with unusually high or low unexplained risk, it is important that the assumptions of the underlying models be scrutinized. We review the use of posterior predictive model checks, which compare features of the observed data to the same features of replicate data generated under the model, for assessing model fitness. One crucial issue is whether extrema are potentially important epidemiological findings or merely evidence of poor model fit. We propose the use of the cross-validation posterior predictive distribution, obtained by reanalyzing the data without a suspect small area, as a method for assessing whether the observed count in the area is consistent with the model. Because it may not be feasible to actually reanalyze the data for each suspect small area in large data sets, two methods for approximating the cross-validation posterior predictive distribution are described.

Journal ArticleDOI
01 Jul 2000-Genetics
TL;DR: This study model the binary trait under the classical threshold model of quantitative genetics, and shows that Bayesian statistics are particularly useful for mapping QTL for complex binary traits.
Abstract: A complex binary trait is a character that has a dichotomous expression but with a polygenic genetic background. Mapping quantitative trait loci (QTL) for such traits is difficult because of the discrete nature and the reduced variation in the phenotypic distribution. Bayesian statistics are proved to be a powerful tool for solving complicated genetic problems, such as multiple QTL with nonadditive effects, and have been successfully applied to QTL mapping for continuous traits. In this study, we show that Bayesian statistics are particularly useful for mapping QTL for complex binary traits. We model the binary trait under the classical threshold model of quantitative genetics. The Bayesian mapping statistics are developed on the basis of the idea of data augmentation. This treatment allows an easy way to generate the value of a hypothetical underlying variable (called the liability) and a threshold, which in turn allow the use of existing Bayesian statistics. The reversible jump Markov chain Monte Carlo algorithm is used to simulate the posterior samples of all unknowns, including the number of QTL, the locations and effects of identified QTL, genotypes of each individual at both the QTL and markers, and eventually the liability of each individual. The Bayesian mapping ends with an estimation of the joint posterior distribution of the number of QTL and the locations and effects of the identified QTL. Utilities of the method are demonstrated using a simulated outbred full-sib family. A computer program written in FORTRAN language is freely available on request.

Journal Article
TL;DR: This paper discusses how a standard Inductive Logic Programming (ILP) system, Progol, has been modified to support learning of SLPs and shows that maximising the Bayesian posterior function involves finding SLPs with short derivations of the examples.
Abstract: Stochastic Logic Programs (SLPs) have been shown be a generalisation of Hidden Markov Models (HMMs), stochastic context-free grammars, and directed Bayes’ nets. A stochastic logic program consists of a set of labelled clauses p:C where p is in the interval [0,1] and C is a first-order range-restricted definite clause. This paper summarises the syntax, distributional semantics and proof techniques for SLPs and then discusses how a standard Inductive Logic Programming (ILP) system, Progol, has been modified to support learning of SLPs. The resulting system 1) finds an SLP with uniform probability labels on each definition and near-maximal Bayes posterior probability and then 2) alters the probability labels to further increase the posterior probability. Stage 1) is implemented within CProgol4.5, which differs from previous versions of Progol by allowing user-defined evaluation functions written in Prolog. It is shown that maximising the Bayesian posterior function involves finding SLPs with short derivations of the examples. Search pruning with the Bayesian evaluation function is carried out in the same way as in previous versions of CProgol. The system is demonstrated with worked examples involving the learning of probability distributions over sequences as well as the learning of simple forms of uncertain knowledge.

Journal ArticleDOI
TL;DR: In this paper, the authors present an Empirical Bayes: Past, Present and Future (Past, Present, and Future), a survey of the past, present and future of empirical Bayes.
Abstract: (2000). Empirical Bayes: Past, Present and Future. Journal of the American Statistical Association: Vol. 95, No. 452, pp. 1286-1289.

Journal ArticleDOI
TL;DR: All Bayes estimators for proper Gaussian priors have zero asymptotic efficiency in this minimax sense, and a class of priors whose Bayes procedures attain the optimal minimax rate of convergence is presented.
Abstract: We study the Bayesian approach to nonparametric function estimation problems such as nonparametric regression and signal estimation We consider the asymptotic properties of Bayes procedures for conjugate (= Gaussian) priors We show that so long as the prior puts nonzero measure on the very large parameter set of interest then the Bayes estimators are not satisfactory More specifically, we show that these estimators do not achieve the correct minimax rate over norm bounded sets in the parameter space Thus all Bayes estimators for proper Gaussian priors have zero asymptotic efficiency in this minimax sense We then present a class of priors whose Bayes procedures attain the optimal minimax rate of convergence These priors may be viewed as compound, or hierarchical, mixtures of suitable Gaussian distributions

Journal ArticleDOI
TL;DR: The efficient numerical solution of FPKE presented relies on the key issue of adaptively calculating the domain over which the state probability density function is to be evaluated, which is done using Chebyshev's inequality.
Abstract: The Fokker-Planck-Kolmogorov equation (FPKE) in conjunction with Bayes conditional density update formula provides optimal estimates for a general continuous-discrete nonlinear filtering problem. It is well known that the analytical solution of FPKE and Bayes formula are extremely difficult to obtain except in a few special cases. Hence, we address this problem using numerical approaches. The efficient numerical solution of FPKE presented relies on the key issue of adaptively calculating the domain over which the state probability density function is to be evaluated, which is done using Chebyshev's inequality. Application to a passive tracking example shows that this approach can provide consistent estimators when measurement nonlinearities and noise levels are high.

Proceedings Article
29 Jun 2000
TL;DR: A new model for studying mul-titask learning is presented, linking theoretical results to practical simulations, and experimental curves for "learning to learn" that can be linked to theoretical results obtained elsewhere are derived.
Abstract: We present a new model for studying mul-titask learning, linking theoretical results to practical simulations. In our model all tasks are combined in a single feedforward neu-ral network. Learning is implemented in a Bayesian fashion. In this Bayesian framework the hidden-to-output weights, being speciic to each task, play the role of model parameters. The input-to-hidden weights, which are shared between all tasks, are treated as hyperparameters. Other hyper-parameters describe error variance and correlations and priors for the model parameters. An important feature of our model is that the probability of these hyperparam-eters given the data can be computed ex-plicitely and only depends on a set of suu-cient statistics. None of these statistics scales with the number of tasks or patterns, which makes empirical Bayes for multitask learning a relatively straightforward optimization problem. Simulations on real-world data sets on single-copy newspaper and magazine sales illustrate properties of multitask learning. Most notably we derive experimental curves for \learning to learn" that can be linked to theoretical results obtained elsewhere.

12 Sep 2000
TL;DR: Evidence is given that the posterior distribution of Naive Bayes goes to zero or one exponentially with document length, and one parametric family is investigated that attempts to downweight the growth rate.
Abstract: : In this paper, we give evidence that the posterior distribution of Naive Bayes goes to zero or one exponentially with document length. While exponential change may be expected as new bits of information are added, adding new words does not always correspond to new information. Essentially as a result of its independence assumption, the estimates grow too quickly. We investigate one parametric family that attempts to downweight the growth rate. The parameters of this family are estimated using a maximum likelihood scheme, and the results are evaluated.

Journal ArticleDOI
TL;DR: A schema that enables the use of classification methods - including machine learning classifiers - for survival analysis and investigates a particular problem of building prognostic models for prostate cancer recurrence, where the sole prediction of the probability of event (and not its probability dependency on time) is of interest.


Journal ArticleDOI
TL;DR: It is shown that uncertainty in decisions is taken into account under a Bayesian formalism and that this may be used to reject uncertain samples, thus dramatically improving system performance.
Abstract: Preliminary results from real-time 'brain-computer interface' experiments are presented. The analysis is based on autoregressive modelling of a single EEG channel coupled with classification and temporal smoothing under a Bayesian paradigm. It is shown that uncertainty in decisions is taken into account under such a formalism and that this may be used to reject uncertain samples, thus dramatically improving system performance. Using the strictest rejection method, a classification performance of 86.5 +/- 6.9% is achieved over a set of seven subjects in two-way cursor movement experiments.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian approach to inference in reliability studies based on type II doubly censored data from a Rayleigh distribution is presented, which can be used to predict the failure-time of a k-out-of-m system.

Journal ArticleDOI
TL;DR: A Bayesian procedure for drawing inferences about specific models for spatial clustering is developed and obtained estimates for disease rates and allow for greater flexibility in both the type of clusters and the number of clusters that may be considered.
Abstract: Many current statistical methods for disease clustering studies are based on a hypothesis testing paradigm. These methods typically do not produce useful estimates of disease rates or cluster risks. In this paper, we develop a Bayesian procedure for drawing inferences about specific models for spatial clustering. The proposed methodology incorporates ideas from image analysis, from Bayesian model averaging, and from model selection. With our approach, we obtain estimates for disease rates and allow for greater flexibility in both the type of clusters and the number of clusters that may be considered. We illustrate the proposed procedure through simulation studies and an analysis of the well-known New York leukemia data.

Journal ArticleDOI
TL;DR: This paper investigates the use of the delta method for obtaining an approximate variance estimate for DIC, in order to attach significance to apparent differences between models, and illustrates the approach using a spatially misaligned data set relating a measure of traffic density to paediatric asthma hospitalizations in San Diego County, California.
Abstract: Bayes and empirical Bayes methods have proven effective in smoothing crude maps of disease risk, eliminating the instability of estimates in low-population areas while maintaining overall geographic trends and patterns. Recent work extends these methods to the analysis of areal data which are spatially misaligned, that is, involving variables (typically counts or rates) which are aggregated over differing sets of regional boundaries. The addition of a temporal aspect complicates matters further, since now the misalignment can arise either within a given time point, or across time points (as when the regional boundaries themselves evolve over time). Hierarchical Bayesian methods (implemented via modern Markov chain Monte Carlo computing methods) enable the fitting of such models, but a formal comparison of their fit is hampered by their large size and often improper prior specifications. In this paper, we accomplish this comparison using the deviance information criterion (DIC), a recently proposed generalization of the Akaike information criterion (AIC) designed for complex hierarchical model settings like ours. We investigate the use of the delta method for obtaining an approximate variance estimate for DIC, in order to attach significance to apparent differences between models. We illustrate our approach using a spatially misaligned data set relating a measure of traffic density to paediatric asthma hospitalizations in San Diego County, California.

Proceedings Article
30 Jun 2000
TL;DR: It is argued that the a person's utility value for a given outcome can be treated as the authors treat other domain attributes: as a random variable with a density function over its possible values.
Abstract: Decision theory does not traditionally include uncertainty over utility functions. We argue that the a person's utility value for a given outcome can be treated as we treat other domain attributes: as a random variable with a density function over its possible values. We show that we can apply statistical density estimation techniques to learn such a density function from a database of partially elicited utility functions. In particular, we define a Bayesian learning framework for this problem, assuming the distribution over utilities is a mixture of Gaussians, where the mixture components represent statistically coherent subpopulations. We can also extend our techniques to the problem of discovering generalized additivity structure in the utility functions in the population. We define a Bayesian model selection criterion for utility function structure and a search procedure over structures. The factorization of the utilities in the learned model, and the generalization obtained from density estimation, allows us to provide robust estimates of utilities using a significantly smaller number of utility elicitation questions. We experiment with our technique on synthetic utility data and on a real database of utility functions in the domain of prenatal diagnosis.

Journal ArticleDOI
TL;DR: In this paper, a generalized version of the BOD decay model is proposed, where the reaction is allowed to assume an order other than one, by making the exponent on BOD concentration a free parameter to be determined by the data.

Journal ArticleDOI
TL;DR: In this article, a simple expression for the retrodictive density operator is derived on the basis of Bayes' theorem for the premeasurement state associated with the result of any measurement.
Abstract: We derive on the basis of Bayes' theorem a simple but general expression for the retrodicted premeasurement state associated with the result of any measurement. The retrodictive density operator is the normalized probability operator measure element associated with the result. We examine applications to quantum optical cryptography and to the optical beam splitter.

Journal ArticleDOI
TL;DR: In this paper, the authors describe the theoretical basis of the Bayesian-based approach and illustrate its application with a practical example that investigates the prevalence of major cardiac defects in a cohort of children born using the assisted reproduction technique known as ICSI (intracytoplasmic sperm injection).
Abstract: Statistical analysis of both experimental and observational data is central to medical research. Unfortunately, the process of conventional statistical analysis is poorly understood by many medical scientists. This is due, in part, to the counter-intuitive nature of the basic tools of traditional (frequency-based) statistical inference. For example, the proper definition of a conventional 95% confidence interval is quite confusing. It is based upon the imaginary results of a series of hypothetical repetitions of the data generation process and subsequent analysis. Not surprisingly, this formal definition is often ignored and a 95% confidence interval is widely taken to represent a range of values that is associated with a 95% probability of containing the true value of the parameter being estimated. Working within the traditional framework of frequency-based statistics, this interpretation is fundamentally incorrect. It is perfectly valid, however, if one works within the framework of Bayesian statistics and assumes a 'prior distribution' that is uniform on the scale of the main outcome variable. This reflects a limited equivalence between conventional and Bayesian statistics that can be used to facilitate a simple Bayesian interpretation based on the results of a standard analysis. Such inferences provide direct and understandable answers to many important types of question in medical research. For example, they can be used to assist decision making based upon studies with unavoidably low statistical power, where non-significant results are all too often, and wrongly, interpreted as implying 'no effect'. They can also be used to overcome the confusion that can result when statistically significant effects are too small to be clinically relevant. This paper describes the theoretical basis of the Bayesian-based approach and illustrates its application with a practical example that investigates the prevalence of major cardiac defects in a cohort of children born using the assisted reproduction technique known as ICSI (intracytoplasmic sperm injection).

Proceedings Article
30 Jul 2000
TL;DR: This paper investigates two particular instantiations of the notion of restricted Bayes optimal classifiers, and shows that the first uses a non-parametric density estimator — Parzen Windows with Gaussian kernels — and hyperplane decision boundaries and is asymptotically equivalent to a maximal margin hyperplane classifier, a highly successful discriminative classifier.
Abstract: We introduce the notion of restricted Bayes optimal classifiers . These classifiers attempt to combine the flexibility of the generative approach to classification with the high accuracy associated with discriminative learning. They first create a model of the joint distribution over class labels and features. Instead of choosing the decision boundary induced directly from the model, they restrict the allowable types of decision boundaries and learn the one that minimizes the probability of misclassification relative to the estimated joint distribution. In this paper, we investigate two particular instantiations of this approach. The first uses a non-parametric density estimator — Parzen Windows with Gaussian kernels — and hyperplane decision boundaries. We show that the resulting classifier is asymptotically equivalent to a maximal margin hyperplane classifier, a highly successful discriminative classifier. We therefore provide an alternative justification for maximal margin hyperplane classifiers. The second instantiation uses a mixture of Gaussians as the estimated density; in experiments on real-world data, we show that this approach allows data with missing values to be handled in a principled manner, leading to improved performance over regular discriminative approaches.