scispace - formally typeset
Search or ask a question
Author

Dipak K. Dey

Bio: Dipak K. Dey is an academic researcher from University of Connecticut. The author has contributed to research in topics: Bayesian probability & Bayesian inference. The author has an hindex of 41, co-authored 310 publications receiving 9310 citations. Previous affiliations of Dipak K. Dey include University of Rochester & University of Connecticut Health Center.


Papers
More filters
Journal ArticleDOI
TL;DR: A general predictive density is presented which includes all proposed Bayesian approaches the authors are aware of and using Laplace approximations they can conveniently assess and compare asymptotic behavior of these approaches.
Abstract: : Model determination is a fundamental data analytic task. Here we consider the problem of choosing amongst a finite (with loss of generality we assume two) set of models. After briefly reviewing classical and Bayesian model choice strategies we present a general predictive density which includes all proposed Bayesian approaches we are aware of. Using Laplace approximations we can conveniently assess and compare asymptotic behavior of these approaches. Concern regarding the accuracy of these approximation for small to moderate sample sizes encourages the use of Monte Carlo techniques to carry out exact calculations. A data set fit with nested non linear models enables comparison between proposals and between exact and asymptotic values.

1,233 citations

04 Dec 1992
TL;DR: Model determination is divided into the issues of model adequacy and model selection and it is proposed to validate conditional predictive distributions arising from single point deletion against observed responses.
Abstract: : Model determination is divided into the issues of model adequacy and model selection. Predictive distributions are used to address both issues. This seems natural since, typically, prediction is a primary purpose for the chosen model. A cross-validation viewpoint is argued for. In particular, for a given model, it is proposed to validate conditional predictive distributions arising from single point deletion against observed responses. Sampling based methods are used to carry out required calculations. An example investigates the adequacy of and rather subtle choice between two sigmoidal growth models of the same dimension.

671 citations

Journal ArticleDOI
TL;DR: In this article, a general class of multivariate skew-elliptical distributions is proposed, which contains the multivariate normal, Student's t, exponential power, and Pearson type II, with an extra parameter to regulate skewness.

629 citations

Journal ArticleDOI
TL;DR: In this paper, a new class of distributions by introducing skewness in multivariate ellip-tically symmetric distributions was developed, which is obtained by using transformation and conditioning.
Abstract: The authors develop a new class of distributions by introducing skewness in multivariate ellip- tically symmetric distributions. The class, which is obtained by using transformation and conditioning, contains many standard families including the multivariate skew-normal and distributions. The authors obtain analytical forms of the densities and study distributional properties. They give practical applica- tions in Bayesian regression models and results on the existence of the posterior distributions and moments under improper priors for the regression coefficients. They illustrate their methods using practical examples.

616 citations

Journal ArticleDOI
TL;DR: This work describes a Bayesian method that allows direct estimates of FST from dominant markers and illustrates the method with a reanalysis of RAPD data from 14 populations of a North American orchid, Platanthera leucophaea.
Abstract: Molecular markers derived from polymerase chain reaction (PCR) amplification of genomic DNA are an important part of the toolkit of evolutionary geneticists. Random amplified polymorphic DNA markers (RAPDs), amplified fragment length polymorphisms (AFLPs) and intersimple sequence repeat (ISSR) polymorphisms allow analysis of species for which previous DNA sequence information is lacking, but dominance makes it impossible to apply standard techniques to calculate F-statistics. We describe a Bayesian method that allows direct estimates of FST from dominant markers. In contrast to existing alternatives, we do not assume previous knowledge of the degree of within-population inbreeding. In particular, we do not assume that genotypes within populations are in Hardy-Weinberg proportions. Our estimate of FST incorporates uncertainty about the magnitude of within-population inbreeding. Simulations show that samples from even a relatively small number of loci and populations produce reliable estimates of FST. Moreover, some information about the degree of within-population inbreeding (FIS) is available from data sets with a large number of loci and populations. We illustrate the method with a reanalysis of RAPD data from 14 populations of a North American orchid, Platanthera leucophaea.

459 citations


Cited by
More filters
Journal Article
TL;DR: This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.
Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

13,333 citations

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined and derive a measure pD for the effective number in a model as the difference between the posterior mean of the deviances and the deviance at the posterior means of the parameters of interest, which is related to other information criteria and has an approximate decision theoretic justification.
Abstract: Summary. We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the ‘hat’ matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.

11,691 citations

Journal ArticleDOI
TL;DR: Various facets of such multimodel inference are presented here, particularly methods of model averaging, which can be derived as a non-Bayesian result.
Abstract: The model selection literature has been generally poor at reflecting the deep foundations of the Akaike information criterion (AIC) and at making appropriate comparisons to the Bayesian information...

8,933 citations

Book
24 Aug 2012
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

8,059 citations

Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations