scispace - formally typeset
Search or ask a question
Author

Isabella Verdinelli

Other affiliations: Sapienza University of Rome
Bio: Isabella Verdinelli is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Cluster analysis & Bayesian probability. The author has an hindex of 20, co-authored 39 publications receiving 3447 citations. Previous affiliations of Isabella Verdinelli include Sapienza University of Rome.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper reviews the literature on Bayesian experimental design, both for linear and nonlinear models, and presents a uniied view of the topic by putting experimental design in a decision theoretic framework.
Abstract: This paper reviews the literature on Bayesian experimental design. A unified view of this topic is presented, based on a decision-theoretic approach. This framework casts criteria from the Bayesian literature of design as part of a single coherent approach. The decision-theoretic structure incorporates both linear and nonlinear design problems and it suggests possible new directions to the experimental design problem, motivated by the use of new utility functions. We show that, in some special cases of linear design problems, Bayesian solutions change in a sensible way when the prior distribution and the utility function are modified to allow for the specific structure of the experiment. The decision-theoretic approach also gives a mathematical justification for selecting the appropriate optimality criterion.

1,903 citations

Journal ArticleDOI
TL;DR: The method derives from observing that in general, a Bayes factor can be written as the product of a quantity called the Savage-Dickey density ratio and a correction factor; both terms are easily estimated from posterior simulation.
Abstract: We present a simple method for computing Bayes factors. The method derives from observing that in general, a Bayes factor can be written as the product of a quantity called the Savage-Dickey density ratio and a correction factor; both terms are easily estimated from posterior simulation. In some cases it is possible to do these computations without ever evaluating the likelihood.

502 citations

Journal ArticleDOI
TL;DR: This article extends false discovery rates to random fields, for which there are uncountably many hypothesis tests, and develops a method for finding regions in the field's domain where there is a significant signal while controlling either theportion of area or the proportion of clusters in which false rejections occur.
Abstract: This article extends false discovery rates to random fields, for which there are uncountably many hypothesis tests. We develop a method for finding regions in the field's domain where there is a significant signal while controlling either the proportion of area or the proportion of clusters in which false rejections occur. The method produces confidence envelopes for the proportion of false discoveries as a function of the rejection threshold. From the confidence envelopes, we derive threshold procedures to control either the mean or the specified tail probabilities of the false discovery proportion. An essential ingredient of this construnction is a new algorithm to compute a confidence superset for the set of all true-null locations. We demonstrate our method with applications to scan statistics and functional neuroimaging.

173 citations

Journal ArticleDOI
TL;DR: Ridge estimation is an extension of mode finding and is useful for understanding the structure of a density and can be used to find hidden structure in point cloud data.
Abstract: We study the problem of estimating the ridges of a density function. Ridge estimation is an extension of mode finding and is useful for understanding the structure of a density. It can also be used to find hidden structure in point cloud data. We show that, under mild regularity conditions, the ridges of the kernel density estimator consistently estimate the ridges of the true density. When the data are noisy measurements of a manifold, we show that the ridges are close and topologically similar to the hidden manifold. To find the estimated ridges in practice, we adapt the modified mean-shift algorithm proposed by Ozertem and Erdogmus [J. Mach. Learn. Res. 12 (2011) 1249–1286]. Some numerical experiments verify that the algorithm is accurate.

126 citations

Journal ArticleDOI
TL;DR: The minimax rate depends only on thedimension of the manifold, not on the dimension of the space in which M is embedded, so the optimal rate of convergence is n^{-2/(2+d)}.
Abstract: We find the minimax rate of convergence in Hausdorff distance for estimating a manifold M of dimension d embedded in RD given a noisy sample from the manifold. Under certain conditions, we show that the optimal rate of convergence is n-2/(2+d). Thus, the minimax rate depends only on the dimension of the manifold, not on the dimension of the space in which M is embedded.

118 citations


Cited by
More filters
01 Jan 2009
TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.
Abstract: The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns. An active learner may pose queries, usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant or easily obtained, but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for successful active learning, a summary of problem setting variants and practical issues, and a discussion of related topics in machine learning research are also presented.

5,227 citations

Proceedings Article
21 Aug 2003
TL;DR: An approach to semi-supervised learning is proposed that is based on a Gaussian random field model, and methods to incorporate class priors and the predictions of classifiers obtained by supervised learning are discussed.
Abstract: An approach to semi-supervised learning is proposed that is based on a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The learning problem is then formulated in terms of a Gaussian random field on this graph, where the mean of the field is characterized in terms of harmonic functions, and is efficiently obtained using matrix methods or belief propagation. The resulting learning algorithms have intimate connections with random walks, electric networks, and spectral graph theory. We discuss methods to incorporate class priors and the predictions of classifiers obtained by supervised learning. We also propose a method of parameter learning by entropy minimization, and show the algorithm's ability to perform feature selection. Promising experimental results are presented for synthetic data, digit classification, and text classification tasks.

3,908 citations

Journal ArticleDOI
TL;DR: It is argued that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages.
Abstract: Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001). (AIC; Bayes factors; BIC; likelihood ratio tests; model averaging; model uncertainty; model selection; multimodel inference.) It is clear that models of nucleotide substitution (henceforth models of evolution) play a significant role in molecular phylogenetics, particularly in the context of distance, maximum likelihood (ML), and Bayesian es- timation. We know that the use of one or other model affects many, if not all, stages of phylogenetic inference. For example, estimates of phylogeny, substitution rates, bootstrap values, posterior probabilities, or tests of the molecular clock are clearly influenced by the model of evolution used in the analysis (Buckley, 2002; Buckley

3,712 citations

Journal ArticleDOI
01 Jan 2016
TL;DR: This review paper introduces Bayesian optimization, highlights some of its methodological aspects, and showcases a wide range of applications.
Abstract: Big Data applications are typically associated with systems involving large numbers of users, massive complex software systems, and large-scale heterogeneous computing and storage architectures. The construction of such systems involves many distributed design choices. The end products (e.g., recommendation systems, medical analysis tools, real-time game engines, speech recognizers) thus involve many tunable configuration parameters. These parameters are often specified and hard-coded into the software by various developers or teams. If optimized jointly, these parameters can result in significant improvements. Bayesian optimization is a powerful tool for the joint optimization of design choices that is gaining great popularity in recent years. It promises greater automation so as to increase both product quality and human productivity. This review paper introduces Bayesian optimization, highlights some of its methodological aspects, and showcases a wide range of applications.

3,703 citations

Journal ArticleDOI
TL;DR: The performance of the genomic control method is quite good for plausible effects of liability genes, which bodes well for future genetic analyses of complex disorders.
Abstract: A dense set of single nucleotide polymorphisms (SNP) covering the genome and an efficient method to assess SNP genotypes are expected to be available in the near future. An outstanding question is how to use these technologies efficiently to identify genes affecting liability to complex disorders. To achieve this goal, we propose a statistical method that has several optimal properties: It can be used with case control data and yet, like family-based designs, controls for population heterogeneity; it is insensitive to the usual violations of model assumptions, such as cases failing to be strictly independent; and, by using Bayesian outlier methods, it circumvents the need for Bonferroni correction for multiple tests, leading to better performance in many settings while still constraining risk for false positives. The performance of our genomic control method is quite good for plausible effects of liability genes, which bodes well for future genetic analyses of complex disorders.

3,130 citations