scispace - formally typeset
Search or ask a question
Author

David N. Reshef

Bio: David N. Reshef is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Maximal information coefficient & Mutual information. The author has an hindex of 11, co-authored 18 publications receiving 2301 citations. Previous affiliations of David N. Reshef include Broad Institute & University of Oxford.

Papers
More filters
Journal ArticleDOI
16 Dec 2011-Science
TL;DR: A measure of dependence for two-variable relationships: the maximal information coefficient (MIC), which captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination of the data relative to the regression function.
Abstract: Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

2,414 citations

01 Dec 2011
TL;DR: The maximal information coefficient (MIC) as mentioned in this paper is a measure of dependence for two-variable relationships that captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function.
Abstract: Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

94 citations

Posted Content
TL;DR: This work presents an intuition behind the equitability of MIC through the exploration of the maximization and normalization steps in its definition, and examines the speed and optimality of the approximation algorithm used to compute MIC.
Abstract: A measure of dependence is said to be equitable if it gives similar scores to equally noisy relationships of different types. Equitability is important in data exploration when the goal is to identify a relatively small set of strongest associations within a dataset as opposed to finding as many non-zero associations as possible, which often are too many to sift through. Thus an equitable statistic, such as the maximal information coefficient (MIC), can be useful for analyzing high-dimensional data sets. Here, we explore both equitability and the properties of MIC, and discuss several aspects of the theory and practice of MIC. We begin by presenting an intuition behind the equitability of MIC through the exploration of the maximization and normalization steps in its definition. We then examine the speed and optimality of the approximation algorithm used to compute MIC, and suggest some directions for improving both. Finally, we demonstrate in a range of noise models and sample sizes that MIC is more equitable than natural alternatives, such as mutual information estimation and distance correlation.

72 citations

Journal ArticleDOI
TL;DR: This paper introduces and characterize a population measure of dependence called MIC*, and introduces an efficient approach for computing MIC* from the density of a pair of random variables, and defines a new consistent estimator MICe for MIC* that is efficiently computable.
Abstract: Given a high-dimensional data set, we often wish to find the strongest relationships within it. A common strategy is to evaluate a measure of dependence on every variable pair and retain the highest-scoring pairs for follow-up. This strategy works well if the statistic used (a) has good power to detect non-trivial relationships, and (b) is equitable, meaning that for some measure of noise it assigns similar scores to equally noisy relationships regardless of relationship type (e.g., linear, exponential, periodic). In this paper, we define and theoretically characterize two new statistics that together yield an efficient approach for obtaining both power and equitability. To do this, we first introduce a new population measure of dependence and show three equivalent ways that it can be viewed, including as a canonical "smoothing" of mutual information. We then introduce an efficiently computable consistent estimator of our population measure of dependence, and we empirically establish its equitability on a large class of noisy functional relationships. This new statistic has better bias/variance properties and better runtime complexity than a previous heuristic approach. Next, we derive a second, related statistic whose computation is a trivial side-product of our algorithm and whose goal is powerful independence testing rather than equitability. We prove that this statistic yields a consistent independence test and show in simulations that the test has good power against independence. Taken together, our results suggest that these two statistics are a valuable pair of tools for exploratory data analysis.

52 citations

Journal ArticleDOI
TL;DR: Possible causes for the emergence of fluoroquinolone-resistant N. gonorrhoeae are investigated, especially among heterosexuals, and prevention efforts should be directed toward both populations.
Abstract: Using data from the Gonococcal Isolate Surveillance Project, we studied changes in ciprofloxacin resistance in Neisseria gonorrhoeae isolates in the United States during 2002-2007. Compared with prevalence in heterosexual men, prevalence of ciprofloxacin-resistant N. gonorrhoeae infections showed a more pronounced increase in men who have sex with men (MSM), particularly through an increase in prevalence of strains also resistant to tetracycline and penicillin. Moreover, that multidrug resistance profile among MSM was negatively associated with recent travel. Across the surveillance project sites, first appearance of ciprofloxacin resistance in heterosexual men was positively correlated with such resistance for MSM. The increase in prevalence of ciprofloxacin resistance may have been facilitated by use of fluoroquinolones for treating gonorrhea and other conditions. The prominence of multidrug resistance suggests that using other classes of antimicrobial drugs for purposes other than treating gonorrhea helped increase the prevalence of ciprofloxacin-resistant strains that are also resistant to those drugs.

51 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Book
17 May 2013
TL;DR: This research presents a novel and scalable approach called “Smartfitting” that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing statistical models for regression models.
Abstract: General Strategies.- Regression Models.- Classification Models.- Other Considerations.- Appendix.- References.- Indices.

3,672 citations

Journal ArticleDOI
TL;DR: The large-scale dynamics of the microbiome can be described by many of the tools and observations used in the study of population ecology, andiphering the metagenome and its aggregate genetic information can also be used to understand the functional properties of the microbial community.
Abstract: Interest in the role of the microbiome in human health has burgeoned over the past decade with the advent of new technologies for interrogating complex microbial communities. The large-scale dynamics of the microbiome can be described by many of the tools and observations used in the study of population ecology. Deciphering the metagenome and its aggregate genetic information can also be used to understand the functional properties of the microbial community. Both the microbiome and metagenome probably have important functions in health and disease; their exploration is a frontier in human genetics.

2,650 citations

Journal ArticleDOI
TL;DR: This Review describes how metagenomics and 16S pyrosequencing techniques are opening the way towards global ecosystem network prediction and the development of ecosystem-wide dynamic models.
Abstract: Metagenomics and 16S pyrosequencing have enabled the study of ecosystem structure and dynamics to great depth and accuracy. Co-occurrence and correlation patterns found in these data sets are increasingly used for the prediction of species interactions in environments ranging from the oceans to the human microbiome. In addition, parallelized co-culture assays and combinatorial labelling experiments allow high-throughput discovery of cooperative and competitive relationships between species. In this Review, we describe how these techniques are opening the way towards global ecosystem network prediction and the development of ecosystem-wide dynamic models.

2,401 citations

Journal ArticleDOI
03 Aug 2012-Cell
TL;DR: It is indicated that host-microbial interactions that impact host metabolism can occur and may be beneficial in pregnancy and when transferred to germ-free mice, T3 microbiota induced greater adiposity and insulin insensitivity compared to T1.

1,466 citations