Home
/
Authors
/
Michael Mitzenmacher

Author

Michael Mitzenmacher

Other affiliations: University of Paris-Sud, International Computer Science Institute, Arizona State University ...read more

Bio: Michael Mitzenmacher is an academic researcher from Harvard University. The author has contributed to research in topics: Hash function & Cuckoo hashing. The author has an hindex of 79, co-authored 422 publications receiving 36300 citations. Previous affiliations of Michael Mitzenmacher include University of Paris-Sud & International Computer Science Institute.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994

Papers

PDF

Open Access

More filters

Book•

Probability and Computing: Randomized Algorithms and Probabilistic Analysis

[...]

Michael Mitzenmacher¹, Eli Upfal²•Institutions (2)

Harvard University¹, Brown University²

01 Jan 2005

TL;DR: Preface 1. Events and probability 2. Discrete random variables and expectation 3. Moments and deviations 4. Chernoff bounds 5. Balls, bins and random graphs 6. Probabilistic method 7. Markov chains and random walks 8. Continuous distributions and the Poisson process

...read moreread less

Abstract: Preface 1. Events and probability 2. Discrete random variables and expectation 3. Moments and deviations 4. Chernoff bounds 5. Balls, bins and random graphs 6. The probabilistic method 7. Markov chains and random walks 8. Continuous distributions and the Poisson process 9. Entropy, randomness and information 10. The Monte Carlo method 11. Coupling of Markov chains 12. Martingales 13. Pairwise independence and universal hash functions 14. Balanced allocations References.

...read moreread less

2,543 citations

Journal Article•DOI•

Detecting Novel Associations in Large Data Sets

[...]

David N. Reshef¹, David N. Reshef², David N. Reshef³, Yakir A. Reshef⁴, Yakir A. Reshef¹, Hilary K. Finucane⁵, Sharon R. Grossman¹, Sharon R. Grossman⁴, Gilean McVean⁶, Gilean McVean³, Peter J. Turnbaugh⁴, Eric S. Lander⁴, Eric S. Lander², Eric S. Lander¹, Michael Mitzenmacher⁴, Pardis C. Sabeti¹, Pardis C. Sabeti⁴ - Show less +13 more•Institutions (6)

Broad Institute¹, Massachusetts Institute of Technology², University of Oxford³, Harvard University⁴, Weizmann Institute of Science⁵, Wellcome Trust Centre for Human Genetics⁶

16 Dec 2011-Science

TL;DR: A measure of dependence for two-variable relationships: the maximal information coefficient (MIC), which captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination of the data relative to the regression function.

...read moreread less

Abstract: Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

...read moreread less

2,414 citations

Journal Article•DOI•

Network Applications of Bloom Filters: A Survey

[...]

Andrei Z. Broder¹, Michael Mitzenmacher²•Institutions (2)

IBM¹, Harvard University²

01 Jan 2004-Internet Mathematics

TL;DR: The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications.

...read moreread less

Abstract: A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been used in database applications since the 1970s, but only in recent years have they become popular in the networking literature. The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications.

...read moreread less

2,199 citations

Journal Article•DOI•

A Brief History of Generative Models for Power Law and Lognormal Distributions

[...]

Michael Mitzenmacher¹•Institutions (1)

Harvard University¹

01 Jan 2004-Internet Mathematics

TL;DR: A rich and long history is found of how lognormal distributions have arisen as a possible alternative to power law distributions across many fields, focusing on underlying generative models that lead to these distributions.

...read moreread less

Abstract: Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a lognormal distribution. In trying to learn enough about these distributions to settle the question, I found a rich and long history, spanning many fields. Indeed, several recently proposed models from the computer science community have antecedents in work from decades ago. Here, I briefly survey some of this history, focusing on underlying generative models that lead to these distributions. One finding is that lognormal and power law distributions connect quite naturally, and hence, it is not surprising that lognormal distributions have arisen as a possible alternative to power law distributions across many fields.

...read moreread less

1,787 citations

Journal Article•DOI•

The power of two choices in randomized load balancing

[...]

Michael Mitzenmacher¹•Institutions (1)

Harvard University¹

01 Oct 2001-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This work uses a limiting, deterministic model representing the behavior as n/spl rarr//spl infin/ to approximate the behavior of finite systems and provides simulations that demonstrate that the method accurately predicts system behavior, even for relatively small systems.

...read moreread less

Abstract: We consider the following natural model: customers arrive as a Poisson stream of rate /spl lambda/n, /spl lambda/<1, at a collection of n servers. Each customer chooses some constant d servers independently and uniformly at random from the n servers and waits for service at the one with the fewest customers. Customers are served according to the first-in first-out (FIFO) protocol and the service time for a customer is exponentially distributed with mean 1. We call this problem the supermarket model. We wish to know how the system behaves and in particular we are interested in the effect that the parameter d has on the expected time a customer spends in the system in equilibrium. Our approach uses a limiting, deterministic model representing the behavior as n/spl rarr//spl infin/ to approximate the behavior of finite systems. The analysis of the deterministic model is interesting in its own right. Along with a theoretical justification of this approach, we provide simulations that demonstrate that the method accurately predicts system behavior, even for relatively small systems. Our analysis provides surprising implications. Having d=2 choices leads to exponential improvements in the expected time a customer spends in the system over d=1, whereas having d=3 choices is only a constant factor better than d=2. We discuss the possible implications for system design.

...read moreread less

1,444 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•DOI•

Power-Law Distributions in Empirical Data

[...]

Aaron Clauset¹, Aaron Clauset², Cosma Rohilla Shalizi³, Mark Newman•Institutions (3)

University of New Mexico¹, Santa Fe Institute², Carnegie Mellon University³

01 Nov 2009-Siam Review

TL;DR: This work proposes a principled statistical framework for discerning and quantifying power-law behavior in empirical data by combining maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov (KS) statistic and likelihood ratios.

...read moreread less

Abstract: Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov (KS) statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data, while in others the power law is ruled out.

...read moreread less

8,753 citations

Book•

Information Theory, Inference and Learning Algorithms

[...]

David J. C. MacKay

06 Oct 2003

TL;DR: A fun and exciting textbook on the mathematics underpinning the most dynamic areas of modern science and engineering.

...read moreread less

Abstract: Fun and exciting textbook on the mathematics underpinning the most dynamic areas of modern science and engineering.

...read moreread less

8,091 citations

Proceedings Article•DOI•

Random graphs

[...]

Alan Frieze¹•Institutions (1)

Carnegie Mellon University¹

22 Jan 2006

TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.

...read moreread less

Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

...read moreread less

7,116 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse