Home
/
Authors
/
Andrei Z. Broder

Author

Andrei Z. Broder

Other affiliations: AmeriCorps VISTA, IBM, Columbia University ...read more

Bio: Andrei Z. Broder is an academic researcher from Google. The author has contributed to research in topics: Web search query & Web query classification. The author has an hindex of 67, co-authored 241 publications receiving 27310 citations. Previous affiliations of Andrei Z. Broder include AmeriCorps VISTA & IBM.

Papers published on a yearly basis

2020
2018
2017
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Counting Minimum Weight Spanning Trees

[...]

Andrei Z. Broder, Ernst W. Mayr¹•Institutions (1)

Ludwig Maximilian University of Munich¹

01 Jul 1997-Journal of Algorithms

TL;DR: An algorithm for counting the number of minimum weight spanning trees is presented, based on the fact that the generating function for theNumber of spanning trees of a given graph, by weight, can be expressed as a simple determinant.

...read moreread less

24 citations

Patent•

System and method for providing contextual actions on a search results page

[...]

Su-Lin Wu¹, Andrei Z. Broder, Evgeniy Gabrilovich, Ronny Lempel¹, Edward Bortnikov, Peter Mika, Debora Donato¹, Wei-Cheng Lai, Christopher LuVogt - Show less +5 more•Institutions (1)

Yahoo!¹

30 Dec 2010

TL;DR: In this paper, a method and system for providing targeted applications within a search engine results page is presented, which includes receiving a search query from a user and interpreting the search query.

...read moreread less

Abstract: The present invention provides a method and system for providing targeted applications within a search engine results page. The method and system includes receiving a search query from a user and interpreting the search query. The method and system then first maps the interpreted query to one or more action templates, wherein mapping the interpreted query to one or more action templates comprises selecting one or more actions associated with the interpreted query. The method and system then maps the selected one or more actions associated with the interpreted query to a plurality of applications and selecting one or more applications associated with the one or more actions. Finally, the method and system displays the one or more applications within a search results page.

...read moreread less

24 citations

Proceedings Article•DOI•

A search-based method for forecasting ad impression in contextual advertising

[...]

Xuerui Wang¹, Andrei Z. Broder², Marcus Fontoura², Vanja Josifovski²•Institutions (2)

University of Massachusetts Amherst¹, Yahoo!²

20 Apr 2009

TL;DR: Experimental results show that the approach can accurately forecast the expected number of impressions of contextual ads in real time, and how this method can be used in tools for bid selection and ad evaluation.

...read moreread less

Abstract: Contextual advertising (also called content match) refers to the placement of small textual ads within the content of a generic web page. It has become a significant source of revenue for publishers ranging from individual bloggers to major newspapers. At the same time it is an important way for advertisers to reach their intended audience. This reach depends on the total number of exposures of the ad (impressions) and its click-through-rate (CTR) that can be viewed as the probability of an end-user clicking on the ad when shown. These two orthogonal, critical factors are both difficult to estimate and even individually can still be very informative and useful in planning and budgeting advertising campaigns.In this paper, we address the problem of forecasting the number of impressions for new or changed ads in the system. Producing such forecasts, even within large margins of error, is quite challenging: 1) ad selection in contextual advertising is a complicated process based on tens or even hundreds of page and ad features; 2) the publishers' content and traffic vary over time; and 3) the scale of the problem is daunting: over a course of a week it involves billions of impressions, hundreds of millions of distinct pages, hundreds of millions of ads, and varying bids of other competing advertisers. We tackle these complexities by simulating the presence of a given ad with its associated bid over weeks of historical data. We obtain an impression estimate by counting how many times the ad would have been displayed if it were in the system over that period of time. We estimate this count by an efficient two-level search algorithm over the distinct pages in the data set. Experimental results show that our approach can accurately forecast the expected number of impressions of contextual ads in real time. We also show how this method can be used in tools for bid selection and ad evaluation.

...read moreread less

23 citations

Posted Content•

A Note on Double Pooling Tests.

[...]

Andrei Z. Broder¹, Ravi Kumar•Institutions (1)

Association for Computing Machinery¹

03 Apr 2020-arXiv: Discrete Mathematics

TL;DR: Double pooling is presented, a simple, easy-to-implement variation on test pooling, that in certain ranges for the a priori probability of a positive test, is significantly more efficient than the standard single pooling approach (the Dorfman method).

...read moreread less

Abstract: We present double pooling, a simple, easy-to-implement variation on test pooling, that in certain ranges for the a priori probability of a positive test, is significantly more efficient than the standard single pooling approach (the Dorfman method).

...read moreread less

23 citations

Journal Article•DOI•

Effective and efficient classification on a search-engine model

[...]

Aris Anagnostopoulos¹, Andrei Z. Broder¹, Kunal Punera²•Institutions (2)

Yahoo!¹, University of Texas at Austin²

28 Jul 2008-Knowledge and Information Systems

TL;DR: It is shown that surprisingly good classification accuracy can be achieved on average over multiple classes by queries with as few as 10 terms, and that optimizing the efficiency of query execution by careful selection of terms can further reduce the query costs.

...read moreread less

Abstract: Traditional document classification frameworks, which apply the learned classifier to each document in a corpus one by one, are infeasible for extremely large document corpora, like the Web or large corporate intranets. We consider the classification problem on a corpus that has been processed primarily for the purpose of searching, and thus our access to documents is solely through the inverted index of a large scale search engine. Our main goal is to build the “best” short query that characterizes a document class using operators normally available within search engines. We show that surprisingly good classification accuracy can be achieved on average over multiple classes by queries with as few as 10 terms. As part of our study, we enhance some of the feature-selection techniques that are found in the literature by forcing the inclusion of terms that are negatively correlated with the target class and by making use of term correlations; we show that both of those techniques can offer significant advantages. Moreover, we show that optimizing the efficiency of query execution by careful selection of terms can further reduce the query costs. More precisely, we show that on our set-up the best 10-term query can achieve 93% of the accuracy of the best SVM classifier (14,000 terms), and if we are willing to tolerate a reduction to 89% of the best SVM, we can build a 10-term query that can be executed more than twice as fast as the best 10-term query.

...read moreread less

23 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
…
21
22
23
24
25
26
27
…
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Statistical mechanics of complex networks

[...]

Réka Albert¹, Albert-László Barabási¹•Institutions (1)

University of Notre Dame¹

01 Jan 2001-Reviews of Modern Physics

TL;DR: In this paper, a simple model based on the power-law degree distribution of real networks was proposed, which was able to reproduce the power law degree distribution in real networks and to capture the evolution of networks, not just their static topology.

...read moreread less

Abstract: The emergence of order in natural systems is a constant source of inspiration for both physical and biological sciences. While the spatial order characterizing for example the crystals has been the basis of many advances in contemporary physics, most complex systems in nature do not offer such high degree of order. Many of these systems form complex networks whose nodes are the elements of the system and edges represent the interactions between them. Traditionally complex networks have been described by the random graph theory founded in 1959 by Paul Erdohs and Alfred Renyi. One of the defining features of random graphs is that they are statistically homogeneous, and their degree distribution (characterizing the spread in the number of edges starting from a node) is a Poisson distribution. In contrast, recent empirical studies, including the work of our group, indicate that the topology of real networks is much richer than that of random graphs. In particular, the degree distribution of real networks is a power-law, indicating a heterogeneous topology in which the majority of the nodes have a small degree, but there is a significant fraction of highly connected nodes that play an important role in the connectivity of the network. The scale-free topology of real networks has very important consequences on their functioning. For example, we have discovered that scale-free networks are extremely resilient to the random disruption of their nodes. On the other hand, the selective removal of the nodes with highest degree induces a rapid breakdown of the network to isolated subparts that cannot communicate with each other. The non-trivial scaling of the degree distribution of real networks is also an indication of their assembly and evolution. Indeed, our modeling studies have shown us that there are general principles governing the evolution of networks. Most networks start from a small seed and grow by the addition of new nodes which attach to the nodes already in the system. This process obeys preferential attachment: the new nodes are more likely to connect to nodes with already high degree. We have proposed a simple model based on these two principles wich was able to reproduce the power-law degree distribution of real networks. Perhaps even more importantly, this model paved the way to a new paradigm of network modeling, trying to capture the evolution of networks, not just their static topology.

...read moreread less

18,415 citations

Journal Article•DOI•

The Structure and Function of Complex Networks

[...]

Mark Newman

01 Jan 2003-Siam Review

TL;DR: Developments in this field are reviewed, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks.

...read moreread less

Abstract: Inspired by empirical studies of networked systems such as the Internet, social networks, and biological networks, researchers have in recent years developed a variety of techniques and models to help us understand or predict the behavior of these systems. Here we review developments in this field, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks.

...read moreread less

17,647 citations

Journal Article•DOI•

Community structure in social and biological networks

[...]

Michelle Girvan¹, Mark Newman•Institutions (1)

Santa Fe Institute¹

11 Jun 2002-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: This article proposes a method for detecting communities, built around the idea of using centrality indices to find community boundaries, and tests it on computer-generated and real-world graphs whose community structure is already known and finds that the method detects this known structure with high sensitivity and reliability.

...read moreread less

Abstract: A number of recent studies have focused on the statistical properties of networked systems such as social networks and the Worldwide Web. Researchers have concentrated particularly on a few properties that seem to be common to many networks: the small-world property, power-law degree distributions, and network transitivity. In this article, we highlight another property that is found in many networks, the property of community structure, in which network nodes are joined together in tightly knit groups, between which there are only looser connections. We propose a method for detecting such communities, built around the idea of using centrality indices to find community boundaries. We test our method on computer-generated and real-world graphs whose community structure is already known and find that the method detects this known structure with high sensitivity and reliability. We also apply the method to two networks whose community structure is not well known—a collaboration network and a food web—and find that it detects significant and informative community divisions in both cases.

...read moreread less

14,429 citations

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•DOI•

Finding and evaluating community structure in networks.

[...]

Mark Newman¹, Mark Newman², Michelle Girvan³, Michelle Girvan¹•Institutions (3)

Santa Fe Institute¹, University of Michigan², Cornell University³

26 Feb 2004-Physical Review E

TL;DR: It is demonstrated that the algorithms proposed are highly effective at discovering community structure in both computer-generated and real-world network data, and can be used to shed light on the sometimes dauntingly complex structure of networked systems.

...read moreread less

Abstract: We propose and study a set of algorithms for discovering community structure in networks-natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using any one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.

...read moreread less

12,882 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse