scispace - formally typeset
Search or ask a question
Author

Andrei Z. Broder

Other affiliations: AmeriCorps VISTA, IBM, Columbia University  ...read more
Bio: Andrei Z. Broder is an academic researcher from Google. The author has contributed to research in topics: Web search query & Web query classification. The author has an hindex of 67, co-authored 241 publications receiving 27310 citations. Previous affiliations of Andrei Z. Broder include AmeriCorps VISTA & IBM.


Papers
More filters
01 Jan 1985
TL;DR: A study of the general properties of permutation invariant mappings combined with the analysis of this particular distribution made possible the computation of the expected running time of this factorization method, settling an open conjecture of Pollard.
Abstract: A random mapping is a random graph where every vertex has outdegree one. Previous work was concerned mostly with a uniform probability distribution on these mappings. In contrast, this investigation assumes a non-uniform model, where different mappings have different probabilities. An important application is the analysis of a factorization heuristic due to Pollard and Brent. The model involved is a random mapping where every vertex has indegree either 0 or d. This distribution belongs to a class called permutation invariant. A study of the general properties of permutation invariant mappings combined with the analysis of this particular distribution made possible the computation of the expected running time of this factorization method, settling an open conjecture of Pollard.

1 citations

Patent
05 May 2000
TL;DR: In this article, a method and system that detects mirrored host pairs using information about a large set of pages, including one or more of: URLs, IP addresses, and connectivity information, is presented.
Abstract: A method and system that detects mirrored host pairs using information about a large set of pages, including one or more of: URLs, IP addresses, and connectivity information. The identities of the detected mirrored hosts are then saved so that browsers, crawlers, proxy servers, or the like can correctly identify mirrored web sites. The described embodiments of the present invention use one or a combination of techniques to identify mirrors. A first group of techniques involves determining mirrors based on URLs and information about connectivity (i.e., hyperlinks) between pages. A second group of techniques looks at connectivity information at a higher granularity, considering all links from all pages on a host as one group and ignoring the target of each link beyond the host level.

1 citations

Patent
13 May 2015
TL;DR: In this article, the authors present a system for providing a purpose-oriented application on a result page of a search engine by providing a device, a method, a program, and a system.
Abstract: PROBLEM TO BE SOLVED: To provide a device, a method, a program, and a system for providing a purpose-oriented application on a result page of a search engine.SOLUTION: A device, a method, a program, and a system receive a retrieval query from a user, select one or more operations relevant to the query, select one or more applications relevant to the one or more operations, and display the one or more applications on a retrieval result page.SELECTED DRAWING: None

1 citations

Proceedings ArticleDOI
Andrei Z. Broder1
02 Feb 2018
TL;DR: The main goal of this talk is to examine developments and to urge the WSDM community to increase its focus on assistive AI solutions that are becoming pertinent to a wide variety of information processing problems.
Abstract: A quarter-century ago Web search stormed the world: within a few years the Web search box became a standard tool of daily life ready to satisfy informational, transactional, and navigational queries needed for some task completion. However, two recent trends are dramatically changing the box»s role: first, the explosive spread of smartphones brings significant computational resources literally into the pockets of billions of users; second, recent technological advances in machine learning and artificial intelligence, and in particular in speech processing led to the wide deployment of assistive AI systems, culminating in personal digital assistants. Along the way, the "Web search box" has become an "assistance request box" (implicit, in the case of voice-activated assistants) and likewise, many other information processing systems (e.g. e-mail, navigation, personal search, etc) have adopted assistive aspects. Formally, the assistive systems can be viewed as a selection process within a base set of alternatives driven by some user input. The output is either one alternative or a smaller set of alternatives, maybe subject to future selection. Hence, classic IR is a particular instance of this formulation, where the input is a textual query and the selection process is relevance ranking over the corpus. In increasing order of selection capabilities, assistive systems can be classified into three categories: Subordinate : systems where the selection is fully specified by the request; if this results in a singleton the system provides it, otherwise the system provides a random alternative from the result set. Therefore, the challenge for subordinate systems consists only in the correct interpretation of the user request (e.g., weather information, simple personal schedule management, a "play jazz" request). Conducive : systems that reduce the set of alternatives to a smaller set, possibly via an interactive process (e.g. the classic ten blue links, the three "smart replies" in Gmail, interactive recommendations, etc). Decisive : systems that make all necessary decisions to reach the desired goal (in other words, select a single alternative from the set of possibilities) including resolving ambiguities and other substantive decisions without further input from the user (e.g., typical translation systems, self-driving cars). The main goal of this talk is to examine these developments and to urge the WSDM community to increase its focus on assistive AI solutions that are becoming pertinent to a wide variety of information processing problems. I will mostly present ideas and work in progress, and there will be many more open questions than definitive answers.

1 citations

Patent
05 Apr 2007
TL;DR: In this article, the authors present a system and method for determining an event occurrence rate, where each content item may be associated with at least one region in a hierarchical data structure and a scale factor may be applied to the first impression volume to generate a second impression volume.
Abstract: Described are a system and method for determining an event occurrence rate. A sample set of content items may be obtained. Each of the content items may be associated with at least one region in a hierarchical data structure. A first impression volume may be determined for the at least one region as a function of a number of impressions registered for the content items associated with the at least one region. A scale factor may be applied to the first impression volume to generate a second impression volume. The scale factor may be selected so that the second impression volume is within a predefined range of a third impression volume. A click-through-rate (CTR) may be estimated as a function of the second impression volume and a number of clicks on the content item.

1 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this paper, a simple model based on the power-law degree distribution of real networks was proposed, which was able to reproduce the power law degree distribution in real networks and to capture the evolution of networks, not just their static topology.
Abstract: The emergence of order in natural systems is a constant source of inspiration for both physical and biological sciences. While the spatial order characterizing for example the crystals has been the basis of many advances in contemporary physics, most complex systems in nature do not offer such high degree of order. Many of these systems form complex networks whose nodes are the elements of the system and edges represent the interactions between them. Traditionally complex networks have been described by the random graph theory founded in 1959 by Paul Erdohs and Alfred Renyi. One of the defining features of random graphs is that they are statistically homogeneous, and their degree distribution (characterizing the spread in the number of edges starting from a node) is a Poisson distribution. In contrast, recent empirical studies, including the work of our group, indicate that the topology of real networks is much richer than that of random graphs. In particular, the degree distribution of real networks is a power-law, indicating a heterogeneous topology in which the majority of the nodes have a small degree, but there is a significant fraction of highly connected nodes that play an important role in the connectivity of the network. The scale-free topology of real networks has very important consequences on their functioning. For example, we have discovered that scale-free networks are extremely resilient to the random disruption of their nodes. On the other hand, the selective removal of the nodes with highest degree induces a rapid breakdown of the network to isolated subparts that cannot communicate with each other. The non-trivial scaling of the degree distribution of real networks is also an indication of their assembly and evolution. Indeed, our modeling studies have shown us that there are general principles governing the evolution of networks. Most networks start from a small seed and grow by the addition of new nodes which attach to the nodes already in the system. This process obeys preferential attachment: the new nodes are more likely to connect to nodes with already high degree. We have proposed a simple model based on these two principles wich was able to reproduce the power-law degree distribution of real networks. Perhaps even more importantly, this model paved the way to a new paradigm of network modeling, trying to capture the evolution of networks, not just their static topology.

18,415 citations

Journal ArticleDOI
TL;DR: Developments in this field are reviewed, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks.
Abstract: Inspired by empirical studies of networked systems such as the Internet, social networks, and biological networks, researchers have in recent years developed a variety of techniques and models to help us understand or predict the behavior of these systems. Here we review developments in this field, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks.

17,647 citations

Journal ArticleDOI
TL;DR: This article proposes a method for detecting communities, built around the idea of using centrality indices to find community boundaries, and tests it on computer-generated and real-world graphs whose community structure is already known and finds that the method detects this known structure with high sensitivity and reliability.
Abstract: A number of recent studies have focused on the statistical properties of networked systems such as social networks and the Worldwide Web. Researchers have concentrated particularly on a few properties that seem to be common to many networks: the small-world property, power-law degree distributions, and network transitivity. In this article, we highlight another property that is found in many networks, the property of community structure, in which network nodes are joined together in tightly knit groups, between which there are only looser connections. We propose a method for detecting such communities, built around the idea of using centrality indices to find community boundaries. We test our method on computer-generated and real-world graphs whose community structure is already known and find that the method detects this known structure with high sensitivity and reliability. We also apply the method to two networks whose community structure is not well known—a collaboration network and a food web—and find that it detects significant and informative community divisions in both cases.

14,429 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: It is demonstrated that the algorithms proposed are highly effective at discovering community structure in both computer-generated and real-world network data, and can be used to shed light on the sometimes dauntingly complex structure of networked systems.
Abstract: We propose and study a set of algorithms for discovering community structure in networks-natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using any one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.

12,882 citations