Showing papers in "Internet Mathematics in 2004"

PDF

Open Access

Journal Article•DOI•

Network Applications of Bloom Filters: A Survey

[...]

Andrei Z. Broder¹, Michael Mitzenmacher²•Institutions (2)

01 Jan 2004-Internet Mathematics

TL;DR: The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications.

...read moreread less

Abstract: A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters allow false positives but the space savings often outweigh this drawback when the probability of an error is controlled. Bloom filters have been used in database applications since the 1970s, but only in recent years have they become popular in the networking literature. The aim of this paper is to survey the ways in which Bloom filters have been used and modified in a variety of network problems, with the aim of providing a unified mathematical and practical framework for understanding them and stimulating their use in future applications.

...read moreread less

2,199 citations

Journal Article•DOI•

A Brief History of Generative Models for Power Law and Lognormal Distributions

[...]

Michael Mitzenmacher¹•Institutions (1)

Harvard University¹

01 Jan 2004-Internet Mathematics

TL;DR: A rich and long history is found of how lognormal distributions have arisen as a possible alternative to power law distributions across many fields, focusing on underlying generative models that lead to these distributions.

...read moreread less

Abstract: Recently, I became interested in a current debate over whether file size distributions are best modelled by a power law distribution or a lognormal distribution. In trying to learn enough about these distributions to settle the question, I found a rich and long history, spanning many fields. Indeed, several recently proposed models from the computer science community have antecedents in work from decades ago. Here, I briefly survey some of this history, focusing on underlying generative models that lead to these distributions. One finding is that lognormal and power law distributions connect quite naturally, and hence, it is not surprising that lognormal distributions have arisen as a possible alternative to power law distributions across many fields.

...read moreread less

1,787 citations

Journal Article•DOI•

Deeper Inside PageRank

[...]

Amy N. Langville¹, Carl D. Meyer¹•Institutions (1)

North Carolina State University¹

01 Jan 2004-Internet Mathematics

TL;DR: A comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniqueness, and convergence properties, possible alterations to the basic model, and suggested alternatives to the traditional solution methods.

...read moreread less

Abstract: This paper serves as a companion or extension to the "Inside PageRank" paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniqueness, and convergence properties, possible alterations to the basic model, suggested alternatives to the traditional solution methods, sensitivity and conditioning, and finally the updating problem. We introduce a few new results, provide an extensive reference list, and speculate about exciting areas of future research.

...read moreread less

910 citations

Journal Article•DOI•

Graph Clustering and Minimum Cut Trees

[...]

Gary W. Flake¹, Robert E. Tarjan², Kostas Tsioutsiouliklis²•Institutions (2)

Yahoo!¹, Princeton University²

01 Jan 2004-Internet Mathematics

TL;DR: The clustering algorithms satisfy strong theoretical criteria and perform well in practice, and it is shown that the quality of the produced clusters is bounded by strong minimum cut and expansion criteria.

...read moreread less

Abstract: In this paper, we introduce simple graph clustering methods based on minimum cuts within the graph. The clustering methods are general enough to apply to any kind of graph but are well suited for graphs where the link structure implies a notion of reference, similarity, or endorsement, such as web and citation graphs. We show that the quality of the produced clusters is bounded by strong minimum cut and expansion criteria. We also develop a framework for hierarchical clustering and present applications to real-world data. We conclude that the clustering algorithms satisfy strong theoretical criteria and perform well in practice.

...read moreread less

380 citations

Journal Article•DOI•

The Average Distance in a Random Graph with Given Expected Degrees

[...]

Fan Chung, Linyuan Lu

01 Jan 2004-Internet Mathematics

TL;DR: It is shown that for certain families of random graphs with given expected degrees, the average distance is almost surely of order log n/ logd̃ where d̃ is the weighted average of the sum of squares of the expected degrees.

...read moreread less

Abstract: Random graph theory is used to examine the "small-world phenomenon"– any two strangers are connected through a short chain of mutual acquaintances. We will show that for certain families of random graphs with given expected degrees, the average distance is almost surely of order log n/ logd where d is the weighted average of the sum of squares of the expected degrees. Of particular interest are power law random graphs in which the number of vertices of degree k is proportional to 1/k β for some fixed exponent β. For the case of β > 3, we prove that the average distance of the power law graphs is almost surely of order log n/ log d. However, many Internet, social, and citation networks are power law graphs with exponents in the range 2 < β < 3 for which the power law random graphs have average distance almost surely of order log log n, but have diameter of order log n (provided having some mild constraints for the average distance and maximum degree). In particular, these graphs contain a dense subgraph...

...read moreread less

370 citations

Journal Article•DOI•

Robustness and Vulnerability of Scale-Free Random Graphs

[...]

Béla Bollobás¹, Oliver Riordan²•Institutions (2)

University of Memphis¹, University of Cambridge²

01 Jan 2004-Internet Mathematics

TL;DR: It is shown that the LCD graph is much more robust than classical random graphs with the same number of edges, but also more vulnerable to attack, namely robustness to random damage, and vulnerability to malicious attack.

...read moreread less

Abstract: Recently many new "scale-free" random graph models have been introduced, motivated by the power-law degree sequences observed in many large-scale, real-world networks. Perhaps the best known, the Barabasi-Albert model, has been extensively studied from heuristic and experimental points of view. Here we consider mathematically two basic characteristics of a precise version of this model, the LCD model, namely robustness to random damage, and vulnerability to malicious attack. We show that the LCD graph is much more robust than classical random graphs with the same number of edges, but also more vulnerable to attack. In particular, if vertices of the n-vertex LCD graph are deleted at random, then as long as any positive proportion remains, the graph induced on the remaining vertices has a component of order n. In contrast, if the deleted vertices are chosen maliciously, a constant fraction less then 1 can be deleted to destroy all large components. For the Barabasi-Albert model, these questions have been st...

...read moreread less

310 citations

Journal Article•DOI•

An Approximate Truthful Mechanism for Combinatorial Auctions with Single Parameter Agents

[...]

Aaron Archer, Christos H. Papadimitriou, Kunal Talwar, Éva Tardos

01 Jan 2004-Internet Mathematics

TL;DR: This work devise a version of randomized rounding that is incentive compatible, giving a truthful mechanism for combinatorial auctions with single parameter agents (e.g., "single minded bidders") that approximately maximizes the social value of the auction.

...read moreread less

Abstract: Mechanism design seeks algorithms whose inputs are provided by selfish agents who would lie if it were to their advantage. Incentive-compatible mechanisms compel the agents to tell the truth by making it in their self-interest to do so. Often, as in combinatorial auctions, such mechanisms involve the solution of NP-hard problems. Unfortunately, approximation algorithms typically destroy incentive compatibility. Randomized rounding is a commonly used technique for designing approximation algorithms. We devise a version of randomized rounding that is incentivecompatible, giving a truthful mechanism for combinatorial auctions with single parameter agents (e.g., "single minded bidders") that approximately maximizes the social value of the auction. We discuss two orthogonal notions of truthfulness for a randomized mechanism–truthfulness with high probability and in expectation–and give a mechanism that achieves both simultaneously. We consider combinatorial auctions where multiple copies of many different item...

...read moreread less

252 citations

Journal Article•DOI•

The Spectra of Random Graphs with Given Expected Degrees

[...]

Fan Chung¹, Linyuan Lu¹, Van Vu¹•Institutions (1)

University of California, San Diego¹

01 Jan 2004-Internet Mathematics

TL;DR: It is shown that (under certain conditions) the eigenvalues of the (normalized) Laplacian of a random power law graph follow the semicircle law while the spectrum of the adjacency matrix of a power law graphs obeys the power law.

...read moreread less

Abstract: In the study of the spectra of power law graphs, there are basically two competing approaches. One is to prove analogues of Wigner's semicircle law while the other predicts that the eigenvalues follow a power law distributions. Although the semicircle law and the power law have nothing in common, we will show that both approaches are essentially correct if one considers the appropriate matrices. We will show that (under certain conditions) the eigenvalues of the (normalized) Laplacian of a random power law graph follow the semicircle law while the spectrum of the adjacency matrix of a power law graph obeys the power law. Our results are based on the analysis of random graphs with given expected degrees and their relations to several key invariants. Of interest are a number of (new) values for the exponent β where phase transitions for eigenvalue distributions occur. The spectrum distributions have direct implications to numerous graph algorithms such as randomized algorithms that involve rapidly mixing Ma...

...read moreread less

224 citations

Journal Article•DOI•

Dynamic Models for File Sizes and Double Pareto Distributions

[...]

Michael Mitzenmacher¹•Institutions (1)

Harvard University¹

01 Jan 2004-Internet Mathematics

TL;DR: The Recursive Forest File model, a new, dynamic generative user model that combines multiplicative models that generate lognormal distributions with recent work on random graph models for the web, explains the behavior of file size distributions, and may be useful for describing other power law phenomena in computer systems as well as other fields.

...read moreread less

Abstract: In this paper, we introduce and analyze a new, dynamic generative user model to explain the behavior of file size distributions. Our Recursive Forest File model combines multiplicative models that generate lognormal distributions with recent work on random graph models for the web. Unlike similar previous work, our Recursive Forest File model allows new files to be created and old files to be deleted over time, and our analysis covers problematic issues such as correlation among file sizes. Moreover, our model allows natural variations where files that are copied or modified are more likely to be copied or modified subsequently. Previous empirical work suggests that file sizes tend to have a lognormal body but a Pareto tail. The Recursive Forest File model explains this behavior, yielding a double Pareto distribution, which has a Pareto tail but close to a lognormal body. We believe the Recursive Forest model may be useful for describing other power law phenomena in computer systems as well as other fields.

...read moreread less

152 citations

Journal Article•DOI•

Link Evolution: Analysis and Algorithms

[...]

Steve Chien, Cynthia Dwork, Ravi Kumar, Daniel R. Simon, Dandapani Sivakumar - Show less +1 more

01 Jan 2004-Internet Mathematics

TL;DR: This analysis formalizes why the intuition that drives Google never fails and presents a very efficient algorithm to incrementally compute good approximations to Google's PageRank, as links evolve.

...read moreread less

Abstract: We anticipate that future web search techniques will exploit changes in web structure and content. As a first step in this direction, we examine the problem of integrating observed changes in link structure into static hyperlink-based ranking computations. We present a very efficient algorithm to incrementally compute good approximations to Google's PageRank [Brin and Page 98], as links evolve. Our experiments reveal that this algorithm is both fast and yields excellent approximations to PageRank, even in light of large changes to the link structure. Our algorithm derives intuition and partial justification from a rigorous sensitivity analysis of Markov chains. Consider a regular Markov chain with stationary probability π, and suppose the transition probability into a state j is increased. We prove that this can only cause • πj to increase–adding a link to a site can only cause the stationary probability of the target site to increase; • the rank of j to improve–if the states are ordered according to thei...

...read moreread less

97 citations

Journal Article•DOI•

Random Deletion in a Scale-Free Random Graph Process

[...]

Colin Cooper¹, Alan Frieze², Juan C. Vera²•Institutions (2)

King's College London¹, Carnegie Mellon University²

01 Jan 2004-Internet Mathematics

TL;DR: It is shown that for large k, t, the expected number of vertices of degree k is approximately dkt where as k → 8, dk ~ Ck -1-β where and C > 0 is a constant.

...read moreread less

Abstract: We study a dynamically evolving random graph which adds vertices and edges using preferential attachment and deletes vertices randomly. At time t, with probability α1 > 0 we add a new vertex ut and m random edges incident with ut . The neighbours of ut are chosen with probability proportional to degree. With probability α -α1 ≥ 0 we add m random edges to existing vertices where the endpoints are chosen with probability proportional to degree. With probability 1-α-α0 we delete a random vertex, if there are vertices left to delete. With probability α0 we delete m random edges. Assuming that α + α1 + α0 > 1 and α0 is sufficently small, we show that for large k, t, the expected number of vertices of degree k is approximately dkt where as k → 8, dk ~ Ck -1-β where and C > 0 is a constant. Note that β can take any value greater than 1.

...read moreread less

Journal Article•DOI•

Algorithmic Challenges in Web Search Engines

[...]

Monika Henzinger

01 Jan 2004-Internet Mathematics

TL;DR: Six algorithmic problems that arise in web search engines and that are not or only partially solved are described: Uniformly sampling of web pages; modeling the web graph; finding duplicate hosts; finding top gainers and losers in data streams; finding large dense bipartite graphs; and understanding how eigenvectors partition the web.

...read moreread less

Abstract: In this paper, we describe six algorithmic problems that arise in web search engines and that are not or only partially solved: (1) Uniformly sampling of web pages; (2) modeling the web graph; (3) finding duplicate hosts; (4) finding top gainers and losers in data streams; (5) finding large dense bipartite graphs; and (6) understanding how eigenvectors partition the web.

...read moreread less

Journal Article•DOI•

Coupling Online and Offline Analyses for Random Power Law Graphs

[...]

Fan Chung, Linyuan Lu

01 Jan 2004-Internet Mathematics

TL;DR: A coupling technique for analyzing online models by using offline models that is especially effective for a growth-deletion model that generalizes and includes the preferential attachment model for generating large complex networks which simulate numerous realistic networks.

...read moreread less

Abstract: We develop a coupling technique for analyzing online models by using offline models. This method is especially effective for a growth-deletion model that generalizes and includes the preferential attachment model for generating large complex networks which simulate numerous realistic networks. By coupling the online model with the offline model for random power law graphs, we derive strong bounds for a number of graph properties including diameter, average distances, connected components, and spectral bounds. For example, we prove that a power law graph generated by the growth-deletion model almost surely has diameter O(log n) and average distance O(log log n).

...read moreread less

Journal Article•DOI•

Coupling Scale-Free and Classical Random Graphs

[...]

Béla Bollobás¹, Oliver Riordan²•Institutions (2)

University of Memphis¹, University of Cambridge²

01 Jan 2004-Internet Mathematics

TL;DR: Coupling techniques are used to show that in certain ways the LCD model is not too far from a standard random graph; in particular, the fractions of vertices that must be retained under an optimal attack in order to keep a giant component are within a constant factor for the scale-free and classical models.

...read moreread less

Abstract: Recently many new "scale-free" random graph models have been introduced, motivated by the power-law degree sequences observed in many large-scale real-world networks. The most studied of these is the Barabasi-Albert growth with "preferential attachment" model, made precise as the LCD model by the present authors. Here we use coupling techniques to show that in certain ways the LCD model is not too far from a standard random graph; in particular, the fractions of vertices that must be retained under an optimal attack in order to keep a giant component are within a constant factor for the scale-free and classical models.

...read moreread less

Journal Article•DOI•

Detecting a Network Failure

[...]

Jon Kleinberg¹•Institutions (1)

Cornell University¹

01 Jan 2004-Internet Mathematics

TL;DR: This work describes algorithms that yield provable guarantees for a particular problem of this type: detecting a network failure, and establishes a connection between graph separators and the notion of VC-dimension, using techniques based on matchings and disjoint paths.

...read moreread less

Abstract: Measuring the properties of a large, unstructured network can be difficult: One may not have full knowledge of the network topology, and detailed global measurements may be infeasible. A valuable approach to such problems is to take measurements from selected locations within the network and then aggregate them to infer large-scale properties. One sees this notion applied in settings that range from Internet topology discovery tools to remote software agents that estimate the download times of popular web pages. Some of the most basic questions about this type of approach, however, are largely unresolved at an analytical level. How reliable are the results? How much does the choice of measurement locations affect the aggregate information one infers about the network? We describe algorithms that yield provable guarantees for a particular problem of this type: detecting a network failure. Suppose we want to detect events of the following form in an n-node network: An adversary destroys up to k nodes or edg...

...read moreread less

Journal Article•DOI•

Crawling on Simple Models of Web Graphs

[...]

Colin Cooper, Alan Frieze

01 Jan 2004-Internet Mathematics

TL;DR: This work considers the problem of searching a randomly growing graph by a random walk, and considers two simple models of "web-graphs," where at each time step a new vertex is added and it is connected to the current graph by randomly chosen edges.

...read moreread less

Abstract: We consider the problem of searching a randomly growing graph by a random walk. In particular we consider two simple models of "web-graphs." Thus at each time step a new vertex is added and it is connected to the current graph by randomly chosen edges. At the same time a "spider" S makes a number of steps of a random walk on the current graph. The parameter we consider is the expected proportion of vertices that have been visited by S up to time t.

...read moreread less

Journal Article•DOI•

Smaller Explicit Superconcentrators

[...]

Noga Alon¹, M. Capalbo²•Institutions (2)

Tel Aviv University¹, Institute for Advanced Study²

01 Jan 2004-Internet Mathematics

TL;DR: Using a new recursive technique, this work presents an explicit construction of an infinite family of N-superconcentrators of density 44, the most economical previously known explicit graphs of this type.

...read moreread less

Abstract: Using a new recursive technique, we present an explicit construction of an infinite family of N-superconcentrators of density 44. The most economical previously known explicit graphs of this type have density around 60.

...read moreread less

Journal Article•DOI•

Infinite Limits of Copying Models of the Web Graph

[...]

Anthony Bonato¹, Jeannette Janssen²•Institutions (2)

Wilfrid Laurier University¹, Dalhousie University²

01 Jan 2004-Internet Mathematics

TL;DR: It is proved that deterministic variations of the so-called copying model can lead to several nonisomorphic limits, which explain how limits of the copying model of the web graph share several properties with R that seem to reflect known properties of theweb graph.

...read moreread less

Abstract: Several stochastic models were proposed recently to model the dynamic evolution of the web graph. We study the infinite limits of the stochastic processes proposed to model the web graph when time goes to infinity. We prove that deterministic variations of the so-called copying model can lead to several nonisomorphic limits. Some models converge to the infinite random graph R, while the convergence of other models is sensitive to initial conditions or minor changes in the rules of the model. We explain how limits of the copying model of the web graph share several properties with R that seem to reflect known properties of the web graph.

...read moreread less

Journal Article•DOI•

Admission Control to Minimize Rejections

[...]

Avrim Blum¹, Adam Tauman Kalai², Jon Kleinberg³•Institutions (3)

Carnegie Mellon University¹, Massachusetts Institute of Technology², Cornell University³

01 Jan 2004-Internet Mathematics

TL;DR: This paper shows that in a number of cases, it can in fact achieve a competitive ratio of 2 for rejections, and achieves matching Θ(√m) upper and lower bounds, where m is the number of edges in arbitrary graphs with arbitrary edge capacities.

...read moreread less

Abstract: Admission control (call control) is a well-studied online problem. We are given a fixed graph with edge capacities, and must process a sequence of calls that arrive over time, accepting some and rejecting others in order to stay within capacity limitations of the network. In the standard theoretical formulation, this problem is analyzed as a benefit problem: The goal is to devise an online algorithm that accepts at least a reasonable fraction of the maximum number of calls that could possibly have been accepted in hindsight. This formulation, however, has the property that even algorithms with optimal competitive ratios (typically O(log n) where n is the number of nodes) may end up rejecting the vast majority of calls even when it would have been possible in hindsight to reject only very few. In this paper, we instead consider the goal of approximately minimizing the number of calls rejected. This is much more appropriate for settings in which rejections are intended to be rare events. In order to avoid t...

...read moreread less

Journal Article•DOI•

Guessing Secrets with Inner Product Questions

[...]

Fan Chung¹, Ron Graham¹, Linyuan Lu¹•Institutions (1)

University of California, San Diego¹

01 Jan 2004-Internet Mathematics

TL;DR: In this article, the authors investigate the problem of extracting as much information as possible about the elements of a given subset X from the answers of a truthful adversary A. In particular, they investigate several aspects of this problem.

...read moreread less

Abstract: We suppose we are given some fixed (but unknown) subset X of a set Ω = 𝔽 n 2 , where 𝔽2 denotes the field of two elements. Our goal is to learn as much as possible about the elements of X by asking certain binary questions. Each "question" Q is just some element of Ω, and the "answer" to Q is just the inner product Q . x ∈ 𝔽2 for some x ∈ X. However, the choice of x is made by a truthful (but possibly malevolent) adversary A, whom we may assume is trying to choose answers so as to yield as little information as possible about X. In this note, we investigate several aspects of this problem. In particular, we are interested in extracting as much information as possible about X from A's answers. Although Acan prevent us from learning the identity of any particular element of X, with appropriate questions we can still learn quite a bit about X. We determine the maximum amount of information that can be recovered under these assumptions and describe explicit sets of questions for achieving this goal. For the c...

...read moreread less

Journal Article•

Random Vertex Deletion in a Scale Free Random Graph

[...]

Colin Cooper, Alan Frieze, Juan C. Vera

01 Jan 2004-Internet Mathematics