Showing papers by "Michael Mitzenmacher published in 2007"

PDF

Open Access

Proceedings Article•DOI•

Network monitoring using traffic dispersion graphs (tdgs)

[...]

Marios Iliofotou¹, Prashanth Pappu, Michalis Faloutsos¹, Michael Mitzenmacher², Sumeet Singh³, George Varghese⁴ - Show less +2 more•Institutions (4)

University of California, Riverside¹, Harvard University², Cisco Systems, Inc.³, University of California, San Diego⁴

24 Oct 2007

TL;DR: This paper proposes the use of Traffic Dispersion Graphs (TDGs) as a way to monitor, analyze, and visualize network traffic.

...read moreread less

Abstract: Monitoring network traffic and detecting unwanted applications has become a challenging problem, since many applications obfuscate their traffic using unregistered port numbers or payload encryption. Apart from some notable exceptions, most traffic monitoring tools use two types of approaches: (a) keeping traffic statistics such as packet sizes and interarrivals, flow counts, byte volumes, etc., or (b) analyzing packet content. In this paper, we propose the use of Traffic Dispersion Graphs (TDGs) as a way to monitor, analyze, and visualize network traffic. TDGs model the social behavior of hosts ("who talks to whom"), where the edges can be defined to represent different interactions (e.g. the exchange of a certain number or type of packets). With the introduction of TDGs, we are able to harness a wealth of tools and graph modeling techniques from a diverse set of disciplines.

...read moreread less

205 citations

Journal Article•DOI•

Improved Lower Bounds for the Capacity of i.i.d. Deletion and Duplication Channels

[...]

Eleni Drinea¹, Michael Mitzenmacher¹•Institutions (1)

Harvard University¹

01 Aug 2007-IEEE Transactions on Information Theory

TL;DR: This paper considers the capacity of binary deletion channels, where bits are deleted independently with probability d, and improves significantly upon the best previous framework used to obtain provable lower bounds on this capacity by utilizing a stronger definition of a typical output from the channel.

...read moreread less

Abstract: This paper considers the capacity of binary deletion channels, where bits are deleted independently with probability d. It improves significantly upon the best previous framework used to obtain provable lower bounds on this capacity by utilizing a stronger definition of a typical output from the channel. The new results give the best known provable bounds on the capacity for all values of d. Moreover, the techniques presented here extend to yield lower bounds for channels with certain types of random insertions, namely, duplications, or combinations of duplications and deletions. To demonstrate these techniques in this context, two binary channels are analyzed: a channel where each transmitted bit is copied with probability nu and a channel where each transmitted bit is copied a geometrically distributed number of times.

...read moreread less

103 citations

Proceedings Article•DOI•

Capacity Upper Bounds for the Deletion Channel

[...]

Suhas Diggavi¹, Michael Mitzenmacher², Henry D. Pfister³•Institutions (3)

École Polytechnique Fédérale de Lausanne¹, Harvard University², Texas A&M University³

24 Jun 2007

TL;DR: Two upper bounds on the capacity of the i.i.d. binary deletion channel are presented, believed to be the first nontrivial upper bounds for this probabilistic deletion channel.

...read moreread less

Abstract: We present two upper bounds on the capacity of the i.i.d. binary deletion channel, where each bit is independently deleted with a fixed probability d. The first can be numerically evaluated for any fixed d. The second provides an asymptotic upper bound as d goes to 1. These appear to be the first nontrivial upper bounds for this probabilistic deletion channel.

...read moreread less

76 citations

Using a Queue to De-amortize Cuckoo Hashing in Hardware

[...]

Adam Kirsch¹, Michael Mitzenmacher¹•Institutions (1)

Harvard University¹

01 Jan 2007

TL;DR: This work focuses on the potential of content-addressable memories and queueing techniques to provide a de-amortization of cuckoo hashing suitable for hardware, and in particular for high-performance routers.

...read moreread less

Abstract: Cuckoo hashing combines multiple-choice hashing with the power to move elements, providing hash tables with very high space utilization and low probability of overflow. However, inserting a new object into such a hash table can take substantial time, requiring many elements to be moved. While these events are rare and the amortized performance of these data structures is excellent, this shortcoming is unacceptable in many applications, particularly those involving hardware router implementations. We address this difficulty, focusing on the potential of content-addressable memories and queueing techniques to provide a de-amortization of cuckoo hashing suitable for hardware, and in particular for high-performance routers.

...read moreread less

51 citations

Journal Article•DOI•

Using the Power of Two Choices to Improve Bloom Filters

[...]

Steven S. Lumetta¹, Michael Mitzenmacher²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Harvard University²

01 Jan 2007-Internet Mathematics

TL;DR: It is shown via simulations that, in comparison with a standard Bloom filter, using the power of two choices can yield modest reductions in the false positive probability using the same amount of space and more hashing.

...read moreread less

Abstract: We consider the combination of two ideas from the hashing literature: the power of two choices and Bloom filters. Specifically, we show via simulations that, in comparison with a standard Bloom filter, using the power of two choices can yield modest reductions in the false positive probability using the same amount of space and more hashing. While the improvements are sufficiently small that they may not be useful in most practical situations, the combination of ideas is instructive; in particular, it suggests that there may be ways to obtain improved results for Bloom filters while using the same basic approach they employ, as opposed to designing new, more complex data structures for the problem.

...read moreread less

45 citations

Proceedings Article•DOI•

Codes for Deletion and Insertion Channels with Segmented Errors

[...]

Zhenming Liu¹, Michael Mitzenmacher¹•Institutions (1)

Harvard University¹

24 Jun 2007

TL;DR: Under this assumption, simple and computationally efficient deterministic encoding and decoding schemes that achieve a high provable rate even under worst case errors are demonstrated.

...read moreread less

Abstract: We consider deletion channels and insertion channels under an additional segmentation assumption: the input consists of disjoint segments of b consecutive bits, with at most one error per segment. Under this assumption, we demonstrate simple and computationally efficient deterministic encoding and decoding schemes that achieve a high provable rate even under worst-case errors. We also consider more complex schemes that experimentally achieve higher rates under random error.

...read moreread less

43 citations

Proceedings Article•DOI•

HEXA: Compact Data Structures for Faster Packet Processing

[...]

Sailesh Kumar¹, Jonathan S. Turner¹, Patrick Crowley¹, Michael Mitzenmacher²•Institutions (2)

Washington University in St. Louis¹, Harvard University²

05 Nov 2007

TL;DR: This approach called history-based encoding, execution and addressing (HEXA) challenges the conventional assumption that graph data structures must store pointers of lceillog2nrceil bits to identify successor nodes and shows how the data structures can be organized so that implicit information can be used to locate successors, significantly reducing the amount of information that must be stored explicitly.

...read moreread less

Abstract: Data structures representing directed graphs with edges labeled by symbols from a finite alphabet are used to implement packet processing algorithms used in a variety of network applications. In this paper we present a novel approach to represent such data structures, which significantly reduces the amount of memory required. This approach called history-based encoding, execution and addressing (HEXA) challenges the conventional assumption that graph data structures must store pointers of lceillog2nrceil bits to identify successor nodes. We show how the data structures can be organized so that implicit information can be used to locate successors, significantly reducing the amount of information that must be stored explicitly. We demonstrate that the binary tries used for IP route lookup can be implemented using just two bytes per stored prefix (roughly half the space required by Eatherton's tree bitmap data structure) and that string matching can be implemented using 20-30% of the space required by conventional data representations. Compact representations are useful, because they allow the performance-critical part of packet processing algorithms to be implemented using fast, on-chip memory, eliminating the need to retrieve information from much slower off-chip memory. This can yield both substantially higher performance and lower power utilization. While enabling a compact representation, HEXA does not add significant complexity to the graph traversal and update, thus maintaining a high performance.

...read moreread less

36 citations

Proceedings Article•

Wired Geometric Routing.

[...]

Jonathan Ledlie¹, Michael Mitzenmacher¹, Margo Seltzer¹, Peter Pietzuch²•Institutions (2)

Harvard University¹, Imperial College London²

01 Jan 2007

TL;DR: This paper addresses the practical problems of forming routing tables with imperfect node knowledge and churn and examines query performance on non-Euclidean data sets.

...read moreread less

Abstract: Routing substrates for overlay networks are an important building block for large distributed applications. Many existing substrates are based on a random identifier space and therefore do not respect node locality when routing data. This can lead to lower performance for locality-sensitive applications, such as web caching, distributed gaming, and resource discovery. This paper examines the problem of building a locality-aware routing substrate on top of a locality-based coordinate system, where the distance between coordinates approximates network latencies. As a starting point we take the scaled θ-routing proposal for geometric routing in a Euclidean space. We address the practical problems of forming routing tables with imperfect node knowledge and churn and examine query performance on non-Euclidean data sets.

...read moreread less

30 citations

An Economically Principled Generative Model of AS Graph Connectivity

[...]

Jacomo Corbo¹, Shaili Jain², Michael Mitzenmacher², David C. Parkes²•Institutions (2)

University of Pennsylvania¹, Harvard University²

01 Jan 2007

TL;DR: In this article, the authors present an economically-principled generative model for Autonomous System graph connectivity, which incorporates a diffusion process to capture how ASes respond to direct and indirect externalities from changes in the network structure, which brings it closer to an equilibrium model.

...read moreread less

Abstract: End-to-end packet delivery in the Internet is achieved through a system of interconnections between the network domains of independent entities called Autonomous Systems (ASes). Inter-domain connections are the result of a complex, dynamic process of negotiated business relationships between pairs of ASes. We present an economically-principled generative model for Autonomous System graph connectivity. While there is already a large literature devoted to understanding Internet connectivity at the AS level, many of these models are either static or based on generalized stochastics. In a thoughtful critique of such models, Li, Alderson, Doyle and Willinger [10] show that while many generative models reproduce certain statistical features of the AS graph, they fail to capture the good performance of realistic networks [10]. In a study of the AS’s intra-domain graph, Li, Alderson, Willinger and Doyle [11] define performance instead in terms of network throughput and show that it is very unlikely that randomized generative models will yield graphs that have the highly-optimized structure of real-world networks. The goal of this paper is to provide insight into the economic drivers that yield, over time, the rich and complex AS interconnection patterns that constitute today’s Internet. Notable features of our model include the assignment of AS business models with an asymmetric gravity model of interdomain traffic demand [3], an explicit representation of AS utility that incorporates benefits for traffic routed, congestion costs, and payments between ASes, and a deterministic process for link revision that can cascade throughout the network. This is the first attempt at AS graph modeling that incorporates a diffusion process to capture how ASes respond to direct and indirect externalities from changes in the network structure, which brings it closer to an equilibrium model. We validate our model against other generative models. To do this, we define the social planner’s problem which is parameterized by the business models of the graph and provide a method to compare earlier generative models with our model by optimizing the placement of business models on the network. We find that our model yields graphs that are better performing as compared to other dynamic generative models. We also show that our model yields a structured placement of nodes endogenously, where this placement of nodes generally reflects ASes’ business models. This is some of the first evidence of the significance of the business competitive landscape in determining the structure of the AS graph.

...read moreread less

21 citations

Network Traffic Analysis using Traffic Dispersion Graphs (TDGs): Techniques and Hardware Implementation

[...]

Marios Iliofotou, Michael Mitzenmacher

01 Jan 2007

TL;DR: It is concluded that TDGs are powerful, useful, and can be implemented efficiently in hardware, and constitute a promising new chapter for network monitoring techniques.

...read moreread less

Abstract: Monitoring network traffic and detecting unwanted applications has become a challenging problem, since many applications obfuscate their traffic using arbitrary port numbers or payload encryption. Apart from some notable exceptions, most traffic monitoring tools follow two types of approaches: (a) keeping traffic statistics such as packet sizes and inter-arrivals, flow counts, byte volumes, etc., or (b) analyzing packet content. In this work, we propose the use of Traffic Dispersion Graphs (TDGs) as a powerful way to monitor, analyze, and visualize network traffic. TDGs model the social behavior of hosts (“who talks to whom”), while the edges can be defined to represent different interactions (e.g. the exchange of a certain number or type of packets). With the introduction of TDGs, we are able to harness the wealth of tools and graph modeling techniques from a diverse set of disciplines. First, we fully explore the abilities of TDGs as an intuitive and visually powerful tool. Second, we demonstrate their usefulness in application classification and intrusion detection solutions. Finally, we provide a hardware-aware design and implementation for TDG-based techniques. We conclude that TDGs are powerful, useful, and can be implemented efficiently in hardware. They constitute a promising new chapter for network monitoring techniques.

...read moreread less

14 citations

Patent•

Computer implemented method for finding optimal path from source to destination

[...]

Evdokia Nikolova¹, Matthew E. Brand¹, Michael Mitzenmacher¹•Institutions (1)

Mitsubishi Electric¹

21 Aug 2007

TL;DR: In this article, a method for finding an optimal path from a source to a destination is presented, where the possible paths from the source to the destination are represented as a stochastic graph of nodes connected by edges.

...read moreread less

Abstract: A method finds an optimal path from a source to a destination. The possible paths from the source to the destination are represented as a stochastic graph of nodes connected by edges (210). Each edge has an independent probability distribution over a cost of the edge (220). A constraint for reaching the destination is defined (230). The graph is reduced to a relatively small set of deterministic minimum cost problems (240), which can be solved to determine an optimal path that maximizes a probability of reaching the destination within the constraint (250).

...read moreread less

HEXA: CompactDataStructures forFaster Packet Processing

[...]

Sailesh Kumar, Jonathan S. Turner, Patrick Crowley, Michael Mitzenmacher

01 Jan 2007