scispace - formally typeset
Search or ask a question

Showing papers by "Michael Mitzenmacher published in 2007"


Proceedings ArticleDOI
24 Oct 2007
TL;DR: This paper proposes the use of Traffic Dispersion Graphs (TDGs) as a way to monitor, analyze, and visualize network traffic.
Abstract: Monitoring network traffic and detecting unwanted applications has become a challenging problem, since many applications obfuscate their traffic using unregistered port numbers or payload encryption. Apart from some notable exceptions, most traffic monitoring tools use two types of approaches: (a) keeping traffic statistics such as packet sizes and interarrivals, flow counts, byte volumes, etc., or (b) analyzing packet content. In this paper, we propose the use of Traffic Dispersion Graphs (TDGs) as a way to monitor, analyze, and visualize network traffic. TDGs model the social behavior of hosts ("who talks to whom"), where the edges can be defined to represent different interactions (e.g. the exchange of a certain number or type of packets). With the introduction of TDGs, we are able to harness a wealth of tools and graph modeling techniques from a diverse set of disciplines.

205 citations


Journal ArticleDOI
TL;DR: This paper considers the capacity of binary deletion channels, where bits are deleted independently with probability d, and improves significantly upon the best previous framework used to obtain provable lower bounds on this capacity by utilizing a stronger definition of a typical output from the channel.
Abstract: This paper considers the capacity of binary deletion channels, where bits are deleted independently with probability d. It improves significantly upon the best previous framework used to obtain provable lower bounds on this capacity by utilizing a stronger definition of a typical output from the channel. The new results give the best known provable bounds on the capacity for all values of d. Moreover, the techniques presented here extend to yield lower bounds for channels with certain types of random insertions, namely, duplications, or combinations of duplications and deletions. To demonstrate these techniques in this context, two binary channels are analyzed: a channel where each transmitted bit is copied with probability nu and a channel where each transmitted bit is copied a geometrically distributed number of times.

103 citations


Proceedings ArticleDOI
24 Jun 2007
TL;DR: Two upper bounds on the capacity of the i.i.d. binary deletion channel are presented, believed to be the first nontrivial upper bounds for this probabilistic deletion channel.
Abstract: We present two upper bounds on the capacity of the i.i.d. binary deletion channel, where each bit is independently deleted with a fixed probability d. The first can be numerically evaluated for any fixed d. The second provides an asymptotic upper bound as d goes to 1. These appear to be the first nontrivial upper bounds for this probabilistic deletion channel.

76 citations


01 Jan 2007
TL;DR: This work focuses on the potential of content-addressable memories and queueing techniques to provide a de-amortization of cuckoo hashing suitable for hardware, and in particular for high-performance routers.
Abstract: Cuckoo hashing combines multiple-choice hashing with the power to move elements, providing hash tables with very high space utilization and low probability of overflow. However, inserting a new object into such a hash table can take substantial time, requiring many elements to be moved. While these events are rare and the amortized performance of these data structures is excellent, this shortcoming is unacceptable in many applications, particularly those involving hardware router implementations. We address this difficulty, focusing on the potential of content-addressable memories and queueing techniques to provide a de-amortization of cuckoo hashing suitable for hardware, and in particular for high-performance routers.

51 citations


Journal ArticleDOI
TL;DR: It is shown via simulations that, in comparison with a standard Bloom filter, using the power of two choices can yield modest reductions in the false positive probability using the same amount of space and more hashing.
Abstract: We consider the combination of two ideas from the hashing literature: the power of two choices and Bloom filters. Specifically, we show via simulations that, in comparison with a standard Bloom filter, using the power of two choices can yield modest reductions in the false positive probability using the same amount of space and more hashing. While the improvements are sufficiently small that they may not be useful in most practical situations, the combination of ideas is instructive; in particular, it suggests that there may be ways to obtain improved results for Bloom filters while using the same basic approach they employ, as opposed to designing new, more complex data structures for the problem.

45 citations


Proceedings ArticleDOI
24 Jun 2007
TL;DR: Under this assumption, simple and computationally efficient deterministic encoding and decoding schemes that achieve a high provable rate even under worst case errors are demonstrated.
Abstract: We consider deletion channels and insertion channels under an additional segmentation assumption: the input consists of disjoint segments of b consecutive bits, with at most one error per segment. Under this assumption, we demonstrate simple and computationally efficient deterministic encoding and decoding schemes that achieve a high provable rate even under worst-case errors. We also consider more complex schemes that experimentally achieve higher rates under random error.

43 citations


Proceedings ArticleDOI
05 Nov 2007
TL;DR: This approach called history-based encoding, execution and addressing (HEXA) challenges the conventional assumption that graph data structures must store pointers of lceillog2nrceil bits to identify successor nodes and shows how the data structures can be organized so that implicit information can be used to locate successors, significantly reducing the amount of information that must be stored explicitly.
Abstract: Data structures representing directed graphs with edges labeled by symbols from a finite alphabet are used to implement packet processing algorithms used in a variety of network applications. In this paper we present a novel approach to represent such data structures, which significantly reduces the amount of memory required. This approach called history-based encoding, execution and addressing (HEXA) challenges the conventional assumption that graph data structures must store pointers of lceillog2nrceil bits to identify successor nodes. We show how the data structures can be organized so that implicit information can be used to locate successors, significantly reducing the amount of information that must be stored explicitly. We demonstrate that the binary tries used for IP route lookup can be implemented using just two bytes per stored prefix (roughly half the space required by Eatherton's tree bitmap data structure) and that string matching can be implemented using 20-30% of the space required by conventional data representations. Compact representations are useful, because they allow the performance-critical part of packet processing algorithms to be implemented using fast, on-chip memory, eliminating the need to retrieve information from much slower off-chip memory. This can yield both substantially higher performance and lower power utilization. While enabling a compact representation, HEXA does not add significant complexity to the graph traversal and update, thus maintaining a high performance.

36 citations


Proceedings Article
01 Jan 2007
TL;DR: This paper addresses the practical problems of forming routing tables with imperfect node knowledge and churn and examines query performance on non-Euclidean data sets.
Abstract: Routing substrates for overlay networks are an important building block for large distributed applications. Many existing substrates are based on a random identifier space and therefore do not respect node locality when routing data. This can lead to lower performance for locality-sensitive applications, such as web caching, distributed gaming, and resource discovery. This paper examines the problem of building a locality-aware routing substrate on top of a locality-based coordinate system, where the distance between coordinates approximates network latencies. As a starting point we take the scaled θ-routing proposal for geometric routing in a Euclidean space. We address the practical problems of forming routing tables with imperfect node knowledge and churn and examine query performance on non-Euclidean data sets.

30 citations


01 Jan 2007
TL;DR: In this article, the authors present an economically-principled generative model for Autonomous System graph connectivity, which incorporates a diffusion process to capture how ASes respond to direct and indirect externalities from changes in the network structure, which brings it closer to an equilibrium model.
Abstract: End-to-end packet delivery in the Internet is achieved through a system of interconnections between the network domains of independent entities called Autonomous Systems (ASes). Inter-domain connections are the result of a complex, dynamic process of negotiated business relationships between pairs of ASes. We present an economically-principled generative model for Autonomous System graph connectivity. While there is already a large literature devoted to understanding Internet connectivity at the AS level, many of these models are either static or based on generalized stochastics. In a thoughtful critique of such models, Li, Alderson, Doyle and Willinger [10] show that while many generative models reproduce certain statistical features of the AS graph, they fail to capture the good performance of realistic networks [10]. In a study of the AS’s intra-domain graph, Li, Alderson, Willinger and Doyle [11] define performance instead in terms of network throughput and show that it is very unlikely that randomized generative models will yield graphs that have the highly-optimized structure of real-world networks. The goal of this paper is to provide insight into the economic drivers that yield, over time, the rich and complex AS interconnection patterns that constitute today’s Internet. Notable features of our model include the assignment of AS business models with an asymmetric gravity model of interdomain traffic demand [3], an explicit representation of AS utility that incorporates benefits for traffic routed, congestion costs, and payments between ASes, and a deterministic process for link revision that can cascade throughout the network. This is the first attempt at AS graph modeling that incorporates a diffusion process to capture how ASes respond to direct and indirect externalities from changes in the network structure, which brings it closer to an equilibrium model. We validate our model against other generative models. To do this, we define the social planner’s problem which is parameterized by the business models of the graph and provide a method to compare earlier generative models with our model by optimizing the placement of business models on the network. We find that our model yields graphs that are better performing as compared to other dynamic generative models. We also show that our model yields a structured placement of nodes endogenously, where this placement of nodes generally reflects ASes’ business models. This is some of the first evidence of the significance of the business competitive landscape in determining the structure of the AS graph.

21 citations


01 Jan 2007
TL;DR: It is concluded that TDGs are powerful, useful, and can be implemented efficiently in hardware, and constitute a promising new chapter for network monitoring techniques.
Abstract: Monitoring network traffic and detecting unwanted applications has become a challenging problem, since many applications obfuscate their traffic using arbitrary port numbers or payload encryption. Apart from some notable exceptions, most traffic monitoring tools follow two types of approaches: (a) keeping traffic statistics such as packet sizes and inter-arrivals, flow counts, byte volumes, etc., or (b) analyzing packet content. In this work, we propose the use of Traffic Dispersion Graphs (TDGs) as a powerful way to monitor, analyze, and visualize network traffic. TDGs model the social behavior of hosts (“who talks to whom”), while the edges can be defined to represent different interactions (e.g. the exchange of a certain number or type of packets). With the introduction of TDGs, we are able to harness the wealth of tools and graph modeling techniques from a diverse set of disciplines. First, we fully explore the abilities of TDGs as an intuitive and visually powerful tool. Second, we demonstrate their usefulness in application classification and intrusion detection solutions. Finally, we provide a hardware-aware design and implementation for TDG-based techniques. We conclude that TDGs are powerful, useful, and can be implemented efficiently in hardware. They constitute a promising new chapter for network monitoring techniques.

14 citations


Patent
21 Aug 2007
TL;DR: In this article, a method for finding an optimal path from a source to a destination is presented, where the possible paths from the source to the destination are represented as a stochastic graph of nodes connected by edges.
Abstract: A method finds an optimal path from a source to a destination. The possible paths from the source to the destination are represented as a stochastic graph of nodes connected by edges (210). Each edge has an independent probability distribution over a cost of the edge (220). A constraint for reaching the destination is defined (230). The graph is reduced to a relatively small set of deterministic minimum cost problems (240), which can be solved to determine an optimal path that maximizes a probability of reaching the destination within the constraint (250).