Author

# Alejandro López-Ortiz

Other affiliations: Open Text Corporation, University of New Brunswick

Bio: Alejandro López-Ortiz is an academic researcher from University of Waterloo. The author has contributed to research in topic(s): Competitive analysis & List update problem. The author has an hindex of 33, co-authored 193 publication(s) receiving 3719 citation(s). Previous affiliations of Alejandro López-Ortiz include Open Text Corporation & University of New Brunswick.

##### Papers published on a yearly basis

##### Papers

More filters

••

[...]

TL;DR: In this article, the authors consider a router on the Internet analyzing the statistical properties of a TCP/IP packet stream and present an algorithm that deterministically finds (in particular) all categories having a frequency above 1/(m+1) using m counters, which is best possible in the worst case.

Abstract: We consider a router on the Internet analyzing the statistical properties of a TCP/IP packet stream. A fundamental difficulty with measuring traffic behavior on the Internet is that there is simply too much data to be recorded for later analysis, on the order of gigabytes a second. As a result, network routers can collect only relatively few statistics about the data. The central problem addressed here is to use the limited memory of routers to determine essential features of the network traffic stream. A particularly difficult and representative subproblem is to determine the top k categories to which the most packets belong, for a desired value of k and for a given notion of categorization such as the destination IP address.We present an algorithm that deterministically finds (in particular) all categories having a frequency above 1/(m+1) using m counters, which we prove is best possible in the worst case. We also present a sampling-based algorithm for the case that packet categories follow an arbitrary distribution, but their order over time is permuted uniformly at random. Under this model, our algorithm identifies flows above a frequency threshold of roughly 1/?nm with high probability, where m is the number of counters and n is the number of packets observed. This guarantee is not far off from the ideal of identifying all flows (probability 1/n), and we prove that it is best possible up to a logarithmic factor. We show that the algorithm ranks the identified flows according to frequency within any desired constant factor of accuracy.

521 citations

••

[...]

TL;DR: This work develops a framework for designing and evaluating adaptive algorithms in the comparison model, and presents adaptive algorithms that make no a priori assumptions about the problem instance, and show that their running times are within a constant factor of optimal with respect to a natural measure of the difficulty of an instance.

Abstract: Motivated by boolean queries in text database systems, we consider the problems of finding the intersection, union, or difference of a collection of sorted sets. While the worst-case complexity of these problems is straightforward , we consider a notion of complexity that depends on the particular instance. We develop the idea of a proof that a given set is indeed the correct answer. Proofs, and in particular shortest proofs, are characterized. We present adaptive algorithms that make no a priori assumptions about the problem instance, and show that their running times are within a constant factor of optimal with respect to a natural measure of the difficulty of an instance. In the process, we develop a framework for designing and evaluating adaptive algorithms in the comparison model. 1 Introduction and Overview Our work can be seen in the general context of performing searches quickly in a database or data warehousing environment. The broad issue is that of characterizing what type of join operations can be performed without scanning the relations involved or actually materializing intermediate relations. The specific problem addressed here can be seen in that context or in the context of performing a web search, or a search in another large text database, for documents containing some or all of a set of keywords. For each keyword we are given the set of references to documents in which it occurs [2, 6, 9]. These sets are stored in some natural order, such as document date. In practice, the sets are large. For example, the average word from user query logs matches approximately a million documents on the AltaVista web search engine. Of course, one would hope that the answer to the query is small, particularly if the query is an intersection. It may also be expected that the elements of such an intersection are not spread uniformly through the initial

188 citations

••

[...]

TL;DR: This paper presents a deterministic algorithm for identifying frequent items in sliding windows defined over real-time packet streams that uses limited memory, requires constant processing time per packet, makes only one pass over the data, and is shown to work well when tested on TCP traffic logs.

Abstract: Internet traffic patterns are believed to obey the power law, implying that most of the bandwidth is consumed by a small set of heavy users. Hence, queries that return a list of frequently occurring items are important in the analysis of real-time Internet packet streams. While several results exist for computing frequent item queries using limited memory in the infinite stream model, in this paper we consider the limited-memory sliding window model. This model maintains the last $N$ items that have arrived at any given time and forbids the storage of the entire window in memory. We present a deterministic algorithm for identifying frequent items in sliding windows defined over real-time packet streams. The algorithm uses limited memory, requires constant processing time per packet (amortized), makes only one pass over the data, and is shown to work well when tested on TCP traffic logs.

142 citations

••

[...]

TL;DR: In this article, the minimum number of required beacons on a network under a BGP-like routing policy is shown to be NP-hard and at best Ω(log n)-approximable.

Abstract: Internet topology information is only made available in aggregate form by standard routing protocols. Connectivity information and latency characteristics must therefore be inferred using indirect techniques. In this paper we consider measurements using a distributed set of measurement points or beacons. We show that computing the minimum number of required beacons on a network under a BGP-like routing policy is NP-hard and at best Ω(log n)-approximable. In the worst case at least (n-1)/3 and at most (n+1)/3 beacons are required for a network with n nodes. We then introduce some observations that allow us to propose a relatively small candidate set of beacons for the current Internet topology. The set proposed has properties with relevant applications for all-paths routing on the public Internet and performance based routing.

93 citations

•

[...]

TL;DR: This paper presents a fast, simple algorithm for bounds consistency propagation of the alldifferent constraint and shows that this algorithm outperforms existing bounds consistency algorithms and also outperforms--on problems with an easily identifiable property-state-ofthe-art commercial implementations of propagators for stronger forms of local consistency.

Abstract: In constraint programming one models a problem by stating constraints on acceptable solutions. The constraint model is then usually solved by interleaving backtracking search and constraint propagation. Previous studies have demonstrated that designing special purpose constraint propagators for commonly occurring constraints can significantly improve the efficiency of a constraint programming approach. In this paper we present a fast, simple algorithm for bounds consistency propagation of the alldifferent constraint. The algorithm has the same worst case behavior as the previous best algorithm but is much faster in practice. Using a variety of benchmark and random problems, we show that our algorithm outperforms existing bounds consistency algorithms and also outperforms--on problems with an easily identifiable property-state-ofthe-art commercial implementations of propagators for stronger forms of local consistency.

79 citations

##### Cited by

More filters

••

[...]

TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.

Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

6,328 citations

••

[...]

05 Nov 2003

TL;DR: This work study and evaluate link estimator, neighborhood table management, and reliable routing protocol techniques, and narrow the design space through evaluations on large-scale, high-level simulations to 50-node, in-depth empirical experiments.

Abstract: The dynamic and lossy nature of wireless communication poses major challenges to reliable, self-organizing multihop networks. These non-ideal characteristics are more problematic with the primitive, low-power radio transceivers found in sensor networks, and raise new issues that routing protocols must address. Link connectivity statistics should be captured dynamically through an efficient yet adaptive link estimator and routing decisions should exploit such connectivity statistics to achieve reliability. Link status and routing information must be maintained in a neighborhood table with constant space regardless of cell density. We study and evaluate link estimator, neighborhood table management, and reliable routing protocol techniques. We focus on a many-to-one, periodic data collection workload. We narrow the design space through evaluations on large-scale, high-level simulations to 50-node, in-depth empirical experiments. The most effective solution uses a simple time averaged EWMA estimator, frequency based table management, and cost-based routing.

1,726 citations

••

[...]

TL;DR: An efficient algorithm for mining decision trees from continuously-changing data streams, based on the ultra-fast VFDT decision tree learner is proposed, called CVFDT, which stays current while making the most of old data by growing an alternative subtree whenever an old one becomes questionable, and replacing the old with the new when the new becomes more accurate.

Abstract: Most statistical and machine-learning algorithms assume that the data is a random sample drawn from a stationary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over months or years, and the underlying processes generating them changed during this time, sometimes radically. Although a number of algorithms have been proposed for learning time-changing concepts, they generally do not scale well to very large databases. In this paper we propose an efficient algorithm for mining decision trees from continuously-changing data streams, based on the ultra-fast VFDT decision tree learner. This algorithm, called CVFDT, stays current while making the most of old data by growing an alternative subtree whenever an old one becomes questionable, and replacing the old with the new when the new becomes more accurate. CVFDT learns a model which is similar in accuracy to the one that would be learned by reapplying VFDT to a moving window of examples every time a new example arrives, but with O(1) complexity per example, as opposed to O(w), where w is the size of the window. Experiments on a set of large time-changing data streams demonstrate the utility of this approach.

1,667 citations

••

[...]

TL;DR: Data Streams: Algorithms and Applications surveys the emerging area of algorithms for processing data streams and associated applications, which rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity.

Abstract: In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time, and number of passes. Some of the methods rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges. This article is an overview and survey of data stream algorithmics and is an updated version of [1].

1,527 citations

•

[...]

TL;DR: In this paper, the authors present a survey of basic mathematical foundations for data streaming systems, including basic mathematical ideas, basic algorithms, and basic algorithms and algorithms for data stream processing.

Abstract: 1 Introduction 2 Map 3 The Data Stream Phenomenon 4 Data Streaming: Formal Aspects 5 Foundations: Basic Mathematical Ideas 6 Foundations: Basic Algorithmic Techniques 7 Foundations: Summary 8 Streaming Systems 9 New Directions 10 Historic Notes 11 Concluding Remarks Acknowledgements References

1,489 citations