scispace - formally typeset
Search or ask a question
Author

Ziv Bar-Yossef

Bio: Ziv Bar-Yossef is an academic researcher from Google. The author has contributed to research in topics: Communication complexity & Web page. The author has an hindex of 34, co-authored 57 publications receiving 5661 citations. Previous affiliations of Ziv Bar-Yossef include Hebrew University of Jerusalem & IBM.


Papers
More filters
Journal ArticleDOI
16 Nov 2002
TL;DR: This work presents a new method for proving strong lower bounds in communication complexity based on the notion of the conditional information complexity of a function, and shows that it also admits a direct sum theorem.
Abstract: We present a new method for proving strong lower bounds in communication complexity. This method is based on the notion of the conditional information complexity of a function which is the minimum amount of information about the inputs that has to be revealed by a communication protocol for the function. While conditional information complexity is a lower bound on the communication complexity, we show that it also admits a direct sum theorem. Direct sum decomposition reduces our task to that of proving (conditional) information complexity lower bounds for simple problems (such as the AND of two bits). For the latter, we develop novel techniques based on Hellinger distance and its generalizations.

724 citations

Journal ArticleDOI
TL;DR: A measure on graphs, the minrank, is identified, which exactly characterizes the minimum length of linear and certain types of nonlinear INDEX codes and for natural classes of side information graphs, including directed acyclic graphs, perfect graphs, odd holes, and odd anti-holes, minrank is the optimal length of arbitrary INDex codes.
Abstract: Motivated by a problem of transmitting supplemental data over broadcast channels (Birk and Kol, INFOCOM 1998), we study the following coding problem: a sender communicates with n receivers R1,..., Rn. He holds an input x ∈ {0,01l}n and wishes to broadcast a single message so that each receiver Ri can recover the bit xi. Each Ri has prior side information about x, induced by a directed graph Grain nodes; Ri knows the bits of a; in the positions {j | (i,j) is an edge of G}.G is known to the sender and to the receivers. We call encoding schemes that achieve this goal INDEXcodes for {0,1}n with side information graph G. In this paper we identify a measure on graphs, the minrank, which exactly characterizes the minimum length of linear and certain types of nonlinear INDEX codes. We show that for natural classes of side information graphs, including directed acyclic graphs, perfect graphs, odd holes, and odd anti-holes, minrank is the optimal length of arbitrary INDEX codes. For arbitrary INDEX codes and arbitrary graphs, we obtain a lower bound in terms of the size of the maximum acyclic induced subgraph. This bound holds even for randomized codes, but has been shown not to be tight.

632 citations

Book ChapterDOI
13 Sep 2002
TL;DR: Three algorithms to count the number of distinct elements in a data stream to within a factor of 1 ± ?
Abstract: We present three algorithms to count the number of distinct elements in a data stream to within a factor of 1 ± ?. Our algorithms improve upon known algorithms for this problem, and offer a spectrum of time/space tradeoffs.

561 citations

Proceedings ArticleDOI
06 Jan 2002
TL;DR: This work designs the first algorithm for the number of distinct elements in a data stream that achieves arbitrary approximation factors and develops the concept of list-efficient streaming algorithms that are essential to the design of efficient streaming algorithms through reductions.
Abstract: We introduce reductions in the streaming model as a tool in the design of streaming algorithms. We develop the concept of list-efficient streaming algorithms that are essential to the design of efficient streaming algorithms through reductions.Our results include a suite of list-efficient streaming algorithms for basic statistical primitives. Using the reduction paradigm along with these tools, we design streaming algorithms for approximately counting the number of triangles in a graph presented as a stream.A specific highlight of our work is the first algorithm for the number of distinct elements in a data stream that achieves arbitrary approximation factors. (Independently, Trevisan [Tre01] has solved this problem via a different approach; our algorithm has the advantage of being list-efficient.)

410 citations

Proceedings ArticleDOI
21 Oct 2006
TL;DR: A measure on graphs, the minrank, is identified, which exactly characterizes the minimum length of linear and certain types of nonlinear INDEX codes and for natural classes of side information graphs, including directed acyclic graphs, perfect graphs, odd holes, and odd anti-holes, minrank is the optimal length of arbitrary INDex codes.
Abstract: Motivated by a problem of transmitting data over broadcast channels (Birk and Kol, INFOCOM 1998), we study the following coding problem: a sender communicates with n receivers R_1, . . . , R_n. He holds an input x \in {0, 1}^n and wishes to broadcast a single message so that each receiver R_i can recover the bit x_i. Each R_i has prior side information about x, induced by a directed graph G on n nodes; R_i knows the bits of x in the positions {j | (i, j) is an edge of G}. We call encoding schemes that achieve this goal INDEX codes for {0, 1}^n with side information graph G. In this paper we identify a measure on graphs, the minrank, which we conjecture to exactly characterize the minimum length of INDEX codes. We resolve the conjecture for certain natural classes of graphs. For arbitrary graphs, we show that the minrank bound is tight for both linear codes and certain classes of non-linear codes. For the general problem, we obtain a (weaker) lower bound that the length of an INDEX code for any graph G is at least the size of the maximum acyclic induced subgraph of G.

383 citations


Cited by
More filters
Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

Proceedings ArticleDOI
03 Jun 2002
TL;DR: The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated.
Abstract: In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing, and algorithmic issues.

2,933 citations

01 Jan 2006
TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].
Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

2,591 citations

Journal ArticleDOI
TL;DR: This purpose of this introductory paper is to introduce the Monte Carlo method with emphasis on probabilistic machine learning and review the main building blocks of modern Markov chain Monte Carlo simulation.
Abstract: This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of this special issue. Lastly, it discusses new interesting research horizons.

2,579 citations

Proceedings ArticleDOI
Frank McSherry1, Kunal Talwar1
21 Oct 2007
TL;DR: It is shown that the recent notion of differential privacv, in addition to its own intrinsic virtue, can ensure that participants have limited effect on the outcome of the mechanism, and as a consequence have limited incentive to lie.
Abstract: We study the role that privacy-preserving algorithms, which prevent the leakage of specific information about participants, can play in the design of mechanisms for strategic agents, which must encourage players to honestly report information. Specifically, we show that the recent notion of differential privacv, in addition to its own intrinsic virtue, can ensure that participants have limited effect on the outcome of the mechanism, and as a consequence have limited incentive to lie. More precisely, mechanisms with differential privacy are approximate dominant strategy under arbitrary player utility functions, are automatically resilient to coalitions, and easily allow repeatability. We study several special cases of the unlimited supply auction problem, providing new results for digital goods auctions, attribute auctions, and auctions with arbitrary structural constraints on the prices. As an important prelude to developing a privacy-preserving auction mechanism, we introduce and study a generalization of previous privacy work that accommodates the high sensitivity of the auction setting, where a single participant may dramatically alter the optimal fixed price, and a slight change in the offered price may take the revenue from optimal to zero.

2,102 citations