scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Information-theoretic analysis of function computation on streams

01 Sep 2010-pp 1147-1152
TL;DR: This paper extends the class of distributions for which the outer bounds are achievable for all functions and a certain classes of distributions on the sources and characterizes the memory requirement for the associated streaming computation problem.
Abstract: We consider the problem of determining the memory required to compute functions of data streams. A streaming system with memory constraint has to observe a collection of sources X 1 ;X 2 ;…;X m sequentially, store synopses of the sources in memory, and compute a function of the sources based on the synopses. We are interested in the memory requirement, the number of bits of memory required to compute the function. In an earlier work, we established a correspondence between this problem and a functional source coding problem in cascade/line networks, and for the latter we derived inner and outer bounds on the rate region. In particular we showed that the outer bounds are achievable for all functions and a certain classes of distributions on the sources. In this paper we extend the class of distributions for which the outer bounds are achieved. By virtue of the correspondence between the two problems, this also characterizes the memory requirement for the associated streaming computation problem.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper establishes the capacity region for a class of source coding function computation setups, where sources of information are available at the nodes of a tree and where a function of these sources must be computed at its root.
Abstract: This paper establishes the capacity region for a class of source coding function computation setups, where sources of information are available at the nodes of a tree and where a function of these sources must be computed at its root. The capacity region holds for any function as long as the sources’ joint distribution satisfies a certain Markov criterion. This criterion is met, in particular, when the sources are independent. This result recovers the capacity regions of several function computation setups. These include the point-to-point communication setting with arbitrary sources, the noiseless multiple access network with conditionally independent sources, and the cascade network with Markovian sources.

32 citations

Posted Content
TL;DR: A general inner bound to the above three dimensional rate region is provided and shown to be tight in a number of interesting settings: the function is partially invertible, full cooperation, one-round point-to-point communication, two-round pointed communication, and cascade.
Abstract: A receiver wants to compute a function of two correlated sources separately observed by two transmitters. One of the transmitters may send a possibly private message to the other transmitter in a cooperation phase before both transmitters communicate to the receiver. For this network configuration this paper investigates both a function computation setup, wherein the receiver wants to compute a given function of the sources exactly, and a rate distortion setup, wherein the receiver wants to compute a given function within some distortion. For the function computation setup, a general inner bound to the rate region is established and shown to be tight in a number of cases: partially invertible functions, full cooperation between transmitters, one-round point-to-point communication, two-round point-to-point communication, and the cascade setup where the transmitters and the receiver are aligned. In particular it is shown that the ratio of the total number of transmitted bits without cooperation and the total number of transmitted bits with cooperation can be arbitrarily large. Furthermore, one bit of cooperation suffices to arbitrarily reduce the amount of information both transmitters need to convey to the receiver. For the rate distortion version, an inner bound to the rate region is exhibited which always includes, and sometimes strictly, the convex hull of Kaspi-Berger's related inner bounds. The strict inclusion is shown via two examples.

12 citations

Proceedings ArticleDOI
01 Jul 2012
TL;DR: In this article, a general inner bound to the above three dimensional rate region is provided and shown to be tight in a number of interesting settings: the function is partially invertible, full cooperation, one-round point-to-point communication, two-round and cascade.
Abstract: A receiver wants to compute a function of two correlated sources separately observed by two transmitters. One of the transmitters is allowed to cooperate with the other transmitter by sending it some data before both transmitters convey information to the receiver. Assuming noiseless communication, what is the minimum number of bits that needs to be communicated by each transmitter to the receiver for a given number of cooperation bits? In this paper, first a general inner bound to the above three dimensional rate region is provided and shown to be tight in a number of interesting settings: the function is partially invertible, full cooperation, one-round point-to-point communication, two-round point-to-point communication, and cascade. Second, the related Kaspi-Berger rate distortion problem is investigated where the receiver now wants to recover the sources within some distortion. By using ideas developed for establishing the above inner bound, a new rate distortion inner bound is proposed. This bound always includes the time sharing of Kaspi-Berger's inner bounds and inclusion is strict in certain cases.

12 citations

Posted Content
TL;DR: In this paper, the authors derived inner and outer bounds to the rate region of this problem which coincide in the cases where f is partially invertible and where the sources are independent given the side information.
Abstract: A receiver wants to compute a function f of two correlated sources X and Y and side information Z. What is the minimum number of bits that needs to be communicated by each transmitter? In this paper, we derive inner and outer bounds to the rate region of this problem which coincide in the cases where f is partially invertible and where the sources are independent given the side information. From the former case we recover the Slepian-Wolf rate region and from the latter case we recover Orlitsky and Roche’s single source result.

9 citations


Cites methods from "Information-theoretic analysis of f..."

  • ...Also, coding schemes and converses established in [13] have been used in other network configurations, such as cascade networks [3],[17]....

    [...]

Proceedings ArticleDOI
01 Sep 2012
TL;DR: The main result is an inner bound to the rate region of this problem which is tight when X - Y - Z forms a Markov chain.
Abstract: A transmitter has access to X, a relay has access to Y, and a receiver has access to Z and wants to compute a given function ƒ(X, Y, Z). How many bits must be transmitted from the transmitter to the relay and from the relay to the receiver so that the latter can reliably recover ƒ(X, Y, Z)? The main result is an inner bound to the rate region of this problem which is tight when X - Y - Z forms a Markov chain.

8 citations

References
More filters
Journal ArticleDOI
TL;DR: Data Streams: Algorithms and Applications surveys the emerging area of algorithms for processing data streams and associated applications, which rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity.
Abstract: In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints on space, time, and number of passes. Some of the methods rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity. The applications for this scenario include IP network traffic analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges. This article is an overview and survey of data stream algorithmics and is an updated version of [1].

1,598 citations


"Information-theoretic analysis of f..." refers background or methods in this paper

  • ...To illustrate the gains that are possible when the synopsis is designed cleverly, we quote the following puzzle that was presented in [1] for the purpose of introducing the problem area: Consider a sequence of distinct integers x1, x2, ....

    [...]

  • ...The problem of computing functions of streaming data under a memory constraint on the synopsis has been the subject of extensive research in the past decade in theoretical computer science and database communities (see survey [1])....

    [...]

Book
01 Jan 2005
TL;DR: In this paper, the authors present a survey of basic mathematical foundations for data streaming systems, including basic mathematical ideas, basic algorithms, and basic algorithms and algorithms for data stream processing.
Abstract: 1 Introduction 2 Map 3 The Data Stream Phenomenon 4 Data Streaming: Formal Aspects 5 Foundations: Basic Mathematical Ideas 6 Foundations: Basic Algorithmic Techniques 7 Foundations: Summary 8 Streaming Systems 9 New Directions 10 Historic Notes 11 Concluding Remarks Acknowledgements References

1,506 citations

Journal ArticleDOI
Noga Alon1, Yossi Matias1, Mario Szegedy1
TL;DR: In this paper, the authors considered the space complexity of randomized algorithms that approximate the frequency moments of a sequence, where the elements of the sequence are given one by one and cannot be stored.

1,456 citations

Proceedings ArticleDOI
01 Jul 1996
TL;DR: It turns out that the numbers F0;F1 and F2 can be approximated in logarithmic space, whereas the approximation of Fk for k 6 requires n (1) space.
Abstract: The frequency moments of a sequence containing mi elements of type i, for 1 i n, are the numbers Fk = P n=1 m k . We consider the space complexity of randomized algorithms that approximate the numbers Fk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly, it turns out that the numbers F0;F1 and F2 can be approximated in logarithmic space, whereas the approximation of Fk for k 6 requires n (1) space. Applications to data bases are mentioned as well.

1,279 citations

Journal ArticleDOI
17 Sep 1995
TL;DR: It is shown that if only the sender can transmit, the number of bits required is a conditional entropy of a naturally defined graph.
Abstract: A sender communicates with a receiver who wishes to reliably evaluate a function of their combined data. We show that if only the sender can transmit, the number of bits required is a conditional entropy of a naturally defined graph. We also determine the number of bits needed when the communicators exchange two messages. Reference is made to the results of rate distortion in evaluating the function of two random variables.

455 citations