Home
/
Authors
/
T. S. Jayram

Author

T. S. Jayram

Other affiliations: University of Michigan, Microsoft

Bio: T. S. Jayram is an academic researcher from IBM. The author has contributed to research in topics: Communication complexity & Upper and lower bounds. The author has an hindex of 35, co-authored 62 publications receiving 5031 citations. Previous affiliations of T. S. Jayram include University of Michigan & Microsoft.

Papers published on a yearly basis

2023
2022
2020
2019
2018
2017
2016
2015
2014
2013
2011
2010
2009
2008
2007
2006
2004
2003
2002
2001
2000
1999

Papers

PDF

Open Access

More filters

Journal Article•DOI•

An information statistics approach to data stream and communication complexity

[...]

Ziv Bar-Yossef¹, T. S. Jayram¹, Ravi Kumar¹, Dandapani Sivakumar¹•Institutions (1)

IBM¹

16 Nov 2002

TL;DR: This work presents a new method for proving strong lower bounds in communication complexity based on the notion of the conditional information complexity of a function, and shows that it also admits a direct sum theorem.

...read moreread less

Abstract: We present a new method for proving strong lower bounds in communication complexity. This method is based on the notion of the conditional information complexity of a function which is the minimum amount of information about the inputs that has to be revealed by a communication protocol for the function. While conditional information complexity is a lower bound on the communication complexity, we show that it also admits a direct sum theorem. Direct sum decomposition reduces our task to that of proving (conditional) information complexity lower bounds for simple problems (such as the AND of two bits). For the latter, we develop novel techniques based on Hellinger distance and its generalizations.

...read moreread less

724 citations

Journal Article•DOI•

Index Coding With Side Information

[...]

Ziv Bar-Yossef¹, Yitzhak Birk¹, T. S. Jayram², T. Kol¹•Institutions (2)

Technion – Israel Institute of Technology¹, IBM²

01 Mar 2011-IEEE Transactions on Information Theory

TL;DR: A measure on graphs, the minrank, is identified, which exactly characterizes the minimum length of linear and certain types of nonlinear INDEX codes and for natural classes of side information graphs, including directed acyclic graphs, perfect graphs, odd holes, and odd anti-holes, minrank is the optimal length of arbitrary INDex codes.

...read moreread less

Abstract: Motivated by a problem of transmitting supplemental data over broadcast channels (Birk and Kol, INFOCOM 1998), we study the following coding problem: a sender communicates with n receivers R1,..., Rn. He holds an input x ∈ {0,01l}n and wishes to broadcast a single message so that each receiver Ri can recover the bit xi. Each Ri has prior side information about x, induced by a directed graph Grain nodes; Ri knows the bits of a; in the positions {j | (i,j) is an edge of G}.G is known to the sender and to the receivers. We call encoding schemes that achieve this goal INDEXcodes for {0,1}n with side information graph G. In this paper we identify a measure on graphs, the minrank, which exactly characterizes the minimum length of linear and certain types of nonlinear INDEX codes. We show that for natural classes of side information graphs, including directed acyclic graphs, perfect graphs, odd holes, and odd anti-holes, minrank is the optimal length of arbitrary INDEX codes. For arbitrary INDEX codes and arbitrary graphs, we obtain a lower bound in terms of the size of the maximum acyclic induced subgraph. This bound holds even for randomized codes, but has been shown not to be tight.

...read moreread less

632 citations

Book Chapter•DOI•

Counting Distinct Elements in a Data Stream

[...]

Ziv Bar-Yossef¹, T. S. Jayram², Ravi Kumar², Dandapani Sivakumar², Luca Trevisan¹ - Show less +1 more•Institutions (2)

University of California, Berkeley¹, IBM²

13 Sep 2002

TL;DR: Three algorithms to count the number of distinct elements in a data stream to within a factor of 1 ± ?

...read moreread less

Abstract: We present three algorithms to count the number of distinct elements in a data stream to within a factor of 1 ± ?. Our algorithms improve upon known algorithms for this problem, and offer a spectrum of time/space tradeoffs.

...read moreread less

561 citations

Proceedings Article•DOI•

Index Coding with Side Information

[...]

Ziv Bar-Yossef¹, Yitzhak Birk¹, T. S. Jayram², T. Kol¹•Institutions (2)

Technion – Israel Institute of Technology¹, IBM²

21 Oct 2006

...read moreread less

Abstract: Motivated by a problem of transmitting data over broadcast channels (Birk and Kol, INFOCOM 1998), we study the following coding problem: a sender communicates with n receivers R_1, . . . , R_n. He holds an input x \in {0, 1}^n and wishes to broadcast a single message so that each receiver R_i can recover the bit x_i. Each R_i has prior side information about x, induced by a directed graph G on n nodes; R_i knows the bits of x in the positions {j | (i, j) is an edge of G}. We call encoding schemes that achieve this goal INDEX codes for {0, 1}^n with side information graph G. In this paper we identify a measure on graphs, the minrank, which we conjecture to exactly characterize the minimum length of INDEX codes. We resolve the conjecture for certain natural classes of graphs. For arbitrary graphs, we show that the minrank bound is tight for both linear codes and certain classes of non-linear codes. For the general problem, we obtain a (weaker) lower bound that the length of an INDEX code for any graph G is at least the size of the maximum acyclic induced subgraph of G.

...read moreread less

383 citations

Journal Article•DOI•

Using Control Theory to Achieve Service Level Objectives In Performance Management

[...]

Sujay Parekh¹, N. Gandhi², Joseph L. Hellerstein¹, Dawn M. Tilbury², T. S. Jayram¹, Joseph Phillip Bigus¹ - Show less +2 more•Institutions (2)

IBM¹, University of Michigan²

01 Jul 2002-Real-time Systems

TL;DR: An analysis of a closed-loop system using an integral control law with Lotus Notes as the target, using root-locus analysis from control theory, is able to predict the occurrence (or absence) of controller-induced oscillations in the system's response.

...read moreread less

Abstract: A widely used approach to achieving service level objectives for a software system (eg, an email server) is to add a controller that manipulates the target system’s tuning parameters We describe a methodology for designing such controllers for software systems that builds on classical control theory The classical approach proceeds in two steps: system identification and controller design In system identification, we construct mathematical models of the target system Traditionally, this has been based on a first-principles approach, using detailed knowledge of the target system Such models can be complex and difficult to build, validate, use, and maintain In our methodology, a statistical (ARMA) model is fit to historical measurements of the target being controlled These models are easier to obtain and use and allow us to apply control-theoretic design techniques to a larger class of systems When applied to a Lotus Notes groupware server, we obtain model-fits with R^{2} no lower than 75% and as high as 98% In controller design, an analysis of the models leads to a controller that will achieve the service level objectives We report on an analysis of a closed-loop system using an integral control law with Lotus Notes as the target The objective is to maintain a reference queue length Using root-locus analysis from control theory, we are able to predict the occurrence (or absence) of controller-induced oscillations in the system’s response Such oscillations are undesirable since they increase variability, thereby resulting in a failure to meet the service level objective We implement this controller for a real Lotus Notes system, and observe a remarkable correspondence between the behavior of the real system and the predictions of the analysis This indicates that the control theoretic analysis is sufficient to select controller parameters that meet the desired goals, and the need for simulations is reduced

...read moreread less

270 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse

Cited by

PDF

Open Access

More filters

An empirical study of the naive Bayes classifier

[...]

Irina Rish¹•Institutions (1)

IBM¹

01 Jan 2001

TL;DR: This work analyzes the impact of the distribution entropy on the classification error, showing that low-entropy feature distributions yield good performance of naive Bayes and demonstrates that naive Baye works well for certain nearlyfunctional feature dependencies.

...read moreread less

Abstract: The naive Bayes classifier greatly simplify learning by assuming that features are independent given class. Although independence is generally a poor assumption, in practice naive Bayes often competes well with more sophisticated classifiers. Our broad goal is to understand the data characteristics which affect the performance of naive Bayes. Our approach uses Monte Carlo simulations that allow a systematic study of classification accuracy for several classes of randomly generated problems. We analyze the impact of the distribution entropy on the classification error, showing that low-entropy feature distributions yield good performance of naive Bayes. We also demonstrate that naive Bayes works well for certain nearlyfunctional feature dependencies, thus reaching its best performance in two opposite cases: completely independent features (as expected) and functionally dependent features (which is surprising). Another surprising result is that the accuracy of naive Bayes is not directly correlated with the degree of feature dependencies measured as the classconditional mutual information between the features. Instead, a better predictor of naive Bayes accuracy is the amount of information about the class that is lost because of the independence assumption.

...read moreread less

2,046 citations

Journal Article•DOI•

An improved data stream summary: the count-min sketch and its applications

[...]

Graham Cormode¹, S. Muthukrishnan¹•Institutions (1)

Rutgers University¹

01 Apr 2005-Journal of Algorithms

TL;DR: In this paper, the authors introduce a sublinear space data structure called the countmin sketch for summarizing data streams, which allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc.

...read moreread less

1,939 citations

Journal Article•DOI•

Mash: fast genome and metagenome distance estimation using MinHash.

[...]

Brian D. Ondov, Todd J. Treangen, Páll Melsted¹, Adam B. Mallonee, Nicholas H. Bergman, Sergey Koren², Adam M. Phillippy² - Show less +3 more•Institutions (2)

University of Iceland¹, National Institutes of Health²

20 Jun 2016-Genome Biology

TL;DR: Mash extends the MinHash dimensionality-reduction technique to include a pairwise mutation distance and P value significance test, enabling the efficient clustering and search of massive sequence collections.

...read moreread less

Abstract: Mash extends the MinHash dimensionality-reduction technique to include a pairwise mutation distance and P value significance test, enabling the efficient clustering and search of massive sequence collections. Mash reduces large sequences and sequence sets to small, representative sketches, from which global mutation distances can be rapidly estimated. We demonstrate several use cases, including the clustering of all 54,118 NCBI RefSeq genomes in 33 CPU h; real-time database search using assembled or unassembled Illumina, Pacific Biosciences, and Oxford Nanopore data; and the scalable clustering of hundreds of metagenomic samples by composition. Mash is freely released under a BSD license ( https://github.com/marbl/mash ).

...read moreread less

1,886 citations

Journal Article•DOI•

Fundamental Limits of Caching

[...]

Mohammad Ali Maddah-Ali¹, Urs Niesen¹•Institutions (1)

Bell Labs¹

11 Mar 2014-IEEE Transactions on Information Theory

TL;DR: This paper proposes a novel coded caching scheme that exploits both local and global caching gains, leading to a multiplicative improvement in the peak rate compared with previously known schemes, and argues that the performance of the proposed scheme is within a constant factor of the information-theoretic optimum for all values of the problem parameters.

...read moreread less

Abstract: Caching is a technique to reduce peak traffic rates by prefetching popular content into memories at the end users. Conventionally, these memories are used to deliver requested content in part from a locally cached copy rather than through the network. The gain offered by this approach, which we term local caching gain, depends on the local cache size (i.e., the memory available at each individual user). In this paper, we introduce and exploit a second, global, caching gain not utilized by conventional caching schemes. This gain depends on the aggregate global cache size (i.e., the cumulative memory available at all users), even though there is no cooperation among the users. To evaluate and isolate these two gains, we introduce an information-theoretic formulation of the caching problem focusing on its basic structure. For this setting, we propose a novel coded caching scheme that exploits both local and global caching gains, leading to a multiplicative improvement in the peak rate compared with previously known schemes. In particular, the improvement can be on the order of the number of users in the network. In addition, we argue that the performance of the proposed scheme is within a constant factor of the information-theoretic optimum for all values of the problem parameters.

...read moreread less

1,857 citations

Journal Article•DOI•

Data fusion

[...]

Jens Bleiholder¹, Felix Naumann¹•Institutions (1)

Hasso Plattner Institute¹

15 Jan 2009-ACM Computing Surveys

TL;DR: This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data Fusion.

...read moreread less

Abstract: The development of the Internet in recent years has made it possible and useful to access many different information systems anywhere in the world to obtain information. While there is much research on the integration of heterogeneous information systems, most commercial systems stop short of the actual integration of available data. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation.This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data fusion, namely, uncertain and conflicting data values. We give an overview and classification of different ways of fusing data and present several techniques based on standard and advanced operators of the relational algebra and SQL. Finally, the article features a comprehensive survey of data integration systems from academia and industry, showing if and how data fusion is performed in each.

...read moreread less

1,797 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse