scispace - formally typeset
Search or ask a question

Showing papers by "Chao Tian published in 2021"


Proceedings ArticleDOI
12 Jul 2021
TL;DR: In this paper, a new information-theoretic bound on generalization error based on a combination of the error decomposition technique of Bu et al. and the conditional mutual information (CMI) construction of Steinke and Zakynthinou was proposed.
Abstract: We propose a new information-theoretic bound on generalization error based on a combination of the error decomposition technique of Bu et al. and the conditional mutual information (CMI) construction of Steinke and Zakynthinou. In a previous work, Haghifam et al. proposed a different bound combining the two aforementioned techniques, which we refer to as the conditional individual mutual information (CIMI) bound. However, in a simple Gaussian setting, both the CMI and the CIMI bounds are order-wise worse than that by Bu et al.. This observation motivated us to propose the new bound, which overcomes this issue by reducing the conditioning terms in the conditional mutual information. In the process of establishing this bound, a conditional decoupling lemma is established, which also leads to a meaningful dichotomy and comparison among these information-theoretic bounds.

20 citations


Posted Content
TL;DR: In this paper, the authors focus on three functions that take place in the cyberspace: how to retrieve information from the Internet, how to leverage large-scale distributed/parallel processing, and how to learn/train machine learning models from private data spread across multiple users.
Abstract: Most of our lives are conducted in the cyberspace. The human notion of privacy translates into a cyber notion of privacy on many functions that take place in the cyberspace. This article focuses on three such functions: how to privately retrieve information from cyberspace (privacy in information retrieval), how to privately leverage large-scale distributed/parallel processing (privacy in distributed computing), and how to learn/train machine learning models from private data spread across multiple users (privacy in distributed (federated) learning). The article motivates each privacy setting, describes the problem formulation, summarizes breakthrough results in the history of each problem, and gives recent results and discusses some of the major ideas that emerged in each field. In addition, the cross-cutting techniques and interconnections between the three topics are discussed along with a set of open problems and challenges.

12 citations


Proceedings ArticleDOI
12 Jul 2021
TL;DR: In this paper, the authors considered a two-level PIR system with heterogeneous privacy requirements for different messages and derived a lower bound to the capacity by proposing a novel coding scheme, namely the non-uniform successive cancellation scheme.
Abstract: In the conventional robust $T$ -colluding private information retrieval (PIR) system, the user needs to retrieve one of the possible messages while keeping the identity of the requested message private from any $T$ colluding servers. Motivated by the possible heterogeneous privacy requirements for different messages, we consider the ( $N, T_{1}: K_{1}, T_{2}: K_{2}$ ) two-level PIR system, where $K_{1}$ messages need to be retrieved privately against $T_{1}$ colluding servers, and all the messages need to be retrieved privately against $T_{2}$ colluding servers where $T_{2}\leq T_{1}$ . We obtain a lower bound to the capacity by proposing a novel coding scheme, namely the non-uniform successive cancellation scheme. A capacity upper bound is also derived. The gap between the upper bound and the lower bound is analyzed, and shown to vanish when $T_{1}=T_{2}$ .

7 citations


Journal ArticleDOI
01 Mar 2021
TL;DR: In this article, the tradeoff between the storage cost and the retrieval cost is considered in a PIR system, where the user needs to retrieve one of the possible messages from a set of storage servers, but wishes to keep the identity of the requested message private from any given server.
Abstract: In a private information retrieval (PIR) system, the user needs to retrieve one of the possible messages from a set of storage servers, but wishes to keep the identity of the requested message private from any given server. Existing efforts in this area have made it clear that the efficiency of the retrieval will be impacted significantly by the amount of the storage space allowed at the servers. In this work, we consider the tradeoff between the storage cost and the retrieval cost. We first present three fundamental results: 1) a regime-wise approximate characterization of the optimal tradeoff with a factor of two, 2) a cyclic permutation lemma that can produce more sophisticated codes from simpler base codes, and 3) a relaxed entropic linear program (LP) lower bound that has a polynomial complexity. Equipped with the cyclic permutation lemma, we then propose two novel code constructions, and by applying the lemma, obtain new storage-retrieval points. Furthermore, we derive more explicit lower bounds by utilizing only a subset of the constraints in the relaxed entropic LP in a systematic manner. Though the new upper bound and lower bound do not lead to a better approximation factor uniformly, they are significantly tighter than the existing art in some regimes.

7 citations


Journal ArticleDOI
TL;DR: In this paper, the fundamental limits of coded caching with certain demand type restrictions are investigated, and a novel coding scheme is proposed, which can provide new operating points that are not covered by any previously known schemes.
Abstract: Caching is a technique to reduce the communication load in peak hours by prefetching contents during off-peak hours. An information theoretic framework for coded caching was introduced by Maddah-Ali and Niesen in a recent work, where it was shown that significant improvement can be obtained compared to uncoded caching. Considerable efforts have been devoted to identify the precise information theoretic fundamental limits of the coded caching systems, however the difficulty of this task has also become clear. One of the reasons for this difficulty is that the original coded caching setting allows all possible multiple demand types during delivery, which in fact introduces tension in the coding strategy. In this paper, we seek to develop a better understanding of the fundamental limits of coded caching by investigating systems with certain demand type restrictions. We first consider the canonical three-user three-file system, and show that, contrary to popular beliefs, the worst demand type is not the one in which all three files are requested. Motivated by these findings, we focus on coded caching systems where every file must be requested by at least one user. A novel coding scheme is proposed, which can provide new operating points that are not covered by any previously known schemes.

6 citations


Posted Content
Tao Liu1, Ruida Zhou1, Dileep Kalathil1, Panganamala Kumar1, Chao Tian1 
TL;DR: In this paper, a primal-dual algorithm with an optimistic primal estimate and a pessimistic dual update is proposed to achieve a reward regret of order Θ( √ K ) while allowing an √ k constraint violation in $K$ episodes.
Abstract: We address the issue of safety in reinforcement learning. We pose the problem in an episodic framework of a constrained Markov decision process. Existing results have shown that it is possible to achieve a reward regret of $\tilde{\mathcal{O}}(\sqrt{K})$ while allowing an $\tilde{\mathcal{O}}(\sqrt{K})$ constraint violation in $K$ episodes. A critical question that arises is whether it is possible to keep the constraint violation even smaller. We show that when a strictly safe policy is known, then one can confine the system to zero constraint violation with arbitrarily high probability while keeping the reward regret of order $\tilde{\mathcal{O}}(\sqrt{K})$. The algorithm which does so employs the principle of optimistic pessimism in the face of uncertainty to achieve safe exploration. When no strictly safe policy is known, though one is known to exist, then it is possible to restrict the system to bounded constraint violation with arbitrarily high probability. This is shown to be realized by a primal-dual algorithm with an optimistic primal estimate and a pessimistic dual update.

3 citations


Posted Content
TL;DR: In this article, the authors considered a two-level PIR system with heterogeneous privacy requirements for different messages and derived a lower bound to the capacity by proposing two novel coding schemes, namely, non-uniform successive cancellation scheme and nonuniform block cancellation scheme.
Abstract: In the conventional robust $T$-colluding private information retrieval (PIR) system, the user needs to retrieve one of the possible messages while keeping the identity of the requested message private from any $T$ colluding servers. Motivated by the possible heterogeneous privacy requirements for different messages, we consider the $(N, T_1:K_1, T_2:K_2)$ two-level PIR system, where $K_1$ messages need to be retrieved privately against $T_1$ colluding servers, and all the messages need to be retrieved privately against $T_2$ colluding servers where $T_2\leq T_1$. We obtain a lower bound to the capacity by proposing two novel coding schemes, namely the non-uniform successive cancellation scheme and the non-uniform block cancellation scheme. A capacity upper bound is also derived. The gap between the upper bound and the lower bounds is analyzed, and shown to vanish when $T_1=T_2$. Lastly, we show that the upper bound is in general not tight by providing a stronger bound for a special setting.

2 citations


Journal ArticleDOI
TL;DR: In this paper, the disjoint-set data structure is used to identify the reduction mapping, instead of relying on exhaustive enumeration in the equivalence classification, and four techniques to investigate the fundamental limits of information systems are proposed.
Abstract: Computer-aided methods, based on the entropic linear program framework, have been shown to be effective in assisting the study of information theoretic fundamental limits of information systems. One key element that significantly impacts their computation efficiency and applicability is the reduction of variables, based on problem-specific symmetry and dependence relations. In this work, we propose using the disjoint-set data structure to algorithmically identify the reduction mapping, instead of relying on exhaustive enumeration in the equivalence classification. Based on this reduced linear program, we consider four techniques to investigate the fundamental limits of information systems: (1) computing an outer bound for a given linear combination of information measures and providing the values of information measures at the optimal solution; (2) efficiently computing a polytope tradeoff outer bound between two information quantities; (3) producing a proof (as a weighted sum of known information inequalities) for a computed outer bound; and (4) providing the range for information quantities between which the optimal value does not change, i.e., sensitivity analysis. A toolbox, with an efficient JSON format input frontend, and either Gurobi or Cplex as the linear program solving engine, is implemented and open-sourced.

1 citations


Posted Content
TL;DR: In this article, a discounted infinite-horizon constrained Markov decision process framework was proposed to address the issue of safety in reinforcement learning and a natural policy gradient-based algorithm was presented with a faster convergence rate for both the optimality gap and the constraint violation.
Abstract: We address the issue of safety in reinforcement learning. We pose the problem in a discounted infinite-horizon constrained Markov decision process framework. Existing results have shown that gradient-based methods are able to achieve an $\mathcal{O}(1/\sqrt{T})$ global convergence rate both for the optimality gap and the constraint violation. We exhibit a natural policy gradient-based algorithm that has a faster convergence rate $\mathcal{O}(\log(T)/T)$ for both the optimality gap and the constraint violation. When Slater's condition is satisfied and known a priori, zero constraint violation can be further guaranteed for a sufficiently large $T$ while maintaining the same convergence rate.