scispace - formally typeset
Search or ask a question
Author

Sharu Theresa Jose

Bio: Sharu Theresa Jose is an academic researcher from King's College London. The author has contributed to research in topics: Generalization & Computer science. The author has an hindex of 6, co-authored 31 publications receiving 111 citations. Previous affiliations of Sharu Theresa Jose include Indian Institute of Technology Bombay & Indian Institutes of Technology.

Papers
More filters
Journal ArticleDOI
TL;DR: Novel information-theoretic upper bounds on the meta-generalization gap are presented via novel individual task MI (ITMI) bounds for two broad classes of meta-learning algorithms that use either separate within-task training and test sets, like model agnostic meta- Learning to learn.
Abstract: Meta-learning, or "learning to learn", refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that uses either separate within-task training and test sets, like MAML, or joint within-task training and test sets, like Reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed, under given technical conditions, for the two classes via novel Individual Task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning.

24 citations

Journal ArticleDOI
19 Jan 2021-Entropy
TL;DR: In this paper, an information-theoretic upper bound on the meta-generalization gap is derived for two broad classes of meta-learning algorithms, which use either separate within-task training and test sets, like model agnostic meta learning (MAML), or joint within-tact training and within-training sets, such as reptile.
Abstract: Meta-learning, or “learning to learn”, refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that use either separate within-task training and test sets, like model agnostic meta-learning (MAML), or joint within-task training and test sets, like reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed for the two classes via novel individual task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning.

17 citations

Journal ArticleDOI
TL;DR: A linear programming (LP)-based framework is presented for obtaining converses for finite blocklength lossy joint source-channel coding problems, which applies for any loss criterion, generalizes certain previously known converses, and also extends to multi-terminal settings.
Abstract: A linear programming (LP)-based framework is presented for obtaining converses for finite blocklength lossy joint source-channel coding problems. The framework applies for any loss criterion, generalizes certain previously known converses, and also extends to multi-terminal settings. The finite blocklength problem is posed equivalently as a nonconvex optimization problem and using a lift-and-project-like method, a close but tractable LP relaxation of this problem is derived. Lower bounds on the original problem are obtained by the construction of feasible points for the dual of the LP relaxation. A particular application of this approach leads to new converses, which recover and improve on the converses of Kostina and Verdu for finite blocklength lossy joint source-channel coding and lossy source coding. For finite blocklength channel coding, the LP relaxation recovers the converse of Polyanskiy, Poor and Verdu and leads to a new improvement on the converse of Wolfowitz, showing thereby that our LP relaxation is asymptotically tight with increasing blocklengths for channel coding, lossless source coding, and joint source-channel coding with the excess distortion probability as the loss criterion. Using a duality-based argument, a new converse is derived for finite blocklength joint source-channel coding for a class of source-channel pairs. Employing this converse, the LP relaxation is also shown to be tight for all blocklengths for the minimization of the expected average symbolwise Hamming distortion of a $q$ -ary uniform source over a $q$ -ary symmetric memoryless channel for any $q\in {\mathbb {N}}$ . The optimization formulation and the lift-and-project method are extended to networked settings and demonstrated by obtaining an improvement on a converse of Zhou et al. for the successive refinement problem for successively refinable source-distortion measure triplets.

14 citations

Posted Content
TL;DR: In this article, a linear programming (LP) based framework is presented for obtaining converses for finite blocklength lossy joint source-channel coding problems, which applies for any loss criterion, generalizes certain previously known converses and also extends to multi-terminal settings.
Abstract: A linear programming (LP) based framework is presented for obtaining converses for finite blocklength lossy joint source-channel coding problems. The framework applies for any loss criterion, generalizes certain previously known converses, and also extends to multi-terminal settings. The finite blocklength problem is posed equivalently as a nonconvex optimization problem and using a lift-and-project-like method, a close but tractable LP relaxation of this problem is derived. Lower bounds on the original problem are obtained by the construction of feasible points for the dual of the LP relaxation. A particular application of this approach leads to new converses which recover and improve on the converses of Kostina and Verdu for finite blocklength lossy joint source-channel coding and lossy source coding. For finite blocklength channel coding, the LP relaxation recovers the converse of Polyanskiy, Poor and Verdu and leads to a new improvement on the converse of Wolfowitz, showing thereby that our LP relaxation is asymptotically tight with increasing blocklengths for channel coding, lossless source coding and joint source-channel coding with the excess distortion probability as the loss criterion. Using a duality based argument, a new converse is derived for finite blocklength joint source-channel coding for a class of source-channel pairs. Employing this converse, the LP relaxation is also shown to be tight for all blocklengths for the minimization of the expected average symbol-wise Hamming distortion of a $q$-ary uniform source over a $q$-ary symmetric memoryless channel for any $q \in N$. The optimization formulation and the lift-and-project method are extended to networked settings and demonstrated by obtaining an improvement on a converse of Zhou et al. for the successive refinement problem for successively refinable source-distortion measure triplets.

13 citations

Journal ArticleDOI
TL;DR: In this paper, a finite blocklength converse for the Slepian-Wolf problem is presented, which significantly improves on the best-known converse due to Miyake and Kanaya.
Abstract: A new finite blocklength converse for the Slepian–Wolf coding problem, which significantly improves on the best-known converse due to Miyake and Kanaya, is presented. To obtain this converse, an extension of the linear programming (LP)-based framework for finite blocklength point-to-point coding problems is employed. However, a direct application of this framework demands a complicated analysis for the Slepian–Wolf problem. An analytically simpler approach is presented, wherein LP-based finite blocklength converses for this problem are synthesized from point-to-point lossless source coding problems with perfect side-information at the decoder. New finite blocklength converses for these point-to-point problems are derived by employing the LP-based framework, and the new converse for Slepian–Wolf coding is obtained by an appropriate combination of these converses.

13 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A general and simple recipe for proving strong converse that is applicable for distributed problems as well and substitutes the hard Markov constraints implied by the distributed nature of the problem with a soft information cost using a variational formula introduced by Oohama.
Abstract: The strong converse for a coding theorem shows that the optimal asymptotic rate possible with vanishing error cannot be improved by allowing a fixed error. Building on a method introduced by Gu and Effros for centralized coding problems, we develop a general and simple recipe for proving strong converse that is applicable for distributed problems as well. Heuristically, our proof of strong converse mimics the standard steps for proving a weak converse, except that we apply those steps to a modified distribution obtained by conditioning the original distribution on the event that no error occurs. A key component of our recipe is the replacement of the hard Markov constraints implied by the distributed nature of the problem with a soft information cost using a variational formula introduced by Oohama. We illustrate our method by providing a short proof of the strong converse for the Wyner-Ziv problem and strong converse theorems for interactive function computation, common randomness and secret key agreement, and the wiretap channel; the latter three strong converse problems were open prior to this work.

26 citations

Posted Content
TL;DR: A general optimization-based framework for stochastic control problems with nonclassical information structures is presented and insights are obtained on the relation between the structure of cost functions and of convex relaxations for inverse optimal control.
Abstract: We present an optimization-based approach to stochastic control problems with nonclassical information structures. We cast these problems equivalently as optimization prob- lems on joint distributions. The resulting problems are necessarily nonconvex. Our approach to solving them is through convex relaxation. We solve the instance solved by Bansal and Basar with a particular application of this approach that uses the data processing inequality for constructing the convex relaxation. Using certain f-divergences, we obtain a new, larger set of inverse optimal cost functions for such problems. Insights are obtained on the relation between the structure of cost functions and of convex relaxations for inverse optimal control.

25 citations

Journal ArticleDOI
TL;DR: Meta-learning as mentioned in this paper is one of the effective techniques to overcome the issue of weak generalization ability to unknown tasks by employing prior knowledge to assist the learning of new tasks, and there are mainly three types of meta learning methods: metric-based, model-based and optimization-based meta-learning.

20 citations

Proceedings ArticleDOI
12 Jul 2021
TL;DR: In this paper, a new information-theoretic bound on generalization error based on a combination of the error decomposition technique of Bu et al. and the conditional mutual information (CMI) construction of Steinke and Zakynthinou was proposed.
Abstract: We propose a new information-theoretic bound on generalization error based on a combination of the error decomposition technique of Bu et al. and the conditional mutual information (CMI) construction of Steinke and Zakynthinou. In a previous work, Haghifam et al. proposed a different bound combining the two aforementioned techniques, which we refer to as the conditional individual mutual information (CIMI) bound. However, in a simple Gaussian setting, both the CMI and the CIMI bounds are order-wise worse than that by Bu et al.. This observation motivated us to propose the new bound, which overcomes this issue by reducing the conditioning terms in the conditional mutual information. In the process of establishing this bound, a conditional decoupling lemma is established, which also leads to a meaningful dichotomy and comparison among these information-theoretic bounds.

20 citations